[m-rev.] for review: Implement map.equal/2

Paul Bone paul at bone.id.au
Sat Oct 19 17:56:17 AEDT 2013
Previous message: [m-rev.] for review: Implement map.equal/2
Next message: [m-rev.] diff: allow the target architecture to be set in the environment
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Zoltan,

It's good to hear from you.

On Sat, Oct 19, 2013 at 10:14:14AM +1100, Zoltan Somogyi wrote:
> This means that two maps with the same
> > +% key-value pairs that were constructed differently may fail to unify or
> > +% compare as unequal, for example if items are inserted into one of the maps
> > +% in a different order.
> 
> Given the explanation in the second half of the sentence, you don't need
> "that were constructed differently". However, you do need to use past tense:
> "if items WERE inserted".
> 
> > This behaviour may change in the future.
> 
> What behavior?

I was referring to a discussion that Julien and I had about adding
user-defined equality to the tree234 type (so that a semidet unification
calls the equal predicate).  However I put this sentence in the wrong place.
I've fixed it by deleting this sentence as it's simply not required and
would only confuse the reader.


> >  
> > +tree234.equal(MapA, MapB) :-
> > +    ( pointer_equal(MapA, MapB) ->
> > +        true
> > +    ;
> > +        equal(MapA, [MapB], [])
> > +    ).
> 
> I strongly dislike having two predicates with different jobs being named the
> same, even if they have different arities. I therefore think equal/3 should
> be renamed something like "match_map_in_stack".

match_tree_in_stack makes sense to me.

> While this predicate, and its helpers, will usually work on maps, not all
> 234 trees are used to implement maps, and the variable names should
> reflect that.

Okay.

> I think the comment describing the overall strategy has to be on equal/2,
> since otherwise, the call to equal/3 does not make sense to the reader.
> I also think the last two arguments should be named, so the definition
> should look something like
> 
>     tree234.equal(TreeA, TreeB) :-
>         % We first try to see if the two trees are identical, since this
>         % can save a large amount of work. If they aren't, then we must
>         % compare their structures. To do this, we use a stack to store
>         % the as-yet-unexplored parts of TreeB in their original order
>         % while the match_tree_in_stack predicate recurses on TreeA.
>         ( pointer_equal(TreeA, TreeB) ->
>             true
>         ;
>             StackB0 = [TreeB],
>             match_tree_in_stack(TreeA, StackB0, StackB),
>             % Fail if we have any parts of TreeB left over.
>             StackB = []
>         ).
> 
> By the way, can it ever happen that for two semantically equivalent trees,
> the final StackB is NOT [], but [empty], or [empty, empty], etc? If yes,
> we need to handle it. If not, we must document WHY it cannot happen.

There's only one way that that can happen.  I will change the code that it
doesn't and explain why.

> > +:- pred match_kv(K::in, V::in,
> > +    list(tree234(K, V))::in, list(tree234(K, V))::out) is semidet.
> > +
> > +match_kv(K, V, !Stack) :-
> > +    !.Stack = [B | !:Stack],
> > +    match_kv(K, V, B, !Stack).
> 
> Again, predicate names should differ, not just arities. May I suggest
> match_kv_against_stack for match_kv/4, and match_kv_against_subtree_and_stack
> for match_kv/5?

I like those names.

> > +    % match_kv(K, V, Top, !Stack)
> > +    %
> > +    % Match a key-value pair against one item from the stack of trees.
> 
> "Match a key-value pair against a stack of subtrees, from which the first
> item has already been extracted."
> 
> The predicate does not make sense unless B is the part of the original TreeB
> that is immediately before !.StackB.
> 
> > +    % If the match is successful then the matched item from the stack is
> > +    % updated.  Top is the top of the input stack.
> 
> I don't know what "update" this is trying to refer to. Also, there is nothing
> named "Top".
>

I've moved this comment to the match_kv_against_stack predicate, formerly
match_kv/4, it now reads:

    % match_kv_against_stack(KA, VA, !StackB)
    %
    % Match a key-value pair against the leftmost key-value pair in the
    % stack of subtrees.  The first item on the stack is the leftmost
    % subtree.  The matched key-value pair is removed from the stack.


> > +    require_complete_switch [B]
> > +    (
> > +        B = empty,
> > +        % This case is rarely executed, because empty items are never put on
> > +        % the stack within match_kv/4 or match_kv/5.
> > +        match_kv(K, V, !Stack)
> 
> This is NOT rarely executed. You don't put plain empty on the stack, but ...

It could have been if equal/2 was called with an empty tree as it's second
argument.  I've now removed that possibility.

> > +        B = two(K1, V1, Left, Right),
> > +        ( Left = empty ->
> > +            % Try to match the items.
> > +            K = K1,
> > +            V = V1,
> > +            % Update stack and return.
> > +            ( Right \= empty ->
> > +                !:Stack = [Right | !.Stack]
> > +            ;
> > +                true
> > +            )
> > +        ;
> > +            % Break up this node and recurse
> > +            !:Stack = [two(K1, V1, empty, Right) | !.Stack],
> > +            match_kv(K, V, Left, !Stack)
> 
> ... you put empty on the stack indirectly, here and in the cases for three
> and four nodes.
> 

Only when it's a subtree of some other node.

> Given this fact, I don't think the Right = empty test above is speeding
> up the code directly; it just does a test that would be done later anyway.
> The reason why I think it is still useful is that it avoids allocating
> a cons cell, which should speed up the code indirectly by postponing gc
> a tiny bit.

And avoiding the overhead of the allocation.  I think the optimisation is
worth while, but I've not measured it.  However, the main reason that I
wanted to avoid putting empty nodes on the stack is to make it easier to
determine when the tree represented by the stack is empty.  Now that I've
fixed the only remaining way that empty nodes could have been placed on the
stack I'm happy with it.

> > --- /dev/null
> > +++ b/tests/general/map_equal.exp
> > @@ -0,0 +1,10 @@
> > +Map1 = Map1: unifiable, equal
> > +Map1 = Map1Copy: unifiable, equal
> > +Map1 = Map2: not unifiable, not equal
> > +Map2 = Map1: not unifiable, not equal
> > +Map1 = empty: not unifiable, not equal
> > +empty = Map1: not unifiable, not equal
> > +empty = empty: unifiable, equal
> > +empty = copy(empty): unifiable, equal
> > +Map3 = Map4: not unifiable, equal
> > +Map4 = Map3: not unifiable, equal
> 
> I don't think this test is sufficient to convince me about the absence
> of a problem about leftover empty subtrees on StackB.
> 
> A pragmatic solution would be to put a wrapper around tree234.equal
> that tests whether it gets the same result (success or failure) as
> converting both maps to association lists, and aborts if it doesn't.
> If some big programs work (if slowly) with such a sanity check, we can
> look for a proof for why there is no such problem. If they abort ...
>

There are three reasons why I think my algorithm is correct.  I hope to
convince you that the wrapper idea won't be necessary:

    1. Although my test case is small I checked it with the debugger's
       coverage testing support and it covers all the cases in all of my
       predicates.

    2. Now that I've added instantiation sub-typing I can guarantee that
       there is never an empty tree in the stack of trees.
    
    3. In match_kv_against_stack/4 and match_kv_against_subtree_and_stack/5
       each clause has the same pattern as the others.  So if the reader
       understands the case for two/4 and is satisfied that it is correct
       then he/she can read the other cases and easily determine of they're
       correct.

> That is a lot of critical feedback, but Paul, the original strategy is
> very well done.

Thanks.  I tried hard on making this efficient; the alternative was to
convert both maps to sorted assoc lists and unify those - which will be
slower as it allocates more memory.  The intuition for my algorithm is that
to recurse on any data structure one normally uses the program stack, for
two data structures you almost want two program stacks, so I just made this
second stack explicit.

Cheers.


-- 
Paul Bone
http://www.bone.id.au
Previous message: [m-rev.] for review: Implement map.equal/2
Next message: [m-rev.] diff: allow the target architecture to be set in the environment
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list