[m-rev.] for review: Implement map.equal/2

Zoltan Somogyi zs at unimelb.edu.au
Sat Oct 19 10:14:14 AEDT 2013


On 18-Oct-2013, Paul Bone <paul at bone.id.au> wrote:
> Implement map.equal/2
> 
> The equality test operation on 2-3-4 trees tests if they are structurally

You mean that UNIFICATION tests this. Your new equality predicate does not.

> equal.  They might also be considered equal if their sets of key-value pairs
> are equivalent.  I've created an equal/2 predicate in the tree234 and
> map modules in the standard library to test this.

> --- a/NEWS
> +++ b/NEWS
> @@ -18,6 +18,10 @@ This is a bug-fix release.
>    call modulo constructor optimisation on the Java and (probably) C#
>    grades.  (Bug #300)
>  
> +Changes to the Mercury standard library:
> +
> +* Equality predicates are now defined for 2-3-4 trees and maps.

I think you need to say what these are semantic equality predicates,
not structural ones.

> --- a/library/map.m
> +++ b/library/map.m
> @@ -19,6 +19,15 @@
>  % tree234.m.  Virtually all the predicates in this file just
>  % forward the work to the corresponding predicate in tree234.m.
>  %
> +% Note: 2-3-4 trees do not have a canonical representation for any given
> +% map.  Therefore, two maps with the same key-value pairs may have different
> +% internal representations.

"same SET of key-value pairs"

This means that two maps with the same
> +% key-value pairs that were constructed differently may fail to unify or
> +% compare as unequal, for example if items are inserted into one of the maps
> +% in a different order.

Given the explanation in the second half of the sentence, you don't need
"that were constructed differently". However, you do need to use past tense:
"if items WERE inserted".

> This behaviour may change in the future.

What behavior?

> See
> +% map.equal/2 below which can be used to test if two maps have the same
> +% key-value pairs.

Same SET of ...

> +    % True if both maps have the same key-value pairs, regardless of how the
> +    % maps were constructed.

Same SET of ...

> +    % Unifying maps does not work as one might expect because the internal
> +    % structure of two equivalent maps (according to this predicate) may be
> +    % different.

... because the internal structures of two maps that contain the same set
of key-value pairs may be different.

> +:- pred tree234.equal(tree234(K, V)::in, tree234(K, V)::in) is semidet.

I think this needs documentation as well. A copy of the comment on map.equal
will do, mutatis mutandis.

> +:- import_module private_builtin.
>  :- import_module require.
>  :- import_module set.
>  :- import_module string.
> @@ -607,6 +610,132 @@ tree234.singleton(K, V) = two(K, V, empty, empty).
>  tree234.is_empty(Tree) :-
>      Tree = empty.
>  
> +tree234.equal(MapA, MapB) :-
> +    ( pointer_equal(MapA, MapB) ->
> +        true
> +    ;
> +        equal(MapA, [MapB], [])
> +    ).

I strongly dislike having two predicates with different jobs being named the
same, even if they have different arities. I therefore think equal/3 should
be renamed something like "match_map_in_stack".

While this predicate, and its helpers, will usually work on maps, not all
234 trees are used to implement maps, and the variable names should
reflect that.

I think the comment describing the overall strategy has to be on equal/2,
since otherwise, the call to equal/3 does not make sense to the reader.
I also think the last two arguments should be named, so the definition
should look something like

    tree234.equal(TreeA, TreeB) :-
        % We first try to see if the two trees are identical, since this
        % can save a large amount of work. If they aren't, then we must
        % compare their structures. To do this, we use a stack to store
        % the as-yet-unexplored parts of TreeB in their original order
        % while the match_tree_in_stack predicate recurses on TreeA.
        ( pointer_equal(TreeA, TreeB) ->
            true
        ;
            StackB0 = [TreeB],
            match_tree_in_stack(TreeA, StackB0, StackB),
            % Fail if we have any parts of TreeB left over.
            StackB = []
        ).

By the way, can it ever happen that for two semantically equivalent trees,
the final StackB is NOT [], but [empty], or [empty, empty], etc? If yes,
we need to handle it. If not, we must document WHY it cannot happen.

> +    % Use a stack to store the unexplored parts of B while you recurse on A
> +    % matcking KV pairs.  use two nested loops, the out loop recurses on A,
> +    % the inner loop matches from B and the explicit stack.

I don't think this is readable, and in any case, if you have the comment I
wrote above, you don't need it.

> +    % Note that B is conceptually the top of B's stack.  Allowing us to use
> +    % it to make pointer comprisons.

There is no "B" in the code. Also, the notion of the "top" of the stack
is not really useful; the most important thing is that the first item item
on the stack is the LEFTMOST fragment of the original TreeB. In any case,
I think the need for this comment can be avoided if you rewrite this ...

> +equal(MapA, !StackB) :-
> +    (
> +        !.StackB = [MapB | !:StackB],
> +        pointer_equal(MapA, MapB)

... as this:

    equal(SubTreeA, !StackB) :-
        (
            !.StackB = [LeftmostSubTreeB | !:StackB],
            pointer_equal(SubTreeA, LeftmostSubTreeB)

Note the variable names.

> +:- pred match_kv(K::in, V::in,
> +    list(tree234(K, V))::in, list(tree234(K, V))::out) is semidet.
> +
> +match_kv(K, V, !Stack) :-
> +    !.Stack = [B | !:Stack],
> +    match_kv(K, V, B, !Stack).

Again, predicate names should differ, not just arities. May I suggest
match_kv_against_stack for match_kv/4, and match_kv_against_subtree_and_stack
for match_kv/5?

> +    % match_kv(K, V, Top, !Stack)
> +    %
> +    % Match a key-value pair against one item from the stack of trees.

"Match a key-value pair against a stack of subtrees, from which the first
item has already been extracted."

The predicate does not make sense unless B is the part of the original TreeB
that is immediately before !.StackB.

> +    % If the match is successful then the matched item from the stack is
> +    % updated.  Top is the top of the input stack.

I don't know what "update" this is trying to refer to. Also, there is nothing
named "Top".

> +:- pred match_kv(K::in, V::in, tree234(K, V)::in,
> +    list(tree234(K, V))::in, list(tree234(K, V))::out) is semidet.
> +
> +match_kv(K, V, B, !Stack) :-

K -> KA
V -> VA
B -> FirstB
Stack -> StackB

> +    require_complete_switch [B]
> +    (
> +        B = empty,
> +        % This case is rarely executed, because empty items are never put on
> +        % the stack within match_kv/4 or match_kv/5.
> +        match_kv(K, V, !Stack)

This is NOT rarely executed. You don't put plain empty on the stack, but ...

> +        B = two(K1, V1, Left, Right),
> +        ( Left = empty ->
> +            % Try to match the items.
> +            K = K1,
> +            V = V1,
> +            % Update stack and return.
> +            ( Right \= empty ->
> +                !:Stack = [Right | !.Stack]
> +            ;
> +                true
> +            )
> +        ;
> +            % Break up this node and recurse
> +            !:Stack = [two(K1, V1, empty, Right) | !.Stack],
> +            match_kv(K, V, Left, !Stack)

... you put empty on the stack indirectly, here and in the cases for three
and four nodes.

Given this fact, I don't think the Right = empty test above is speeding
up the code directly; it just does a test that would be done later anyway.
The reason why I think it is still useful is that it avoids allocating
a cons cell, which should speed up the code indirectly by postponing gc
a tiny bit.

> --- /dev/null
> +++ b/tests/general/map_equal.exp
> @@ -0,0 +1,10 @@
> +Map1 = Map1: unifiable, equal
> +Map1 = Map1Copy: unifiable, equal
> +Map1 = Map2: not unifiable, not equal
> +Map2 = Map1: not unifiable, not equal
> +Map1 = empty: not unifiable, not equal
> +empty = Map1: not unifiable, not equal
> +empty = empty: unifiable, equal
> +empty = copy(empty): unifiable, equal
> +Map3 = Map4: not unifiable, equal
> +Map4 = Map3: not unifiable, equal

I don't think this test is sufficient to convince me about the absence
of a problem about leftover empty subtrees on StackB.

A pragmatic solution would be to put a wrapper around tree234.equal
that tests whether it gets the same result (success or failure) as
converting both maps to association lists, and aborts if it doesn't.
If some big programs work (if slowly) with such a sanity check, we can
look for a proof for why there is no such problem. If they abort ...

That is a lot of critical feedback, but Paul, the original strategy is
very well done.

Zoltan.



More information about the reviews mailing list