[m-dev.] map.merge argument order

Paul Bone paul at bone.id.au
Mon Dec 9 18:32:04 AEDT 2013


On Mon, Dec 09, 2013 at 04:54:18PM +1100, Julien Fischer wrote:
> On Mon, Dec 9, 2013 at 4:56 PM, Paul Bone <paul at bone.id.au> wrote:
> 
> > On Mon, Dec 09, 2013 at 04:40:42PM +1100, Julien Fischer wrote:
> > > On Mon, Dec 9, 2013 at 4:31 PM, Paul Bone <paul at bone.id.au> wrote:
> > > >
> > > > Check again, I'm asserting that the _current_ one isn't good for SV
> > > >
> > >
> > > I get that.
> > >
> > >
> > > > notation and the proposed change is.
> > >
> > >
> > > *What* proposed change?  So far as I can see you haven't proposed
> > > any concrete change.  My two cents on this:  the current predicate
> > versions
> > > of map.merge/3 is fine because:
> >
> > I'm proposing that merge(MapA, MapB, Map) take time proportional to the
> > size
> > of MapA.  Currently it takes time proportional to the size of MapB.
> >
> > > (1) the efficiency implications of the argument ordering are _clearly_
> > > documented
> > > (2) the arguments are in the order that is consistent with what the
> > > rest of the standard library expects, e.g. list.foldl(map.merge, … does
> > what
> > > you would expect it to.
> >
> > No it doesn't.
> >
> 
> What on earth do you think it does then?

Here is the declaration and definition:

        % Merge the contents of the two maps.
        % Throws an exception if both sets of keys are not disjoint.
        %
        % The cost of this predicate is proportional to the number of elements
        % in the second map, so for efficiency, you want to put the bigger map
        % first and the smaller map second.
        %
    :- func merge(map(K, V), map(K, V)) = map(K, V).
    :- pred merge(map(K, V)::in, map(K, V)::in, map(K, V)::out) is det.

    map.merge(MA, MB, M) :-
        map.to_assoc_list(MB, MBList),
        map.det_insert_from_assoc_list(MBList, MA, M).

        % Apply map.det_insert to key - value pairs from the assoc_lists.
        %
    :- pred map.det_insert_from_assoc_list(assoc_list(K, V)::in,
        map(K, V)::in, map(K, V)::out) is det.

    map.det_insert_from_assoc_list([], !Map).
    map.det_insert_from_assoc_list([K - V | KVs], !Map) :-
        map.det_insert(K, V, !Map),
        map.det_insert_from_assoc_list(KVs, !Map).


I think that it transforms the _second_ argument (MB) into an assoc list,
which takes linear time.  Then det_insert_from_assoc_list recurses over that
list (also linear time in the _second_ argument).  On each iteration it
updates the map that it was passed as it goes, this map was the _first_
argument to merge/3, each update costs logarithmic time in the cost of the
growing map.  So the cost is approximately:

    Cost ~= MB + MB * log(MA)

This is roughly O(N * log(M)) where N is the size of the _second_ map and M
is the size of the _first_ map.

I believe this interpretation is correct and I've checked it a number of
times.  Please let me know if I'm mistaken.

A function that is linear in it's argument will grow faster than one that is
logarithmic in it's argument.  Therefore the dominating cost is the linear
one.  Therefore merge/3's execution time is proportional to the size of its
_second_ argument.

Therefore, if you have two maps you wish to merge it is more optimal to pass
the larger map as the _first_ argument of merge/3, and the smaller one as
the second argument.  When merge/3 is used as a higher order parameter in a
predicate such as list.foldl/4.  The accumulator is passed in the second and
third arguments of merge/3.  I would expect an accumulating map to be larger
than the sub-maps being merged, therefore foldl over merge/3 is inefficient,
as it passes the larger map in the _second_ argument rather than the
_first_.


> > (3) there is no additional overhead in traversing the input maps as
> > > with Zoltan's proposed version that determines the sizes of the
> > > inputs dynamically.
> >
> > I agree with this point.  However Zoltan's solution was to keep a version
> > of
> > the predicate no additional runtime costs for those programmers that know
> > which map is smaller.
> 
> 
> Which should be map.merge/3.
> 
> I'm curious to why (of all things) you think this is a problem.

Because 1) performance is important, 2) be both can't be right about the
characteristics of merge/3, so at least one of us is writing pathologically
slow code.  That is important to me.

Regarding the original issue, a simple matter of convenience.  That is not
very important, however it is still somewhat important so I maintain that it
was useful to ask the question.  For example, Zoltan's assertion that these
things are usually _not_ used with state variable notation is useful in
determining if my proposed change is valuable.

Thanks.


-- 
Paul Bone
http://www.bone.id.au



More information about the developers mailing list