[m-dev.] map.merge argument order

Matt Giuca matt.giuca at gmail.com
Mon Dec 9 19:05:02 AEDT 2013


I agree with Paul in principle.

Although the order of the first two arguments is semantically irrelevant,
it matters for performance. I think about operations like this from an
object-oriented perspective. In an object-oriented language, map.merge
might be written as:

void Map::Merge(Map other);

That is, it takes an "other" map, and merges it into this one,
destructively updating the "this" map. In this context, it would be
relatively uncontroversial to implement it assuming "this" is big and
"other" is small, so that you could repeatedly add small "other" maps to an
ever-growing "this" map.

In Mercury, I would translate a void destructive method into a predicate
where the "this" object is the last two arguments (in, out). So:

map.merge(Other, !This).

This lets you, for example, use list.foldl to merge a bunch of small maps
into a single large map with the best performance.

Having said all that, I *strongly dislike* changing libraries like this.
I've been burned by this before, when the Mercury team re-ordered some
arguments to map.insert to be more conducive to state variables, breaking
hundreds of lines of code. This change would be possibly worse, because it
wouldn't triggered any compiler errors, but may introduce subtle
performance issues into peoples' code that has already taken into account
the characteristics of this predicate. (Note: This is just my opinion; I
have no personal stake in this as I don't think I have any code that uses
this predicate.)

(Does svmap have a 'merge' with the arguments in the correct order? That
was the workaround last time. If not, why not add it there?)


On Mon, Dec 9, 2013 at 6:32 PM, Paul Bone <paul at bone.id.au> wrote:

> On Mon, Dec 09, 2013 at 04:54:18PM +1100, Julien Fischer wrote:
> > On Mon, Dec 9, 2013 at 4:56 PM, Paul Bone <paul at bone.id.au> wrote:
> >
> > > On Mon, Dec 09, 2013 at 04:40:42PM +1100, Julien Fischer wrote:
> > > > On Mon, Dec 9, 2013 at 4:31 PM, Paul Bone <paul at bone.id.au> wrote:
> > > > >
> > > > > Check again, I'm asserting that the _current_ one isn't good for SV
> > > > >
> > > >
> > > > I get that.
> > > >
> > > >
> > > > > notation and the proposed change is.
> > > >
> > > >
> > > > *What* proposed change?  So far as I can see you haven't proposed
> > > > any concrete change.  My two cents on this:  the current predicate
> > > versions
> > > > of map.merge/3 is fine because:
> > >
> > > I'm proposing that merge(MapA, MapB, Map) take time proportional to the
> > > size
> > > of MapA.  Currently it takes time proportional to the size of MapB.
> > >
> > > > (1) the efficiency implications of the argument ordering are
> _clearly_
> > > > documented
> > > > (2) the arguments are in the order that is consistent with what the
> > > > rest of the standard library expects, e.g. list.foldl(map.merge, …
> does
> > > what
> > > > you would expect it to.
> > >
> > > No it doesn't.
> > >
> >
> > What on earth do you think it does then?
>
> Here is the declaration and definition:
>
>         % Merge the contents of the two maps.
>         % Throws an exception if both sets of keys are not disjoint.
>         %
>         % The cost of this predicate is proportional to the number of
> elements
>         % in the second map, so for efficiency, you want to put the bigger
> map
>         % first and the smaller map second.
>         %
>     :- func merge(map(K, V), map(K, V)) = map(K, V).
>     :- pred merge(map(K, V)::in, map(K, V)::in, map(K, V)::out) is det.
>
>     map.merge(MA, MB, M) :-
>         map.to_assoc_list(MB, MBList),
>         map.det_insert_from_assoc_list(MBList, MA, M).
>
>         % Apply map.det_insert to key - value pairs from the assoc_lists.
>         %
>     :- pred map.det_insert_from_assoc_list(assoc_list(K, V)::in,
>         map(K, V)::in, map(K, V)::out) is det.
>
>     map.det_insert_from_assoc_list([], !Map).
>     map.det_insert_from_assoc_list([K - V | KVs], !Map) :-
>         map.det_insert(K, V, !Map),
>         map.det_insert_from_assoc_list(KVs, !Map).
>
>
> I think that it transforms the _second_ argument (MB) into an assoc list,
> which takes linear time.  Then det_insert_from_assoc_list recurses over
> that
> list (also linear time in the _second_ argument).  On each iteration it
> updates the map that it was passed as it goes, this map was the _first_
> argument to merge/3, each update costs logarithmic time in the cost of the
> growing map.  So the cost is approximately:
>
>     Cost ~= MB + MB * log(MA)
>
> This is roughly O(N * log(M)) where N is the size of the _second_ map and M
> is the size of the _first_ map.
>
> I believe this interpretation is correct and I've checked it a number of
> times.  Please let me know if I'm mistaken.
>
> A function that is linear in it's argument will grow faster than one that
> is
> logarithmic in it's argument.  Therefore the dominating cost is the linear
> one.  Therefore merge/3's execution time is proportional to the size of its
> _second_ argument.
>
> Therefore, if you have two maps you wish to merge it is more optimal to
> pass
> the larger map as the _first_ argument of merge/3, and the smaller one as
> the second argument.  When merge/3 is used as a higher order parameter in a
> predicate such as list.foldl/4.  The accumulator is passed in the second
> and
> third arguments of merge/3.  I would expect an accumulating map to be
> larger
> than the sub-maps being merged, therefore foldl over merge/3 is
> inefficient,
> as it passes the larger map in the _second_ argument rather than the
> _first_.
>
>
> > > (3) there is no additional overhead in traversing the input maps as
> > > > with Zoltan's proposed version that determines the sizes of the
> > > > inputs dynamically.
> > >
> > > I agree with this point.  However Zoltan's solution was to keep a
> version
> > > of
> > > the predicate no additional runtime costs for those programmers that
> know
> > > which map is smaller.
> >
> >
> > Which should be map.merge/3.
> >
> > I'm curious to why (of all things) you think this is a problem.
>
> Because 1) performance is important, 2) be both can't be right about the
> characteristics of merge/3, so at least one of us is writing pathologically
> slow code.  That is important to me.
>
> Regarding the original issue, a simple matter of convenience.  That is not
> very important, however it is still somewhat important so I maintain that
> it
> was useful to ask the question.  For example, Zoltan's assertion that these
> things are usually _not_ used with state variable notation is useful in
> determining if my proposed change is valuable.
>
> Thanks.
>
>
> --
> Paul Bone
> http://www.bone.id.au
> _______________________________________________
> developers mailing list
> developers at lists.mercurylang.org
> http://lists.mercurylang.org/listinfo/developers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/developers/attachments/20131209/91d68670/attachment.html>


More information about the developers mailing list