<div dir="ltr">I agree with Paul in principle.<div><br></div><div>Although the order of the first two arguments is semantically irrelevant, it matters for performance. I think about operations like this from an object-oriented perspective. In an object-oriented language, map.merge might be written as:</div>
<div><br></div><div>void Map::Merge(Map other);</div><div><br></div><div>That is, it takes an "other" map, and merges it into this one, destructively updating the "this" map. In this context, it would be relatively uncontroversial to implement it assuming "this" is big and "other" is small, so that you could repeatedly add small "other" maps to an ever-growing "this" map.</div>
<div><br></div><div>In Mercury, I would translate a void destructive method into a predicate where the "this" object is the last two arguments (in, out). So:</div><div><br></div><div>map.merge(Other, !This).</div>
<div><br></div><div>This lets you, for example, use list.foldl to merge a bunch of small maps into a single large map with the best performance.</div><div><br></div><div>Having said all that, I <i>strongly dislike</i> changing libraries like this. I've been burned by this before, when the Mercury team re-ordered some arguments to map.insert to be more conducive to state variables, breaking hundreds of lines of code. This change would be possibly worse, because it wouldn't triggered any compiler errors, but may introduce subtle performance issues into peoples' code that has already taken into account the characteristics of this predicate. (Note: This is just my opinion; I have no personal stake in this as I don't think I have any code that uses this predicate.)</div>
<div><br></div><div>(Does svmap have a 'merge' with the arguments in the correct order? That was the workaround last time. If not, why not add it there?)</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Dec 9, 2013 at 6:32 PM, Paul Bone <span dir="ltr"><<a href="mailto:paul@bone.id.au" target="_blank">paul@bone.id.au</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Mon, Dec 09, 2013 at 04:54:18PM +1100, Julien Fischer wrote:<br>
> On Mon, Dec 9, 2013 at 4:56 PM, Paul Bone <<a href="mailto:paul@bone.id.au">paul@bone.id.au</a>> wrote:<br>
><br>
> > On Mon, Dec 09, 2013 at 04:40:42PM +1100, Julien Fischer wrote:<br>
> > > On Mon, Dec 9, 2013 at 4:31 PM, Paul Bone <<a href="mailto:paul@bone.id.au">paul@bone.id.au</a>> wrote:<br>
> > > ><br>
> > > > Check again, I'm asserting that the _current_ one isn't good for SV<br>
> > > ><br>
> > ><br>
> > > I get that.<br>
> > ><br>
> > ><br>
> > > > notation and the proposed change is.<br>
> > ><br>
> > ><br>
> > > *What* proposed change? So far as I can see you haven't proposed<br>
> > > any concrete change. My two cents on this: the current predicate<br>
> > versions<br>
> > > of map.merge/3 is fine because:<br>
> ><br>
> > I'm proposing that merge(MapA, MapB, Map) take time proportional to the<br>
> > size<br>
> > of MapA. Currently it takes time proportional to the size of MapB.<br>
> ><br>
> > > (1) the efficiency implications of the argument ordering are _clearly_<br>
> > > documented<br>
> > > (2) the arguments are in the order that is consistent with what the<br>
> > > rest of the standard library expects, e.g. list.foldl(map.merge, … does<br>
> > what<br>
> > > you would expect it to.<br>
> ><br>
> > No it doesn't.<br>
> ><br>
><br>
> What on earth do you think it does then?<br>
<br>
</div>Here is the declaration and definition:<br>
<div class="im"><br>
% Merge the contents of the two maps.<br>
% Throws an exception if both sets of keys are not disjoint.<br>
%<br>
% The cost of this predicate is proportional to the number of elements<br>
% in the second map, so for efficiency, you want to put the bigger map<br>
% first and the smaller map second.<br>
%<br>
:- func merge(map(K, V), map(K, V)) = map(K, V).<br>
:- pred merge(map(K, V)::in, map(K, V)::in, map(K, V)::out) is det.<br>
<br>
</div> map.merge(MA, MB, M) :-<br>
map.to_assoc_list(MB, MBList),<br>
map.det_insert_from_assoc_list(MBList, MA, M).<br>
<br>
% Apply map.det_insert to key - value pairs from the assoc_lists.<br>
%<br>
:- pred map.det_insert_from_assoc_list(assoc_list(K, V)::in,<br>
<div class="im"> map(K, V)::in, map(K, V)::out) is det.<br>
<br>
</div> map.det_insert_from_assoc_list([], !Map).<br>
map.det_insert_from_assoc_list([K - V | KVs], !Map) :-<br>
map.det_insert(K, V, !Map),<br>
map.det_insert_from_assoc_list(KVs, !Map).<br>
<br>
<br>
I think that it transforms the _second_ argument (MB) into an assoc list,<br>
which takes linear time. Then det_insert_from_assoc_list recurses over that<br>
list (also linear time in the _second_ argument). On each iteration it<br>
updates the map that it was passed as it goes, this map was the _first_<br>
argument to merge/3, each update costs logarithmic time in the cost of the<br>
growing map. So the cost is approximately:<br>
<br>
Cost ~= MB + MB * log(MA)<br>
<br>
This is roughly O(N * log(M)) where N is the size of the _second_ map and M<br>
is the size of the _first_ map.<br>
<br>
I believe this interpretation is correct and I've checked it a number of<br>
times. Please let me know if I'm mistaken.<br>
<br>
A function that is linear in it's argument will grow faster than one that is<br>
logarithmic in it's argument. Therefore the dominating cost is the linear<br>
one. Therefore merge/3's execution time is proportional to the size of its<br>
_second_ argument.<br>
<br>
Therefore, if you have two maps you wish to merge it is more optimal to pass<br>
the larger map as the _first_ argument of merge/3, and the smaller one as<br>
the second argument. When merge/3 is used as a higher order parameter in a<br>
predicate such as list.foldl/4. The accumulator is passed in the second and<br>
third arguments of merge/3. I would expect an accumulating map to be larger<br>
than the sub-maps being merged, therefore foldl over merge/3 is inefficient,<br>
as it passes the larger map in the _second_ argument rather than the<br>
_first_.<br>
<div class="im"><br>
<br>
> > (3) there is no additional overhead in traversing the input maps as<br>
> > > with Zoltan's proposed version that determines the sizes of the<br>
> > > inputs dynamically.<br>
> ><br>
> > I agree with this point. However Zoltan's solution was to keep a version<br>
> > of<br>
> > the predicate no additional runtime costs for those programmers that know<br>
> > which map is smaller.<br>
><br>
><br>
> Which should be map.merge/3.<br>
><br>
> I'm curious to why (of all things) you think this is a problem.<br>
<br>
</div>Because 1) performance is important, 2) be both can't be right about the<br>
characteristics of merge/3, so at least one of us is writing pathologically<br>
slow code. That is important to me.<br>
<br>
Regarding the original issue, a simple matter of convenience. That is not<br>
very important, however it is still somewhat important so I maintain that it<br>
was useful to ask the question. For example, Zoltan's assertion that these<br>
things are usually _not_ used with state variable notation is useful in<br>
determining if my proposed change is valuable.<br>
<br>
Thanks.<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
--<br>
Paul Bone<br>
<a href="http://www.bone.id.au" target="_blank">http://www.bone.id.au</a><br>
_______________________________________________<br>
developers mailing list<br>
<a href="mailto:developers@lists.mercurylang.org">developers@lists.mercurylang.org</a><br>
<a href="http://lists.mercurylang.org/listinfo/developers" target="_blank">http://lists.mercurylang.org/listinfo/developers</a><br>
</div></div></blockquote></div><br></div>