[m-dev.] map.merge argument order
Paul Bone
paul at bone.id.au
Mon Dec 9 18:32:04 AEDT 2013
On Mon, Dec 09, 2013 at 04:54:18PM +1100, Julien Fischer wrote:
> On Mon, Dec 9, 2013 at 4:56 PM, Paul Bone <paul at bone.id.au> wrote:
>
> > On Mon, Dec 09, 2013 at 04:40:42PM +1100, Julien Fischer wrote:
> > > On Mon, Dec 9, 2013 at 4:31 PM, Paul Bone <paul at bone.id.au> wrote:
> > > >
> > > > Check again, I'm asserting that the _current_ one isn't good for SV
> > > >
> > >
> > > I get that.
> > >
> > >
> > > > notation and the proposed change is.
> > >
> > >
> > > *What* proposed change? So far as I can see you haven't proposed
> > > any concrete change. My two cents on this: the current predicate
> > versions
> > > of map.merge/3 is fine because:
> >
> > I'm proposing that merge(MapA, MapB, Map) take time proportional to the
> > size
> > of MapA. Currently it takes time proportional to the size of MapB.
> >
> > > (1) the efficiency implications of the argument ordering are _clearly_
> > > documented
> > > (2) the arguments are in the order that is consistent with what the
> > > rest of the standard library expects, e.g. list.foldl(map.merge, … does
> > what
> > > you would expect it to.
> >
> > No it doesn't.
> >
>
> What on earth do you think it does then?
Here is the declaration and definition:
% Merge the contents of the two maps.
% Throws an exception if both sets of keys are not disjoint.
%
% The cost of this predicate is proportional to the number of elements
% in the second map, so for efficiency, you want to put the bigger map
% first and the smaller map second.
%
:- func merge(map(K, V), map(K, V)) = map(K, V).
:- pred merge(map(K, V)::in, map(K, V)::in, map(K, V)::out) is det.
map.merge(MA, MB, M) :-
map.to_assoc_list(MB, MBList),
map.det_insert_from_assoc_list(MBList, MA, M).
% Apply map.det_insert to key - value pairs from the assoc_lists.
%
:- pred map.det_insert_from_assoc_list(assoc_list(K, V)::in,
map(K, V)::in, map(K, V)::out) is det.
map.det_insert_from_assoc_list([], !Map).
map.det_insert_from_assoc_list([K - V | KVs], !Map) :-
map.det_insert(K, V, !Map),
map.det_insert_from_assoc_list(KVs, !Map).
I think that it transforms the _second_ argument (MB) into an assoc list,
which takes linear time. Then det_insert_from_assoc_list recurses over that
list (also linear time in the _second_ argument). On each iteration it
updates the map that it was passed as it goes, this map was the _first_
argument to merge/3, each update costs logarithmic time in the cost of the
growing map. So the cost is approximately:
Cost ~= MB + MB * log(MA)
This is roughly O(N * log(M)) where N is the size of the _second_ map and M
is the size of the _first_ map.
I believe this interpretation is correct and I've checked it a number of
times. Please let me know if I'm mistaken.
A function that is linear in it's argument will grow faster than one that is
logarithmic in it's argument. Therefore the dominating cost is the linear
one. Therefore merge/3's execution time is proportional to the size of its
_second_ argument.
Therefore, if you have two maps you wish to merge it is more optimal to pass
the larger map as the _first_ argument of merge/3, and the smaller one as
the second argument. When merge/3 is used as a higher order parameter in a
predicate such as list.foldl/4. The accumulator is passed in the second and
third arguments of merge/3. I would expect an accumulating map to be larger
than the sub-maps being merged, therefore foldl over merge/3 is inefficient,
as it passes the larger map in the _second_ argument rather than the
_first_.
> > (3) there is no additional overhead in traversing the input maps as
> > > with Zoltan's proposed version that determines the sizes of the
> > > inputs dynamically.
> >
> > I agree with this point. However Zoltan's solution was to keep a version
> > of
> > the predicate no additional runtime costs for those programmers that know
> > which map is smaller.
>
>
> Which should be map.merge/3.
>
> I'm curious to why (of all things) you think this is a problem.
Because 1) performance is important, 2) be both can't be right about the
characteristics of merge/3, so at least one of us is writing pathologically
slow code. That is important to me.
Regarding the original issue, a simple matter of convenience. That is not
very important, however it is still somewhat important so I maintain that it
was useful to ask the question. For example, Zoltan's assertion that these
things are usually _not_ used with state variable notation is useful in
determining if my proposed change is valuable.
Thanks.
--
Paul Bone
http://www.bone.id.au
More information about the developers
mailing list