[m-rev.] for post-commit review: det_remove*, rev_sorted_list_to_set

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Sep 9 13:51:56 AEST 2019



On Mon, 9 Sep 2019 12:56:31 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
> >     Document that the input to sorted_list_to_set predicates and functions
> >     should not have duplicates, unless the predicate in question detects
> >     and removes duplicates. The different set modules are inconsistent
> >     in this regard.
> 
> Perhaps we ought to give the version that doesn't check a different name,
> e.g. unchecked_sorted_list_to_set?

Given the existence of list_to_set, the point of sorted_list_to_set is to allow
the avoidance of the overhead of (a) sorting the list, and (b) deleting any duplicates.
In theory, we could have THREE unchecked versions: one that checks sortedness,
one that checks the absence of adjacent duplicates, and one that checks both.

As it happens, only the last makes sense. Since it can be done by calling compare
on each pair of adjacent elements and throwing an exception on = and >, it has the
same overhead as the other two while offering more safety.

So yes, having one checked and one unchecked version works. But this leaves
an interesting question: how do we notify users that they should consider
replacing existing calls to sorted_list_to_set with the unchecked versions?
In most of the set modules, the existing sorted_list_to_set *was* unchecked,
so any call site that has been working all this time should switch over to the
unchecked version for performance. But we can't mark sorted_list_to_set as
obsolete if that remains the name of the checked version. But just adding
checking to these modules' versions of sorted_list_to_set would silently
add overhead to all their existing callers. One way to avoid this would be
to name the two versions unchecked_sorted_list_to_set and
checked_sorted_list_to_set, keep the existing sorted_list_to_set in each
set module, and mark it obsolete in favor of the checked and unchecked versions.
Opinions?

In some modules, there is also a version of sorted_list_to_set that takes
the length of the list as an extra parameter, which exists solely to avoid
the cost of computing the length inside sorted_list_to_set when the caller
already knows that info. There is no point in offering a checked version
of that. On the other hand, I think it would make sense to change its argument
order from <List, Set, ListLen> to <List, ListLen, Set>. This would be
a breaking change, but we could add an unchecked_ prefix to its name
at the same time. Opinions?

> One note about this; although they are more specialised, diet.m,
> fat_sparse_bitset, sparse_bitset.m and tree_bitset.m would probably
> benefit from following the same predicate ordering (where possible) as
> the generic set implementations.

I forgot about the bitset versions. While I saw the diffs for diet.m go by,
they did not form a lasting impression. I think it may be worth considering
whether to rename diet.m to set_diet.m.

I will have a go at them later this week.

Zoltan.


More information about the reviews mailing list