[m-rev.] for post-commit review: det_remove*, rev_sorted_list_to_set

Peter Wang novalazy at gmail.com
Mon Sep 9 14:42:41 AEST 2019


On Mon, 09 Sep 2019 13:51:56 +1000 (AEST), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> 
> On Mon, 9 Sep 2019 12:56:31 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
> > >     Document that the input to sorted_list_to_set predicates and functions
> > >     should not have duplicates, unless the predicate in question detects
> > >     and removes duplicates. The different set modules are inconsistent
> > >     in this regard.
> > 
> > Perhaps we ought to give the version that doesn't check a different name,
> > e.g. unchecked_sorted_list_to_set?
> 
> Given the existence of list_to_set, the point of sorted_list_to_set is to allow
> the avoidance of the overhead of (a) sorting the list, and (b) deleting any duplicates.
> In theory, we could have THREE unchecked versions: one that checks sortedness,
> one that checks the absence of adjacent duplicates, and one that checks both.
> 
> As it happens, only the last makes sense. Since it can be done by calling compare
> on each pair of adjacent elements and throwing an exception on = and >, it has the
> same overhead as the other two while offering more safety.
> 
> So yes, having one checked and one unchecked version works. But this leaves
> an interesting question: how do we notify users that they should consider
> replacing existing calls to sorted_list_to_set with the unchecked versions?
> In most of the set modules, the existing sorted_list_to_set *was* unchecked,
> so any call site that has been working all this time should switch over to the
> unchecked version for performance. But we can't mark sorted_list_to_set as
> obsolete if that remains the name of the checked version. But just adding
> checking to these modules' versions of sorted_list_to_set would silently
> add overhead to all their existing callers. One way to avoid this would be
> to name the two versions unchecked_sorted_list_to_set and
> checked_sorted_list_to_set, keep the existing sorted_list_to_set in each
> set module, and mark it obsolete in favor of the checked and unchecked versions.
> Opinions?

To me, the predicates/functions with "sorted_list_to_" in their names
already indicate that it is the caller's responsibility to pass sorted
lists as input. The simplest solution is to leave them as unchecked, but
make the documentation more explicit.

It's unfortunately that sorted_list_to_set assumes no duplicates.
I think predicates/functions that assume no duplicates should have an
indication in their names, e.g. sorted_no_dup_list_to_set.

> 
> In some modules, there is also a version of sorted_list_to_set that takes
> the length of the list as an extra parameter, which exists solely to avoid
> the cost of computing the length inside sorted_list_to_set when the caller
> already knows that info. There is no point in offering a checked version
> of that. On the other hand, I think it would make sense to change its argument
> order from <List, Set, ListLen> to <List, ListLen, Set>. This would be
> a breaking change, but we could add an unchecked_ prefix to its name
> at the same time. Opinions?

It only exists in set_bbbtree, right? I've never seen that used so
I would go as far as removing the _len version to reduce unused code.

Peter


More information about the reviews mailing list