[m-rev.] for post-commit review: det_remove*, rev_sorted_list_to_set

Julien Fischer jfischer at opturion.com
Mon Sep 9 14:40:05 AEST 2019


Hi Zoltan,

On Mon, 9 Sep 2019, Zoltan Somogyi wrote:

> On Mon, 9 Sep 2019 12:56:31 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
>>>     Document that the input to sorted_list_to_set predicates and functions
>>>     should not have duplicates, unless the predicate in question detects
>>>     and removes duplicates. The different set modules are inconsistent
>>>     in this regard.
>>
>> Perhaps we ought to give the version that doesn't check a different name,
>> e.g. unchecked_sorted_list_to_set?
>
> Given the existence of list_to_set, the point of sorted_list_to_set is to allow
> the avoidance of the overhead of (a) sorting the list, and (b) deleting any duplicates.
> In theory, we could have THREE unchecked versions: one that checks sortedness,
> one that checks the absence of adjacent duplicates, and one that checks both.
>
> As it happens, only the last makes sense. Since it can be done by calling compare
> on each pair of adjacent elements and throwing an exception on = and >, it has the
> same overhead as the other two while offering more safety.
>
> So yes, having one checked and one unchecked version works. But this leaves
> an interesting question: how do we notify users that they should consider
> replacing existing calls to sorted_list_to_set with the unchecked versions?
> In most of the set modules, the existing sorted_list_to_set *was* unchecked,
> so any call site that has been working all this time should switch over to the
> unchecked version for performance. But we can't mark sorted_list_to_set as
> obsolete if that remains the name of the checked version. But just adding
> checking to these modules' versions of sorted_list_to_set would silently
> add overhead to all their existing callers. One way to avoid this would be
> to name the two versions unchecked_sorted_list_to_set and
> checked_sorted_list_to_set, keep the existing sorted_list_to_set in each
> set module, and mark it obsolete in favor of the checked and unchecked versions.
> Opinions?

In the short term:

As you suggest, add the unchecked_ and checked_ versions, deprecate the
unprefixed version.

In the longer term:

Change the unprefixed version to be a synonym for the checked_ version
and deprecate the checked_version.  The practice throughout most of the
stdlib is for unsafe / unchecked predicates to be prefixed and to use
the unprefixed names for the safe versions.

> In some modules, there is also a version of sorted_list_to_set that takes
> the length of the list as an extra parameter, which exists solely to avoid
> the cost of computing the length inside sorted_list_to_set when the caller
> already knows that info. There is no point in offering a checked version
> of that. On the other hand, I think it would make sense to change its argument
> order from <List, Set, ListLen> to <List, ListLen, Set>. This would be
> a breaking change, but we could add an unchecked_ prefix to its name
> at the same time. Opinions?

IIRC, that occurs only in set_bbbtree; I've never seen that module used
in the wild, so I think we can get away with a breaking change here.

>> One note about this; although they are more specialised, diet.m,
>> fat_sparse_bitset, sparse_bitset.m and tree_bitset.m would probably
>> benefit from following the same predicate ordering (where possible) as
>> the generic set implementations.
>
> I forgot about the bitset versions. While I saw the diffs for diet.m go by,
> they did not form a lasting impression. I think it may be worth considering
> whether to rename diet.m to set_diet.m.

That's not worth doing IMO.

Julien.


More information about the reviews mailing list