[m-rev.] for review: rationalise hash functions across the standard library

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun Feb 16 13:38:13 AEDT 2020


2020-02-16 13:25 GMT+11:00 Peter Wang<novalazy at gmail.com>:
>> Peter, could you tell me why version_hash_table uses user-defined equality?
>> The change was done by pbone, but I hope you may have discussed the
>> issue with him. The discussion on m-rev in May 2013, when the user defined
>> equality was added, says nothing about *why* it was added, except one
>> possibly-relevant vague reference to "MC needing it", and I see nothing
>> relevant in that time frame in m-dev.
> 
> I don't know.

In that case, would you mind if I deleted it? Semantically, yes,
the equality we want is the one now specified by "where equality is"
and not the default syntactic equality, but the same is true for
most of the other collection types, and we don't have user defined
equality for most of them. So having them for version_hash_tables
is just a (seemingly) pointless difference from hash_tables themselves.

>> +:- pred update(K::in, V::in, assoc_list(K, V)::in, assoc_list(K, V)::out)
>> +    is semidet.
>> +
> 
> Add documentation for this, in particular noting that only the first
> element with the given key is updated.

Will do, in both modules.

>> @@ -564,42 +608,59 @@ lookup(HT, K) =
>>          func_error($pred, "key not found")
>>      ).
>>  
>> +% XXX The convention in other library modules is that
>> +% - elem is shorthand for search, NOT lookup, and
>> +% - det_elem is shorthand for lookup.
>>  elem(K, HT) = lookup(HT, K).
> 
> I wouldn't mind doing away with all elem and det_elems.

I wouldn't either, but that would break existing code.

>> +:- pred foldlf(func(K, V, T) = T, kv_list(K, V), T, T).
>> +:- mode foldlf(func(in, in, in) = out is det, in, in, out) is det.
>> +:- mode foldlf(func(in, in, di) = uo is det, in, di, uo) is det.
>> +
>> +foldlf(F, KVs, !A) :-
>> +    (
>> +        KVs = kv_nil
>> +    ;
>> +        KVs = kv_cons(K, V, TailKVs),
>> +        !:A = F(K, V, !.A),
>> +        foldlf(F, TailKVs, !A)
>> +    ).
>> +
> 
> I guess you didn't want to add this to kv_list as well. That's fine.

No, I didn't. I don't think hash_table.m itself should have a fold *predicate*
that takes a *function*, since this goes against the convention we use
in all the other standard library modules.

>> -munge(N, X) =
>> -    (X `unchecked_left_shift` N) `xor`
>> -    (X `unchecked_right_shift` (int.bits_per_int - N)).
>> +accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
>> +    % XXX This is a REALLY BAD algorithm, with shift amounts that
>> +    % will routinely exceed the word size.
>> +    HashAcc =
>> +        (HashAcc0 `unchecked_left_shift` HashValue) `xor`
>> +        (HashAcc0 `unchecked_right_shift` (int.bits_per_int - HashValue)).
> 
> Oops, munge was originally called with fixed values of N, 5 or 3.
> By the original intention, accumulate_hash_value should be:
> 
> accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
>     N = 5,
>     HashAcc =
>         (HashAcc0 `unchecked_left_shift` N) `xor`
>         (HashAcc0 `unchecked_right_shift` (int.bits_per_int - N)) `xor`
> 	HashValue.
> 
> Something like FNV-1a should do a lot better,
> but generic_hash is deprecated anyway.

OK, I will leave it as it is for now.

Thanks for the review.

Zoltan.


More information about the reviews mailing list