[m-rev.] for review: rationalise hash functions across the standard library

Peter Wang novalazy at gmail.com
Sun Feb 16 13:25:14 AEDT 2020


On Sat, 15 Feb 2020 11:11:21 +1100 (AEDT), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> 
> On Fri, 14 Feb 2020 20:27:39 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
> > You can go ahead; I won't be touching hash_table code for a few
> > weeks at least.
> 
> Done. The diff is attached. It is for review by Peter, with special attention
> to the new XXXs.
> 
> Peter, could you tell me why version_hash_table uses user-defined equality?
> The change was done by pbone, but I hope you may have discussed the
> issue with him. The discussion on m-rev in May 2013, when the user defined
> equality was added, says nothing about *why* it was added, except one
> possibly-relevant vague reference to "MC needing it", and I see nothing
> relevant in that time frame in m-dev.

I don't know.


> diff --git a/library/assoc_list.m b/library/assoc_list.m
> index fd38e7271..75459f5e0 100644
> --- a/library/assoc_list.m
> +++ b/library/assoc_list.m
> @@ -91,6 +91,14 @@
>      %
>  :- func assoc_list(K, V) ^ det_elem(K) = V is det.
>  
> +%---------------------------------------------------------------------------%
> +%
> +% Updating elements in assoc_lists.
> +%
> +
> +:- pred update(K::in, V::in, assoc_list(K, V)::in, assoc_list(K, V)::out)
> +    is semidet.
> +

Add documentation for this, in particular noting that only the first
element with the given key is updated.

> @@ -564,42 +608,59 @@ lookup(HT, K) =
>          func_error($pred, "key not found")
>      ).
>  
> +% XXX The convention in other library modules is that
> +% - elem is shorthand for search, NOT lookup, and
> +% - det_elem is shorthand for lookup.
>  elem(K, HT) = lookup(HT, K).

I wouldn't mind doing away with all elem and det_elems.

> +:- pred foldlf(func(K, V, T) = T, kv_list(K, V), T, T).
> +:- mode foldlf(func(in, in, in) = out is det, in, in, out) is det.
> +:- mode foldlf(func(in, in, di) = uo is det, in, di, uo) is det.
> +
> +foldlf(F, KVs, !A) :-
> +    (
> +        KVs = kv_nil
> +    ;
> +        KVs = kv_cons(K, V, TailKVs),
> +        !:A = F(K, V, !.A),
> +        foldlf(F, TailKVs, !A)
> +    ).
> +

I guess you didn't want to add this to kv_list as well. That's fine.

> -munge(N, X) =
> -    (X `unchecked_left_shift` N) `xor`
> -    (X `unchecked_right_shift` (int.bits_per_int - N)).
> +accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
> +    % XXX This is a REALLY BAD algorithm, with shift amounts that
> +    % will routinely exceed the word size.
> +    HashAcc =
> +        (HashAcc0 `unchecked_left_shift` HashValue) `xor`
> +        (HashAcc0 `unchecked_right_shift` (int.bits_per_int - HashValue)).

Oops, munge was originally called with fixed values of N, 5 or 3.
By the original intention, accumulate_hash_value should be:

accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
    N = 5,
    HashAcc =
        (HashAcc0 `unchecked_left_shift` N) `xor`
        (HashAcc0 `unchecked_right_shift` (int.bits_per_int - N)) `xor`
	HashValue.

Something like FNV-1a should do a lot better,
but generic_hash is deprecated anyway.

> diff --git a/library/kv_list.m b/library/kv_list.m
> index cf951782d..e63940802 100644
> --- a/library/kv_list.m
> +++ b/library/kv_list.m
> @@ -107,6 +107,14 @@
>      %
>  :- func kv_list(K, V) ^ det_elem(K) = V is det.
>  
> +%---------------------------------------------------------------------------%
> +%
> +% Updating elements in kv_lists.
> +%
> +
> +:- pred update(K::in, V::in, kv_list(K, V)::in, kv_list(K, V)::out)
> +    is semidet.
> +

This needs a comment as well.

The rest looks fine.

Peter


More information about the reviews mailing list