[m-rev.] for review: rationalise hash functions across the standard library
Peter Wang
novalazy at gmail.com
Sun Feb 16 13:25:14 AEDT 2020
On Sat, 15 Feb 2020 11:11:21 +1100 (AEDT), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
>
>
> On Fri, 14 Feb 2020 20:27:39 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
> > You can go ahead; I won't be touching hash_table code for a few
> > weeks at least.
>
> Done. The diff is attached. It is for review by Peter, with special attention
> to the new XXXs.
>
> Peter, could you tell me why version_hash_table uses user-defined equality?
> The change was done by pbone, but I hope you may have discussed the
> issue with him. The discussion on m-rev in May 2013, when the user defined
> equality was added, says nothing about *why* it was added, except one
> possibly-relevant vague reference to "MC needing it", and I see nothing
> relevant in that time frame in m-dev.
I don't know.
> diff --git a/library/assoc_list.m b/library/assoc_list.m
> index fd38e7271..75459f5e0 100644
> --- a/library/assoc_list.m
> +++ b/library/assoc_list.m
> @@ -91,6 +91,14 @@
> %
> :- func assoc_list(K, V) ^ det_elem(K) = V is det.
>
> +%---------------------------------------------------------------------------%
> +%
> +% Updating elements in assoc_lists.
> +%
> +
> +:- pred update(K::in, V::in, assoc_list(K, V)::in, assoc_list(K, V)::out)
> + is semidet.
> +
Add documentation for this, in particular noting that only the first
element with the given key is updated.
> @@ -564,42 +608,59 @@ lookup(HT, K) =
> func_error($pred, "key not found")
> ).
>
> +% XXX The convention in other library modules is that
> +% - elem is shorthand for search, NOT lookup, and
> +% - det_elem is shorthand for lookup.
> elem(K, HT) = lookup(HT, K).
I wouldn't mind doing away with all elem and det_elems.
> +:- pred foldlf(func(K, V, T) = T, kv_list(K, V), T, T).
> +:- mode foldlf(func(in, in, in) = out is det, in, in, out) is det.
> +:- mode foldlf(func(in, in, di) = uo is det, in, di, uo) is det.
> +
> +foldlf(F, KVs, !A) :-
> + (
> + KVs = kv_nil
> + ;
> + KVs = kv_cons(K, V, TailKVs),
> + !:A = F(K, V, !.A),
> + foldlf(F, TailKVs, !A)
> + ).
> +
I guess you didn't want to add this to kv_list as well. That's fine.
> -munge(N, X) =
> - (X `unchecked_left_shift` N) `xor`
> - (X `unchecked_right_shift` (int.bits_per_int - N)).
> +accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
> + % XXX This is a REALLY BAD algorithm, with shift amounts that
> + % will routinely exceed the word size.
> + HashAcc =
> + (HashAcc0 `unchecked_left_shift` HashValue) `xor`
> + (HashAcc0 `unchecked_right_shift` (int.bits_per_int - HashValue)).
Oops, munge was originally called with fixed values of N, 5 or 3.
By the original intention, accumulate_hash_value should be:
accumulate_hash_value(HashValue, HashAcc0, HashAcc) :-
N = 5,
HashAcc =
(HashAcc0 `unchecked_left_shift` N) `xor`
(HashAcc0 `unchecked_right_shift` (int.bits_per_int - N)) `xor`
HashValue.
Something like FNV-1a should do a lot better,
but generic_hash is deprecated anyway.
> diff --git a/library/kv_list.m b/library/kv_list.m
> index cf951782d..e63940802 100644
> --- a/library/kv_list.m
> +++ b/library/kv_list.m
> @@ -107,6 +107,14 @@
> %
> :- func kv_list(K, V) ^ det_elem(K) = V is det.
>
> +%---------------------------------------------------------------------------%
> +%
> +% Updating elements in kv_lists.
> +%
> +
> +:- pred update(K::in, V::in, kv_list(K, V)::in, kv_list(K, V)::out)
> + is semidet.
> +
This needs a comment as well.
The rest looks fine.
Peter
More information about the reviews
mailing list