[m-dev.] For review: hash table implementation
Fergus Henderson
fjh at cs.mu.OZ.AU
Fri Feb 2 00:33:45 AEDT 2001
That looks good.
The main worry is how it will interact with uniqueness checking.
On 31-Jan-2001, Ralph Becket <rbeck at microsoft.com> wrote:
> % Returns the number of buckets in a hash table.
> %
> :- func num_buckets(hash_table(K, V)) = int.
> :- mode num_buckets(hash_table_ui) = out is det.
>
> % Returns the number of occupants in a hash table.
> %
> :- func num_occupants(hash_table(K, V)) = int.
> :- mode num_occupants(hash_table_ui) = out is det.
>
> % Insert an item into a hash table; if one is already there then
> % it is overwritten.
> %
> :- func insert(hash_table(K, V), K, V) = hash_table(K, V).
> :- mode insert(hash_table_di, in, in) = hash_table_uo is det.
That should be spelt `set', for consistence with `map__set'.
It would be nice to have `det_insert' and `det_update' too.
> :- func 'elem :='(K, hash_table(K, V), V) = hash_table(K, V).
> :- mode 'elem :='(in, hash_table_di, in) = hash_table_uo is det.
>
> % Delete the entry (if any) for the given key.
> :- func delete(hash_table(K, V), K) = hash_table(K, V).
> :- mode delete(hash_table_di, in) = hash_table_uo is det.
I suggest s/(if any)// and instead spell out a bit more clearly
exactly what happens in that case:
% If the key is not present, leave the map unchanged.
> % Lookup the value associated with the given key. An exception
> % is raised if there is no entry for the key.
> %
> :- func lookup(hash_table(K, V), K) = V.
> :- mode lookup(hash_table_ui, in) = out is det.
>
> :- func elem(K, hash_table(K, V)) = V.
> :- mode elem(in, hash_table_ui) = out is det.
A comment here, like the one in map.m, would make things much
clearer.
> % Like lookup, but just fails if there is no entry for the key.
> %
> :- pred search(hash_table(K, V), K, V).
> :- mode search(hash_table_ui, in, out) is semidet.
>
> % Convert a hash table into an association list.
> %
> :- func to_assoc_list(hash_table(K, V)) = assoc_list(K, V).
> :- mode to_assoc_list(hash_table_ui) = out is det.
>
> %
> ----------------------------------------------------------------------------
> %
> %
> ----------------------------------------------------------------------------
> %
>
> :- implementation.
>
> :- import_module exception, math, std_util, bool, exception, char, list.
>
>
>
> :- type hash_table(K, V)
> ---> ht(
> num_buckets :: int,
> num_occupants :: int,
> max_occupants :: int,
> hash_pred :: hash_pred(K),
> keys :: array(maybe(K)),
> values :: array(V)
> ).
Hmm, did you consider using a bitmap rather than using maybe(K)?
:- type hash_table(K, V)
---> ht(
...
bitmap :: array(int), % size = num_buckets / bits_per_int
keys :: array(K), % size = num_buckets (or 0)
values :: array(V) % size = num_buckets (or 0)
).
That would reduce the number of cells allocated quite a bit.
For a hash table with B buckets and N occupants, it would be four
cells instead of N + 3 cells.
cells words
ralph's original ht 1 6
representation + key array 1 B + 1
+ yes/1 maybes N N
+ value array 1 B + 1
= N + 3 2B + N + 8
with bitmap ht 1 7
+ key array 1 B + 1
+ bit map 1 B/8 + 1
+ value array 1 B + 1
= 4 2.125*B + 10
The access cost would probably be reasonable similar.
> % THE HASHING SCHEME
> %
> % The user provided hashing function computes two independent
> % hashes H1 and H2 for a given key K.
> %
> % We calculate D = H2 \/ 1 (to ensure that D is non-zero and
> % odd and is therefore coprime to the number of buckets)
It might be better to use (H2 << 1) \/ 1; I imagine that for typical
hash functions the lower bits of H2 will be more random that the upper ones.
> :- pred find_slot(hash_table(K, V), K, int, bool).
> :- mode find_slot(hash_table_ui, in, out, out) is det.
>
> find_slot(HT, K, H, IsEmpty) :-
> (HT ^ hash_pred)(K, Hash1a, Hash2a),
> int__abs(Hash1a, Hash1),
> int__abs(Hash2a, Hash2),
> H0 = Hash1 `rem` HT ^ num_buckets,
> Delta = Hash2 \/ 1, % Have to ensure it's odd and
> non-zero.
> find_slot_0(HT, K, H0, Delta, H, IsEmpty).
>
> %
> ----------------------------------------------------------------------------
> %
>
> :- pred find_slot_0(hash_table(K, V), K, int, int, int, bool).
> :- mode find_slot_0(hash_table_ui, in, in, in, out, out) is det.
Don't put a ------- line between things like `find_slot' and
`find_slot_0'. Conceptually these are a unit.
Also, `find_slot_0' should be named `find_slot_2',
according to our coding guidelines, and for consistency
with the code elsewhere.
--
Fergus Henderson <fjh at cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list