[m-dev.] For review: hash table implementation

Fergus Henderson fjh at cs.mu.OZ.AU
Fri Feb 2 00:33:45 AEDT 2001


That looks good.

The main worry is how it will interact with uniqueness checking.

On 31-Jan-2001, Ralph Becket <rbeck at microsoft.com> wrote:
>     % Returns the number of buckets in a hash table.
>     %
> :- func num_buckets(hash_table(K, V)) = int.
> :- mode num_buckets(hash_table_ui) = out is det.
> 
>     % Returns the number of occupants in a hash table.
>     %
> :- func num_occupants(hash_table(K, V)) = int.
> :- mode num_occupants(hash_table_ui) = out is det.
> 
>     % Insert an item into a hash table; if one is already there then
>     % it is overwritten.
>     %
> :- func insert(hash_table(K, V), K, V) = hash_table(K, V).
> :- mode insert(hash_table_di, in, in) = hash_table_uo is det.

That should be spelt `set', for consistence with `map__set'.
It would be nice to have `det_insert' and `det_update' too.

> :- func 'elem :='(K, hash_table(K, V), V) = hash_table(K, V).
> :- mode 'elem :='(in, hash_table_di, in) = hash_table_uo is det.
> 
>     % Delete the entry (if any) for the given key.
> :- func delete(hash_table(K, V), K) = hash_table(K, V).
> :- mode delete(hash_table_di, in) = hash_table_uo is det.

I suggest s/(if any)// and instead spell out a bit more clearly
exactly what happens in that case:

      % If the key is not present, leave the map unchanged.

>     % Lookup the value associated with the given key.  An exception
>     % is raised if there is no entry for the key.
>     %
> :- func lookup(hash_table(K, V), K) = V.
> :- mode lookup(hash_table_ui, in) = out is det.
> 
> :- func elem(K, hash_table(K, V)) = V.
> :- mode elem(in, hash_table_ui) = out is det.

A comment here, like the one in map.m, would make things much
clearer.

>     % Like lookup, but just fails if there is no entry for the key.
>     %
> :- pred search(hash_table(K, V), K, V).
> :- mode search(hash_table_ui, in, out) is semidet.
> 
>     % Convert a hash table into an association list.
>     %
> :- func to_assoc_list(hash_table(K, V)) = assoc_list(K, V).
> :- mode to_assoc_list(hash_table_ui) = out is det.
> 
> %
> ----------------------------------------------------------------------------
> %
> %
> ----------------------------------------------------------------------------
> %
> 
> :- implementation.
> 
> :- import_module exception, math, std_util, bool, exception, char, list.
> 
> 
> 
> :- type hash_table(K, V) 
>     --->    ht(
>                 num_buckets             :: int,
>                 num_occupants           :: int,
>                 max_occupants           :: int,
>                 hash_pred               :: hash_pred(K),
>                 keys                    :: array(maybe(K)),
>                 values                  :: array(V)
>             ).

Hmm, did you consider using a bitmap rather than using maybe(K)?

:- type hash_table(K, V) 
    --->    ht(
		...
                bitmap                  :: array(int), % size = num_buckets / bits_per_int
                keys                    :: array(K),   % size = num_buckets (or 0)
                values                  :: array(V)    % size = num_buckets (or 0)
            ).

That would reduce the number of cells allocated quite a bit.
For a hash table with B buckets and N occupants, it would be four
cells instead of N + 3 cells.

						cells		words
	ralph's original	  ht		1		6
	representation		+ key array	1		B + 1
				+ yes/1 maybes	N		N
				+ value array	1		B + 1
				= 		N + 3		2B + N + 8

	with bitmap		  ht		1		7
				+ key array	1		B + 1
				+ bit map	1		B/8 + 1
				+ value array	1		B + 1
				=		4		2.125*B + 10

The access cost would probably be reasonable similar.

>     % THE HASHING SCHEME
>     %
>     % The user provided hashing function computes two independent
>     % hashes H1 and H2 for a given key K.
>     %
>     % We calculate D = H2 \/ 1 (to ensure that D is non-zero and
>     % odd and is therefore coprime to the number of buckets)

It might be better to use (H2 << 1) \/ 1; I imagine that for typical
hash functions the lower bits of H2 will be more random that the upper ones.

> :- pred find_slot(hash_table(K, V), K, int, bool).
> :- mode find_slot(hash_table_ui, in, out, out) is det.
> 
> find_slot(HT, K, H, IsEmpty) :-
>     (HT ^ hash_pred)(K, Hash1a, Hash2a),
>     int__abs(Hash1a, Hash1),
>     int__abs(Hash2a, Hash2),
>     H0    = Hash1 `rem` HT ^ num_buckets,
>     Delta = Hash2 \/ 1,                 % Have to ensure it's odd and
> non-zero.
>     find_slot_0(HT, K, H0, Delta, H, IsEmpty).
> 
> %
> ----------------------------------------------------------------------------
> %
> 
> :- pred find_slot_0(hash_table(K, V), K, int, int, int, bool).
> :- mode find_slot_0(hash_table_ui, in, in, in, out, out) is det.

Don't put a ------- line between things like `find_slot' and
`find_slot_0'.  Conceptually these are a unit.

Also, `find_slot_0' should be named `find_slot_2',
according to our coding guidelines, and for consistency
with the code elsewhere.


-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list