[m-dev.] For review: hash table implementation

Ralph Becket rbeck at microsoft.com
Fri Feb 2 01:37:53 AEDT 2001


>From Fergus Henderson on 01/02/2001 13:36:00
> That looks good.
> 
> The main worry is how it will interact with uniqueness checking.

Yes and that's going to become more of an issue in general as time 
goes on, I reckon.

> >     % Insert an item into a hash table; if one is already there then
> >     % it is overwritten.
> >     %
> > :- func insert(hash_table(K, V), K, V) = hash_table(K, V).
> > :- mode insert(hash_table_di, in, in) = hash_table_uo is det.
> 
> That should be spelt `set', for consistence with `map__set'.
> It would be nice to have `det_insert' and `det_update' too.

Done.

> I suggest s/(if any)// and instead spell out a bit more clearly
> exactly what happens in that case:

Done.

> > :- func elem(K, hash_table(K, V)) = V.
> > :- mode elem(in, hash_table_ui) = out is det.
> 
> A comment here, like the one in map.m, would make things much
> clearer.

Done, and for 'elem :=', too.

> Hmm, did you consider using a bitmap rather than using maybe(K)?

No, but an excellent idea.

> That would reduce the number of cells allocated quite a bit.
> For a hash table with B buckets and N occupants, it would be four
> cells instead of N + 3 cells.
> 
> 						cells		words
> 	ralph's original	  ht		1		6
> 	representation		+ key array	1		B + 1
> 				+ yes/1 maybes	N		N
> 				+ value array	1		B + 1
> 				= 		N + 3		2B + N + 8
> 
> 	with bitmap		  ht		1		7
> 				+ key array	1		B + 1
> 				+ bit map	1		B/8 + 1
> 				+ value array	1		B + 1
> 				=		4		2.125*B + 10
> 
> The access cost would probably be reasonable similar.

I'd be surprised to see any difference in access time.  Also, let's hope
we get of the order of 32 bits/word rather than 8 :)

I'll implement this and re-present the diff.

> >     % We calculate D = H2 \/ 1 (to ensure that D is non-zero and
> >     % odd and is therefore coprime to the number of buckets)
> 
> It might be better to use (H2 << 1) \/ 1; I imagine that for typical
> hash functions the lower bits of H2 will be more random that the upper
ones.

Good idea - done.  I've used (H2 + H2 + 1) since << is more heavyweight
than is required.

> > :- pred find_slot_0(hash_table(K, V), K, int, int, int, bool).
> > :- mode find_slot_0(hash_table_ui, in, in, in, out, out) is det.
> 
> Don't put a ------- line between things like `find_slot' and
> `find_slot_0'.  Conceptually these are a unit.
> 
> Also, `find_slot_0' should be named `find_slot_2',
> according to our coding guidelines, and for consistency
> with the code elsewhere.

Consider that (and my coding style) changed.

New diff coming any time now...

--
Ralph Becket      |      MSR Cambridge      |      rbeck at microsoft.com 

--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list