[m-dev.] hash consing

Fergus Henderson fjh at cs.mu.OZ.AU
Thu Aug 12 03:34:49 AEST 1999


Zoltan and I had a longish discussion about hash consing data types.
The following description is based on Zoltan's notes about the topic
as revised by me after our discussion.

hash consing
------------

Syntax:

	:- pragma hash_cons(typector/arity)
	:- pragma hash_cons(typector/arity, [ctor1/arity1, ctor2/arity2, ...])
		(or perhaps some better name instead of `hash_cons',
		e.g `canonical')

Semantics:
	
	Calls to the specified constructors for the specified type
	should be tabled; as a result, equality checking and tabling
	for values of this type will be O(1) rather than O(N).

Semantic checking:
		
	The pragma must appear in same section (interface/implementation) as
	the type declaration.

	The mode checker should record that the result of a construction
		unification creating a term of a hash-consed type
		is not unique.
	The mode checker should disallow creation of partially instantiated
		values for hash-cons types
		(Alternatively, it could allow them; but in that case, they
		should be canonicalized only at the point when the value
		becomes ground; even then, that could be confusing behaviour,
		so it's probably not worth supporting this.)

Implementation:

	(1) Ensure that constructions are hash-consed

	For each hash-consed constructor C/N, any unification which constructs
	a ground value using that constructor will get transformed into
	a call to a function hash_cons_C/N which is defined by

		:- pragma memo(hash_cons_C/N).
		hash_cons_C(X1, ..., XN) = C(X1, ..., XN).

	The functor descriptor for each constructor C/N (which is part
	of the type_layout_info and type_ctors_info RTTI structures)
	should include a field containing the address of the table
	variable for hash_cons_C/N; for constructors which are not
	hash-consed, this field will be NULL. 
	std_util__construct/4 should check this field and handle the
	case when it is not null appropriately.

	(2) Make use of the uniqueness property

	Unification should be changed as follows.
	For partially hash-consed types, unify(in, in) would do a switch
	on the functor, and then for hash-consed functors it would
	just compare the address.
	For completely hash-consed types unify(in, in) should just compare
	the address; no switch is needed in this case.

	Tabling should be changed as follows.
	Arguments with completely hash-consed types should be tabled
	as if they were ints, i.e. using a hash table which hashes on
	the term address.
	For partially hash-consed types, the table would as usual be a trie
	which switches on the functor, and then for hash-consed functors the
	next level of the table would table the whole term as if it were an
	int (i.e. it would be a hash table which hashes on the
	term address).

Optimizations:

	When tabling a procedure which has several input arguments that are
	all tabled as if they were ints (the arguments might be ints,
	enums, or completely hash-consed types), we should consider
	using a single hash table which hashes all of the arguments
	rather than a chain of hash tables.

	The tabling of discriminated union types whose type is known
	statically should be optimized.  Currently it is not; we just
	call table_lookup_insert_user, which uses inefficient RTTI,
	even though RTTI is not needed.

	Currently tabling of polymorphic procedures is stupidly inefficient.
	Polymorphic tabled procedures are tabled first on their implicit
	type_info arguments, and then on their other arguments, which is fine.
	But tabling of the type_info arguments is done using
	table_lookup_insert_poly (passing it the type_info for type_info!)
	rather than table_lookup_insert_type_info,
	so it unnecessarily tables the type_info for type_info.
	And tabling of the polymorphically-typed arguments is done
	using table_lookup_insert_poly rather than table_lookup_insert_user,
	so it unnecessarily tables the type_info for the type again,
	which is unnecessary since that value has already been tabled.
	This is pretty woeful -- the chain of tables is about twice
	as long as it needs to be.
	(For reference, the predicate table_lookup_insert_poly tables both
	the type_info for its argument and then the argument itself,
	whereas table_lookup_insert_user tables just the argument.
	table_lookup_insert_type_info is a hypothetical new predicate
	which just tables the type_info.)

	We should try to allow inlining and type specialization for
	tabled procedures (this should work OK now that the table
	variables are not simply C static variables).

	The simplest way of doing type specialization for tabled polymorphic
	procedures would just specialize the body of the procedure;
	the tabling code itself would remain pretty much unchanged.
	The next level up would be to specialize calls to
	table_lookup_insert_user where the type is known statically.
	The most complicated approach would be to also specialize calls
	to table_lookup_insert_typeinfo where the type is known statically;
	these could be handled by inserting initialization code which at
	program start-up calls table_lookup_insert_type_info to insert the
	type_info for that type into the unspecialized procedure's table,
	saving the address of the sub-table returned in a new global variable;
	then the specialized code could start off its search using the
	sub-table stored in the global variable, thus avoiding the call to
	table_lookup_insert_type_info.
	
Additional notes:

	There are some difficulties with hash consing types containing
	higher-order predicate terms (closures): equality on closures
	is undecidable, so the can't be hash-consed.
	In some cases, we can detect this at compile time;
	such cases should probably be reported as compile errors.
	But in other cases, e.g. where you have a polymorphic type,
	it can't be detected until run-time.
	The same issue may apply for other types which can't be compared
	for equality (e.g. `c_pointer').

	As part of our language-level RTTI/reflection support
	(which is currently in std_util.m but which ought to
	be moved to a separate module), we should perhaps provide
	an interface which says whether or not a type or functor
	is hash-consed.  For the former, a simple predicate
	`:- pred type_ctor_is_hash_consed(type_ctor::in) is semidet.'
	would be sufficient.  For the latter, it's a little more
	complicated.  `get_functor', `deconstruct', and possibly
	also `functor' could all be changed to return a new `functor_info'
	abstract type, with corresponding functions `functor_name',
	`functor_arity', `functor_arg_types', rather than just returning
	the name, arity, and argument types, and this ADT could have an
	additional function `functor_is_hashconsed'.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list