[m-dev.] hash consing
Fergus Henderson
fjh at cs.mu.OZ.AU
Thu Aug 12 03:34:49 AEST 1999
Zoltan and I had a longish discussion about hash consing data types.
The following description is based on Zoltan's notes about the topic
as revised by me after our discussion.
hash consing
------------
Syntax:
:- pragma hash_cons(typector/arity)
:- pragma hash_cons(typector/arity, [ctor1/arity1, ctor2/arity2, ...])
(or perhaps some better name instead of `hash_cons',
e.g `canonical')
Semantics:
Calls to the specified constructors for the specified type
should be tabled; as a result, equality checking and tabling
for values of this type will be O(1) rather than O(N).
Semantic checking:
The pragma must appear in same section (interface/implementation) as
the type declaration.
The mode checker should record that the result of a construction
unification creating a term of a hash-consed type
is not unique.
The mode checker should disallow creation of partially instantiated
values for hash-cons types
(Alternatively, it could allow them; but in that case, they
should be canonicalized only at the point when the value
becomes ground; even then, that could be confusing behaviour,
so it's probably not worth supporting this.)
Implementation:
(1) Ensure that constructions are hash-consed
For each hash-consed constructor C/N, any unification which constructs
a ground value using that constructor will get transformed into
a call to a function hash_cons_C/N which is defined by
:- pragma memo(hash_cons_C/N).
hash_cons_C(X1, ..., XN) = C(X1, ..., XN).
The functor descriptor for each constructor C/N (which is part
of the type_layout_info and type_ctors_info RTTI structures)
should include a field containing the address of the table
variable for hash_cons_C/N; for constructors which are not
hash-consed, this field will be NULL.
std_util__construct/4 should check this field and handle the
case when it is not null appropriately.
(2) Make use of the uniqueness property
Unification should be changed as follows.
For partially hash-consed types, unify(in, in) would do a switch
on the functor, and then for hash-consed functors it would
just compare the address.
For completely hash-consed types unify(in, in) should just compare
the address; no switch is needed in this case.
Tabling should be changed as follows.
Arguments with completely hash-consed types should be tabled
as if they were ints, i.e. using a hash table which hashes on
the term address.
For partially hash-consed types, the table would as usual be a trie
which switches on the functor, and then for hash-consed functors the
next level of the table would table the whole term as if it were an
int (i.e. it would be a hash table which hashes on the
term address).
Optimizations:
When tabling a procedure which has several input arguments that are
all tabled as if they were ints (the arguments might be ints,
enums, or completely hash-consed types), we should consider
using a single hash table which hashes all of the arguments
rather than a chain of hash tables.
The tabling of discriminated union types whose type is known
statically should be optimized. Currently it is not; we just
call table_lookup_insert_user, which uses inefficient RTTI,
even though RTTI is not needed.
Currently tabling of polymorphic procedures is stupidly inefficient.
Polymorphic tabled procedures are tabled first on their implicit
type_info arguments, and then on their other arguments, which is fine.
But tabling of the type_info arguments is done using
table_lookup_insert_poly (passing it the type_info for type_info!)
rather than table_lookup_insert_type_info,
so it unnecessarily tables the type_info for type_info.
And tabling of the polymorphically-typed arguments is done
using table_lookup_insert_poly rather than table_lookup_insert_user,
so it unnecessarily tables the type_info for the type again,
which is unnecessary since that value has already been tabled.
This is pretty woeful -- the chain of tables is about twice
as long as it needs to be.
(For reference, the predicate table_lookup_insert_poly tables both
the type_info for its argument and then the argument itself,
whereas table_lookup_insert_user tables just the argument.
table_lookup_insert_type_info is a hypothetical new predicate
which just tables the type_info.)
We should try to allow inlining and type specialization for
tabled procedures (this should work OK now that the table
variables are not simply C static variables).
The simplest way of doing type specialization for tabled polymorphic
procedures would just specialize the body of the procedure;
the tabling code itself would remain pretty much unchanged.
The next level up would be to specialize calls to
table_lookup_insert_user where the type is known statically.
The most complicated approach would be to also specialize calls
to table_lookup_insert_typeinfo where the type is known statically;
these could be handled by inserting initialization code which at
program start-up calls table_lookup_insert_type_info to insert the
type_info for that type into the unspecialized procedure's table,
saving the address of the sub-table returned in a new global variable;
then the specialized code could start off its search using the
sub-table stored in the global variable, thus avoiding the call to
table_lookup_insert_type_info.
Additional notes:
There are some difficulties with hash consing types containing
higher-order predicate terms (closures): equality on closures
is undecidable, so the can't be hash-consed.
In some cases, we can detect this at compile time;
such cases should probably be reported as compile errors.
But in other cases, e.g. where you have a polymorphic type,
it can't be detected until run-time.
The same issue may apply for other types which can't be compared
for equality (e.g. `c_pointer').
As part of our language-level RTTI/reflection support
(which is currently in std_util.m but which ought to
be moved to a separate module), we should perhaps provide
an interface which says whether or not a type or functor
is hash-consed. For the former, a simple predicate
`:- pred type_ctor_is_hash_consed(type_ctor::in) is semidet.'
would be sufficient. For the latter, it's a little more
complicated. `get_functor', `deconstruct', and possibly
also `functor' could all be changed to return a new `functor_info'
abstract type, with corresponding functions `functor_name',
`functor_arity', `functor_arg_types', rather than just returning
the name, arity, and argument types, and this ADT could have an
additional function `functor_is_hashconsed'.
--
Fergus Henderson <fjh at cs.mu.oz.au> | "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh> | of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3 | -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list