[m-dev.] tabling, extensibility, and dynamic type class casts

Fergus Henderson fjh at cs.mu.OZ.AU
Thu Nov 30 20:45:17 AEDT 2000


Tyson and I just had a long discussion about compare/3 and hashing,
type classes, dynamic type class casts, and restrictions on instance
declarations.  This is an attempt to summarize what we talked about.

The issue that came up was how should we handle tabling for types
like c_pointer.  This is an instance of the more general question
"How do I provide an operation that works on a large set of types
(possibly on all types), but which works differently on some
user-extensible subset of types?".

Hashing (i.e. "func hash(T) = int") and comparison (i.e. the standard
compare/3 predicate) are good examples of this.  For most types, you
can implement hash/1 using std_util__deconstruct/4.  The standard
compare/3 could be implemented similarly, or at least specified in
that way; our current implementation happens to use a different
technique, but this could be considered as just an optimization.
The reason for preferring to specify compare/3 as being implemented
using std_util__deconstruct is to reduce the number of primitive
constructs in the language.  The RTTI stuff has to be treated as
builtin, if it is to work reliably [*], but compare/3 need not be,
since compare/3 can be defined in terms of deconstruct/4 (but not vice
versa).  Furthermore, it would be inconsistent to specify compare/3 as
being a language builtin rather than just a standard library
predicate, since we don't do the same thing for hash/1.

However, compare/3 and hash/1 ought to work differently for some types,
e.g. for types like set_unordlist that ought to have user-defined
equality.  Currently we don't have a good solution for that.
If you define a type with a user-defined equality, then deconstruct/4
will not work for that type, and so hash/1 and compare/3 won't work
for that type either.

One way to work around that is to make hash/1 and compare/3 type
class methods; then the compiler will check to see that you don't
use them for types on which they are not defined.

However, that doesn't work well, for two reasons.  Firstly, it is a
pain, since you would have to explicitly define instance declarations
for every type, even though in most cases you will just end up using 
the default implementation in terms of deconstruct.  We could provide
equivalents to Haskell's "deriving Ord", and we could even make them
implicit, but that could only be done for a fixed set of such type
classes, and so it is not a general solution.

An alternative approach is to add support for dynamic type class
casts, and to use those.  In fact in some sense we already have syntax
for dynamic type class casts: just use construct/3 on an appropriate
existentially quantified type!  We just don't have the necessary
run-time support.  But it could be added.

OK, so far so good.  The stuff above just revises and summarizes our
earlier discussions on this topic.

Now, suppose we then go ahead and define hash/1 and compare/3
using dynamic type class casts, so that e.g. hash/1 will first
check whether something is an instance of the `non_default_hash' typeclass,
and if so it will use the method from that typeclass, otherwise it
will use the default implementation using deconstruct/4.

That is, suppose hash/1 is implemented like so:

	:- func hash(T) = int.
	hash(X) =
		if simple_construct(X) = any_non_default_hash(X1) then
			non_default_hash(X1)
		else
			default_hash(X).

	:- typeclass non_default_hash(T) where [
		non_default_hash(T) = int
	].

	:- type any_non_default_hash --->
		some [T] any_non_default_hash(T) <= non_default_hash(T).

	:- func default_hash(T) = int.
	% ... implemented using deconstruct/4

	% This should go in the standard library...
	:- func simple_construct(ArgT) = T is semidet.
	simple_construct(Object) = Result :-
		univ_to_type(construct(type_of(Result), 0, [univ(Object)]),
			Result).

Now, all this RTTI stuff looks expensive.  But the idea is that
the if the type is known statically, the compiler will be able
to specialize all of this.

Unfortunately, however, [this is where we start to make some new
observations] the compiler *can't* do complete compile-time
specialization of dynamic type class casts, because the relevant
instance definition could be in any module, and the implementation
hence can't determine whether there is any matching instance
declaration until *link* time.  Well, it can do the optimization
in the case when the type class cast will succeed, but it can't
optimize type class casts which would fail.

There is however a way to allow compile-time optimization in
such cases.  We could add `:- no_instance' declarations, and
declare e.g. `:- no_instance non_default_hashable(term)';
this would allow the compiler to optimize cases like the above.
It would be an error to have both an `instance' and `no_instance'
declaration for the same typeclass and type(s); this could be
detected at link time as in our current implementation.

One problem with dynamic type class casts is that implementing
them efficiently requires restrictions on what kind of instance
declarations are allowed.  This is one of the main motivations
behind our current set of such restrictions.  But these restrictions
are in many cases onerous and overly restrictive in practice.
One possible solution to that problem is to have two different
kinds of type classes, one for which dynamic type class casts
are allowed (and arbitrary instance declarations aren't),
and the other for which there are few restrictions on instance
declarations but for which dynamic type class casts are not
allowed, or at least are not efficient.  Even better, we could
have just one kind of type class, but allow declarations saying
how the type class should be "indexed"; such declarations would
impose restrictions on what kinds of instance declarations are
allowed, but would conversely allow efficient dynamic type
class casts.  You might want to index differently on different
types in a multiparameter type class.  The indexing issue is
analagous to indexing ordinary clauses.

[Aside: Another potential problem with dynamic type class casts is their
interaction with abstraction boundaries.  Should private instance
declarations be allowed?  If so, should they be accesible via
dynamic type class casts?

If the answers to those questions are "Yes" and "No" respectively,
then technically we wouldn't actually need to add `:- no_instance'
declarations, since private instance declarations could be used
to achieve the same effect!  However, that's a rather awful hack.
And it would be tedious too.  So I wouldn't really recommend it.]

[*] Glasgow Haskell implements RTTI using a type class based library,
but that approach has a severe drawback: is not safe to use in
untrusted code.  The reason for this is that the safety of an unsafe
type cast in the library depends on the user's code defining instances
for the RTTI type classes only in appropriate ways, and an untrusted
user might abuse that.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list