[mercury-users] Tagged datatypes and boxing

Peter Schachte schachte at cs.mu.OZ.AU
Sat Sep 25 00:53:07 AEST 1999

On Thu, Sep 23, 1999 at 05:27:22AM -0700, Ralph Becket wrote:
> I was very surprised when my compression program that *shouldn't* be doing
> any structure creation turned out to be doing just that in spades.
> A swift application of mprof later and it turns out that the culprits are a
> couple of predicates in the inner loop that return tagged integers (there
> are
> only two possible tags in each case).  I had naively expected the compiler
> to
> steal a bit for this purpose, but of course that's unreasonable.

It's not just integers.  Consider these types:

	:- type color ---> red ; green ; blue.

	:- type colorhue ---> light(color) ; dark(color).

Clearly only 3 bits are needed to represent a colorhue, yet Mercury
still uses a boxed color, ie, 2 words.  One scheme that would be more
economical with memory would be for the compiler to keep track of how
many bits are actually needed to represent each type.  When
determining how to represent a new discriminated union, it could
determine the total size of the arguments of each constructor, and add
the log of the number of constructors.  If this is no larger than the
word size of the machine, that'll be the size of this type, otherwise
it'll have to be boxed (so the size will be the word size).  The
actual representation would have to put the tag bits for a type in the
more significant bits, and the argument in the least significant bits.

For types with only nilary and unary constructors, this would only
require arithmetic comparisons to distinguish functors, addition and
subtraction for construction and deconstruction, and one extra shift
for switching.  For types with some constructors with more than one
argument, masking and shifting will be needed.  Whether the time saved
by avoiding memory allocation compensates for the type wasted in
masking and shifted is unclear, but I expect when you count avoided
cache misses, page faults, and, gasp, garbage collections, it does.

What about types involving pointers?  Here Mercury already does almost
as well as it can, allowing 4 or 8 constructors each with at most 1
address argument.  It could be improved a bit by allowing any number
of nilary constructors to share a single 2 or 3 bit tag in the low
bits, and then use the remaining bits to distinguish their values.
With this representation, any constructor can be recognized with a
single instruction.  The cost in this case is in switching, where a
second dispatch would be necessary for the nilary constructors.
Again, I expect reduced memory turnover would more than compensate.

These optimizations would certainly complicate the code to determine
how to represent a discriminated union type, but I expect that change
would be manageable.  The more worrying question is how much much of
the rest of the compiler would be affected, and how badly?

Peter Schachte                     Lisa, if you don't like your job you
mailto:schachte at cs.mu.OZ.AU        don't strike. You just go in every day
http://www.cs.mu.oz.au/~schachte/  and do it really half-assed. That's the
PGP: finger schachte at  American way. -- Homer Simpson 
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe

More information about the users mailing list