# [mercury-users] Tagged datatypes and boxing

Richard A. O'Keefe ok at hermes.otago.ac.nz
Tue Sep 28 09:02:12 AEST 1999

```The debate is between me and Peter Scachte.

> Open-endedness is precisely what I want at this stage in the discussion.
> The whole point is that I don't *know* what everyone will find appropriate
> for constraining variables.
>
> 	type smallprime = X:integer where 0 < X & X < 1024 & prime(X).
>
> Yes, why not?

I suspect we are talking about different things here.  I believe you
suggesting that it should be possible to specify types in a more
expressive way than is possible with only regular types.  This is
something Lee Naish has advocated for quite a while, and, details
aside, seems like a good idea to me.  The devil is in the details,
though.

Agreed.  To be honest, the ad hoc syntax is mine, but the *idea* is
exactly Lee Naish's idea.

What I was referring to was just *representation*.

because subranges in Pascal have *nothing* to do with representation.
Come to that, they don't have anything to do with representation in

subtype Nat8 is Integer range 0 .. 2**8 - 1;

then what I get is a subset of the *values* of Integer (0 to 255)
with *exactly the same representation* as Integer.  You can change
that with an entirely separate declaration

for Nat8'Size use 8;

which only makes sense *given* the range bounds.

You could probably
represent the primes between 0 and 1024 with 7 or 8 bits, but you'd be
crazy to do so.

*BUT* the compiler would be free to exploit the bounds 0..1023 (as
1024 is not a prime) to represent the primes between 0 and 1024 with
TEN bits.

As you observe, you could represent the odd integers
between 1 and 511 in 8 bits, but as soon as you tried to do math with
them you'd be sorry you did.

Let's face it, you will deeply regret *ANY* packing if you try to do
much, even loads and stores, with the packed data.  There is, after
all, a reason packing fell out of disfavour in Pascal:  the code
bloats up and slows down like you wouldn't believe.

You *don't* do arithmetic with odd integers between 1 and 511
any more than modern machines do arithmetic with bytes.  When you
fetch a byte (or other packed value) from memory, it is widened to
a more convenient format (typically 32 bits); when you store a
byte (or other packed value) into memory, it is narrowed again
(and may be bounds checked).

This is partly the point of the type/subtype distinction.
Suppose I have

:- integer subtype (X: odd8 :- 0 < X < 512, X mod 2 =:= 1).

and
X, Y, Z: odd8

What is the type of X+Y+Z?  Is it odd8?  No way!  It's *integer*.
(A compiler that does interval analysis, and such compilers exist,
would realise that the subtype is some subset of [3,1533], which
is clearly not odd8.)  So the implementation would be

from_odd8(N) where 0 <= N, N < 256 -> N*2+1.

to_odd8(N) where 0 < N, N < 512, X mod 2 = 1 -> (N-1) div 2.

and
(S:odd8) = X+Y+Z
would be implemented as
S = to_odd8(from_odd8(X) + from_odd8(Y) + from_odd8(Z)).

If the compiler will not, for example, pack -500..+500 into 10
bits, which requires nearly as much implementation complexity
as the odd8 example, then the *programmer* will have to.

You could represent the numbers between
1024 and 1040 in 4 bits, but even that seems rather dubious if you're
going to be doing much math with them.

Once again, you *DON'T* do calculations on packed forms.  Pascal
didn't, modern hardware can't.  Even on PDP-10s, the hardware could
only do 18-bit or 36-bit integer arithmetic.

This is why I suggested the
old Pascal-style subranges for specifying small integer
representation:  it has all the expressiveness you'd want for that.

Once again, it _didn't_ have that expressiveness in Pascal.
Pascal subrange notation merely describes a set of values.
To get packing, you have to use the 'packed' keyword, which
applies to record and array types.  And there are all sorts of
limitations on packed fields/array elements in Pascal, for example
(and for good reason) they cannot be passed as 'var' parameters.
Ada *does* allow its equivalent, but expressly involves conversion
to general representation on entry and back to packed form on exit.
Even C notionally doesn't do arithmetic *directly* on anything
smaller than an 'int'.

Even if you argue for open-ended typing information, I still think it
makes sense to have a specialized syntax for specifying a
representation.

My point was that the "specialised syntax" can *be* that subset of
the open-ended typing information which the compiler is capable
of recognising as useful for the purpose.  One compiler might
recognise nothing; another might recognise constant bounds; another
might recognise modular information.  Why raise the limitations of
some compilers to the level of special-purpose straitjacket syntax?
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au