[m-dev.] mini types

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat May 30 09:09:43 AEST 2020


At the moment, the only types that the compiler considers to be
sub-word-sized are enums, {int,uint}{8,16,32}, and chars. However,
some du types are also sub-word-sized. Consider t1:

:- type e1              % enum type
    --->    e1_f1
    ;       e1_f2.

:- type t1
    --->    t1_f1(e1, int8)         % arguments take 9 bits
    ;       t1_f2(e1, e1, e1)       % arguments take 3 bits

A value of type t1 won't exceed 13 bits in size:

- 3 bits for the zero ptag (not needed, but present to make the direct arg
  optimization applicable to the type),

- 1 bit for a local sectag to distinguish the two function symbols, and

- a maximum of 9 bits for the arguments

It should therefore be possible to pack a value of type t1 into a word
together with values other sub-word-sized types, which is why I call them
mini types. It should even be possible for a mini type to be an argument
in other mini types:

:- type t2
    --->    t2_f1(t1, t1).          % arguments take 26 bits

The current design for deciding type representations can be relatively easily
be adapted to handle this example, though the reason for this is complex.
Given a type such as t1 above, the compiler, when building the module's
.int3 file, can assume that the reference to type e1 in the definition
of type t1 refers to the type e1 defined in this module, because if it
does not, then the unqualified reference to e1 is ambiguous, which is
an error. Therefore *if* the reference to e1 is *not* to the local definition,
then the programmer will have to resolve the ambiguity before the program
will compile, which will require the .int3 file to be rebuilt, which means
that the incorrect info about type t1 in the original .int3 file won't ever
be used to inform type representation decisions that end up in an executable
(though they will end up in .o files that will later be rebuilt).

However, there is a common use case for mini types which this approach
cannot recognze: the use of enum types from other modules.

When we build .int3 files, the only input is the module's source file;
we don't read any interface files. This means that when we see
a definition like

:- type t3
    --->    t1_f1(bool, int8)       % arguments take 9 bits
    ;       t1_f2(bool, bool, bool) % arguments take 3 bits

the compiler at present has no idea what type "bool" refers to, which means
it cannot assume that it is sub-word-sized, which means that it cannot
recognize that t3 is also a mini type.

I can see two broad approaches that will let the compiler recognize t3
as a mini type when building the .int3 file.

The first approach is to reserve the names of some enum types defined
in the Mercury library, and simply not allow users to define types
with those names. The names would certainly include enum, but the library
also defines a whole bunch of other enum types:

comparison_result                           in builtin.m
day_of_week                                 in calendar.m,
noncanon_handling                           in deconstruct.m
whence                                      in io.m and stream.m
access_type, file_type, whence              in io.m
stream_content, stream_mode                 in io.m
integer_base, integer_size, signedness      in lexer.m and term.m
assoc                                       in ops.m
poly_kind, string_format_flag_*             in string.parse_util.m
*_status                                    in table_builtin.m
adjacent_to_graphic_token                   in term_io.m
dst                                         in time.m

and it is far from clear whether we would want to reserve any of these
(with the exception of the ones that are *already* defined in more than
one module :-()

The other approach is to allow programmers to indicate that a type
with a given name has a size of N bits. This could be done in several ways:

- a pragma such as ":- pragma type_size(enum, 1)."

- a compiler option such as --type-size enum=1

- a compiler option such as --type-sizes <type_size_file>, where the
  type_size_file contains lines such as "enum 1".

In all these cases, the compiler would *assume* that the assertion given by
the programmer is true when generating the .int3 file, but would *check*
whether the assertion is true when generating the .int file (when it has
all the info needed for this), generating an error for each failed assertion.

To make maximal use of mini types, the compiler would want to know whether
values of a mini type can be compared as a whole, as a single N-bit unsigned
number, which is possible only in special circumstances (all arguments have
to be themselves comparable as unsigned, and the type has to have either
just one function symbol, or one constant and one nonconstant function symbol,
in that order). This means that e.g. the pragmas above should have room
for an assertion to this effect, as in ":- pragma type_size(enum, 1,
comparable_as_unsigned).".

The first approach is easier to implement, but the second is more general,
since it works for types that are defined outside the Mercury standard library.
I therefore prefer the second approach.

Opinions?

Zoltan.


More information about the developers mailing list