[m-dev.] getting type representation decisions from .int files

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Jun 30 09:08:33 AEST 2021

I have just about finished the first draft of code that changes when
the compiler makes decisions about type representation.

Currently, each compiler invocation that generates code
makes such decisions for the types referenced in the module
being compiled, based on the information it has from
the .int* files it has read in. This means that it is in theory
possible for the compiler to compute different representations
for the same time when compiling two different modules
because it has access to slightly different sets of declarations
when compiling the two modules. (This can happen when
e.g. "mmc m1.m" reads a m3.int3, but "mmc m2.m" does not.)
After a lot of work on what used to be modules.m (and is now
grab_modules.m), this is not a problem if you do not enable
argument packing, but it IS a problem if you do enable it.

Last year, I worked towards the new setup in which the compiler
decides the representation of each type defined in a module
when it creates the .int file for that module, and every compiler
invocation that generates target code gets its information about
the representation of type m2.t from the type_repn item for type t
from m2.int. Since all compilations get their info about any given
type's representation from the same file, the results will be
consistent, provided that all those compilations look at the same
up-to-date version of that .int file.

The design I used last year added a new field to the augmented
compilation unit type, which contains, besides (the parse tree of)
the source code of the module being compiled, contained the .int0 files
of its ancestor modules, the .int files of the directly imported modules,
the .int2 files of the indirectly imported modules, some .opt and/or
.trans_opt files, and the .int files needed to make sense of the .opt/.trans_opt
files. The .int and .int2 files among these will define the representations
of all the types that the compiler has access to when generating code
for this compilation unit, with one exception: the types defined in
the module being compiled itself. To plug this gap, I added to
the augmented compilation unit a new field to contain the .int file
of module being compiled, from which we use *only* the type
representation items, and no others.

This extra field requires slight changes to a nontrivial number of files,
which have caused repeated conflicts as other changes are made
to the affected modules. I would therefore like to commit the
change-in-progress I am working on. Almost all the code in the diff
is executed only if experiment flags, which default off, are explicitly
switched on, so there should be no interference with the normal
operation of the compiler. The remainder of the change involves
tweaks to the types containing type representation information,
and to the code that generates or consumes values of those types.
I would of course bootcheck each change before checkin.

Does anyone object to me committing such work-in-progress?

The logical reviewer for this work is Peter, since he was the last
person, and maybe the only one, to work on du_type_layout.m
apart from me. Peter, do you want to review each  change I make
as I make them, either pre- or post-commit, or should I just make
progress checkins without review, with a review of the whole thing
after it is done, but before switching it on?

My current draft does not handle subtypes as yet, because I am
not sure I have a full understanding of them as yet. I am fuzzy
on two questions.

First, would it be fair to say that a subtype may two differ from
its supertype in only two ways:

- it may delete one or more of the supertype's constructors; and/or
- in the remaining constructors, it may replace some arguments' types
  with subtypes of those types.

If so, I would prefer to include this in the reference manual, replacing
the existing text that talks about such restrictions.

Second, I don't understand this sentence in the reference manual:

Any variable that occurs in @var{supertype} must occur in @var{subtype}.

Is it talking about the each type constructor's parameters, or about
all the type vars in each type's definition? The wording implies
the latter, but that can't be correct, since the subtype may omit
a data constructor that has an existentially typed argument.
And if it its talking about the former, then why does it say only
"must occur", instead of requiring the subtype's and the supertype's
type parameter lists to be identical? What is there to be gained
by allowing the subtype to add type parameters, or to permute
any existing type parameters? And where is this restriction enforced?
I couldn't find anything related to this in check_subtype_defn in add_type.m.


More information about the developers mailing list