[m-dev.] mini types

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun May 31 21:14:54 AEST 2020

On Sun, 31 May 2020 20:12:24 +1000, Michael Day <mikeday at yeslogic.com> wrote:

> Hi Zoltan,
> > I can see two broad approaches that will let the compiler recognize t3
> > as a mini type when building the .int3 file.
> Quick question: how much of this proposal is dependent on the current 
> way in which the compiler resolves imports and handles separate 
> compilation through the generation of various interface files and how 
> much is inherent to the language itself?

The issue the proposal is trying to address is not caused by the language,
but by the way the compiler communicates information from one module
to another.

> Because the obvious counter proposal would be for the compiler to do the 
> right thing without needing the programmer to spell it out, so it would 
> be good to know if that's ruled out by factors that would be impossible 
> to change or factors that would be merely impractical to change :)

Factors that would be impractical to change.

Currently, the compiler decides how values of each type are represented
when it generates code for a module, say module A. The set of types whose
representations it decides has to include not just the types defined in A,
but also the types defined in other modules B that A imports, and those
can depend on the definitions of types defined in module C that B imports
(e.g. when B defines a type to be equivalent to another type in C).
There is no simple way (that I know of) to guarantee that the compiler
sees the same set of relevant type definitions when deciding the representation
of type B.t1 when compiling module B as when compiling module A.
*Most* of the time, they do, but sometimes they don't, and when they don't,
the result is silent data corruption setting a time bomb that usually goes off
in some unrelated piece of code, making it very hard to debug. (That's the
second time I am writing that today :-) And even if this could be fixed,
there is nothing to guarantee that it would *stay* fixed as the compiler
is updated over the years.

This is why I have been working towards the model of having a module's
.int file being the single source of truth about the representation of the
types defined in that module. This has a simple correctness argument
that would be hard to accidentally screw up.

When generating A.int, and deciding the representation of a type t2
defined in A, if one of t2's function symbols has an argument whose type
is B.t1, the compiler needs to know whether values of B.t1 fit in N bits,
where N < 32 or N < 64. Given that B.t1 may be abstract exported from B,
it can't get this info from B.t1's definition, since that may be visible;
it has to get it from type representation information in B.int3,
put there by the compiler when B.int3 was created.

A compiler invocation such as "mmc --make-short-interface B.m"
generates B.int3 from B.m, and nothing else: it does not read in
any other files.

A compiler invocation such as "mmc --make-interface A.m"
generates both A.int and A.int2, from both A.m and from the .int3 files
of the modules imported by A.

At the moment, the compiler puts type representation information
into .int3 files about the *simple* types defined in the module,
where simple is defined as a type that is either an equivalence type,
a direct dummy type, an enum type, or a notag type. The criterion
that unites these kinds of types is that one can test whether
a type falls into any one of these categories without reference
to the definition of any other type. This is important, because
that other type may be defined in another module, and the compiler
has no access to any information about any other module when
it is creating the .int3 file. This includes the bool module of the
standard library :-(, which is why we cannot add mini types
as a fifth category of simple type unless we either (a) restrict
the argument types to the ones defined in the same module,
or (b) adopt one of the proposals in the post that started
this thread.

When the above design is implemented RSN, compiler will also put type
representation information for *all* of the types defined in module A
into A.int and into A.int2. To do this, it needs access to information
about the representations of all the types that are occur as argument
types in the type definitions in A. If one of those types is a type defined
in B, the compiler can know that it is a mini type only if B.int3
has a type representation item saying so.

The only way I can see to allow mini types to include references
to non-locally-defined types without adopting one of the approaches
in my original proposal would be to introduce a whole new kind
of interface files to go between .int/.int2 files on the one hand
and .int3 files on the other hand: call them .int2.5 files :-)
When generating A.int2.5, the compiler would read the .int3 files
of the modules imported by A, and put into it the representations
of the mini types defined in A, which could now include arguments
whose types are defined not in A, but in modules imported by A.
Then the compiler would read the .int2.5 files of the modules
imported by A when generating A.int. One *could* possibly
make this work, but I, for one, would want to even try;
it is far too much work, and far too much performance impact
on the compiler, compared to the possible gain.
And since one mini type can include another, full generality
would require many more kinds of interface files between
.int3 and .int, which would be even more impractical.

> > - a pragma such as ":- pragma type_size(enum, 1)."
> > 
> > - a compiler option such as --type-size enum=1
> > 
> > - a compiler option such as --type-sizes <type_size_file>, where the
> >    type_size_file contains lines such as "enum 1".
> Personally I prefer the pragma, although I would prefer it even more if 
> the compiler could generate the pragma for me.

The pragma is my preference as well.

The compiler couldn't generate the pragma when creating the module's
.int3 file, but it *could* generate an informational message (telling the
programmer about how such a pragma could help) when generating
its .int file, or when generating target code. This message could include
a pragma that the programmer could cut-and-paste into the source code.


More information about the developers mailing list