[m-rev.] for review: simplify the creation of .int3 files

Zoltan Somogyi zoltan.somogyi at runbox.com
Tue Jan 29 22:04:26 AEDT 2019

The motivation for this diff requires some background.

The argument packing optimizations require that all modules that use a type
agree on that type's representation. At the moment, the compiler computes
the representation of every type every time it compiles a module. The results
of these computations will be consistent between compilation *only* to the
extent that the compiler sees the same information relevant to these decisions
on every compilation. Given our current approach to interface files, I see
no way to make a convincing correctness argument for such consistency.

The approach for which I *can* see a correctness argument is this.

- When we create a module's .int3 file, we put ':- type_representation' items
  into it for the "simple" types, those whose representation du_type_layout.m
  decides on the first pass: direct dummies, notag types, enums, types known
  to be equivalent to sub-word-sized types (enums, {int,uint}{8,16,32}, char),
  and word aligned pointers). We would also pass on has_direct_arg notations
  if present. (I would argue that after arg packing is switched on, explicit
  has_direct_arg annotations should become unnecessary.)

- When we create a module's .int file, we run du_type_layout.m to compute
  the representation of all types defined in the module. The simple types
  as above, for the non-simple types, using the type_representation items
  from the .int3 files of the imported modules. We put the results for both
  kinds of types into the module's .int file. This file becomes the single
  source of truth for the representation of the types defined in this module.
  (We agreed on the desirability of a single source of truth on m-dev about
  15 months ago.)

- When we generate code for a module, we read in the module's *own* .int file
  as well as the .int files of the modules it imports, and take the type
  representation decisions from there; we don't run du_type_layout.m again.

If the build system (either mmake, or mmc --make) ensures that every .int
file read by the last step is up to date when we generate code for every
module, then consistency is assured.

This scheme does have a drawback.

If module A contains ":- type t_a == bool", then neither module A nor
the modules importing it can pack values of type t_a into a one-bit field
in a heap cell. The reason is that the compiler invocation that creates A.int3
does not know anything about module except A, so it does not know that bool
is a one-bit enum. Indeed, it may NOT be a one-bit enum. Module A, instead
of importing bool.m from the standard library, may import a module cheater.m
containing a definition such as ":- type bool == int". And having module A
contain ":- type t_a == bool.bool" instead does not help. The compiler
can see that bool.bool is a qualified name, but it not know whether it is
*fully* qualified; it could actually be a partially qualified version of
a type named bool in a module named "cheater.bool".

I have some ideas for how this drawback can be fixed, but I think we should
agree about the overall approach before considering them.

There may be some other incompatilities. At the moment, the compiler
does not enforce the requirement that for a unification X = f, the f
comes from a type defined in an imported module; it could come from
a module whose .int2 or .int3 file the compiler read after seeing
an import_module item for that module in another module that *was*
imported. To make the correctness argument work, this must stop.


This diff is step 1a in achieving the above plan.

Step 1 is to rationalize the code that generates all interface files
to make that code easier to understand and modify, since the byzantine
nature of the existing code for creating interface files would otherwise
all the later steps much harder.

Step 1a inside step 1 is to rationalize the code that generates .int3 files.

Step 1b will be simplifying the code that used to be shared between
the creation of .int3 files and other interface files; some predicates,
and some parts of other predicates, won't be needed anymore.

Step 1c will rationalize the code creating .int0 files.

Step 1d will rationalize the code creating .int1 and .int2 files.

All of parts of step 1 are intended to keep the contents we put
into interface files bit-identical with what we generated up to now.
Nevertheless, this diff generates different contents for five .int3 files
(see the attached file DIFF.s1s2). This is because this diff fixes
an old bug; the new output is what the output *should* have been
all along.

Step 2 will start (intentionally) changing the contents of interface files
by fixing the potential problems I find during step 1. And as I pare down
each interface file's contents to the minimum required, I also intend
to document their contents in compiler/notes. If I can figure them out,
I also intend to document the *rationale* for the decisions of what
was included/excluded in each kind of interface file.

Step 3a will put type_representation items into .int3 files,
with the readers of those files ignoring them.

Step 3b will start using those type_representation items in .int3 files
for simple types to compute type representations for complex types,
which it will put into type_representation items in .int files,
which will also start by being ignored.

Step 3c will then switch over to the new scheme: it will start taking
type representation information from .int files, as described in the
correctness argument above.


The diff is for review by Julien or Peter, but I would like everyone's
feedback on the plans above.

To make a review easier, I am attaching the summary of how the pre-diff
compiler computed what goes into a .int3 file. I would suggest that
whoever does the review, do it in two stages: check whether the summary
is a correct description of what the old code does, and then check whether
what the new code does for each kind of item is what the summary calls for
that kind of item. Otherwise, the correspondence between the too-complex
old code and the simple new code is too hard to keep track of in one's head.

I am also attaching the output of git diff both with and without -b.

Thanks in advance,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Log.int3
Type: application/octet-stream
Size: 2327 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20190129/9ca8fa73/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFF.int3
Type: application/octet-stream
Size: 23720 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20190129/9ca8fa73/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFFb.int3
Type: application/octet-stream
Size: 20391 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20190129/9ca8fa73/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFF.s1s2
Type: application/octet-stream
Size: 2859 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20190129/9ca8fa73/attachment-0007.obj>

More information about the reviews mailing list