[m-rev.] for review: parsing .used files

Julien Fischer jfischer at opturion.com
Tue Apr 20 17:02:01 AEST 2021


Hi Zoltan,

On Mon, 19 Apr 2021, Zoltan Somogyi wrote:

> 2021-04-19 16:35 GMT+10:00 "Julien Fischer" <jfischer at opturion.com>:

>>> +    % Alternatively, the .used file could contain two terms, the version
>>> +    % number info, and everything else. We could then select the predicate
>>> +    % we use to read in everything else based on the version number.
>>
>> Using a separate initial term for the version number would be my preference.
>
> OK. This happens to be relevant right now, because since I sent that diff,
> I have started work on changing how recompilation.usage.m works.
> It has traditionally also interleaved (a) figuring out what information
> should go into the .used file, and (b) actually writing it out. My current
> incomplete diff changes this so that it first constructs a single term
> that represents the information to be written out, and then just writes it out.
> This incomplete diff now bootchecks with one obvious exception:
> since the code writing out .used files has switched to a new format
> but the code reading them in hasn't, all the recompilation tests fail.
>
> The type that defines my proposed new file format is attached.
> I think it is a reasonable compromise between the our needs
> as compiler writers for expressive function symbol names,
> without making those names too long when we have to look at
> .used files themselves. (Neither of which users have to do.)
> I look forward to your feedback on this design.

I think that's fine.  I would reorder the definitions so that
the type used_classes/0 occurs immediately after used_file_items/0.
I would also suggest that the new type be defined in its own module
rather that in recompilation.usage.

> A couple of other questions about next steps.

Taking a step back, there are a couple of issues with smart
recompilation.  The first is that it does not work with mmc --make.
As a consequence I suspect few, if any, users actually use it.
(At least I think it doesn't work with mmc --make and the compiler
certainly generates warnings that say that is the case ...)

Second, it does not work with --intermodule-optimization.
(Again, the compiler generates warnings to this effect.)
The compiler's usage message (and XXXs in the code itself) suggest that
it could be supported, although I wonder whether that's useful.

> First, my diff cannot be committed without updating the code
> that reads in and uses the contents of .used files. That second
> part means updating the code in recompilation.check.m to work
> based on the terms of the type in the attachment. Supporting
> backward compatibility would require adding code to transform
> the existing data structure to this one, which is a nontrivial amount
> of work. Is there any point in doing this work?

No.

> I don't think so, since the recompilation package is set up so that
> any syntax error in a .used file is handled by simply rebuilding
> everything, and having the .used file contain a new format is a
> guaranteed syntax error (since the version number will change, even if
> nothing else does). But I am willing to be persuaded that backward
> compatibility is needed.
>
> The second question is about timestamps. We currently represent them
> as strings that have to fit a strict format: yyyy-mm-dd hh:mm:ss.
> I was thinking about a du representation, with six integer fields,
> most of which can be uint8. I see no downside to this change;
> does anyone else?

I don't see the need for it; see below.

> Another aspect of this is that a du type has room for an OS native
> representation of time. For unix, that would be seconds since 1970 jan 1;
> I don't know what Windows and other OSs use.

Seconds is too coarse a grained resolution.  All the platforms we
support can do better.  (Some of them make it a little more difficult
than others, but the support is there.)

> For .used files, we could
> write out both the OS native representation, which would mean nothing
> to humans, and the yyyy-mm-dd hh:mm:ss that they can read, and
> then pay attention to only the native representation after reading it
> back in, unless that reading takes place on a different OS, in which case
> we would compute the native representation from the human-readable
> one just the way we do now.

I think we only need to the native representation; the only people who
are going to read .used files are developers and converting an epoch
timestamp to a readable time isn't that difficult.

> And yet another aspect that we should think about is the inclusion
> of sub-second-resolution time information. Some OSs now support
> nanosecond resolution in e.g. file modification times, though of course
> not all of that resolution is useful yet.

I suggest using a representation of timestampe based on Java's Instant
type (java.time.Instant).  That should cover all of the points above,
except readability.

In Mercury, that would be something like the following:

     :- type timestamp
          --->   timestamp(
                     seconds :: int64,  % Seconds from the epoch.
                     nanos   :: uint32  % [0, 999,999,999]
                 ).

where nanos is the number of nanoseconds further along the time line
from the seconds field.

This will work with clocks of resolutions from a second down to
nanosecond resolution clocks (i.e it will be portable).
(I think ISO 8601 has readable represention of this format, e.g.
2021-04-20T11:07:22.956087, but for this use case you could probably
just write the raw components of the timestamp and be done with it.)

Julien.


More information about the reviews mailing list