[m-dev.] making .trans_opt files in parallel

Tue Dec 13 17:18:58 AEDT 2022

2022-12-13 16:00 GMT+11:00 "Peter Wang" <novalazy at gmail.com>:
>> What we want to specify is effectively a dag; if module B is reachable
>> from module A, then module A may depend on B.trans_opt; otherwise,
>> it may not. Each no_trans_opt pragma would put only one constraint
>> on the dag, and there would be no easy way to check that two such
>> pragmas in different modules basically rule out both A using B.trans_opt
>> and B using A.trans_opt, which would not be a correctness problem,
>> but may be a performance problem.
>> 
> 
> Yes, I don't really mind that.

That is certainly a valid position.

> (I don't think .trans_opt files provide
> much value as it is, correct me if I'm wrong.)

They allow intermodule inlining of small but possibly often-used
predicates and functions. Whether that is "much value" is in the eye
of the beholder. I happen to think it is borderline useful. I agree that
the termination and other analysis results in .trans_opt files were
much more useful for writing papers than they are in practice :-(

>> I think it would be better to specify the whole dag all at once,
>> probably as the contents of a file, whose name can be passed
>> to mmc in an option. Unfortunately, I don't know of a simple way
>> to encode dag in a text file in a way that is
>> 
>> - unambiguous,
>> - easy to read and understand,
>> - but does not reduce all dags to a total order.
>> 
> 
> An obvious place to start is something like the dot language, e.g.
> 
>     bar -> foo;
>     baz -> bar;

I believe the write_graph predicate in generate_dep_d_files.m
already writes out the imports graph using that language.
We would not want to have to parse the *whole* of the dot language,
but only the subset needed to describe dags in one particular
prescribed way. Or something even simpler (see the scc list below).

> Nodes (module names) may appear multiple times. It may be convenient to
> reverse the order of the edges so that the imported module is written on
> the left. As in dot, we may also allow grouping and/or nesting, say:
> 
>     foo <- {
> 	bar,
> 	baz <- quux
>     }
> 
> I wrote a quick program see what it looks like on the standard library
> (attached).

I have turned your dependency description into an actual dot file,
and converted that to a PDF; these are attached. The cluttered and
unreadable nature of the PDF shows that these are not really usable
for visualization for programs of this scale.

However, once we have the graph, we have library predicates
that can turn them into a list of SCCs. It would not be difficult
to write a Mercury program that reads all the .import_graph files
generated for a program, and output that list of SCCs, like this:

scc 1: m1
scc 2: m2 m3
scc 3: m4
etc

This implicitly specifies a dag. (The "scc N:" part is just my explanation;
it wouldn't be part of the file.) Unfortunately, the dag it specifies is not
suited for making .trans_opt files in parallel, your original goal.

> I'm not sure I like the "whole dag" approach better.
> 
> 1. Changes to the module imports would need to be mirrored in the dag,

No, they wouldn't; intermodule optimization would continue to work.
True, as more and more changes to imports accumulate, the initial dag
may move farther away from the now-optimal dag, but just above you said
you weren't concerned about that.

> and it's likely people will forget (also being a pain).

Forgetting would not be a problem, and one wouldn't have to keep
adjusting the dag. It would be enough to check every few months
whether the old dag is still good enough, and if not, to adjust or rebuild it.
There are two possible senses of "good enough": does a dag allow as much
parallelism in building .trans_opt files as you want, and does it allow as much
cross-module optimization as you want. These two measures are opposites.
You can always get more parallelism by deleting edges from the dag, but
deleted edges mean lost opportunities for cross module optimization.
You therefore want different settings for day-to-day work than for releases.

I think we are talking at cross purposes because we have two different dags
in mind. Your GRAPH contains an edge for every single import, use and include
declaration in the program; that indeed would be a pain to maintain. I am talking
about a smaller dag derived from something like GRAPH, which just decides
"is the compilation of A.m allowed to read B.trans_opt?".

> The compiler
> could warn about edges that don't match a module import declaration,

"not match" in what sense? In the presence of cyclic imports, there *will be*
imports that go against the dependency direction imposed by the dag,
but warning about that would be pointless, because short of breaking
the cycle by e.g. merging all the modules in the cycle into one module,
there would be nothing that programmers could do about it.

> 2. Do we require modules that don't participate in a cycle to be
> included in the dag? There's no technical reason to do so, so it's
> just busywork.

>From this point of view, I would divide modules into three categories, not two.
These would be:

1. Modules that neither depend on other modules, nor do any other modules
    depend on them.

2. Modules that either depend on other modules or have other modules
   depend on them, but are not in a cycle.

3. Modules that are part of a cyclic dependency.

We agree that type 3 modules should be in the dag. I think type 2 modules
should also be in the dag, because including their position in the dag
in a file that is available to compiler invocations is a simple and effective way
to specify which module can read which other module's .trans_opt file.

I agree that type 1 modules need not be in the dag, but that is not a surprise,
since they are independent of every other module. Such modules should not
have .trans_opt files generated for them, since the absence of the module
from the dag would mean that no other module would be allowed to read
such a .trans_opt file anyway.

> 3. Do we necessarily require a dag? Some smaller cycles may be
> acceptable, letting the compiler impose its arbitrary ordering
> as it does now.

That depends on which "we" you are talking about.

An invocation of mmc --intermod-opt A.m definitely needs to know
whether it is allowed to read B.trans_opt, which means it needs to know
the part of the dag connecting A and B. So mmc needs access to arbitrary
pieces of the dag on demand, which means it needs to be able to access
the whole dag.

This is a different question from whether we require users to fully
specify a dag. Given a user-specified set of modules which all depend
on one another, the compiler could impose its own order on them
without requiring the user to do so. The order could be alphabetical
(a smaller version of the system we have now), or it could be based on
the order in which they are written in a file, such as the list of sccs above.
The latter would have the advantage that it would allow sufficiently
knowledgeable users to specify the total order derived from the dag.

Zoltan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GRAPH.dot
Type: application/msword
Size: 20516 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/developers/attachments/20221213/e839c39a/attachment-0002.dot>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GRAPH.dot
Type: application/msword
Size: 20516 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/developers/attachments/20221213/e839c39a/attachment-0003.dot>