[m-rev.] for post-commit review: fill in much of the "compilation in detail" chapter

Julien Fischer jfischer at opturion.com
Sun Aug 17 02:03:44 AEST 2025


On Sat, 16 Aug 2025 at 08:29, Zoltan Somogyi <zoltan.somogyi at runbox.com> wrote:

> Fill in a large part of "compilation detail".
>
> Minor improvements to the preceding chapters.
>
> diff --git a/doc/mercury_user_guide.texi b/doc/mercury_user_guide.texi
> index 6a5a52be5..94d8bdb75 100644
> --- a/doc/mercury_user_guide.texi
> +++ b/doc/mercury_user_guide.texi
> @@ -208,22 +208,31 @@ Any command line argument
>  that does not start with either @samp{=} or @samp{@@}
>  will be treated either as the argument of an option
>  (if it immediately follows an option that takes an argument),
> -or as the name of a file or of a module
> +or as a @dfn{non-option argument}
>  (if it does not immediately follow an option that takes an argument).
> -The compiler assumes that arguments ending in @samp{.m} are file names,
> -while all other arguments are module names.
>  @end itemize
>
> +In the absence of the @code{--make} option,
> +whose description we will defer until @ref{Introduction to mmc --make},
> +all non-option arguments should be either
> +the name of a file, or the name of a module.
> + at c XXX what about e.g. "mmc --make prog.clean"?

I don't understand that XXX, you've already said that this applies in
the absence of the --make option.

> +The compiler assumes that
> +non-option arguments ending in @samp{.m} are file names,
> +while all other non-option arguments are module names.
> +Both file names and module names tell the compiler
> +what code it should operate on.
> +
>  @node Option arguments
>  @subsection Option arguments
>
> -The Mercury compiler follows the usual conventions around options.
> +The Mercury compiler follows the usual Unix conventions around options.

Specifically Mercury follows GNU convention around command line options.
(Those are above and beyond what, for example, POISX requires.)

>  (Some of its options (e.g.@: @samp{-c}, @samp{-o}, and @samp{-I})
>  have a similar meaning to that in compilers for other languages,
>  though of course most are specific to Mercury.)
>
> -Like most other Unix programs,
> -it supports both short (single-character) and long option names.
> +Like most other Unix programs, it supports both
> +short (single-character) and long (multi-character) option names.
>
>  On command lines, an option argument that starts with @samp{--}
>  specifies a single long option.

...

> @@ -374,6 +388,7 @@ mmc @code{prog.m}
>
>  or, more generally,
>
> + at c ZZZ @var{list of options} looks strange in the generated HTML.

Perhaps, just mmc @var{options}?

Maybe provide a concrete example of an invocation with multiple options,
e.g.

   mmc --inhibit-warnings -O5 prog.m


>  @example
>  mmc @var{list of options} @code{prog.m}
>  @end example

> @@ -1864,69 +1904,427 @@ also be included in the generated launcher shell script or batch file.

...

> -Since targeting C is the default,
> -this tells @samp{mmc} to generate C code,
> -and then invoke the configured C compiler to translate that to object code.
> - at samp{mmc} puts the generated C code into a file called @file{@var{module}.c}
> -and the generated object code into a file called @file{@var{module}.o},
> -where @var{module} is the name of the Mercury module
> -defined in @file{@var{filename}.m}.
>  If the source file contains nested modules,
> -then each submodule will get compiled to separate C and object files.
> +then both the main module in the file,
> +and all the submodules nested inside it, directly or indirect,

s/indirect/indirectly/

> +will all get compiled
> +first to separate @file{.c} (or @file{.java}, or @file{.cs} files,
> +and then to separate @file{.o} (or @file{.class}, or @file{.dll} files.
> +
> +However, before you can compile a module,
> +you must first make the interface files
> +for the modules that it imports, either directly or indirectly.
> +And if you want to compile the program with intermodule optimization,
> +then you must first also create the files that enable that.
> +The next two sections cover these in turn.
> +
> + at c ----------------------------------------------------------------------------
> +
> + at node Creating interface files
> + at section Creating interface files
>
> -Before you can compile a module,
> -you must make the interface files
> -for the modules that it imports (directly or indirectly).
>  You can create the interface files for one or more source files
>  using the following commands:
>
>  @example
> -mmc --make-short-int @var{filename1}.m @var{filename2}.m @dots{}
> -mmc --make-priv-int @var{filename1}.m @var{filename2}.m @dots{}
> -mmc --make-int @var{filename1}.m @var{filename2}.m @dots{}
> +mmc --make-short-interface @var{module_1}.m @var{module_2}.m @dots{}
> +or, equivalently,
> +mmc --make-short-int @var{module_1}.m @var{module_2}.m @dots{}
> +
> +mmc --make-private-interface @var{module_1}.m @var{module_2}.m @dots{}
> +or, equivalently,
> +mmc --make-priv-int @var{module_1}.m @var{module_2}.m @dots{}
> +
> +mmc --make-interface @var{module_1}.m @var{module_2}.m @dots{}
> +or, equivalently,
> +mmc --make-int @var{module_1}.m @var{module_2}.m @dots{}

I don't think giving both the full and abbreviated  option names here
adds anything, just use the full option names. That should also simplify
the descriptions below.

>  @end example
> + at findex --make-short-interface
>  @findex --make-short-int
> + at findex --make-private-interface
>  @findex --make-priv-int
> + at findex --make-interface
>  @findex --make-int
>
> -The first command builds (or rebuilds)
> + at itemize @bullet
> + at item
> +The commands in the first pair build (or rebuild)
>  the @samp{.int3} file of each module contained in the named source files.
> -The second command builds (or rebuilds)
> +(@emph{Not} just the top module of each source file.)
> + at item
> +The commands in the second pair build (or rebuild)
>  the @samp{.int0} file of each module contained in the named source files.
>  (Note that only modules that have submodules need @samp{.int0} files.)
> -The third command builds (or rebuilds)
> + at item
> +The commands in the third pair build (or rebuild)
>  both the @samp{.int} and @samp{.int2} file
>  of each module contained in the named source files.
> + at end itemize

...

> + at node Creating optimization files
> + at section Creating optimization files
> +
> +By default, when @command{mmc} compiles a module, say @var{module_a},
> +the only code it has access to is the code of @var{module_a} itself.
> +The only source of information that @command{mmc} has
> +about the modules imported by @var{module_a} are their @file{.int} files.
> +Beyond the definitions of types, insts and modes,
> +these contain the @emph{declarations} of predicates and functions,
> +but their @emph{definitions}.

but *not* their @emph{definitions}.

> +However, the compiler usual optimizations could do a better job

s/compiler/compiler's/

> +if they @emph{did} have access
> +to the definitions of those predicates and functions.
> +This is why the Mercury compiler has a mechanism for providing that access.
> +This mechanism, intermodule optimization, has two faces:
> +recording extra information about the nominally-private parts of each module
> +in a file,
> +and making use of that information while compiling other modules.
> +
> +Commands that do the first part look like this:
>
>  @example
> -mmc --make-opt-int @var{filename1}.m @var{filename2}.m @dots{}
> +mmc --make-optimization-interface @var{module_1}.m @var{module_2}.m @dots{}
> +or, equivalently,
> +mmc --make-opt-int @var{module_1}.m @var{module_2}.m @dots{}
>  @end example
> + at findex --make-optimization-interface
>  @findex --make-opt-int
>
> -If you are going to compile with @samp{--transitive-intermodule-optimization}
> -enabled, then you also need to create the transitive optimization files.
> - at findex --transitive-intermodule-optimization
> +Each of these commands
> +will build @file{@var{module_1}.opt}, @file{@var{module_2}.opt},
> +and in general a @file{.opt} file for each named module.
> +These files contain information that is normally private to the module
> +that the @file{.opt} file is for,
> +but which may be useful for optimization.
> +Mostly, this includes the definitions (i.e. the code)
> +of both public and private predicates and functions of the module,
> +if those definitions match one or more from a list of criteria,
> +which include (but are not limited to) the following.
> +
> + at itemize
> + at item
> +Predicates and function definitions that are so simple
> +that inlining calls to them
> +(meaning replacing the call
> +with an appropriately-renamed copy of the callee's definition)
> +is likely to result in a speedup.
> + at item
> +Predicates and function definitions
> +that contain switches on the values of arguments,
> +meaning that after inlining calls to them
> +at call sites that know the values of those arguments,
> +the switch can be eliminated.
> + at item
> +Predicates and function definitions that have higher-order arguments,
> +meaning that after inlining calls to them
> +at call sites that know the values of those higher order arguments,
> +the higher-order calls in the inlined version
> +can be replaced by first order calls.
> + at end itemize
> +
> +Beside such code, @file{.opt} files also contain
> +definitions and declarations needed to make sense of that code,
> +such as the definitions of the types, insts and modes they involve,
> +and the declarations of both
> +the predicates and functions they define.
> +and the predicates and functions they call.
> +
> +After @file{.opt} files have been created,
> +any invocation of @command{mmc} to compile say @var{module_1}
> +with the @code{--intermodule-optimization} option
> +(or @code{--intermod-opt} for short),
> +will read in, and use,
> +the @file{.opt} files of the modules that @var{module_1} imports.
> +
> +In some cases, when compiling e.g. @var{module_1},
> +an optimization would like access to information that is derived
> +not just from a module that @var{module_1} imports, call it @var{module_2},
> +but also from modules that @var{module_2} imports,
> +and they import, and so on.
> +The information that these optimizations need
> +is not so much the code of e.g. predicates
> +defined in modules that @var{module_2} imports,
> +but their effect on the properties
> +of the predicates and functions of @var{module_2} itself.
> +
> +Consider a conjunction such as
> + at example
> +... p(...), q(...), r(...), ...
> + at end example
> +where the definition of @code{r}
> +traverses a data structure created by @code{p}.
> +The Mercury compiler contains an optimization
> +that can fuse two traversals into one.
> +The optimization is called @emph{deforestation},
> +because it can eliminate the intermediate data structure
> +created by @code{p} and consumed by @file{r},
> +and in logic programming languages,
> +data structures are terms, which can be viewed as trees.
> +
> +Deforestation can fuse two traversals only if they are next to each other,
> +and in this case, the two calls to be fused are @emph{not} next to each other.
> +The first step is therefor to replace

s/therefor/therefore/

> + at example
> +... p(...), q(...), r(...), ...
> + at end example
> +with
> + at example
> +... p(...), r(...), q(...), ...
> + at end example
> +However, this is safe only in certain circumstances.
> +
> +One situation in which it is unsafe
> +occurs when @code{q} is semidet, meaning it can fail. and
> + at code{r} can throw an exception.
> +This is because in this case, the reordering above
> +can replace code that simply fails with code that throws an exception.
> +This is an @emph{no observable effect} on the execution of the program,
> +which optimizations are not allowed to make.
> +
> +To perform the above reordering,
> +the compiler needs to know that @code{r} can never throw an exception.
> +(It would also need to be able to rule out other situations
> +that could cause the reordering to have an observable effect,
> +but in this example, we are focusing on just this one.)
> +For this, it needs to know not just
> +that the code of @code{r} (which must be available
> +if we are considering fusing it with the code of @code{p})
> +contains no code to throw an exception,
> +but also that the same is true for the predicates and functions it calls,
> +and the predicates and functions they call, directly or indirectly.
> +In effect, we need to know that no predicate or function
> +in the call tree of @code{r} can throw an exception.
> +
> +ZZZ
> +
> +To make this possible in at least some cases,
> +Mercury has a mechanism to make such information available:
> + at code{.trans_opt} files.
> +These files contain analysis results,
> +with compiler options specifying the set of analyses
> +whose results they contain.
> +
> +One of these analyses is exception analysis,
> +which computes safe approximations to the set of exceptions
> +that each predicate or function can possibly throw.

It's very much an approximation since the sets it tracks are
pretty much empty and non-empty.

> +If this approximations is the empty set,

s/approximations/approximation/

> +then we know for sure that
> +the predicate or function cannot throw any exception.
> +(An approximation can overestimate
> +the set of actions that the predicate or function may perform,
> +but it is safe only if will never underestimate that set.)

on if *it* will

> +
> +Consider a call chain between functions where
> + at code{f} calls @code{g},
> + at code{g} calls @code{h}, and
> + at code{h} calls @code{i}.
> +with @code{f}, @code{g}, @code{h}, @code{i} being defined in
> + at code{module_f}, @code{module_g}, @code{module_h} and @code{module_i}
> +respectively.
> +Suppose none of these functions contain
> +any calls other than the ones listed here,
> +and none of these modules contain anything else.
> +In that case,
> +
> + at itemize
> + at item
> +to know whether @code{i} can throw exceptions,
> +we need only the code of @code{i};
> + at item
> +to know whether @code{h} can throw exceptions,
> +we need the code of @code{h} and the results of the analysis for @code{i};
> + at item
> +to know whether @code{g} can throw exceptions,
> +we need the code of @code{g} and the results of the analysis for @code{h};
> + at item
> +to know whether @code{f} can throw exceptions,
> +we need the code of @code{f} and the results of the analysis for @code{g}.
> + at end itemize
> +
> +These dependencies transfor to the files involved:

s/transfor/transfer/

The rest looks fine.

Julien.


More information about the reviews mailing list