[m-rev.] for review: --generate-dependencies-ints

Peter Wang novalazy at gmail.com
Wed Oct 11 15:03:22 AEDT 2023


On Wed, 11 Oct 2023 02:42:15 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review by anyone. The benchmarking script, and the summary
> of its results that the log message references, are attached.
> The two arenas (one built by the benchmarking script
> the conventional way, one built using the new option)
> end up with identical contents, so the new option works.
> 
> The attached diff just adds a new capability, but does not add
> any uses of it. That is because using it requires decisions
> that I would like your feedback on.
> 
> One decision is: should the new option --generate-dependencies-ints
> replace the existing --generate-dependencies option? By this I mean
> actually *deleting* the new option, and making its implementation
> replace the old implementation of --generate-dependencies.
> This case would require an update of the documentation of
> the old option, instead of documentation of the new option.
> Any announcement of the new capability would also depend
> on this decision.

No to deleting --generate-dependencies completely, see below.

> 
> If we want to keep both options, the next question is: how do we
> get mmake to use the new option? We would need some mmake target suffix
> that Mmake.rules would recognize as calling for the new option
> --generate-dependencies-ints, instead of just --generate-dependencies.
> What should that suffix be? I think .depend-ints would be a
> logical choice, but I am not sure it would look right to users.

Seems okay.

> 
> The speedup demonstrated by the benchmark results comes from
> avoiding unnecessary opening, reading and parsing of both
> source files and interface files. That same saving would
> be available if we extended --generate-dependecies-ints
> to --generate-dependecies-ints-targets, i.e. we also generated
> .c, .cs or .java files. The reason I did not do that
> is that compilation is a significantly more expensive process
> than the generation of .intN files, and therefore benefits
> much more from parallelism. This means that looping over
> the modules to be compiled with list.map2_foldl2 would not do;
> one would want to set up some worker processes, and feed them
> modules to generate target language code for from a work queue.
> That is a much bigger change than this one.
> 
> If we did implement --generate-dependecies-ints-targets,
> that would turn mmc into a whole-of-program compiler.
> (Optionally, since separate compilation would of course
> still be available.) However, it would do so only for
> programs that don't need intermodule optimization.
> Adding intermodule optimization to this framework
> would actually be the hardest part. We *could* generate
> .opt files using worker processes and a work queue,
> but in that case, recording the parse trees of the .opt files
> being written out would be a problem. While recording
> the parse tree in the *worker process that created it*
> would be trivial, *getting that parse tree back to the
> main process* would be far from trivial. And without that,
> the process of generating target language code would
> be required to open, read and parse .opt files, thus
> leaving some speedup opportunities on the table.

Right.

Also it would be replicating much of mmc --make.

> Above, I talked about integrating the new capability
> with mmake. We should of course integrate it with mmc --make
> as well, but mmc --make *already* uses worker processes,
> so I expect that the issue discussed in the above paragraph
> would need to be tackled even with --generate-dependecies-ints.

mmc --make doesn't use --generate-dependencies,
and can generate interface files in parallel (to an extent).

> Add new option --generate-dependencies-ints.
> 
> This new option extends --generate-dependencies to take advantage of the
> opportunity afforded by the fact that "mmc --generate-dependencies prog.m"
> reads in every Mercury source file in the current directory that is
> part of "prog". It does this by
> 
> - generating the .int3 file of all local-directory modules of the program;
> - generating the .int0 file for each these modules that has submodules;
> - generating the .int and .int2 files of all local-directory modules
>   of the program.
> 
> Normally, the process of creating .int0, .int and .int3 files would
> need to read in .int0 and .int3 files, but in every one of these cases,
> we have just written out those files, so simply holding onto their
> parse trees, we can skip this step. On my laptop, on a directory
> containing library/*.m, mdbcomp/*.m and compiler/*.m, generating
> the dependencies and generating all the .int3/.int0/.int/.int2 files
> takes just over 25 seconds. Using the new option, the same process
> takes less than 10 seconds.

--generate-dependencies-ints doesn't take advantage of parallelism
(yet). Generating interface files in parallel might still be faster,
despite --generate-dependencies-ints doing less work.

> compiler/options.m:
>     Add the new option.
> 
> compiler/op_mode.m:
>     Add a new variant of the existing op_mode for --generate-dependencies,
>     which we select in the presence of the new option.
> 
> compiler/mercury_compile_main.m:
>     Implement the new op_mode.
> 
>     Fix an old issue, which is that "mmc --make-private-interface x.m"
>     generated a .int0 file for *every* module in x.m, even the modules
>     that don't have any submodules.
> 
> compiler/deps_map.m:
>     The code implementing the new option needs to know which modules
>     of the program are in the current directory. The deps_map structure
>     built by the code shared with the implementation of --generate-dependencies
>     has not needed that info until now, so add a new field to the deps
>     structure of each module to provide this info.
> 
> compiler/generate_dep_d_files.m:
>     Return the deps_map created by the code that implements both
>     --generate-dependencies and thee new option to mercury_compile_main.m.

s/thee/the

> 
> compiler/write_module_interface_files.m:
>     Add a flag to the predicates that first construct the parse trees of,
>     and then write out, .int3/.int0/.int/.int2 files, that
>     mercury_compile_main.m can use to tell them to record the parse trees
>     in the have read module maps.
> 
>     Add new variants of two of those predicates that take some of their
>     arguments from burdened_module structures, since that is the form
>     in which mercury_compile_main.m has that info.
> 
> compiler/module_baggage.m:
>     The predicates in write_module_interface_files.m that generate
>     .int0/.int/.int2 files take an argument that should be the timestamp
>     of the source file they are being derived from, if that timestamp
>     is needed for smart recompilation. Until now, we only ever invoked
>     those predicates when we have just read in the source file,
>     and this timestamp was readily available. The code implementing
>     the new option needs to store this info for a short time, and
>     the module baggage is the obvious place to store it, so add this field
>     to the baggage.
> 
> compiler/error_spec.m:
>     In invocation of the compiler with new option may report errors that

s/In/An

>     prevent the construction of interface files for several modules.
>     The new code in mercury_compile.m prints the error_specs that have
>     contexts in the order of those contexts, but we want to print
>     the messages without contexts (which in this case report that
>     certain files could not be written or copied) to have a useful
>     order too. Make this possible by allowing the invisible pieces
>     we use for ordering to specify that order via a string (in this case,
>     the name of the file that e.g. could not be written), rather than
>     the only previous option, an integer.
> 
> compiler/grab_modules.m:
> compiler/make.get_module_dep_info.m:
> compiler/make.module_dep_file.m:
>     Fill in the new field in the module baggage.
> 
> compiler/check_module_interface.m:
> compiler/handle_options.m:
> compiler/make_hlds_error.m:
> compiler/parse_module.m:
> compiler/prog_foreign_enum.m:
> compiler/typecheck_error_util.m:
> compiler/typecheck_msgs.m:
> compiler/write_deps_file.m:
> compiler/write_error_spec.m:
>     Conform to the changes above.

> diff --git a/compiler/write_module_interface_files.m b/compiler/write_module_interface_files.m
> index 3122d4883..c480af661 100644
> --- a/compiler/write_module_interface_files.m
> +++ b/compiler/write_module_interface_files.m
...
>      % write_short_interface_file_int3(ProgressStream, ErrorStream, Globals,
> -    %   ParseTreeModuleSrc, Succeeded, Specs, !IO):
> +    %   AddToHrmm, ParseTreeModuleSrc, Succeeded, Specs,
> +    %   !HaveReadModuleMaps, !IO):
>      %
>      % Output the unqualified short interface file to <module>.int3.
>      %
> +    % Specs will contain any error and/or warning messages resulting
> +    % - from the process of constructing the contents of the .int3 file,
> +    % - from the process of writing it out, and
> +    % - updating the associated timestamp file.
> +    %
> +    % We bind Succeeded to "succeeded" only if all three of those processes
> +    % succeeded without errors. (Specs may contain warnings even then.)
> +    %
> +    % We add the contents of the .int3 file to !HaveReadModuleMaps if we could
> +    % construct it without any errors, *and*  AddToHrmm says we should.
> +    %

Delete the extra space.

The diff looks okay.

Peter


More information about the reviews mailing list