[m-rev.] for review: --generate-dependencies-ints
Peter Wang
novalazy at gmail.com
Wed Oct 11 15:03:22 AEDT 2023
On Wed, 11 Oct 2023 02:42:15 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review by anyone. The benchmarking script, and the summary
> of its results that the log message references, are attached.
> The two arenas (one built by the benchmarking script
> the conventional way, one built using the new option)
> end up with identical contents, so the new option works.
>
> The attached diff just adds a new capability, but does not add
> any uses of it. That is because using it requires decisions
> that I would like your feedback on.
>
> One decision is: should the new option --generate-dependencies-ints
> replace the existing --generate-dependencies option? By this I mean
> actually *deleting* the new option, and making its implementation
> replace the old implementation of --generate-dependencies.
> This case would require an update of the documentation of
> the old option, instead of documentation of the new option.
> Any announcement of the new capability would also depend
> on this decision.
No to deleting --generate-dependencies completely, see below.
>
> If we want to keep both options, the next question is: how do we
> get mmake to use the new option? We would need some mmake target suffix
> that Mmake.rules would recognize as calling for the new option
> --generate-dependencies-ints, instead of just --generate-dependencies.
> What should that suffix be? I think .depend-ints would be a
> logical choice, but I am not sure it would look right to users.
Seems okay.
>
> The speedup demonstrated by the benchmark results comes from
> avoiding unnecessary opening, reading and parsing of both
> source files and interface files. That same saving would
> be available if we extended --generate-dependecies-ints
> to --generate-dependecies-ints-targets, i.e. we also generated
> .c, .cs or .java files. The reason I did not do that
> is that compilation is a significantly more expensive process
> than the generation of .intN files, and therefore benefits
> much more from parallelism. This means that looping over
> the modules to be compiled with list.map2_foldl2 would not do;
> one would want to set up some worker processes, and feed them
> modules to generate target language code for from a work queue.
> That is a much bigger change than this one.
>
> If we did implement --generate-dependecies-ints-targets,
> that would turn mmc into a whole-of-program compiler.
> (Optionally, since separate compilation would of course
> still be available.) However, it would do so only for
> programs that don't need intermodule optimization.
> Adding intermodule optimization to this framework
> would actually be the hardest part. We *could* generate
> .opt files using worker processes and a work queue,
> but in that case, recording the parse trees of the .opt files
> being written out would be a problem. While recording
> the parse tree in the *worker process that created it*
> would be trivial, *getting that parse tree back to the
> main process* would be far from trivial. And without that,
> the process of generating target language code would
> be required to open, read and parse .opt files, thus
> leaving some speedup opportunities on the table.
Right.
Also it would be replicating much of mmc --make.
> Above, I talked about integrating the new capability
> with mmake. We should of course integrate it with mmc --make
> as well, but mmc --make *already* uses worker processes,
> so I expect that the issue discussed in the above paragraph
> would need to be tackled even with --generate-dependecies-ints.
mmc --make doesn't use --generate-dependencies,
and can generate interface files in parallel (to an extent).
> Add new option --generate-dependencies-ints.
>
> This new option extends --generate-dependencies to take advantage of the
> opportunity afforded by the fact that "mmc --generate-dependencies prog.m"
> reads in every Mercury source file in the current directory that is
> part of "prog". It does this by
>
> - generating the .int3 file of all local-directory modules of the program;
> - generating the .int0 file for each these modules that has submodules;
> - generating the .int and .int2 files of all local-directory modules
> of the program.
>
> Normally, the process of creating .int0, .int and .int3 files would
> need to read in .int0 and .int3 files, but in every one of these cases,
> we have just written out those files, so simply holding onto their
> parse trees, we can skip this step. On my laptop, on a directory
> containing library/*.m, mdbcomp/*.m and compiler/*.m, generating
> the dependencies and generating all the .int3/.int0/.int/.int2 files
> takes just over 25 seconds. Using the new option, the same process
> takes less than 10 seconds.
--generate-dependencies-ints doesn't take advantage of parallelism
(yet). Generating interface files in parallel might still be faster,
despite --generate-dependencies-ints doing less work.
> compiler/options.m:
> Add the new option.
>
> compiler/op_mode.m:
> Add a new variant of the existing op_mode for --generate-dependencies,
> which we select in the presence of the new option.
>
> compiler/mercury_compile_main.m:
> Implement the new op_mode.
>
> Fix an old issue, which is that "mmc --make-private-interface x.m"
> generated a .int0 file for *every* module in x.m, even the modules
> that don't have any submodules.
>
> compiler/deps_map.m:
> The code implementing the new option needs to know which modules
> of the program are in the current directory. The deps_map structure
> built by the code shared with the implementation of --generate-dependencies
> has not needed that info until now, so add a new field to the deps
> structure of each module to provide this info.
>
> compiler/generate_dep_d_files.m:
> Return the deps_map created by the code that implements both
> --generate-dependencies and thee new option to mercury_compile_main.m.
s/thee/the
>
> compiler/write_module_interface_files.m:
> Add a flag to the predicates that first construct the parse trees of,
> and then write out, .int3/.int0/.int/.int2 files, that
> mercury_compile_main.m can use to tell them to record the parse trees
> in the have read module maps.
>
> Add new variants of two of those predicates that take some of their
> arguments from burdened_module structures, since that is the form
> in which mercury_compile_main.m has that info.
>
> compiler/module_baggage.m:
> The predicates in write_module_interface_files.m that generate
> .int0/.int/.int2 files take an argument that should be the timestamp
> of the source file they are being derived from, if that timestamp
> is needed for smart recompilation. Until now, we only ever invoked
> those predicates when we have just read in the source file,
> and this timestamp was readily available. The code implementing
> the new option needs to store this info for a short time, and
> the module baggage is the obvious place to store it, so add this field
> to the baggage.
>
> compiler/error_spec.m:
> In invocation of the compiler with new option may report errors that
s/In/An
> prevent the construction of interface files for several modules.
> The new code in mercury_compile.m prints the error_specs that have
> contexts in the order of those contexts, but we want to print
> the messages without contexts (which in this case report that
> certain files could not be written or copied) to have a useful
> order too. Make this possible by allowing the invisible pieces
> we use for ordering to specify that order via a string (in this case,
> the name of the file that e.g. could not be written), rather than
> the only previous option, an integer.
>
> compiler/grab_modules.m:
> compiler/make.get_module_dep_info.m:
> compiler/make.module_dep_file.m:
> Fill in the new field in the module baggage.
>
> compiler/check_module_interface.m:
> compiler/handle_options.m:
> compiler/make_hlds_error.m:
> compiler/parse_module.m:
> compiler/prog_foreign_enum.m:
> compiler/typecheck_error_util.m:
> compiler/typecheck_msgs.m:
> compiler/write_deps_file.m:
> compiler/write_error_spec.m:
> Conform to the changes above.
> diff --git a/compiler/write_module_interface_files.m b/compiler/write_module_interface_files.m
> index 3122d4883..c480af661 100644
> --- a/compiler/write_module_interface_files.m
> +++ b/compiler/write_module_interface_files.m
...
> % write_short_interface_file_int3(ProgressStream, ErrorStream, Globals,
> - % ParseTreeModuleSrc, Succeeded, Specs, !IO):
> + % AddToHrmm, ParseTreeModuleSrc, Succeeded, Specs,
> + % !HaveReadModuleMaps, !IO):
> %
> % Output the unqualified short interface file to <module>.int3.
> %
> + % Specs will contain any error and/or warning messages resulting
> + % - from the process of constructing the contents of the .int3 file,
> + % - from the process of writing it out, and
> + % - updating the associated timestamp file.
> + %
> + % We bind Succeeded to "succeeded" only if all three of those processes
> + % succeeded without errors. (Specs may contain warnings even then.)
> + %
> + % We add the contents of the .int3 file to !HaveReadModuleMaps if we could
> + % construct it without any errors, *and* AddToHrmm says we should.
> + %
Delete the extra space.
The diff looks okay.
Peter
More information about the reviews
mailing list