[m-rev.] for review: replace item_blocks with parse trees during pre-HLDS passes
Julien Fischer
jfischer at opturion.com
Fri Mar 13 01:38:07 AEDT 2020
On Tue, 10 Mar 2020, Zoltan Somogyi wrote:
> For review by Julien, since he kindly volunteered.
> Thank you in advance. Note that some of the XXXs
> in the log message are questions for you. I would
> also appreciate it if, besides reviewing the diff, you
> also tried it out on the programs you are working on.
I have done so and have not encountered any problems. One thing I did
encounter due to the changes in warnings for unused modules is:
search.m:043: warning: module `fd' is imported in the interface, but it is
search.m:043: not used in the interface.
search.m:091: Warning: this `:- import_module' declaration for module `fd' in
search.m:091: the implementation section is redundant, given the
search.m:091: `:- import_module' declaration for the same module in the
search.m:091: interface section.
search.m:043: The previous declaration is here.
We could argubly omit the second warning in cases like the above.
> Make the pre-HLDS passes use file-kind-specific parse trees.
>
> Replacing item blocks file-kind-specific kinds of section markers with
> file-kind-specific parse trees has several benefits.
>
> - It allows us to encode the structural invariants of each kind of file
> we read in in the type of its representation. This the detection of
... using the type of its representation ...
This *makes* the detection of ...
> any accidental violations of those invariants trivial.
>
> - Since each file-kind-specific parse tree has separate lists for separate
> kinds of items, code that wants to operate on one or a few kinds of items
> can just operate on those kinds of items, without having to traverse
> item blocks containing many other kinds of items as well. The most
> important consequence of this is not the improved efficiency, though
> that is nice, but the increased clarity of the code.
>
> - The new design is much more flexible. For example, it should be possible
> to record that e.g. an interface file we read in as a indirect dependency
> (i.e. a file we read not because its module was imported by the module
> we are compiling, but because its module was imported by *another* imported
> module) should be used *only* for the purpose it was read in for. This should
> avoid situations where deleting an import of A from a module, because it
> is not needed anymore, leads the compiler to generate an error message
> about a missing import of module B. This can happen if (a) module B
> always *should* have been imported, since it is used, but (b) module A's
> import of module B lead to module B's interface being available *without*
> an import of B.
>
> Specifically, this flexility should enable us to establish each module's
s/flexility/flexibility/
> .int file as the single source of truth about how values of each type
> defined in that module should be represented. When compiling each source
> file, this approach requires the compiler to read in that module's .int file
> but using only the type_repn items from that .int file, and nothing else.
...
> This change is big and therefore hard to review. Therefore in many files,
> this change adds "XXX CLEANUP" comments to draw attention to places that
> have issues that should be fixed, but whose fixes should come later, in
> separate diffs.
You've got at least one of the comments labelled YYY; is that supposed
to be an XXX CLEANUP one as well?
> compiler/prog_item.m:
...
> Split the existing pragma item kind into three item kinds, which have
> different invariants applying to them.
>
> - The decl (short for declarative) pragmas give the compiler some
> information, such as that a predicate is obsolete or that we
> want to type specialize some predicate or function, that is in effect
> part of the module's interface. Decl pragmas may appear in module
> interfaces, and the compiler may put them into interface files;
> neither statement is true of the other two kinds of pragmas.
>
> - The impl (short for implementation) pragmas are named so
> precisely because they may appear only in implementation sections.
> They give the compiler information information that is private
Doubled-up "information".
> to that module. Examples include foreign_decls, foreign_codes,
> foreign_procs, and promises of clause equivalence, and requests
> for inlining, tabling etc. These will never be put into interface files,
> though some of them can affect the compilation of other modules
> by being included in .opt files.
>
> - The gen (short for generated) pragmas can never (legally) appear
> in source files at all. They record the results of compiler
> analyses e.g. about which arguments of a predicate are unused,
> or what exceptions a function can throw, and accordingly they
> should only ever occur in compiler-generated interface files.
...
> compiler/get_dependencies.m:
> Add operations to gather implicit references to builtin modules
> (which there have to be made available even without an explicit
Delete "there".
> import_module or use_module declaration) in all kinds of parse trees.
> These have more code overall, but will be at runtime, since we need
> only look at the item kinds that may *have* such implicit references.
>
> Add a mechanism to record the result of these gathering operations
> in import_and_or_use_maps.
>
> Give some types, function symbols, predicates and variables
> more meaningful names.
...
> compiler/split_parse_tree_src.m:
> Rearchitect how this module separates out nested submodules from within
> the main module in a file.
>
> Another of the jobs of this module is to generate error messages for
> when module A includes module B twice, with special exception for
... with *a* special exception ...
> the case where A's interface contains nested submodule A.B's interface, and
> A's implementation contains nested submodule A.B's implementation.
> Other than that, has been split_parse_tree_src.m any duplicate inclusions
> of submodules, whether via nesting or via include_module declarations.
What is that last sentence trying to say?
> The problem was that ironically, while it reported duplicate
> include_module declarations as errors, it also *generated* duplicate
> include_module declarations. Since it replaced each nested submodule
> occurrence with an include_module declaration, in the scenario above,
> it generated two include_module declarations for A.B. Even worse,
> The interface incarnation of submodule A.B could contain
> (the interface of) its own nested submodule A.B.C, while its
> implementation incarnation could contain (the implementation section of)
> A.B.C. Each occurrence of A.B.C would be its only occurrence in the
> including part of its parent A.B, which means local tests for duplicates
> do not work. (I found this out the hard way.)
...
> compiler/error_util.m:
> Add an id field to all error_specs, which by convention should be
> filled in with $pred. This allows a person debugging the compiler
> where in the code an undesired error message is coming from
> significantly easier than was previously possible.
There appear to be some words missing from that last sentence.
> Most of the modules that have changes only "to conform to the changes
> above" will be for this change. In many cases, the updated code
> will also simplify the creation of the affected error_specs.
>
> Fix a bug that looked for a phase in only one kind of error_spec.
>
> Add some utility operations needed by other parts of this change.
>
> Delete a previously internal function that has been moved to
> mdbcomp/prim_data.m to make it accessible in other modules as well.
>
> compiler/Mercury.options:
> Ask the compiler to warn about dead predicates in every module
> touched by this change (at least in one its earlier versions).
>
> compiler/add_foreign_enum.m:
> Replace a check for an inappropriately placed foreign_enum declaration
> with a sanity check, since with this diff, the error should be caught
> earlier.
>
> compiler/add_mutable_aux_preds.m:
> Delete a check for an inappropriately placed mutable declaration,
> since with this diff, the error should be caught earlier.
>
> compiler/add_pragma.m:
> Instead of adding pass2 and pass3 pragmas, add decl and impl and
> generated pragmas.
>
> Delete the tests for generated pragma occurring anywhere except
> .opt files, since those tests are now done earlier.
>
> Shorten some too-long predicate names.
>
> compiler/comp_unit_interface.m:
> Operate on as specific kinds of parse trees as the interface of this
> module will allow. (We could operate on more speficic parse trees
s/speficic/specific/
> if we changed the interface, but that is future work).
>
> Use the same predicates for handling duplicate include_module,
> import_module and use_module declarations as everywhere else.
>
> Delete the code of an experiment that shouldn't be needed anymore.
...
> compiler/options.m:
> Add an option to control whether errors and/or warnings detecting
> when deciding what should go into a .intN file be printed,
> thus (potentially) preventing the creation of that file.
>
> XXX Should this option be documented for users?
Yes.
...
> NEWS:
> Mention that we now generate warnings for unused import_module and
> use_module declarations in the interface even if the module has
> submodules.
>
> XXX Should we report this as a breaking change?
Yes.
...
>
> tests/invalid/foreign_enum_invalid.err_exp:
> Expect a different wording for an error message. It is more "standard"
> but slightly less informative.
> XXX Is this ok?
Yes, that's fine. Knowing the type the foreign_enum is for isn't particularly
relevant for the error in question.
...
> diff --git a/compiler/error_util.m b/compiler/error_util.m
> index a3fae4838..a0ca070b1 100644
> --- a/compiler/error_util.m
> +++ b/compiler/error_util.m
> @@ -68,28 +68,49 @@
...
> +% The id field, which is present in all three alternatives, is totally
> +% ignored when printing error_specs. Its job is something completely different:
> +% helping developers track down where in the source code each error_spec
> +% was constructed. Without the id fields, if developers wants to know this,
> +% e.g. because they do not want the message printed, or because there is
> +% a problem with its wording, they have to grep for some words in the message.
> +% However, grepping for a single word will usually get many false hits,
> +% while grepping for two or more consecutive words in the message may miss
> +% the code generating the message, because in that code, some of those
> +% consecutive words are on different lines. On the other hand, if every
> +% place that constructs an error_spec, of any of these three varieties,
> +% fills in the id field with $pred, then finding the right place is easy:
> +% just put a breakpoint on do_write_error_spec, wait for it to be invoked
> +% to print the error_spec whose origin you want to track down, and then
> +% look its id field. Even if the predicate this identifies has several pieces
> +% of code that construct specs, the scope in which you have to search for it
> +% will be easily manageable.
I think a better alternative here would be to have a developer-only option
that cause the compiler to print the id at the end of each error message.
...
> diff --git a/compiler/prog_item.m b/compiler/prog_item.m
> index dac879fb2..7439e429e 100644
> --- a/compiler/prog_item.m
> +++ b/compiler/prog_item.m
...
> @@ -37,17 +71,99 @@
...
> +:- type maybe_implicit_import_and_or_use
> + ---> explicit_avail(
> + section_import_and_or_use
> + )
> + ; implicit_avail(
> + implicit_import_or_use,
> + maybe(section_import_and_or_use)
> + ).
> +
> + % Values of this type specify how each module we have available
> + % was *made* available. If a module had redundant import_module
> + % and/or use_module declarations, each of these has had a warning
> + % generated for it and then discarded. One of these declarations
and was then discarded.
> + % can be made redundant redundant not only by another declaration
> + % of the same kind in the same section, but also by more permissive
> + % declarations; import_module declarations grant more permissions
> + % than use_module declarations, and declarations in the interface
> + % give more permissions than the declarations of the same kind
> + % in the implementation section.
> +:- type section_import_and_or_use_map ==
> + map(module_name, section_import_and_or_use).
> +:- type import_and_or_use_map ==
> + map(module_name, maybe_implicit_import_and_or_use).
> +
> +%---------------------------------------------------------------------------%
> %
> % The parse_tree_{src,int,opt} types define the ASTs we use for source files,
> % interface files and optimization files respectively.
...
> @@ -537,7 +852,7 @@
> ptto_struct_reuse :: list(item_struct_reuse)
> ).
>
> -%-----------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> %
> % A raw compilation unit is one module to be compiled. A parse_tree_src
> % that contains N nested submodules corresponds to 1 + N raw_compilation_units,
> @@ -548,23 +863,121 @@
> % item block containing the items in an interface or implementation section
> % of its module.
> %
> +% XXX CLEANUP We should stop using raw_compilation_units completely,
> +% replacing their use with a parse_tree_module_src. That also contains
> +% the same components (includes, avails, fims and items), but in a more
> +% convenient form, with different kinds of separated out and with
> +% e.g. duplicate imports and type definitions already handled.
> +%
> % Before we convert a raw compilation unit into the HLDS, we augment it
> % with the contents of the interface files of the modules it imports
> % (directly or indirectly), and if requested, with the contents of the
> % optimization files of those modules as well. The augmented compilation unit
> -% will consist of
> -%
> -% - the src_item_blocks, i.e. the item blocks of the original raw compilation
> -% unit, some of which may be marked as implementation but being exported to
> -% submodules,
> -% - the int_item_blocks, which were read from the interfaces of other,
> -% directly imported modules,
> -% - the opt_item_blocks, which were read from the interfaces or optimization
> -% files of other, indirectly imported modules.
> -%
> -% As with the parse tree types above, the contexts in these types
> -% may be term.context_init if the actual context isn't known, but if the
> -% recorded context is not term.context_init, then it is valid.
> +% will consist of the following for compiler invocations that generate
> +% target language code. (For compiler invocations that generate .int and
> +% .int2 files, all the interface files mentioned below will be replaced
> +% by .int3 files.)
> +%
> +% - The module_src field contains the original raw_compilation_unit
> +% after being transformed into a parse_tree_module_src.
> +%
> +% Once upon a time, we dependended on the raw compilation unit
Spelling.
> +% having item blocks marked as "implementation section but exported
> +% to submodules", but now we rely on the fact that some kinds of items
> +% in the implementation section are *always* exported to the current
> +% module's submodules (if there are any), while others are *never* exported
> +% to submodules, and these are already separated in the
> +% parse_tree_module_src.
> +%
> +% - The ancestor_int_specs field contains the .int0 interface files of
> +% the ancestors of this module, which are always implicitly imported.
> +%
> +% - The direct_int_specs field contains the .int files of the modules
> +% directly imported or used by this module, with the "override" exception
> +% noted below.
> +%
> +% - The indirect_int_specs field contains the .int2 files of the modules
> +% indirectly imported or used by this module, again with the "override"
> +% exception noted below.
> +%
> +% In this case, module A "indirectly imports or uses" module C if
> +% module A imports or uses a module B whose .int file uses module C.
> +% (.int files only use modules; they do not import them.)
> +%
> +% The exceptions above are that
> +%
> +% o if a module's .int0 file is in the ancestor_int_specs field,
> +% we don't include its .int1 file in the direct_int_specs field,
> +% or its .int2 file in the indirect_int_specs field. In effect,
> +% the appearance of a module in the ancestor_int_specs field
Trailing whitespace at the end of that last sentence.
> +% overrides (i.e. prevents) its appearance in the direct_int_specs
> +% or the indirect_int_specs fields.
> +%
> +% o if a module's .int file is in the direct_int_specs field,
> +% we don't include its .int2 file in the indirect_int_specs field.
> +% Again, the appearance of a module in the direct_int_specs field
Ditto.
> +% overrides its appearance in the indirect_int_specs field.
> +%
...
> diff --git a/compiler/read_modules.m b/compiler/read_modules.m
> index 96619b6e4..3f503cee5 100644
> --- a/compiler/read_modules.m
> +++ b/compiler/read_modules.m
...
> +:- pred return_timestamp_if_needed(maybe_return_timestamp::in,
> + maybe(timestamp)::in, maybe(timestamp)::out) is det.
> +
> +return_timestamp_if_needed(ReturnTimestamp, MaybeTimestamp0, MaybeTimestamp) :-
> (
> ReturnTimestamp = do_return_timestamp,
> - ModuleTimestamp = module_timestamp(_, Timestamp, _),
> - MaybeTimestamp = yes(Timestamp)
> + (
> + MaybeTimestamp0 = no,
> + % This can happen if
> + %
> + % - code that does not need a timestamp enters a parse tree
> + % into the have_read_module_map, and then later
> + %
> + % - code that does need a timestamp finds the parse tree there.
> + %
> + % We abort because I (zs) don't think this should happen:
> + % the use of have_read_module_maps in module_imports.m never
> + % needs timestamps, the smart recompilation modules always
> + % need timestamps, but the have_read_module_maps they use
> + % are completely separate (for now). % If it turns out I am wrong,
There's a stray '%' there.
> + % or we *do* want these two subsystems to use the same
> + % have_read_module_maps, then there are two obvious possibilities:
> + % either *always* store the timestamp of a file we read in, or
> + % get the timestamp from the OS the first time it is needed
> + % (have_read_module_maps entries include the filename, so this
> + % is possible). The first solution is simpler, the second can
> + % execute fewer system calls.
> + unexpected($pred, "do_return_timestamp but no timestamp")
The change is fine otherwise.
Julien
More information about the reviews
mailing list