[m-rev.] for review: replace item_blocks with parse trees during pre-HLDS passes

Julien Fischer jfischer at opturion.com
Fri Mar 13 01:38:07 AEDT 2020


On Tue, 10 Mar 2020, Zoltan Somogyi wrote:

> For review by Julien, since he kindly volunteered.
> Thank you in advance. Note that some of the XXXs
> in the log message are questions for you. I would
> also appreciate it if, besides reviewing the diff, you
> also tried it out on the programs you are working on.

I have done so and have not encountered any problems.  One thing I did
encounter due to the changes in warnings for unused modules is:

     search.m:043:   warning: module `fd' is imported in the interface, but it is
     search.m:043:   not used in the interface.
     search.m:091: Warning: this `:- import_module' declaration for module `fd' in
     search.m:091:   the implementation section is redundant, given the
     search.m:091:   `:- import_module' declaration for the same module in the
     search.m:091:   interface section.
     search.m:043:   The previous declaration is here.

We could argubly omit the second warning in cases like the above.

> Make the pre-HLDS passes use file-kind-specific parse trees.
> 
> Replacing item blocks file-kind-specific kinds of section markers with
> file-kind-specific parse trees has several benefits.
> 
> - It allows us to encode the structural invariants of each kind of file
>   we read in in the type of its representation. This the detection of

    ... using the type of its representation ...

    This *makes* the detection of ...

>   any accidental violations of those invariants trivial.
> 
> - Since each file-kind-specific parse tree has separate lists for separate
>   kinds of items, code that wants to operate on one or a few kinds of items
>   can just operate on those kinds of items, without having to traverse
>   item blocks containing many other kinds of items as well. The most
>   important consequence of this is not the improved efficiency, though
>   that is nice, but the increased clarity of the code.
> 
> - The new design is much more flexible. For example, it should be possible
>   to record that e.g. an interface file we read in as a indirect dependency
>   (i.e. a file we read not because its module was imported by the module
>   we are compiling, but because its module was imported by *another* imported
>   module) should be used *only* for the purpose it was read in for. This should
>   avoid situations where deleting an import of A from a module, because it
>   is not needed anymore, leads the compiler to generate an error message
>   about a missing import of module B. This can happen if (a) module B
>   always *should* have been imported, since it is used, but (b) module A's
>   import of module B lead to module B's interface being available *without*
>   an import of B.
>
>   Specifically, this flexility should enable us to establish each module's

s/flexility/flexibility/

>   .int file as the single source of truth about how values of each type
>   defined in that module should be represented. When compiling each source
>   file, this approach requires the compiler to read in that module's .int file
>   but using only the type_repn items from that .int file, and nothing else.

...

> This change is big and therefore hard to review. Therefore in many files,
> this change adds "XXX CLEANUP" comments to draw attention to places that
> have issues that should be fixed, but whose fixes should come later, in
> separate diffs.

You've got at least one of the comments labelled YYY; is that supposed
to be an XXX CLEANUP one as well?

> compiler/prog_item.m:

...

>     Split the existing pragma item kind into three item kinds, which have
>     different invariants applying to them.
>
>     - The decl (short for declarative) pragmas give the compiler some
>       information, such as that a predicate is obsolete or that we
>       want to type specialize some predicate or function, that is in effect
>       part of the module's interface. Decl pragmas may appear in module
>       interfaces, and the compiler may put them into interface files;
>       neither statement is true of the other two kinds of pragmas.
>
>     - The impl (short for implementation) pragmas are named so
>       precisely because they may appear only in implementation sections.
>       They give the compiler information information that is private

Doubled-up "information".

>       to that module. Examples include foreign_decls, foreign_codes,
>       foreign_procs, and promises of clause equivalence, and requests
>       for inlining, tabling etc. These will never be put into interface files,
>       though some of them can affect the compilation of other modules
>       by being included in .opt files.
>
>     - The gen (short for generated) pragmas can never (legally) appear
>       in source files at all. They record the results of compiler
>       analyses e.g. about which arguments of a predicate are unused,
>       or what exceptions a function can throw, and accordingly they
>       should only ever occur in compiler-generated interface files.

...

> compiler/get_dependencies.m:
>     Add operations to gather implicit references to builtin modules
>     (which there have to be made available even without an explicit


Delete "there".

>     import_module or use_module declaration) in all kinds of parse trees.
>     These have more code overall, but will be at runtime, since we need
>     only look at the item kinds that may *have* such implicit references.
>
>     Add a mechanism to record the result of these gathering operations
>     in import_and_or_use_maps.
>
>     Give some types, function symbols, predicates and variables
>     more meaningful names.

...

> compiler/split_parse_tree_src.m:
>     Rearchitect how this module separates out nested submodules from within
>     the main module in a file.
>
>     Another of the jobs of this module is to generate error messages for
>     when module A includes module B twice, with special exception for

     ... with *a* special exception ...

>     the case where A's interface contains nested submodule A.B's interface, and
>     A's implementation contains nested submodule A.B's implementation.
>     Other than that, has been split_parse_tree_src.m any duplicate inclusions
>     of submodules, whether via nesting or via include_module declarations.

What is that last sentence trying to say?

>     The problem was that ironically, while it reported duplicate
>     include_module declarations as errors, it also *generated* duplicate
>     include_module declarations. Since it replaced each nested submodule
>     occurrence with an include_module declaration, in the scenario above,
>     it generated two include_module declarations for A.B. Even worse,
>     The interface incarnation of submodule A.B could contain
>     (the interface of) its own nested submodule A.B.C, while its
>     implementation incarnation could contain (the implementation section of)
>     A.B.C. Each occurrence of A.B.C would be its only occurrence in the
>     including part of its parent A.B, which means local tests for duplicates
>     do not work. (I found this out the hard way.)

...

> compiler/error_util.m:
>     Add an id field to all error_specs, which by convention should be
>     filled in with $pred. This allows a person debugging the compiler
>     where in the code an undesired error message is coming from
>     significantly easier than was previously possible.

There appear to be some words missing from that last sentence.

>     Most of the modules that have changes only "to conform to the changes
>     above" will be for this change. In many cases, the updated code
>     will also simplify the creation of the affected error_specs.
>
>     Fix a bug that looked for a phase in only one kind of error_spec.
>
>     Add some utility operations needed by other parts of this change.
>
>     Delete a previously internal function that has been moved to
>     mdbcomp/prim_data.m to make it accessible in other modules as well.
> 
> compiler/Mercury.options:
>     Ask the compiler to warn about dead predicates in every module
>     touched by this change (at least in one its earlier versions).
> 
> compiler/add_foreign_enum.m:
>     Replace a check for an inappropriately placed foreign_enum declaration
>     with a sanity check, since with this diff, the error should be caught
>     earlier.
> 
> compiler/add_mutable_aux_preds.m:
>     Delete a check for an inappropriately placed mutable declaration,
>     since with this diff, the error should be caught earlier.
> 
> compiler/add_pragma.m:
>     Instead of adding pass2 and pass3 pragmas, add decl and impl and
>     generated pragmas.
>
>     Delete the tests for generated pragma occurring anywhere except
>     .opt files, since those tests are now done earlier.
>
>     Shorten some too-long predicate names.
> 
> compiler/comp_unit_interface.m:
>     Operate on as specific kinds of parse trees as the interface of this
>     module will allow. (We could operate on more speficic parse trees

s/speficic/specific/

>     if we changed the interface, but that is future work).
>
>     Use the same predicates for handling duplicate include_module,
>     import_module and use_module declarations as everywhere else.
>
>     Delete the code of an experiment that shouldn't be needed anymore.
...

> compiler/options.m:
>     Add an option to control whether errors and/or warnings detecting
>     when deciding what should go into a .intN file be printed,
>     thus (potentially) preventing the creation of that file.
>
>     XXX Should this option be documented for users?

Yes.

...

> NEWS:
>     Mention that we now generate warnings for unused import_module and
>     use_module declarations in the interface even if the module has
>     submodules.
>
>     XXX Should we report this as a breaking change?

Yes.

...

> 
> tests/invalid/foreign_enum_invalid.err_exp:
>     Expect a different wording for an error message. It is more "standard"
>     but slightly less informative.
>     XXX Is this ok?

Yes, that's fine.  Knowing the type the foreign_enum is for isn't particularly
relevant for the error in question.

...

> diff --git a/compiler/error_util.m b/compiler/error_util.m
> index a3fae4838..a0ca070b1 100644
> --- a/compiler/error_util.m
> +++ b/compiler/error_util.m
> @@ -68,28 +68,49 @@

...

> +% The id field, which is present in all three alternatives, is totally
> +% ignored when printing error_specs. Its job is something completely different:
> +% helping developers track down where in the source code each error_spec
> +% was constructed. Without the id fields, if developers wants to know this,
> +% e.g. because they do not want the message printed, or because there is
> +% a problem with its wording, they have to grep for some words in the message.
> +% However, grepping for a single word will usually get many false hits,
> +% while grepping for two or more consecutive words in the message may miss
> +% the code generating the message, because in that code, some of those
> +% consecutive words are on different lines. On the other hand, if every
> +% place that constructs an error_spec, of any of these three varieties,
> +% fills in the id field with $pred, then finding the right place is easy:
> +% just put a breakpoint on do_write_error_spec, wait for it to be invoked
> +% to print the error_spec whose origin you want to track down, and then
> +% look its id field. Even if the predicate this identifies has several pieces
> +% of code that construct specs, the scope in which you have to search for it
> +% will be easily manageable.

I think a better alternative here would be to have a developer-only option
that cause the compiler to print the id at the end of each error message.

...

> diff --git a/compiler/prog_item.m b/compiler/prog_item.m
> index dac879fb2..7439e429e 100644
> --- a/compiler/prog_item.m
> +++ b/compiler/prog_item.m

...

> @@ -37,17 +71,99 @@

...

> +:- type maybe_implicit_import_and_or_use
> +    --->    explicit_avail(
> +                section_import_and_or_use
> +            )
> +    ;       implicit_avail(
> +                implicit_import_or_use,
> +                maybe(section_import_and_or_use)
> +            ).
> +
> +    % Values of this type specify how each module we have available
> +    % was *made* available. If a module had redundant import_module
> +    % and/or use_module declarations, each of these has had a warning
> +    % generated for it and then discarded. One of these declarations

and was then discarded.

> +    % can be made redundant redundant not only by another declaration
> +    % of the same kind in the same section, but also by more permissive
> +    % declarations; import_module declarations grant more permissions
> +    % than use_module declarations, and declarations in the interface
> +    % give more permissions than the declarations of the same kind
> +    % in the implementation section.
> +:- type section_import_and_or_use_map ==
> +    map(module_name, section_import_and_or_use).
> +:- type import_and_or_use_map ==
> +    map(module_name, maybe_implicit_import_and_or_use).
> +
> +%---------------------------------------------------------------------------%
>  %
>  % The parse_tree_{src,int,opt} types define the ASTs we use for source files,
>  % interface files and optimization files respectively.

...

> @@ -537,7 +852,7 @@
>                  ptto_struct_reuse           :: list(item_struct_reuse)
>              ).
> 
> -%-----------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
>  %
>  % A raw compilation unit is one module to be compiled. A parse_tree_src
>  % that contains N nested submodules corresponds to 1 + N raw_compilation_units,
> @@ -548,23 +863,121 @@
>  % item block containing the items in an interface or implementation section
>  % of its module.
>  %
> +% XXX CLEANUP We should stop using raw_compilation_units completely,
> +% replacing their use with a parse_tree_module_src. That also contains
> +% the same components (includes, avails, fims and items), but in a more
> +% convenient form, with different kinds of separated out and with
> +% e.g. duplicate imports and type definitions already handled.
> +%
>  % Before we convert a raw compilation unit into the HLDS, we augment it
>  % with the contents of the interface files of the modules it imports
>  % (directly or indirectly), and if requested, with the contents of the
>  % optimization files of those modules as well. The augmented compilation unit
> -% will consist of
> -%
> -% - the src_item_blocks, i.e. the item blocks of the original raw compilation
> -%   unit, some of which may be marked as implementation but being exported to
> -%   submodules,
> -% - the int_item_blocks, which were read from the interfaces of other,
> -%   directly imported modules,
> -% - the opt_item_blocks, which were read from the interfaces or optimization
> -%   files of other, indirectly imported modules.
> -%
> -% As with the parse tree types above, the contexts in these types
> -% may be term.context_init if the actual context isn't known, but if the
> -% recorded context is not term.context_init, then it is valid.
> +% will consist of the following for compiler invocations that generate
> +% target language code. (For compiler invocations that generate .int and
> +% .int2 files, all the interface files mentioned below will be replaced
> +% by .int3 files.)
> +%
> +% - The module_src field contains the original raw_compilation_unit
> +%   after being transformed into a parse_tree_module_src.
> +%
> +%   Once upon a time, we dependended on the raw compilation unit

Spelling.

> +%   having item blocks marked as "implementation section but exported
> +%   to submodules", but now we rely on the fact that some kinds of items
> +%   in the implementation section are *always* exported to the current
> +%   module's submodules (if there are any), while others are *never* exported
> +%   to submodules, and these are already separated in the
> +%   parse_tree_module_src.
> +%
> +% - The ancestor_int_specs field contains the .int0 interface files of
> +%   the ancestors of this module, which are always implicitly imported.
> +%
> +% - The direct_int_specs field contains the .int files of the modules
> +%   directly imported or used by this module, with the "override" exception
> +%   noted below.
> +%
> +% - The indirect_int_specs field contains the .int2 files of the modules
> +%   indirectly imported or used by this module, again with the "override"
> +%   exception noted below.
> +%
> +%   In this case, module A "indirectly imports or uses" module C if
> +%   module A imports or uses a module B whose .int file uses module C.
> +%   (.int files only use modules; they do not import them.)
> +%
> +%   The exceptions above are that
> +%
> +%   o   if a module's .int0 file is in the ancestor_int_specs field,
> +%       we don't include its .int1 file in the direct_int_specs field,
> +%       or its .int2 file in the indirect_int_specs field. In effect,
> +%       the appearance of a module in the ancestor_int_specs field

Trailing whitespace at the end of that last sentence.

> +%       overrides (i.e. prevents) its appearance in the direct_int_specs
> +%       or the indirect_int_specs fields.
> +%
> +%   o   if a module's .int file is in the direct_int_specs field,
> +%       we don't include its .int2 file in the indirect_int_specs field.
> +%       Again, the appearance of a module in the direct_int_specs field

Ditto.

> +%       overrides its appearance in the indirect_int_specs field.
> +%

...

> diff --git a/compiler/read_modules.m b/compiler/read_modules.m
> index 96619b6e4..3f503cee5 100644
> --- a/compiler/read_modules.m
> +++ b/compiler/read_modules.m
...

> +:- pred return_timestamp_if_needed(maybe_return_timestamp::in,
> +    maybe(timestamp)::in, maybe(timestamp)::out) is det.
> +
> +return_timestamp_if_needed(ReturnTimestamp, MaybeTimestamp0, MaybeTimestamp) :-
>      (
>          ReturnTimestamp = do_return_timestamp,
> -        ModuleTimestamp = module_timestamp(_, Timestamp, _),
> -        MaybeTimestamp = yes(Timestamp)
> +        (
> +            MaybeTimestamp0 = no,
> +            % This can happen if
> +            %
> +            % - code that does not need a timestamp enters a parse tree
> +            %   into the have_read_module_map, and then later
> +            %
> +            % - code that does need a timestamp finds the parse tree there.
> +            %
> +            % We abort because I (zs) don't think this should happen:
> +            % the use of have_read_module_maps in module_imports.m never
> +            % needs timestamps, the smart recompilation modules always
> +            % need timestamps, but the have_read_module_maps they use
> +            % are completely separate (for now). % If it turns out I am wrong,

There's a stray '%' there.

> +            % or we *do* want these two subsystems to use the same
> +            % have_read_module_maps, then there are two obvious possibilities:
> +            % either *always* store the timestamp of a file we read in, or
> +            % get the timestamp from the OS the first time it is needed
> +            % (have_read_module_maps entries include the filename, so this
> +            % is possible). The first solution is simpler, the second can
> +            % execute fewer system calls.
> +            unexpected($pred, "do_return_timestamp but no timestamp")

The change is fine otherwise.

Julien


More information about the reviews mailing list