[m-rev.] for review: don't have the parser return dummy trees

Julien Fischer jfischer at opturion.com
Sat Apr 30 13:54:25 AEST 2022


On Fri, 29 Apr 2022, Zoltan Somogyi wrote:

> Don't return dummy parse_trees for missing files.
> 
> The predicates that read a Mercury source, interface or optimization file
> used to return four things:
> 
> - the name of the file (for callers who specified only a module name),
> - the timestamp, if requested and available,
> - a parse tree, and
> - a representation of any errors (sets of error categories, and error_specs).
> 
> However, these four things were not independent. Some combinations of their
> values were not allowed, but (a) there was no documentation of what these
> combinations were, and (b) code that processed these four things had to be
> prepared to handle illegal as well as legal combinations.
> 
> This diff makes these predicates return only one result, which contains
> 
> - all four of the above things, when the file could be opened, but
> - only the file name and a representation of the error if the file
>   could not be opened,
> - only the file name and a representation of *no* errors, if the caller
>   asked the predicate to read the file only if its timestamp did not match
>   a specified value, and it does match that value.
> 
> We use a somewhat modified version of an existing type, have_read_module,
> for this. It is modified both by including information that its users
> now need that they did not need before, and shortening the names of its
> function symbols, which now occur in many more places than before.
> 
> compiler/read_modules.m:
>     Make the change to the output arguments described above.
>
>     Making this change requiring having the affected predicates deal with

s/requiring/requires/

>     the case where we either cannot find or cannot open the relevant file.
>     (The two are effectively the same thing in this module, since we search
>     by attempting to open.) Passing that task off to parse_modules.m
>     was always inelegant.
>
>     Simplify the have_read_module_map type by making the key always
>     be a module_name. We deleted the only use of have_read_module_maps
>     with another kind of key a while ago.
>
>     Delete some no-longer-needed predicates.

...

> diff --git a/compiler/deps_map.m b/compiler/deps_map.m
> index 5a838516a..9c7b48ce8 100644
> --- a/compiler/deps_map.m
> +++ b/compiler/deps_map.m

...

> @@ -315,11 +326,70 @@ read_dependencies(Globals, Search, ModuleName, ExpectationContexts,
>      % XXX If SrcSpecs contains error messages, the parse tree may not be
>      % complete, and the rest of this predicate may work on incorrect data.

Why does that comment refer to SrcSpecs?  Both the existing code and this
modification don't contain that variable (assuming SrcSpecs is meant to refer
to a variable)? Does it mean Specs there?

>      read_module_src(Globals, rrm_get_deps(ModuleName), ignore_errors, Search,
> -        ModuleName, ExpectationContexts, SourceFileName,
> -        always_read_module(dont_return_timestamp), _,
> -        ParseTreeSrc, SrcReadModuleErrors, !IO),
> +        ModuleName, ExpectationContexts,
> +        always_read_module(dont_return_timestamp), HaveReadModuleSrc, !IO),
> +    (
> +        HaveReadModuleSrc = have_read_module(SourceFileName, _MaybeTimestamp,
> +            ParseTreeSrc, ReadModuleErrors)
> +    ;
> +        HaveReadModuleSrc = have_not_read_module(SourceFileName,
> +            ReadModuleErrors),
> +        % XXX Creating a dummy parse tree, from which we then construct
> +        % a single burdened module, which we then add to the deps_map,
> +        % preserves old behavior. This old behavior has two effects,
> +        % as as I can tell (zs, 2022 apr 28).

The word "far" is missing.

> +        %
> +        % - It includes modules which are imported but whose sources
> +        %   are not in the current directory among the dependencies
> +        %   of the main module in the main module's .dv file.
> +        %
> +        %   I see three main classes of such modules. The two obvious classes
> +        %   are Mercury library modules and a project's own library modules.


Due to source file mappings a project's own source files may not reside in
the current directory, although I think that distinction is not important here.

> +        %   A third class is nested submodules which are not found because
> +        %   they are looked up as if they were separate submodules. This is
> +        %   exemplified by the tests/invalid_make_int/sub_c test case,
> +        %   where sub_c.m imports sub_a.sub_1, but sub_a.sub_1.m does not
> +        %   exist, because sub_1 is a nested submodule inside sub_a.m.
> +        %
> +        %   It is arguable whether including the first two categories
> +        %   among the main module's dependencies in the .dv file
> +        %   is a good or not. With the right view path, they can detect
> +        %   when the main module needs to be rebuilt due to changes in them,
> +        %   but specifying that view path to --generate-dependencies, and
> +        %   having the compiler actually use that view path, would be
> +        %   a better idea, Including the third is probably a bad idea

Replace that comma with a full stop.

Are you assuming that the second class, the project's own library modules, have
been installed (i.e. as the result of mmake or mmc --make's install target).
It is not unknown for libraries to be used without having been installed.

> +        %   from all respects, though (a) distinguishing that category
> +        %   from the other two is nontrivial, and (b) fixing that bad idea
> +        %   would require changing the expected output of the sub_c test case.

Why is (b) an issue?  Would the new error be less meaningful?

> +        %
> +        %   The commented out code below is a possible replacement of this
> +        %   switch and the call following it. We could use it if decide
> +        %   not to put dummy parse_tree_srcs into the deps_map. Switching
> +        %   to it bootchecks, though only with a different expected output
> +        %   for the sub_c test case.
> +        %
> +        % - Strangely, even the totally empty parse_tree_src we construct
> +        %   will end up having dependencies, because we implicitly import
> +        %   both the public and private builtin modules into all modules,
> +        %   even if they contain nothing that could refer to either builtin
> +        %   module :-( However, these dependencies should not drag into
> +        %   the deps_map any module that other, nondummy modules' dependencies
> +        %   wouldn't drag in anyway.

I am puzzled about what the existing behaviour is here: are we writing out
.dv files in the presence of errors?

> +        ParseTreeSrc = parse_tree_src(ModuleName, term.dummy_context_init,
> +            cord.init)
> +    ),
>      parse_tree_src_to_burdened_module_list(Globals, SourceFileName,
> -        ParseTreeSrc, SrcReadModuleErrors, Specs, BurdenedModules),
> +        ParseTreeSrc, ReadModuleErrors, Specs, BurdenedModules),
> +%   (
> +%       HaveReadModuleSrc = have_read_module(SourceFileName, _MaybeTimestamp,
> +%       parse_tree_src_to_burdened_module_list(Globals, SourceFileName,
> +%       parse_tree_src_to_burdened_module_list(Globals, SourceFileName,
> +%           ParseTreeSrc, ReadModuleErrors, Specs, BurdenedModules)
> +%   ;
> +%       HaveReadModuleSrc = have_not_read_module(_, ReadModuleErrors),
> +%       BurdenedModules = [],
> +%       Specs = get_read_module_specs(ReadModuleErrors)
> +%   ),
>      !:Specs = Specs ++ !.Specs.

...


> diff --git a/compiler/generate_dep_d_files.m b/compiler/generate_dep_d_files.m
> index 324284773..91c84b28f 100644
> --- a/compiler/generate_dep_d_files.m
> +++ b/compiler/generate_dep_d_files.m

...

> @@ -2237,10 +2290,30 @@ update_opt_error_status(Globals, WarnOption, FileName, ModuleErrors,
>          Spec = simplest_no_context_spec($pred, severity_warning,
>              phase_read_files, Pieces),
>          !:Specs = [Spec | !.Specs]
> -        )
> +    ).
>      % NOTE: We do NOT update !Error, since a missing optimization
>      % interface file is not necessarily an error.
> -    else if
> +
> +    % update_opt_error_status_on_success(ModuleErrors, !Specs, !Error):
> +    %
> +    % Work out whether any errors have occurred while reading
> +    % a .opt or .trans_opt file, and update !Error accordingly.
> +    % Note that we can ignore *not finding* a .opt or .trans_opt file,
> +    % a situation handled nu update_opt_error_status_on_failure,

s/nu/by/

> +    % finding any error, whether syntax or semantic, inside one of these files
> +    % that does exist is always fatal. This is because it indicates either
> +    % that the Mercury compiler invocation that created it had a bug,
> +    % or that the files been tampered with later.
> +    %
> +:- pred update_opt_error_status_on_success(read_module_errors::in,
> +    list(error_spec)::in, list(error_spec)::out,
> +    maybe_opt_file_error::in, maybe_opt_file_error::out) is det.
> +
> +update_opt_error_status_on_success(ModuleErrors, !Specs, !Error) :-
> +    FatalErrors = ModuleErrors ^ rm_fatal_errors,
> +    NonFatalErrors0 = ModuleErrors ^ rm_nonfatal_errors,
> +    set.delete(rme_nec, NonFatalErrors0, NonFatalErrors),
> +    ( if
>          set.is_empty(FatalErrors),
>          set.is_empty(NonFatalErrors)
>      then

...

That looks fine otherwise.

Julien.


More information about the reviews mailing list