[m-rev.] for review: Add --trans-opt-deps-spec option.

Zoltan Somogyi zoltan.somogyi at runbox.com
Thu Jan 12 15:14:59 AEDT 2023


2023-01-12 14:13 GMT+11:00 "Peter Wang" <novalazy at gmail.com>:
> This option lets the user provide a file containing information to
> remove some edges from the trans-opt dependency graph, i.e. the graph
> used to determine which .trans_opt files to read when making a module's
> own .trans_opt file.
> 
> The reason to remove edges from the graph is to break dependency cycles.
> .trans_opt files for modules within an SCC have to be made one after
> another, instead of in parallel. For example, the standard library
> contains one large SCC due to circular imports, so making .trans_opt
> files turns out to be a bottleneck when building the standard library
> (on a machine with sufficient parallelism available).
> 
> Furthermore, the user had no control over which modules in an SCC
> could read the .trans_opt files of other modules in the same SCC.
> If a and b happened to import each other, the compiler would always
> break the cycle by allowing a to read b.trans_opt, but not allow b to
> read a.trans_opt, simply based on module names. The new option lets the
> user break the cycle in a way that may improve analysis results.
> 
> compiler/options.m:
>     Add the --trans-opt-deps-spec option.
> 
> compiler/generate_dep_d_files.m:
>     If the option --trans-opt-deps-spec FILE is used, use the
>     information given in the file to remove some edges from the
>     trans-opt dependency graph.
> 
>     If --generate-module-order is passed, also output the module order
>     computed from the trans-opt dependency graph to a file. This can be
>     use to guide the writing of the spec file.

I would reword the last sentence, to something like this:

Users may find this a useful starting point when writing their own spec file.

> @@ -240,9 +240,63 @@ generate_dependencies(Globals, Mode, Search, ModuleName, DepsMap0, !IO) :-
>          % and assume that the each module's `.opt' file might import any
>          % of that module's implementation dependencies; in actual fact,
>          % it will be some subset of that.
> -
>          digraph.tc(ImpDepsGraph, IndirectOptDepsGraph),
>  
> +        % Compute the trans-opt deps for the purpose of making trans-opt
> +        % files. This is normally equal to transitive closure of the indirect

I would say ".trans_opt files".

> +        % dependencies (i.e. IndirectOptDepsGraph) since a module may read the
> +        % `.trans_opt' file of any directly or indirectly imported module.
> +        %
> +        % To deal with cycles in the graph, we impose an arbitrary order on

"we impose" used to be true, but this diff changes that. Reword to say
that we impose that order *by default*.

> +        % modules so that when making the trans-opt file for a module
> +        % "earlier" in the cycle, the compiler may read the trans-opt files
> +        % of modules "later" in the cycle, but not vice versa.
> +        %
> +        % The problem with that is twofold:
> +        % - Lack of parallelism. The trans-opt files for modules within a
> +        %   single SCC have to be made one after another.
> +        % - The arbitrary ordering is likely to produce sub-optimal
> +        %   information transfer between trans-opt files.
> +        %
> +        % To improve the situation, we allow the user to specify a file
> +        % (see read_trans_opt_deps_spec) to manually remove edges in the
> +        % dependency graph, thereby breaking up SCCs and, ideally, converting
> +        % the graph into a dag.

... to specify *a list of edges for the code below to remove* from ...

"thereby breaking up SCCs": some removed edges may not do that;
indeed, the file may contain references to edges that don't exist in the graph.
Reword to say that the *intent* is to allow the breaking up of SCCs.

> +        globals.lookup_maybe_string_option(Globals,
> +            trans_opt_deps_spec, MaybeSpecFileName),
> +        (
> +            MaybeSpecFileName = yes(SpecFileName),
> +            read_trans_opt_deps_spec_file(SpecFileName, MaybeSpec, !IO),
> +            (
> +                MaybeSpec = ok(Spec),
> +                apply_trans_opt_deps_spec(Spec, ImpDepsGraph,
> +                    TransOptDepsGraph0),
> +                digraph.tc(TransOptDepsGraph0, TransOptDepsGraph)
> +            ;
> +                MaybeSpec = error(Error),
> +                report_error(ErrorStream, Error, !IO),
> +                TransOptDepsGraph = IndirectOptDepsGraph
> +            )

Rename Spec to something like EdgesToRemove, to remove confusion
wrt error_specs. Likewise for MaybeSpec.

> +    % The --trans-opt-deps-spec file shall contain a series of terms
> +    % of either form:
> +    %
> +    %   module_allow_deps(M, [ ALLOW ]).
> +    %   module_disallow_deps(M, [ DISALLOW ]).
> +    %
> +    % where M is a Mercury module name,
> +    % and ALLOW and DISALLOW are comma-separated lists of module names.
> +    %
> +    % To make the file less verbose, `builtin' and `private_builtin' are
> +    % implicitly included in an ALLOW list unless M is itself `builtin'
> +    % or `private_builtin'.

I don't think we discussed this earlier, but I think it is worth considering
whether this policy should be extended either 

- to all the library modules whose names contain "builtin". or
- to all the library modules that the compiler can consider a module to depend on
  implicitly, which includes all the "x_builtin" modules, but also the others that
  can be returned by compute_implicit_avail_needs in get_dependencies.m.

> +    % TODO: report errors using error specs
> +    % TODO: report multiple errors

If you want, I can do these after you commit.

> +:- pred read_trans_opt_deps_spec_file(string::in,
> +    maybe_error(trans_opt_deps_spec)::out, io::di, io::uo) is det.
> +
> +read_trans_opt_deps_spec_file(FileName, Result, !IO) :-
> +    io.read_named_file_as_string(FileName, ReadResult, !IO),
> +    (
> +        ReadResult = ok(Contents),
> +        string.length(Contents, ContentsLen),
> +        StartPos = posn(1, 0, 0),

I will look into defining a function in the library to return this tuple,
so people won't have to figure out the numbers each time they write
code like this.

> +parse_trans_opt_deps_spec_file(FileName, Contents, ContentsLen, !Pos,
> +        Result, !SpecFile) :-

I would put the two state var args next to each other.

> +:- pred parse_trans_opt_deps_spec_term(term::in, maybe_error::out,
> +    trans_opt_deps_spec::in, trans_opt_deps_spec::out) is det.
> +
> +parse_trans_opt_deps_spec_term(Term, Result, !SpecFile) :-
> +    ( if
> +        Term = functor(atom(AtomName), [LeftTerm, RightTerm], _Context),
> +        (
> +            AtomName = "module_allow_deps"
> +        ;
> +            AtomName = "module_disallow_deps"
> +        ),
> +        try_parse_symbol_name(LeftTerm, SourceName)
> +    then
> +        ( if
> +            AtomName = "module_allow_deps",
> +            SourceName \= unqualified("builtin"),
> +            SourceName \= unqualified("private_builtin")
> +        then
> +            TargetList0 = [
> +                unqualified("builtin"),
> +                unqualified("private_builtin")
> +            ]
> +        else
> +            TargetList0 = []
> +        ),
> +        parse_trans_opt_deps_spec_module_list(RightTerm, Result0,
> +            TargetList0, TargetList),
> +        (
> +            Result0 = ok,
> +            set.list_to_set(TargetList, TargetSet),
> +            (
> +                AtomName = "module_allow_deps",
> +                AllowOrDisallow = module_allow_deps(TargetSet)
> +            ;
> +                AtomName = "module_disallow_deps",
> +                AllowOrDisallow = module_disallow_deps(TargetSet)
> +            ),

This code seems to include builtin and private_builtin automatically
in DISALLOW lists, as well as ALLOW lists. I don't think this is what you intend.
I would always set TargetList0 to [], and insert these two modules into TargetSet0
in the allow case, if the module name is not itself builtin/private_builtin.

> +:- pred parse_trans_opt_deps_spec_module_list(term::in, maybe_error::out,
> +    list(module_name)::in, list(module_name)::out) is det.
> +
> +parse_trans_opt_deps_spec_module_list(Term, Result, !RevModuleNames) :-
> +    ( if Term = functor(atom("[]"), [], _Context) then
> +        Result = ok
> +    else if Term = functor(atom("[|]"), [HeadTerm, TailTerm], _Context) then
> +        ( if try_parse_symbol_name(HeadTerm, ModuleName) then
> +            !:RevModuleNames = [ModuleName | !.RevModuleNames],
> +            parse_trans_opt_deps_spec_module_list(TailTerm, Result,
> +                !RevModuleNames)
> +        else
> +            get_term_context(Term) = context(FileName, LineNum),
> +            string.format("%s:%d: expected module name",
> +                [s(FileName), i(LineNum)], Msg),
> +            Result = error(Msg)
> +        )
> +    else
> +        get_term_context(Term) = context(FileName, LineNum),
> +        string.format("%s:%d: expected list", [s(FileName), i(LineNum)], Msg),
> +        Result = error(Msg)
> +    ).

You can simplify this code by using list_term_to_term_list; there is
no need to duplicate its logic here.

> +apply_module_allow_deps(SourceKey, AllowSet, TargetKey, !Graph) :-

I would put the two Key args next to each other, both here and in the
next predicate.

> diff --git a/compiler/mercury_compile_make_hlds.m b/compiler/mercury_compile_make_hlds.m
> index 5b61cbf7e..afe5af779 100644
> --- a/compiler/mercury_compile_make_hlds.m
> +++ b/compiler/mercury_compile_make_hlds.m

> @@ -489,14 +498,14 @@ maybe_grab_plain_and_trans_opt_files(ProgressStream, ErrorStream, Globals,
>      (
>          OpModeAugment = opmau_make_trans_opt,
>          (
> -            MaybeTransOptDeps = yes(TransOptDeps),
> +            MaybeDFileTransOptDeps = yes(DFileTransOptDeps),
>              % When creating the trans_opt file, only import the
> -            % trans_opt files which are lower in the ordering.
> +            % trans_opt files which are listed in the `.d' file.

"listed in the .d file" means "anywhere in the .d file". Is this what you
want to say?

> @@ -2388,6 +2390,7 @@ long_option("show-developer-type-repns",            show_developer_type_repns).
>  long_option("show-developer-type-representations",  show_developer_type_repns).
>  long_option("show-dependency-graph",    show_dependency_graph).
>  long_option("imports-graph",            imports_graph).
> +long_option("trans-opt-deps-spec",      trans_opt_deps_spec).
>  long_option("dump-trace-counts",        dump_trace_counts).
>  long_option("dump-hlds",                dump_hlds).
>  long_option("hlds-dump",                dump_hlds).

There should also be a help message paragraph for the new option,
even if it is commented out for now.

Likewise an entry in doc/user_guide.texi.

> +:- type maybe_include_trans_opt_rule
> +    --->    do_not_include_trans_opt_rule
> +    ;       include_trans_opt_rule(trans_opt_rule_info).
> +
> +:- type trans_opt_rule_info
> +    --->    trans_opt_deps_from_order(set(module_name))
> +    ;       trans_opt_deps_from_d_file(set(module_name)).

Please document the semantics of this type.

>      % write_dependency_file(Globals, BurdenedAugCompUnit, MaybeIntermodDeps,
> -    %   AllDeps, MaybeTransOptDeps, !IO):
> +    %   AllDeps, MaybeInclTransOptRule, !IO):
>      %
>      % Write out the per-module makefile dependencies (`.d') file for the
>      % specified module. AllDeps is the set of all module names which the
>      % generated code for this module might depend on, i.e. all that have been
>      % used or imported, directly or indirectly, into this module, including
>      % via .opt or .trans_opt files, and including parent modules of nested
> -    % modules. MaybeTransOptDeps is a list of module names which the
> -    % `.trans_opt' file may depend on. This is set to `no' if the
> -    % dependency list is not available.
> +    % modules. MaybeInclTransOptRule controls whether to include a
> +    % trans_opt_deps rule in the file.

Add ", and if so, what that rule should say."


>  :- pred construct_trans_opt_deps_rule(globals::in,
> -    maybe(list(module_name))::in, set(module_name)::in, string::in,
> -    list(mmake_entry)::out, io::di, io::uo) is det.
> +    maybe_include_trans_opt_rule::in, maybe_intermod_deps::in,
> +    string::in, list(mmake_entry)::out, io::di, io::uo) is det.
>  
> -construct_trans_opt_deps_rule(Globals, MaybeTransOptDeps, LongDeps,
> +construct_trans_opt_deps_rule(Globals, MaybeInclTransOptRule, IntermodDeps,
>          TransOptDateFileName, MmakeRulesTransOpt, !IO) :-
>      (
> -        MaybeTransOptDeps = yes(TransOptDeps0),
> -        set.intersect(set.list_to_set(TransOptDeps0), LongDeps,
> -            TransOptDateDeps),
> +        MaybeInclTransOptRule = include_trans_opt_rule(TransOptRuleInfo),
> +        % There are two cases when we will write a trans_opt_deps rule.
> +        (
> +            % We reach this case when explicitly generating dependencies.

Move the switch unification before this comment, because without that,
this comment does not make sense. Likewise in the second arm.

> +            % TransOptDeps0 are the dependencies taken from the (possibly
> +            % user-adjusted) trans-opt dependency graph.
> +            %
> +            % TransOptOrder contains the list of modules that occur later
> +            % than the current module in a topological ordering of the
> +            % trans-opt dependency graph.

"THE trans-opt dependency graph": there is more than one such graph
(with/without deleted edges), so say explicitly which one you mean.

> +            % We take the intersection of TransOptOrder and TransOptDeps0
> +            % to eliminate any circularities that might arise in the
> +            % trans_opt_deps rules if we were to use TransOptDeps0 as-is.
> +            TransOptRuleInfo = trans_opt_deps_from_order(TransOptOrder),
> +            (
> +                IntermodDeps = intermod_deps(_, _, _, _, TransOptDeps0),
> +                set.intersect(TransOptOrder, TransOptDeps0, TransOptDeps)
> +            ;
> +                IntermodDeps = no_intermod_deps,
> +                unexpected($pred, "no_intermod_deps")
> +            )
> +        ;
> +            % We reach this case when the `.d' file is being automatically
> +            % rewritten after producing target code, etc. We will not have
> +            % computed the trans-opt dependency graph, and we will not have
> +            % read the trans-opt-deps-spec file.
> +            %
> +            % What we can do is write the new `.d' file with the same trans-opt
> +            % dependencies as the old `.d' file. As source files are modified,
> +            % the trans-opt dependencies listed in the `.d' file may become out
> +            % of date, so the user will need to explicitly regenerate
> +            % dependencies.
> +            %
> +            % Note: we used to take the intersection with LongDeps, but this
> +            % case was not separated from the previous case and it greatly
> +            % reduces the set of dependencies, so I'm not sure if it was
> +            % intentional. --pw

Your diff deletes the LongDeps argument of this predicate, so this needs to be
updated to refer to LongDeps *in the caller*.

> +            TransOptRuleInfo = trans_opt_deps_from_d_file(DFileTransOptDeps),
> +            TransOptDeps = DFileTransOptDeps
> +        ),
>          % Note that maybe_read_dependency_file searches for
>          % this exact pattern.
>          make_module_file_names_with_suffix(Globals,
>              ext_other(other_ext(".trans_opt")),
> -            set.to_sorted_list(TransOptDateDeps), TransOptDateDepsFileNames,
> +            set.to_sorted_list(TransOptDeps), TransOptDepsFileNames,
>              !IO),

This change seems strange, but it seems that the original variable name
was simply badly chosen.


> -generate_dependencies_write_d_files(_, [], _, _, _, _, _, _, !IO).
> +generate_dependencies_write_d_files(_, [], _, _, _, _, _, _, _, !IO).
>  generate_dependencies_write_d_files(Globals, [Dep | Deps],
>          IntDepsGraph, ImpDepsGraph, IndirectDepsGraph, IndirectOptDepsGraph,
> -        TransOptOrder, DepsMap, !IO) :-
> +        TransOptDepsGraph, TransOptOrder, DepsMap, !IO) :-
>      generate_dependencies_write_d_file(Globals, Dep,
>          IntDepsGraph, ImpDepsGraph, IndirectDepsGraph, IndirectOptDepsGraph,
> -        TransOptOrder, DepsMap, !IO),
> +        TransOptDepsGraph, TransOptOrder, DepsMap, !IO),
>      generate_dependencies_write_d_files(Globals, Deps,
>          IntDepsGraph, ImpDepsGraph, IndirectDepsGraph, IndirectOptDepsGraph,
> -        TransOptOrder, DepsMap, !IO).
> +        TransOptDepsGraph, TransOptOrder, DepsMap, !IO).
>  
>  :- pred generate_dependencies_write_d_file(globals::in, deps::in,
>      deps_graph::in, deps_graph::in, deps_graph::in, deps_graph::in,
> -    list(module_name)::in, deps_map::in, io::di, io::uo) is det.
> +    deps_graph::in, list(module_name)::in, deps_map::in, io::di, io::uo)
> +    is det.
>  
>  generate_dependencies_write_d_file(Globals, Dep,
>          IntDepsGraph, ImpDepsGraph, IndirectDepsGraph, IndirectOptDepsGraph,
> -        TransOptOrder, _DepsMap, !IO) :-
> +        TransOptDepsGraph, FullTransOptOrder, _DepsMap, !IO) :-
>      % XXX The fact that _DepsMap is unused here may be a bug.
>      Dep = deps(_, BurdenedModule),
>      BurdenedModule = burdened_module(Baggage, ParseTreeModuleSrc),
> @@ -1284,24 +1342,29 @@ generate_dependencies_write_d_file(Globals, Dep,
>              IndirectDeps)
>      ),
>  
> -    IntermodDeps = intermod_deps(IntDeps, ImpDeps, IndirectDeps,
> -        IndirectOptDeps),
> +    get_dependencies_from_graph(TransOptDepsGraph, ModuleName, TransOptDeps0),
> +    set.delete(ModuleName, TransOptDeps0, TransOptDeps),
>  
> -    % Compute the trans-opt dependencies for this module. To avoid
> -    % the possibility of cycles, each module is only allowed to depend
> -    % on modules that occur later than it in the TransOptOrder.
> +    IntermodDeps = intermod_deps(IntDeps, ImpDeps, IndirectDeps,
> +        IndirectOptDeps, TransOptDeps),
> +
> +    % Compute the maximum allowable trans-opt dependencies for this module.
> +    % To avoid the possibility of cycles, each module is only allowed to depend
> +    % on modules that occur later than it in the FullTransOptOrder.

s/later than it in the/after it in/

>      FindModule =
>          ( pred(OtherModule::in) is semidet :-
>              ModuleName \= OtherModule
>          ),
> -    list.drop_while(FindModule, TransOptOrder, TransOptDeps0),
> -    ( if TransOptDeps0 = [_ | TransOptDeps1] then
> +    list.drop_while(FindModule, FullTransOptOrder, TailTransOptOrder),
> +    ( if TailTransOptOrder = [_ | TransOptOrderList] then
>          % The module was found in the list.
> -        TransOptDeps = TransOptDeps1
> +        set.list_to_set(TransOptOrderList, TransOptOrder)
>      else
> -        TransOptDeps = []
> +        set.init(TransOptOrder)
>      ),

Not a new issue, but I would s/FindModule/NotThisModule/

> -maybe_output_module_order(Globals, ModuleName, DepsOrdering, !IO) :-
> -    globals.lookup_bool_option(Globals, generate_module_order, Order),
> +output_module_order(Globals, ModuleName, Ext, DepsOrdering, !IO) :-
> +    module_name_to_file_name(Globals, $pred, do_create_dirs,
> +        ext_other(Ext), ModuleName, OrdFileName, !IO),
> +    get_progress_output_stream(Globals, ModuleName, ProgressStream, !IO),
> +    globals.lookup_bool_option(Globals, verbose, Verbose),
> +    string.format("%% Creating module order file `%s'...",
> +        [s(OrdFileName)], CreatingMsg),
> +    maybe_write_string(ProgressStream, Verbose, CreatingMsg, !IO),
> +    io.open_output(OrdFileName, OrdResult, !IO),
>      (
> -        Order = yes,
> -        module_name_to_file_name(Globals, $pred, do_create_dirs,
> -            ext_other(other_ext(".order")), ModuleName, OrdFileName, !IO),
> -        get_progress_output_stream(Globals, ModuleName, ProgressStream, !IO),
> -        globals.lookup_bool_option(Globals, verbose, Verbose),
> -        string.format("%% Creating module order file `%s'...",
> -            [s(OrdFileName)], CreatingMsg),
> -        maybe_write_string(ProgressStream, Verbose, CreatingMsg, !IO),
> -        io.open_output(OrdFileName, OrdResult, !IO),
> -        (
> -            OrdResult = ok(OrdStream),
> -            io.write_list(OrdStream, DepsOrdering, "\n\n",
> -                write_module_scc(OrdStream), !IO),
> -            io.close_output(OrdStream, !IO),
> -            maybe_write_string(ProgressStream, Verbose, " done.\n", !IO)
> -        ;
> -            OrdResult = error(IOError),
> -            maybe_write_string(ProgressStream, Verbose, " failed.\n", !IO),
> -            maybe_flush_output(ProgressStream, Verbose, !IO),
> -            get_error_output_stream(Globals, ModuleName, ErrorStream, !IO),
> -            io.error_message(IOError, IOErrorMessage),
> -            string.format("error opening file `%s' for output: %s",
> -                [s(OrdFileName), s(IOErrorMessage)], OrdMessage),
> -            report_error(ErrorStream, OrdMessage, !IO)
> -        )
> +        OrdResult = ok(OrdStream),
> +        io.write_list(OrdStream, DepsOrdering, "\n\n",
> +            write_module_scc(OrdStream), !IO),
> +        io.close_output(OrdStream, !IO),
> +        maybe_write_string(ProgressStream, Verbose, " done.\n", !IO)
>      ;
> -        Order = no
> +        OrdResult = error(IOError),
> +        maybe_write_string(ProgressStream, Verbose, " failed.\n", !IO),
> +        maybe_flush_output(ProgressStream, Verbose, !IO),
> +        get_error_output_stream(Globals, ModuleName, ErrorStream, !IO),
> +        io.error_message(IOError, IOErrorMessage),
> +        string.format("error opening file `%s' for output: %s",
> +            [s(OrdFileName), s(IOErrorMessage)], OrdMessage),
> +        report_error(ErrorStream, OrdMessage, !IO)
>      ).

This kind of change is easier to review with "git diff -b".

Apart from the issues above, the diff is fine.

Zoltan.


More information about the reviews mailing list