[m-rev.] for review: optimizing mutually recursive tail calls in MLDS

Julien Fischer jfischer at opturion.com
Sat Sep 2 23:46:26 AEST 2017


Hi Zoltan,

On Tue, 29 Aug 2017, Zoltan Somogyi wrote:

> The logical reviewer for this diff would be Paul, but I don't think
> I can ask him, so it is for review by anyone.
>
> The diff passes bootcheck on my laptop, but I would like to ask Julien,
> whether or not he reviews the change, to test it out in the Java and C#
> grades.

I have done so and not encountered any problems with this diff.

...

> Implement mutual tail call optimization for the MLDS.
> 
> It does not (yet) optimize all the calls that the LLDS backend optimizes,
> with the most significant limitation being that it handles only calls to
> procedures that return all their arguments by reference. This rules out
> det functions (thought it includes *semidet* functions), as well as the
> backends that use copy-out. Lifting those limitations is future work.

...

> diff --git a/compiler/ml_args_util.m b/compiler/ml_args_util.m
> index eeb8600..7645dd2 100644
> --- a/compiler/ml_args_util.m
> +++ b/compiler/ml_args_util.m
> @@ -32,6 +32,7 @@
>
>  :- import_module bool.
>  :- import_module list.
> +:- import_module map.
>
>  %---------------------------------------------------------------------------%
>  %
> @@ -84,6 +85,72 @@
>      list(mer_mode)::in, pred_or_func::in, code_model::in,
>      mlds_func_params::out, ml_gen_info::in, ml_gen_info::out) is det.
> 
> +    % ml_gen_tscc_arg_decls(ModuleInfo, Vars, Types, Modes, VarSet, Context,
> +    %     ProcIdInTscc, NextInArgNum0, NextOutArgNum0, NumOutputs,
> +    %     !OutArgNames, TsccInArgs, TsccInLocalVarDefns, TsccArgs,
> +    %     OwnLocalVarDefns, CopyTsccToOwnStmts) :-

s/:-/:/ at the end there.

> +    % Given a procedure in a TSCC and information about its argument list
> +    % in the form of the arguments' variable names (Vars), types (Types),
> +    % and modes (Modes), return the information the code generator needs
> +    %
> +    % - to generate code for tail calls to this procedure, from other
> +    %   procedures in the TSCC as well from itself, and
> +    % - to generate parameter passing code in the procedure's prologue.
> +    %
> +    % The former is needed before we can start generating code for *any*
> +    % of the procedures in the TSCC.
> +    % 
> +    % Parameter passing between the procedures in a TSCC is done by
> +    % giving each procedure its own list of lvn_tscc_proc_input_vars
> +    % (representing the formal input parameters) and having each tail call
> +    % assign the actual input parameters to these. Results are returned
> +    % through lvn_tscc_output_vars, each of which contains the address
> +    % of the memory slot where the result should be stored.
> +    % Both lvn_tscc_proc_input_vars and lvn_tscc_output_vars include
> +    % an argument sequence number, with separate sequences for input and
> +    % output arguments. These sequences start at 1, so all top level calls
> +    % to ml_gen_tscc_arg_decls should pass 1 as both NextInArgNum0 and
> +    % NextOutArgNum0. ml_gen_tscc_arg_decls returns the total number of
> +    % output arguments, to allow our caller to do a sanity check (requiring
> +    % all procedures in a TSCC to have the same number of output arguments).
> +    %
> +    % ml_gen_tscc_arg_decls will record the names of the output arguments
> +    % in !OutArgNames, if the initial value of !.OutArgNames contains no
> +    % such record.
> +    %
> +    % The value returned in TsccInArgs will be a list of the
> +    % lvn_tscc_proc_input_vars of the input arguments only. This is used
> +    % during the generation of code for tailcalls.
> +    %
> +    % The value returned in TsccInLocalVarDefns will be a local var definition
> +    % of each variable in TsccInArgs. This is used to define those variables
> +    % in copies of the TSCC code that are *not* for this procedure.
> +    %
> +    % The value returned in TsccArgs a list of all the arguments of the

*is* a list of

> +    % procedure, both input and output, in declaration order. This will be
> +    % the argument list of the copy of the TSCC for this procedure,
> +    % and thus it will define the lvn_tscc_proc_input_vars in that copy.
> +    % (Thus the lvn_tscc_proc_input_vars will be defined by exactly one of
> +    % TsccInLocalVarDefns and TsccInArgs in each copy.)
> +    %
> +    % Each procedure's part in the TSCC defines local MLDS variables for each
> +    % argument, both input and output (returned in OwnLocalVarDefns), and
> +    % its code starts by copying the values of each lvn_tscc_proc_input_var
> +    % *and* lvn_tscc_output_var to the local var (from OwnLocalVarDefns)
> +    % corresponding to them. For input argument, this copies the value;
> +    % for output arguments, this copies the address where the argument's value
> +    % should be put. These assignments are returned in CopyTsccToOwnStmts.
> +    %

...

> diff --git a/compiler/ml_gen_info.m b/compiler/ml_gen_info.m
> index 1fbf7dc..a20c7cb 100644
> --- a/compiler/ml_gen_info.m
> +++ b/compiler/ml_gen_info.m

...

>  %---------------------------------------------------------------------------%
>  %
> -% Initialize the ml_gen_info.
> +% Initialize the ml_gen_info, which contains the state of the code generator
> +% while it generates MLDS code for a HLDS procedure.
> +%
> +% When the HLDS procedure is part of a TSCC of several mutually-tail-recursive
> +% procedures, we bundle we generate for each procedure into a single piece of

  ... we bundle *the code* we generate ...

> +% code for *each* entry procedure of the TSCC. To help prevent accidental name
> +% collisions between the compiler-generated local variables of the different
> +% procedures, we get those procedures to use non-overlapping sets of sequence
> +% numbers in the names of those compiler-generated variables, by
>  %
> +% - creating an initial set of counters (the sources of those sequence numbers)
> +%   before starting to generate code for a TSCC, and
> +% - threading the values of those counters from one procedure to the next,
> +%   by calling ml_gen_info_final to pick up the values of those counters
> +%   after code generation is finished for one procedure, and passing them
> +%   to ml_gen_info_init when starting to generate code for the next procedure.
> +%
> +% The ml_gen_tscc_info structure contains not just these counters, but also
> +% information about tail recursion. By definition, the procedures in a TSCC
> +% contain tail recursive calls to the same set of procedures (the procedures
> +% of the TSCC), but TSCCs are computed from calls in the HLDS. A call that is
> +% a tail call in the HLDS is sometimes not a tail call in the MLDS, which can
> +% happen if making the call a tail call would leave a dangling reference.
> +% We therefore need to record whether each procedure in the TSCC has an
> +% *MLDS* tail call generated to it from any of the *other* procedures
> +% in the TSCC, because if it doesn't, then its MLDS code isn't actually
> +% mutually-tail-recursive, and it should be handled as such. (This is because
> +% the code we generate for only *self*-tail-recursive procedures has lower
> +% overhead than the code we generate for *mutually*-tail-recursive procedures.)

...


> diff --git a/compiler/ml_proc_gen.m b/compiler/ml_proc_gen.m
> index 950913c..609b7b6 100644
> --- a/compiler/ml_proc_gen.m
> +++ b/compiler/ml_proc_gen.m

...

>  :- func project_pred_proc_id_info_id(pred_proc_id_info) = pred_proc_id.
> 
> -project_pred_proc_id_info_id(pred_proc_id_info(PredProcId, _, _)) = PredProcId.
> +project_pred_proc_id_info_id(PredProcIdInfo) = PredProcId :-
> +    PredProcIdInfo = pred_proc_id_info(PredProcId, _, _, _).
> 
> +    % Partition the procedures in an SCC into following four categories.

    into *the* following four categories.

> +    %
> +    % - Those that don't contain any tail calls (NoneIdInfos).
> +    % - Those that in which we can only optimize self recursive tail calls,
> +    %   either because they don't contain any mutually tail recursive calls,
> +    %   or because we can't (yet) optimize the mutually tail recursive calls
> +    %   they do contain (SelfIdInfos).
> +    % - Those model_det procedures in which we can optimize mutually tail
> +    %   recursive calls (MutualDetIdInfos).
> +    % - Those model_semi procedures in which we can optimize mutually tail
> +    %   recursive calls (MutualSemiIdInfos).
> +    %
>  :- pred partition_scc_procs(module_info::in, list(pred_proc_id)::in,
>      list(pred_proc_id_info)::out, list(pred_proc_id_info)::out,
> -    list(pred_proc_id_info)::out) is det.
> +    list(pred_proc_id_info)::out, list(pred_proc_id_info)::out) is det.

...

> @@ -550,26 +600,832 @@ construct_func_defn(ModuleInfo, PredProcIdInfo, FuncParams, FuncBody,

...

> +ml_gen_tscc(ModuleInfo, Target, ConstStructMap, _SCCEntryPredProcIds,
> +        TsccCodeModel, TSCCE, !FuncDefns, !GlobalData, !Specs) :-
> +    TSCCE = scc_with_entry_points(PredProcIds, _CalledFromHigherTSCCs,
> +        _ExportedTSCCPredProcIds),
> +    PredProcIdList = set.to_sorted_list(PredProcIds),
> +    (
> +        PredProcIdList = [],
> +        unexpected($pred, "empty TSCC")
> +    ;
> +        PredProcIdList = [SinglePredProcId],
> +        % For a TSCC containing just one procedure, we neither need nor want
> +        % the extra overhead required for managing *mutual* tail recursion.
> +        ml_gen_proc_lookup(ModuleInfo, Target, ConstStructMap, self_tail_rec,
> +            SinglePredProcId, !FuncDefns, !GlobalData, !Specs)
> +    ;
> +        PredProcIdList = [_, _ | _],
> +        % Try to compile each procedure in the TSCC into the MLDS code
> +        % fragment that will go under "top_of_proc_i" for each i.
> +        %
> +        % Caveat 1:
> +        % By definition, every procedure in the TSCC has at least one
> +        % mutually-tail-recursive call to it from another procedure
> +        % in the TSCC. However, the TSCC is computed from the HLDS.
> +        % Some calls that look like tail calls in the HLDS turn out
> +        % *not* to be tail calls in the MLDS, because they pass as input
> +        % arguments the addresses of local variables in the caller's stack
> +        % frame, addresses that would become dangling references if the
> +        % call were made a tail call. If it turns out that *none* of the
> +        % mutually-tail-recursive calls in the HLDS to a procedure
> +        % turn out to be mutually-tail-recursive calls in the MLDS,
> +        % then including that procedure in the scheme we use above
> +        % would be a waste; it would just incur overhead for no gain
> +        % in either stack usage or speed. ml_gen_tscc_trial will return
> +        % the identities of such procedures in NoMutualPredProcIds.
> +        % All the procedures in MutualPredProcIds will have mutually-tail-
> +        % recursive calls to them in the *M*LDS. (Therefore MutualPredProcIds
> +        % may have 0 elements, or 2, or 3, or any other number except 1.)
> +        %
> +        % Caveat 2:
> +        % If the MLDS code we generated for any of MutualPredProcIds contained
> +        % any function definitions nested inside of them, then we have problem.

*a* problem

> +        % The scheme shown above would create M copies of each such nested
> +        % function definition, with M > 1. In the usual case of us generating
> +        % code non-nested code, ml_elim_nested.m would hoist out such nested

Delete the first "code" there.

> +        % definitions M times, leading to the M copies of the hoisted-out
> +        % definitions. Therefore, until we teach ml_elim_nested.m not to do
> +        % that, we abandon the optimization of mutually recursive tail calls
> +        % in the presence of such nested function definitions.

Julien.


More information about the reviews mailing list