[m-rev.] For review: --inline-linear-tail-rec-sccs

Paul Bone paul at bone.id.au
Thu Mar 23 12:13:33 AEDT 2017
Previous message: [m-rev.] For review: --inline-linear-tail-rec-sccs
Next message: [m-rev.] For review: --inline-linear-tail-rec-sccs
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Mar 16, 2017 at 09:30:18AM +1100, Zoltan Somogyi wrote:
> For review by anyone.
> 
> This is something we discussed earlier this year.
> 
> I am particularly interested in whether we should allow
> a procedure in the SCC to have *more* than one tail recursive
> call (to the same callee) despite the code size implications.
> (This would obviously require a new option name, and updated
> documentation.)

As with most optimisations, I think a mix of special cases (like this one)
and general approaches is best.  It's good to have this special (linear)
case handled, but I guess that it would be best to handle all the other
cases with a general "should we inline things in this SCC?" decision.
However such a general solution would need to be revised (maybe turned off)
once the MLDS can do mutual tail recursion since the MLDS tail recursion
won't increase the code size.

A general inlining solution could also consider each call site individually.
It might not eliminate mutual recursion, but it might reduce its impact and
make it easier for the MLDS backend to deal with.

One other special case I can think of is when you have linear recursion, but
not all the recursive calls are tail calls.

p :-
    ...,
    q.   % Tail call

q :-
    ...,
    r,   % Non-tail call
    ....

r :-
    ...,
    p.   % Tail call.

This can be inlined in the same way as you have done.  Except that r should
be inlined into q, rather than being made the self-tail call.  If necessary
(if r is the entry into the SCC), then it could be duplicated, one copy made
to enter the SCC, and other inlined within q.

If there are multiple non-tail calls, but the recursion is otherwise linear,
this may still be worthwhile, it won't result in constant stack usage but
hopefully it will be more efficient.

Now I'll review your diff:

> compiler/mark_tail_calls.m:
>     Generalize the code to look either for
>     
>     - just self tail recursive calls, as at the moment, as needed by both
>       the LLDS and the MLDS code generator (for different purposes), or for
>     - both self and mutual tail recursive calls, as needed by the new
>       kind of inlining.
> 
>     Give the top level predicates names that indicate their overall purpose.
> 
>     Change the representation of the at_tail type to separate out the
>     flag that says whether we have or haven't seen a recursive call
>     in the backward traversal so far. This yields cleaner code
>     in several places.
> 
>     Store the list of generated error_specs, and the flag that says whether
>     the traversal has found any recursive call, in the mark_tail_calls_info
>     structure, instead of returning them as separate output arguments.
>     This is better, since *most* places don't add error_specs or set the flag.

Ah, I see here is where our code would conflict where we duplicated effort.

> compiler/ml_tailcall.m:
>     ZZZ
> 

;-)  I guess you forgot to fill this in.

> diff --git a/compiler/hlds_dependency_graph.m b/compiler/hlds_dependency_graph.m
> index 20cd5ba..53a9e53 100644
> --- a/compiler/hlds_dependency_graph.m
> +++ b/compiler/hlds_dependency_graph.m
> @@ -66,19 +66,43 @@
>  :- pred module_info_rebuild_dependency_info(module_info::in, module_info::out,
>      hlds_dependency_info::out) is det.
>  
> +%---------------------%
> +
>  :- type include_imported
>      --->    include_imported
>      ;       do_not_include_imported.
>  
> -    % Build the dependency graph of predicates.
> +    % Should the dependency graph include an edge from p to q
> +    % only if p calls q in a tail call (only_tail_calls calls for this),
> +    % or if p calls q in any call, and if p references q in a unification
> +    % (all_calls_and_unifies).
> +    %
> +    % Note that only_tail_calls requires recursive calls to be marked by
> +    % mark_tail_calls.m, and mark_tail_calls.m requires a previously built
> +    % dependency graph, which therefore must have been built with
> +    % all_calls_and_unifies.

So I want to check that I understand.

    You build the dependency info with all_calls_and_unifies,
    Then mark tail calls,
    Then build a new dependency info with only_tail_calls,
    Then do inlining, which uses these markers to perform the new
    optimisation.
    Then mark tail calls again for the LLDS potentially generating warnings.

The problem I want to avoid, and it looks like you have, is that warnings
for mutualy recursive calls that are not tail calls (generated by
mark_tail_calls) could be incorrect if generated before inlining, or on dead
versions of the SCCs that have not yet been eliminated.

I'm asking, not because I think your change is incorrect, but so that I
understand.

> diff --git a/compiler/inlining.m b/compiler/inlining.m
> index 72db5d9..325784e 100644
> --- a/compiler/inlining.m
> +++ b/compiler/inlining.m
> @@ -237,84 +251,247 @@ inlining(!ModuleInfo) :-
>      else
>          map.init(NeededMap)
>      ),
> +    Params = inline_params(Simple, SingleUse, LinearTailRec,
> +        CallCost, CompoundThreshold, SimpleThreshold, VarThreshold,
> +        HighLevelCode, AnyTracing, NeededMap),
>  
> -    % Build the call graph and extract the topological sort.
> -    % NOTE: the topological sort returns a list of SCCs. Clearly, we want to
> -    % process the SCCs bottom to top (which is the order that they are
> -    % returned), but it is not easy to guess the best way to flatten each SCC
> -    % to achieve the best result. The current implementation just uses the
> -    % ordering of the list returned by the topological sort. A more
> -    % sophisticated approach would be to break the cycle so that
> -    % the procedure(s) that are called by higher SCCs are processed last,
> -    % but we do not implement that yet.
> -
> -    module_info_ensure_dependency_info(!ModuleInfo, DepInfo),
> -    PredProcs = dependency_info_get_condensed_bottom_up_sccs(DepInfo),
> -    set.init(InlinedProcs0),
> -    do_inlining(PredProcs, NeededMap, Params, InlinedProcs0, !ModuleInfo),
> +    % Build the call graph and extract the list of SCC. We process
> +    % SCCs bottom up, so that if a caller wants to inline a callee
> +    % in a lower SCC, it gets the *already optimized* version of the callee.
> +    % If LinearTailRec = no, we don't try to do anything special about
> +    % calls where the callee is in the *same* SCC as the caller.

List of SCCs

> +
> +    (
> +        LinearTailRec = no,
> +        module_info_ensure_dependency_info(!ModuleInfo, DepInfo)
> +    ;
> +        LinearTailRec = yes,
> +        % For this, we need *accurate* information about SCCs.
> +        % I (zs) am not 100% certain that every pass before this one
> +        % that invalidates any existing dependency info also clobbers
> +        % that dependency info.
> +        module_info_rebuild_dependency_info(!ModuleInfo, DepInfo),
> +        mark_self_and_mutual_tail_rec_calls_in_module(DepInfo, !ModuleInfo)
> +    ),

I see the marking phase is called here.

> +
> +:- pred inline_in_maybe_linear_tail_rec_scc(inline_params::in,
> +    scc_with_entry_points::in, list(pred_proc_id)::in,
> +    set(pred_proc_id)::in, set(pred_proc_id)::out,
> +    module_info::in, module_info::out) is det.
> +
> +inline_in_maybe_linear_tail_rec_scc(Params, SCCEntryPoints, SCCProcs,
> +        !ShouldInlineProcs, !ModuleInfo) :-
> +    SCCEntryPoints = scc_with_entry_points(SCC,
> +        CalledFromHigherSCCs, Exported),
> +    TSCCDepInfo =
> +        build_proc_dependency_graph(!.ModuleInfo, SCC, only_tail_calls),
> +    get_bottom_up_sccs_with_entry_points(!.ModuleInfo, TSCCDepInfo,
> +        TSCCsEntries),
> +    ( if
> +        % If every procedure in the SCC is reachable from every other
> +        % via tail recursive calls, and ...
> +        TSCCsEntries = [_],
> +        % ... the number of tail recursive calls in the SCC is equal
> +        % to the number of procedures in the SCC, ...
> +        TSCCArcs = dependency_info_get_arcs(TSCCDepInfo),
> +        list.length(TSCCArcs, NumTSCCArcs),
> +        list.length(SCCProcs, NumSCCProcs),
> +        NumTSCCArcs = NumSCCProcs

What happens if you have:

    p :-
        q.
    q :-
        (
            r
        ;
            r
        ).
    r :-
        p,  % non-tail call.
        ....

Now there are three procedures, and three recursive tail calls.  But we
probably dont want to inline r twice.  Particularly if r is or becomes large
due to other inlining.

I also wonder how useful it would be to relax this.  If only some of the
tail calls are mutual calls.

    p :-
        q.
    q :-
        (
            r
        ;
            r,  % non-tail call.
            ...
        ).
    r :-
        p.

Inlining r into the tail recursive call site in q.  and q into p will result
in constant stack space _if_ the first branch is executed.  If the first
branch _is_ executed more often then this could be a good optimisation, if
it isn't I don't think it would be a sagnificant pesimisation.  I guess the
real question is would it be an optimisation enough of the time ti make
writing the optimisation worth-while.

> diff --git a/compiler/mark_tail_calls.m b/compiler/mark_tail_calls.m
> index 40ba7cc..36d025f 100644
> --- a/compiler/mark_tail_calls.m
> +++ b/compiler/mark_tail_calls.m
> @@ -1,10 +1,10 @@
> -%-----------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
>  % vim: ft=mercury ts=4 sw=4 et
> -%-----------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
>  % Copyright (C) 2001-2008, 2010-2012 The University of Melbourne.
>  % This file may only be copied under the terms of the GNU General
>  % Public License - see the file COPYING in the Mercury distribution.
> -%-----------------------------------------------------------------------------%

You should update the copyright information on each of the files you touch.
Copyright (C) 2017 The Mercury Team

> @@ -49,42 +51,59 @@
>  
>  :- import_module list.
>  
> -:- pred mark_tail_calls_in_pred(pred_id::in,
> -    module_info::in, module_info::out, pred_info::in, pred_info::out,
> -    list(error_spec)::in, list(error_spec)::out) is det.
> +%---------------------------------------------------------------------------%
>  
> -    % This pass is only required when either
> -    %
> -    % - tail recursion warnings are enabled (via the command line option
> -    %   or the require_tail_recursion pragma), or
> -    % - a debug grade is used.
> +    % Mark both self and mutual tail recursive calls in the module.
>      %
> -    % In all other situations the HLDS traversal can be skipped and
> -    % mark_tail_calls_in_proc will return not_changed to allow the caller
> -    % to avoid updating the proc table.
> +    % Unlike the predicates below serving the LLDS code generator,
> +    % this predicate never generates any error messages, and it never
> +    % restricts it attention to only *self* tail recursive calls.
>      %

Ah.  I see how you can generate warnings after inlining.

> diff --git a/doc/user_guide.texi b/doc/user_guide.texi
> index bb94558..e990a8d 100644
> --- a/doc/user_guide.texi
> +++ b/doc/user_guide.texi
> @@ -8770,6 +8770,14 @@ containing more than @var{threshold} variables. Procedures
>  containing large numbers of variables can cause
>  slow compilation.
>  
> + at item --inline-linear-tail-rec-sccs @var{threshold}
> + at findex --inline-linear-tail-rec-sccs
> +Given a set of mutually recursive procedures (an SCC, or strongly
> +connected component, of the call graph) in which each procedure
> +contains exactly one call to a procedure in the SCC and
> +that call is a tail call, inline the callee at every one of those
> +call sites.
> +

In this and maybe the --help text.  It'd be good to tell users what this
means to them.  Eg: it reduces mutual recursion to self tail recursion,
which the MLDS can optimize directly into a loop.


There's only a couple of things I'd like changed before you commit it.  But
I'd also like to know that the "there are the same number of tail recursive
calls as procedures" check is sufficient, or there's other checks elsewhere
that I missed.

Thanks, once you've committed I'll merge/re-write my change based on this
(your review was first).


-- 
Paul Bone
http://paul.bone.id.au
Previous message: [m-rev.] For review: --inline-linear-tail-rec-sccs
Next message: [m-rev.] For review: --inline-linear-tail-rec-sccs
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list