[m-rev.] for review: set optimization options just once

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun Oct 11 21:18:40 AEDT 2020


2020-10-11 21:14 GMT+11:00 "Zoltan Somogyi" <zoltan.somogyi at runbox.com>:
> 
> 2020-10-11 21:06 GMT+11:00 "Julien Fischer" <jfischer at opturion.com>:
>> On Sun, 11 Oct 2020, Julien Fischer wrote:
>> 
>>> On Sun, 11 Oct 2020, Zoltan Somogyi wrote:
>>>
>>>>  For review by Julien, since it is an attempt to fix a bug he reported,
>>>>  Mantis 522. The optimization improvement mentioned in the log message
>>>>  has bootchecked, but I haven't tried it out on this code yet.
>>>
>>> I will try it out after you commit.
>> 
>> It helps, but it's still not sufficient. 
> 
> Yes, I know, and I said so.
> 
>> Shifting some of the field
>> updates in a separate predicate (as in the diff below) avoids the limit.
>> I'll commit that unless there are any objections.
> 
> I object. That is a crude workaround, while the attached diff
> is a more elegant solution, though it needs to be in the installed
> compiler, not just in the workspace. I intend to commit this diff
> as soon as a final bootcheck is done.

The first send of this was blocked by gmail due to a "potential security
issue", so I am including the attachments in the text.

Zoltan.

Standardize arg vars in common struct optimization.

compiler/common.m:
    When processing a construction unification, replace each argument variable
    with the lowest numbered (and therefore earliest defined) variable in its
    equivalence class. The effect of this is that for source code such as

        !S ^ f1 = F1,
        !S ^ f2 = F2
     
    where S has four fields, while we used to generate HLDS like this:
     
        ( % removable barrier scope
            S0 = struct(_V11, V12, V13, V14),
            S1 = struct(  F1, V12, V13, V14)
        ),
        ( % removable barrier scope
            S1 = struct(V_21, _V22, V23, V24),
            S2 = struct(V_21,   F2, V13, V14)
        ),

    we now generate HLDS whose last goal is this:

            S2 = struct(  F1,   F2, V13, V14)

    By replacing V_21 with F1, we avoid keeping S1 alive, allowing
    other optimizations to delete its its construction, saving the allocation
    and filling in of a heap cell.

    To avoid increasing the number of variables that have to be saved
    on the stack, either across calls or at other stack flushing points,
    we choose the lowest numbered equivalent variable among only the variables
    that we have seen since the last stack flush. This requires recording
    which variables we have seen since the last stack flush.

    Rename common_info_clear_structs as common_info_stack_flush,
    since we now clear more than just a set of structs at each stack flush.

    Put a cheap but virtually-never-fails test after a more expensive test
    that usually fails.

compiler/simplify_goal.m:
    Conform to the predicate rename in common.m.

compiler/simplify_proc.m:
    Raise the threshold at which we turn off common struct optimization,
    to avoid disabling it on convert_options_to_globals.

compiler/simplify_goal_conj.m:
    Add a trace goal to help debug similar issues.

compiler/hlds_out_goal.m:
    Add a utility predicate for simplify_goal_conj.m.

diff --git a/compiler/common.m b/compiler/common.m
index fbb613aca..e6c8adfd9 100644
--- a/compiler/common.m
+++ b/compiler/common.m
@@ -112,9 +112,11 @@
 :- type common_info.
 :- func common_info_init = common_info.
 
-    % Clear the list of structs seen since the last stack flush.
+    % Handle the effects of a stack flush. Specifically, clear both
+    % the list of structs, and the set of variables, seen since
+    % the last stack flush.
     %
-:- pred common_info_clear_structs(common_info::in, common_info::out) is det.
+:- pred common_info_stack_flush(common_info::in, common_info::out) is det.
 
     % If we find a construction that constructs a cell identical to one we
     % have seen before, replace the construction with an assignment from the
@@ -181,6 +183,7 @@
 :- import_module map.
 :- import_module pair.
 :- import_module require.
+:- import_module set.
 :- import_module term.
 
 %---------------------------------------------------------------------------%
@@ -238,6 +241,7 @@
                 var_eqv                 :: eqvclass(prog_var),
                 all_structs             :: struct_map,
                 since_call_structs      :: struct_map,
+                since_call_vars         :: set_of_progvar,
                 seen_calls              :: seen_calls
             ).
 
@@ -297,11 +301,14 @@
 common_info_init = CommonInfo :-
     eqvclass.init(VarEqv0),
     map.init(StructMap0),
+    set_of_var.init(SinceCallVars0),
     map.init(SeenCalls0),
-    CommonInfo = common_info(VarEqv0, StructMap0, StructMap0, SeenCalls0).
+    CommonInfo = common_info(VarEqv0, StructMap0, StructMap0, SinceCallVars0,
+        SeenCalls0).
 
-common_info_clear_structs(!Info) :-
-    !Info ^ since_call_structs := map.init.
+common_info_stack_flush(!Info) :-
+    !Info ^ since_call_structs := map.init,
+    !Info ^ since_call_vars := set_of_var.init.
 
 %---------------------------------------------------------------------------%
 
@@ -414,7 +421,11 @@ common_optimise_unification(Unification0, UnifyMode, !GoalExpr, !GoalInfo,
         record_equivalence(Var1, Var2, !Common)
     ;
         Unification0 = complicated_unify(_, _, _)
-    ).
+    ),
+    NonLocals = goal_info_get_nonlocals(!.GoalInfo),
+    SinceCallVars0 = !.Common ^ since_call_vars,
+    set_of_var.union(NonLocals, SinceCallVars0, SinceCallVars),
+    !Common ^ since_call_vars := SinceCallVars.
 
 :- pred common_optimise_construct(prog_var::in, cons_id::in,
     list(prog_var)::in, mer_inst::in,
@@ -431,15 +442,18 @@ common_optimise_construct(Var, ConsId, ArgVars, LVarFinalInst,
         ArgVars, ArgVarIds, VarEqv0, VarEqv1),
     AllStructMap0 = !.Common ^ all_structs,
     ( if
+        map.search(AllStructMap0, TypeCtor, ConsIdMap0),
+        map.search(ConsIdMap0, ConsId, Structs),
+        find_matching_cell_construct(Structs, VarEqv1, ArgVarIds, OldStruct),
+
         % generate_assign assumes that the output variable is in the
         % instmap_delta, which will not be true if the variable is local
         % to the unification. The optimization is pointless in that case.
+        % This test is after find_matching_cell_construct, because that call
+        % is *much* more likely to fail than this test, even though it is
+        % also significantly more expensive.
         InstMapDelta = goal_info_get_instmap_delta(GoalInfo0),
-        instmap_delta_search_var(InstMapDelta, Var, _),
-
-        map.search(AllStructMap0, TypeCtor, ConsIdMap0),
-        map.search(ConsIdMap0, ConsId, Structs),
-        find_matching_cell_construct(Structs, VarEqv1, ArgVarIds, OldStruct)
+        instmap_delta_search_var(InstMapDelta, Var, _)
     then
         OldStruct = structure(OldVar, _),
         eqvclass.ensure_equivalence(Var, OldVar, VarEqv1, VarEqv),
@@ -461,12 +475,132 @@ common_optimise_construct(Var, ConsId, ArgVars, LVarFinalInst,
             simplify_info_incr_cost_delta(Cost, !Info)
         )
     else
+        common_standardize_and_record_construct(Var, TypeCtor, ConsId, ArgVars,
+            VarEqv1, GoalExpr0, GoalExpr, GoalInfo0, GoalInfo, !Common, !Info)
+    ).
+
+    % The purpose of this predicate is to short-circuit variable-to-variable
+    % equivalences in structure arguments.
+    %
+    % The kind of situation where this matters is a sequence of
+    % updates to various fields of a structure. Consider the code
+    %
+    %   !S ^ f1 = F1,
+    %   !S ^ f2 = F2
+    %
+    % where S has four fields. The compiler represents those two lines as
+    %
+    %   ( % removable barrier scope
+    %       S0 = struct(_V11, V12, V13, V14),
+    %       S1 = struct(  F1, V12, V13, V14)
+    %   ),
+    %   ( % removable barrier scope
+    %       S1 = struct(V_21, _V22, V23, V24),
+    %       S2 = struct(V_21,   F2, V13, V14)
+    %   ),
+    %
+    % The compiler knows that V_21 is equivalent to F1, since both
+    % occur in the same place, the first argument of S1. But as long as
+    % the first argument of S2 is recorded as V_21, the compiler will
+    % need to keep the goal that defines V_21, the deconstruction of S1,
+    % which means that it also needs to keep the *construction* of S1.
+    % This means that the compiler cannot optimize a sequence of field
+    % assignments into the single construction of a new cell with all
+    % the updated field values.
+    %
+    % We handle this by replacing each argument variable in a construction
+    % unification with the lowest-numbered (and therefore earliest-introduced)
+    % variable in its equivalence class (but see next paragraph). That means
+    % that we would make the first argument of S2 be F1, not V_21. And since
+    % we know that V23 and V24 are equivalent to V13 and V14 respectively
+    % (due to the appearance in the third and fourth slots of S1), the args
+    % from which we construct S2 would be F1, F2, V13 and V14.
+    %
+    % There is one qualification to the above. When we look for the lowest
+    % numbered variable in the argument variable's equivalence class,
+    % we confine our attention to the variables that we have seen since
+    % the last call. This is because reading the value of a variable
+    % that we last saw before a call will require the code generator
+    % to save the value of that variable on the stack, which has costs
+    % of its own. Between (a) saving the values of three fields in stack slots
+    % and later loading those values from their stack slots, and (b) saving
+    % just the cell variable on the stack, and later loading it from the stack
+    % and then reading the fields from the heap, (b) is almost certainly
+    % faster, since it does 1 store and 3 loads vs 3 stores and 3 loads.
+    % When reusing just one or two fields, the difference almost certainly
+    % going to be minor, and its direction (which approach is better) will
+    % probably depend on information we don't have right now. I (zs) think
+    % that not requiring extra variables to be stored in stack slots is
+    % probably the better approach overall.
+    %
+:- pred common_standardize_and_record_construct(prog_var::in, type_ctor::in,
+    cons_id::in, list(prog_var)::in, eqvclass(prog_var)::in,
+    hlds_goal_expr::in, hlds_goal_expr::out,
+    hlds_goal_info::in, hlds_goal_info::out,
+    common_info::in, common_info::out,
+    simplify_info::in, simplify_info::out) is det.
+
+common_standardize_and_record_construct(Var, TypeCtor, ConsId, ArgVars, VarEqv,
+        GoalExpr0, GoalExpr, GoalInfo0, GoalInfo, !Common, !Info) :-
+    SinceCallVars = !.Common ^ since_call_vars,
+    list.map(find_representative(SinceCallVars, VarEqv),
+        ArgVars, ArgRepnVars),
+    ( if
+        ArgRepnVars = ArgVars
+    then
         GoalExpr = GoalExpr0,
-        GoalInfo = GoalInfo0,
-        Struct = structure(Var, ArgVars),
-        record_cell_in_maps(TypeCtor, ConsId, Struct, VarEqv1, !Common)
+        GoalInfo = GoalInfo0
+    else if
+        GoalExpr0 = unify(Var, RHS0, UnifyMode, Unification0, Ctxt),
+        RHS0 = rhs_functor(ConsId, IsExistConstr, ArgVars),
+        Unification0 = construct(Var, ConsId, ArgVars, ArgModes, How,
+            Uniq, SubInfo)
+    then
+        Unification = construct(Var, ConsId, ArgRepnVars, ArgModes, How,
+            Uniq, SubInfo),
+        RHS = rhs_functor(ConsId, IsExistConstr, ArgRepnVars),
+        GoalExpr = unify(Var, RHS, UnifyMode, Unification, Ctxt),
+        set_of_var.list_to_set([Var | ArgRepnVars], NonLocals),
+        goal_info_set_nonlocals(NonLocals, GoalInfo0, GoalInfo),
+        !Common ^ var_eqv := VarEqv,
+        simplify_info_set_should_requantify(!Info)
+    else
+        unexpected($pred, "GoalExpr0 has unexpected shape")
+    ),
+    Struct = structure(Var, ArgRepnVars),
+    record_cell_in_maps(TypeCtor, ConsId, Struct, VarEqv, !Common).
+
+%---------------------%
+
+    % Given a variable, return the lowest numbered variable in its
+    % equivalence class that we have seen since the last stack flush.
+    % See the comment on common_standardize_and_record_construct
+    % for the reason why we do this.
+    % 
+:- pred find_representative(set_of_progvar::in,
+    eqvclass(prog_var)::in, prog_var::in, prog_var::out) is det.
+
+find_representative(SinceCallVars, VarEqv, Var, RepnVar) :-
+    EqvVarsSet = get_equivalent_elements(VarEqv, Var),
+    set.to_sorted_list(EqvVarsSet, EqvVars),
+    ( if find_representative_loop(SinceCallVars, EqvVars, RepnVarPrime) then
+        RepnVar = RepnVarPrime
+    else
+        RepnVar = Var
+    ).
+
+:- pred find_representative_loop(set_of_progvar::in, list(prog_var)::in,
+    prog_var::out) is semidet.
+
+find_representative_loop(SinceCallVars, [Var | Vars], RepnVar) :-
+    ( if set_of_var.contains(SinceCallVars, Var) then
+        RepnVar = Var
+    else
+        find_representative_loop(SinceCallVars, Vars, RepnVar)
     ).
 
+%---------------------------------------------------------------------------%
+
 :- pred common_optimise_deconstruct(prog_var::in, cons_id::in,
     list(prog_var)::in, list(unify_mode)::in, can_fail::in,
     hlds_goal_expr::in, hlds_goal_expr::out,
diff --git a/compiler/hlds_out_goal.m b/compiler/hlds_out_goal.m
index 67c4e90ec..fa432d74f 100644
--- a/compiler/hlds_out_goal.m
+++ b/compiler/hlds_out_goal.m
@@ -45,6 +45,12 @@
 :- pred dump_goal(module_info::in, prog_varset::in, hlds_goal::in,
     io::di, io::uo) is det.
 
+    % Print a goal in a way that is suitable for debugging the compiler
+    % (but necessarily for anything else), followed by a newline.
+    %
+:- pred dump_goal_nl(module_info::in, prog_varset::in, hlds_goal::in,
+    io::di, io::uo) is det.
+
     % Print out an HLDS goal. The module_info and prog_varset give
     % the context of the goal. The integer gives the level of indentation
     % to be used within the goal. The string says what should end the line
@@ -163,6 +169,10 @@ dump_goal(ModuleInfo, VarSet, Goal, !IO) :-
     do_write_goal(Info, ModuleInfo, VarSet, TypeQual, VarNamePrint,
         Indent, Follow, Goal, !IO).
 
+dump_goal_nl(ModuleInfo, VarSet, Goal, !IO) :-
+    dump_goal(ModuleInfo, VarSet, Goal, !IO),
+    io.nl(!IO).
+
 write_goal(Info, ModuleInfo, VarSet, VarNamePrint, Indent, Follow, Goal,
         !IO) :-
     % Don't type qualify everything.
diff --git a/compiler/simplify_goal.m b/compiler/simplify_goal.m
index 5d1ca9e5c..11538bf7e 100644
--- a/compiler/simplify_goal.m
+++ b/compiler/simplify_goal.m
@@ -273,10 +273,10 @@ simplify_goal(Goal0, Goal, NestedContext0, InstMap0, !Common, !Info) :-
         Goal3 = Goal2
     ),
     Goal3 = hlds_goal(GoalExpr3, GoalInfo3),
-    maybe_clear_common_structs(before, GoalExpr3, !.Info, !Common),
+    maybe_handle_stack_flush(before, GoalExpr3, !.Info, !Common),
     simplify_goal_expr(GoalExpr3, GoalExpr4, GoalInfo3, GoalInfo4,
         NestedContext, InstMap0, !Common, !Info),
-    maybe_clear_common_structs(after, GoalExpr4, !.Info, !Common),
+    maybe_handle_stack_flush(after, GoalExpr4, !.Info, !Common),
     enforce_unreachability_invariant(GoalInfo4, GoalInfo, !Info),
     Goal = hlds_goal(GoalExpr4, GoalInfo).
 
@@ -374,16 +374,19 @@ simplify_goal_expr(!GoalExpr, !GoalInfo, NestedContext0,
     % When doing deforestation, it may be better to remove
     % as many common structures as possible.
     %
-:- pred maybe_clear_common_structs(before_after::in, hlds_goal_expr::in,
+    % Clear the set of variables seen since the last stack flush,
+    % for the same reason.
+    %
+:- pred maybe_handle_stack_flush(before_after::in, hlds_goal_expr::in,
     simplify_info::in, common_info::in, common_info::out) is det.
 
-maybe_clear_common_structs(BeforeAfter, GoalExpr, Info, !Common) :-
+maybe_handle_stack_flush(BeforeAfter, GoalExpr, Info, !Common) :-
     ( if
         simplify_do_common_struct(Info),
         not simplify_do_extra_common_struct(Info),
         will_flush(GoalExpr, BeforeAfter) = yes
     then
-        common_info_clear_structs(!Common)
+        common_info_stack_flush(!Common)
     else
         true
     ).
diff --git a/compiler/simplify_goal_conj.m b/compiler/simplify_goal_conj.m
index ddc1b25da..0f397625a 100644
--- a/compiler/simplify_goal_conj.m
+++ b/compiler/simplify_goal_conj.m
@@ -45,6 +45,8 @@
 
 :- import_module check_hlds.simplify.simplify_goal.
 :- import_module hlds.goal_util.
+:- import_module hlds.hlds_out.
+:- import_module hlds.hlds_out.hlds_out_goal.
 :- import_module hlds.hlds_pred.
 :- import_module hlds.hlds_rtti.
 :- import_module hlds.make_goal.
@@ -105,6 +107,18 @@ simplify_goal_plain_conj(Goals0, GoalExpr, GoalInfo0, GoalInfo,
             GoalExpr = conj(plain_conj, Goals)
         ),
         GoalInfo = GoalInfo0
+    ),
+    trace [compile_time(flag("debug_simplify_conj")), io(!IO)] (
+        simplify_info_get_module_info(!.Info, ModuleInfo),
+        simplify_info_get_varset(!.Info, VarSet),
+        io.write_string("\n------------------------\n", !IO),
+        io.write_string("\nBEFORE SIMPLIFY_GOAL_PLAIN_CONJ\n\n", !IO),
+        list.foldl(dump_goal_nl(ModuleInfo, VarSet), Goals0, !IO),
+        io.write_string("\nAFTER EXCESS ASSIGN\n\n", !IO),
+        list.foldl(dump_goal_nl(ModuleInfo, VarSet), Goals1, !IO),
+        io.write_string("\nAFTER SIMPLIFY_CONJ\n\n", !IO),
+        list.foldl(dump_goal_nl(ModuleInfo, VarSet), Goals, !IO),
+        io.flush_output(!IO)
     ).
 
 :- pred contains_multisoln_goal(list(hlds_goal)::in) is semidet.
diff --git a/compiler/simplify_proc.m b/compiler/simplify_proc.m
index 93f6b3205..5ab3ad164 100644
--- a/compiler/simplify_proc.m
+++ b/compiler/simplify_proc.m
@@ -331,10 +331,6 @@ simplify_proc_maybe_vary_parameters(ModuleInfo, PredId, ProcInfo,
     proc_info_get_vartypes(ProcInfo, VarTypes0),
     vartypes_count(VarTypes0, NumVars),
     ( if NumVars > turn_off_common_struct_threshold then
-        % If we have too many variables, common_struct takes so long that
-        % either the compiler runs out of memory or the user runs out of
-        % patience. The fact that we would generate better code if the
-        % compilation finished is therefore of limited interest.
         !SimplifyTasks ^ do_common_struct := do_not_opt_common_structs
     else
         true
@@ -361,9 +357,26 @@ simplify_proc_maybe_vary_parameters(ModuleInfo, PredId, ProcInfo,
         )
     ).
 
+    % If we have too many variables, common_struct used to take so long that
+    % either the compiler runs out of memory, or the user runs out of patience.
+    % In such cases, the fact that we would generate better code if the
+    % compilation finished is therefore of limited interest.
+    %
+    % However, since this limit was first imposed, we have optimized
+    % the compiler's infrastructure for such things, e.g. by using much more
+    % compact representations for sets of variables, which permit much faster
+    % operations on them. These changes do not eliminate the danger described
+    % above completely, but they do raise the threshold at which they can
+    % appear.
+    %
+    % As of 2020 october 11, the code of convert_options_to_globals in
+    % handle_options.m has just shy of 9,000 variables at the time of the
+    % first simplify pass (HLDS dump stage 65). The compiler can handle that
+    % easily, so the setting below allows for some growth.
+    % 
 :- func turn_off_common_struct_threshold = int.
 
-turn_off_common_struct_threshold = 1000.
+turn_off_common_struct_threshold = 12000.
 
 :- pred simplify_proc_maybe_mark_modecheck_clauses(
     proc_info::in, proc_info::out) is det.


More information about the reviews mailing list