[m-rev.] for post-commit review: Fix a number of dependent parallel conjunction and loop control bugs.

Paul Bone pbone at csse.unimelb.edu.au
Fri Oct 21 11:15:58 AEDT 2011


For post-commit review by Zoltan,

There are probably nicer ways to fix some problems, for example my live_vars.m
change.  But I'm not worried about that until after the paper deadline.

---

Fix a number of bugs in dependent parallel conjunctions and loop control.

Don't transform left-recursive parallel loops into right-recursive loops when
loop control is enabled.

compiler/par_conj_gen.m:
    If a loop control scope instantiates a non-local variable that is not
    protected by a future ensure that it has a stack slot allocated, and that
    the code_info state (used by the code generator) knows where this variable
    will be on the stack so that once it is needed it can be used.
    An example of this is the variable Y in list.map.

    Use a correctly-sized (but not compressed) stack frame for the spawned off
    code.

    Fix a silly typeo that prevented get_future goals from being added when
    they where needed.

    Tidy up some code.

compiler/code_util.m:
compiler/opt_util.m:
    Move instr_get_rvals_and_lvals from opt_util.m to code_util.m and export it
    so that it can be used by par_conj_gen.  Modify this predicate so that it
    stores rvals and lvals in sets, hopefully reducing the number of rvals and
    lvals that need to be represented.

compiler/dep_par_conj.m:
    Because left recursion has historically been faster than right recursion,
    we used to detect when it was possible to transform right into left
    recursion (by swapping the conjuncts in a parallel conjunction).  Now that
    loop control is effective we disable this hack when loop control is
    enabled.

    This change made it easy for me to test non tail-recursive loop control
    cases like list.map

compiler/live_vars.m:
    When a parallel conjunction is transformed into a loop control scope and
    then liveness analysis is applied it mis-calculates the death of variables
    that die in the loop control scope.

    map_foldl(M, F, [X | Xs], !Acc) :-
        spawn_off(
            M(X, Y),
            F(Y, !Acc)
        ),
        ...
        some_other_code_that_needs_stack_slots,
        ...
        map_foldl(M, F, Xs, !Acc).

    X dies after the call to M (post death), Similarly Y dies after the call to
    F, depending on dep_par_conj.m Y may also need a stack slot.  However,
    since M is executing in parallel with
    some_other_code_that_needs_stack_slots, the stack slots used by X and Y are
    still being used by the spawned off code.  And they may continue to be used
    up to the recursive call.  It is okay if they die within the spawned off
    scope, but according to the some_other_code_that_needs_stack_slots, they
    need to be alive.  so that they have correctly allocated stack slots.  They
    may then die after the recursive call (at which point the barrier for the
    spawned off code will have completed).

    live_vars.m therefore tracks these variables and delays their death (after
    resurrecting them at the end of the scope) until the recursive call.

    Part of this should probably be handled in liveness.m, however the loop
    control scope will still need to resurrect these variables.  Additionally
    there is already code in live_vars to recognize loop control scopes and
    their implicit barriers.

Index: compiler/code_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/code_util.m,v
retrieving revision 1.191
diff -u -p -b -r1.191 code_util.m
--- compiler/code_util.m	17 Oct 2011 04:31:28 -0000	1.191
+++ compiler/code_util.m	21 Oct 2011 00:11:11 -0000
@@ -29,6 +29,7 @@
 :- import_module list.
 :- import_module maybe.
 :- import_module pair.
+:- import_module set.
 
 %-----------------------------------------------------------------------------%
 
@@ -89,6 +90,14 @@
 
 :- func size_of_cell_args(list(cell_arg)) = int.
 
+    % Determine all the rvals and lvals referenced by an instruction.
+    %
+:- pred instr_rvals_and_lvals(instr::in, set(rval)::out, set(lval)::out)
+    is det.
+
+:- pred instrs_rvals_and_lvals(list(instruction)::in, set(rval)::out,
+    set(lval)::out) is det.
+
 %---------------------------------------------------------------------------%
 %---------------------------------------------------------------------------%
 
@@ -446,5 +455,169 @@ size_of_cell_args([CellArg | CellArgs]) 
     Sizes = size_of_cell_args(CellArgs).
 
 %-----------------------------------------------------------------------------%
+
+instr_rvals_and_lvals(comment(_), set.init, set.init).
+instr_rvals_and_lvals(livevals(_), set.init, set.init).
+instr_rvals_and_lvals(block(_, _, Instrs), Rvals, Lvals) :-
+    instrs_rvals_and_lvals(Instrs, Rvals, Lvals).
+instr_rvals_and_lvals(assign(Lval,Rval), make_singleton_set(Rval),
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(keep_assign(Lval,Rval), make_singleton_set(Rval),
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(llcall(_, _, _, _, _, _), set.init, set.init).
+instr_rvals_and_lvals(mkframe(_, _), set.init, set.init).
+instr_rvals_and_lvals(label(_), set.init, set.init).
+instr_rvals_and_lvals(goto(_), set.init, set.init).
+instr_rvals_and_lvals(computed_goto(Rval, _), make_singleton_set(Rval),
+    set.init).
+instr_rvals_and_lvals(arbitrary_c_code(_, _, _), set.init, set.init).
+instr_rvals_and_lvals(if_val(Rval, _), make_singleton_set(Rval), set.init).
+instr_rvals_and_lvals(save_maxfr(Lval), set.init, make_singleton_set(Lval)).
+instr_rvals_and_lvals(restore_maxfr(Lval), set.init, make_singleton_set(Lval)).
+instr_rvals_and_lvals(incr_hp(Lval, _, _, SizeRval, _, _, MaybeRegionRval,
+        MaybeReuse), Rvals, Lvals) :-
+    some [!Rvals, !Lvals] (
+        !:Rvals = make_singleton_set(SizeRval),
+        !:Lvals = make_singleton_set(Lval),
+        (
+            MaybeRegionRval = yes(RegionRval),
+            set.insert(RegionRval, !Rvals)
+        ;
+            MaybeRegionRval = no
+        ),
+        (
+            MaybeReuse = llds_reuse(ReuseRval, MaybeFlagLval),
+            set.insert(ReuseRval, !Rvals),
+            (
+                MaybeFlagLval = yes(FlagLval),
+                set.insert(FlagLval, !Lvals)
+            ;
+                MaybeFlagLval = no
+            )
+        ;
+            MaybeReuse = no_llds_reuse
+        ),
+        Rvals = !.Rvals,
+        Lvals = !.Lvals
+    ).
+instr_rvals_and_lvals(mark_hp(Lval), set.init, make_singleton_set(Lval)).
+instr_rvals_and_lvals(restore_hp(Rval), make_singleton_set(Rval), set.init).
+instr_rvals_and_lvals(free_heap(Rval), make_singleton_set(Rval), set.init).
+    % The region instructions implicitly specify some stackvars or framevars,
+    % but they cannot reference lvals or rvals that involve code addresses or
+    % labels, and that is the motivation of the reason this code was originally
+    % written.
+    % More recently code generation for loop_control scopes uses this
+    % predicate, but it is not likly to be used with rbmm.
+instr_rvals_and_lvals(push_region_frame(_, _), set.init, set.init).
+instr_rvals_and_lvals(region_fill_frame(_, _, IdRval, NumLval, AddrLval),
+    make_singleton_set(IdRval), list_to_set([NumLval, AddrLval])).
+instr_rvals_and_lvals(region_set_fixed_slot(_, _, ValueRval),
+    make_singleton_set(ValueRval), set.init).
+instr_rvals_and_lvals(use_and_maybe_pop_region_frame(_, _), set.init,
+    set.init).
+instr_rvals_and_lvals(store_ticket(Lval), set.init, make_singleton_set(Lval)).
+instr_rvals_and_lvals(reset_ticket(Rval, _Reason), make_singleton_set(Rval),
+    set.init).
+instr_rvals_and_lvals(discard_ticket, set.init, set.init).
+instr_rvals_and_lvals(prune_ticket, set.init, set.init).
+instr_rvals_and_lvals(mark_ticket_stack(Lval), set.init,
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(prune_tickets_to(Rval), make_singleton_set(Rval),
+    set.init).
+instr_rvals_and_lvals(incr_sp(_, _, _), set.init, set.init).
+instr_rvals_and_lvals(decr_sp(_), set.init, set.init).
+instr_rvals_and_lvals(decr_sp_and_return(_), set.init, set.init).
+instr_rvals_and_lvals(foreign_proc_code(_, Cs, _, _, _, _, _, _, _, _),
+        list_to_set(Rvals), list_to_set(Lvals)) :-
+    foreign_proc_components_get_rvals_and_lvals(Cs, Rvals, Lvals).
+instr_rvals_and_lvals(init_sync_term(Lval, _, _), set.init,
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(fork_new_child(Lval, _), set.init,
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(join_and_continue(Lval, _), set.init,
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(lc_create_loop_control(_, Lval), set.init,
+    make_singleton_set(Lval)).
+instr_rvals_and_lvals(lc_wait_free_slot(Rval, Lval, _),
+    make_singleton_set(Rval), make_singleton_set(Lval)).
+instr_rvals_and_lvals(lc_spawn_off(LCRval, LCSRval, _),
+    list_to_set([LCRval, LCSRval]), set.init).
+instr_rvals_and_lvals(lc_join_and_terminate(LCRval, LCSRval),
+    list_to_set([LCRval, LCSRval]), set.init).
+
+    % Determine all the rvals and lvals referenced by a list of instructions.
+    %
+instrs_rvals_and_lvals(Instrs, Rvals, Lvals) :-
+    foldl2(instrs_rvals_and_lvals_acc, Instrs, set.init, Rvals,
+        set.init, Lvals).
+
+:- pred instrs_rvals_and_lvals_acc(instruction::in,
+    set(rval)::in, set(rval)::out, set(lval)::in, set(lval)::out) is det.
+
+instrs_rvals_and_lvals_acc(llds_instr(Uinstr, _), !Rvals, !Lvals) :-
+    instr_rvals_and_lvals(Uinstr, NewRvals, NewLvals),
+    % The accumulator is the first argument since that suits the performance
+    % charicteristics of set.union.
+    set.union(!.Rvals, NewRvals, !:Rvals),
+    set.union(!.Lvals, NewLvals, !:Lvals).
+
+    % Extract the rvals and lvals from the foreign_proc_components.
+    %
+:- pred foreign_proc_components_get_rvals_and_lvals(
+    list(foreign_proc_component)::in,
+    list(rval)::out, list(lval)::out) is det.
+
+foreign_proc_components_get_rvals_and_lvals([], [], []).
+foreign_proc_components_get_rvals_and_lvals([Comp | Comps], !:Rvals, !:Lvals) :-
+    foreign_proc_components_get_rvals_and_lvals(Comps, !:Rvals, !:Lvals),
+    foreign_proc_component_get_rvals_and_lvals(Comp, !Rvals, !Lvals).
+
+    % Extract the rvals and lvals from the foreign_proc_component
+    % and add them to the list.
+    %
+:- pred foreign_proc_component_get_rvals_and_lvals(foreign_proc_component::in,
+    list(rval)::in, list(rval)::out, list(lval)::in, list(lval)::out) is det.
+
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_inputs(Inputs),
+        !Rvals, !Lvals) :-
+    NewRvals = foreign_proc_inputs_get_rvals(Inputs),
+    list.append(NewRvals, !Rvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_outputs(Outputs),
+        !Rvals, !Lvals) :-
+    NewLvals = foreign_proc_outputs_get_lvals(Outputs),
+    list.append(NewLvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_user_code(_, _, _),
+        !Rvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_raw_code(_, _, _, _),
+        !Rvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_fail_to(_),
+        !Rvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_alloc_id(_),
+        !Rvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_noop,
+        !Rvals, !Lvals).
+
+    % Extract the rvals from the foreign_proc_input.
+    %
+:- func foreign_proc_inputs_get_rvals(list(foreign_proc_input)) = list(rval).
+
+foreign_proc_inputs_get_rvals([]) = [].
+foreign_proc_inputs_get_rvals([Input | Inputs]) = [Rval | Rvals] :-
+    Input = foreign_proc_input(_Name, _VarType, _IsDummy, _OrigType, Rval,
+        _, _),
+    Rvals = foreign_proc_inputs_get_rvals(Inputs).
+
+    % Extract the lvals from the foreign_proc_output.
+    %
+:- func foreign_proc_outputs_get_lvals(list(foreign_proc_output)) = list(lval).
+
+foreign_proc_outputs_get_lvals([]) = [].
+foreign_proc_outputs_get_lvals([Output | Outputs]) = [Lval | Lvals] :-
+    Output = foreign_proc_output(Lval, _VarType, _IsDummy, _OrigType,
+        _Name, _, _),
+    Lvals = foreign_proc_outputs_get_lvals(Outputs).
+
+%-----------------------------------------------------------------------------%
 :- end_module ll_backend.code_util.
 %-----------------------------------------------------------------------------%
Index: compiler/dep_par_conj.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/dep_par_conj.m,v
retrieving revision 1.60
diff -u -p -b -r1.60 dep_par_conj.m
--- compiler/dep_par_conj.m	19 Oct 2011 01:08:28 -0000	1.60
+++ compiler/dep_par_conj.m	21 Oct 2011 00:11:11 -0000
@@ -480,11 +480,21 @@ maybe_sync_dep_par_conj(Conjuncts, GoalI
     ( set_of_var.is_empty(SharedVars) ->
         % Independant parallel conjunctions can somtimes be re-ordered to
         % generate faster code.
+        module_info_get_globals(ModuleInfo0, Globals),
+        globals.lookup_bool_option(Globals, par_loop_control, ParLoopControl),
+        (
+            ParLoopControl = no,
         reorder_indep_par_conj(PredProcId, VarTypes0, InstMap, Conjuncts,
             GoalInfo, NewGoal, ModuleInfo0, ModuleInfo),
         !:SyncInfo = sync_info(ModuleInfo, IgnoreVars, AllowSomePathsOnly,
             VarSet0, VarTypes0, PredProcId, TSStringTable0)
     ;
+            ParLoopControl = yes,
+            % Don't swap the conjuncts, parallel loop control can do a better
+            % job of optimizing this code.
+            NewGoal = hlds_goal(conj(parallel_conj, Conjuncts), GoalInfo)
+        )
+    ;
         sync_dep_par_conj(ModuleInfo0, AllowSomePathsOnly, SharedVars,
             Conjuncts, GoalInfo, NewGoal, InstMap,
             VarSet0, VarSet, VarTypes0, VarTypes, TSStringTable0,
@@ -1953,7 +1963,7 @@ maybe_specialize_call_and_goals(RevGoals
                 % signalled or waited variable.  If so, add `get' goals after
                 % the transformed goal to make sure the variable is bound.
                 PushedPairs = PushedSignalPairs ++ PushedWaitPairs,
-                list.filter(should_add_get_goal(NonLocals, RevGoals1),
+                list.filter(should_add_get_goal(NonLocals, FwdGoals1),
                     PushedPairs, PushedPairsNeedGets),
                 VarTypes = !.SpecInfo ^ spec_vartypes,
                 list.map(make_get_goal(!.SpecInfo ^ spec_module_info, VarTypes),
@@ -2225,7 +2235,7 @@ number_future_args(ArgNo, [Arg | Args], 
 :- pred should_add_get_goal(set_of_progvar::in, list(hlds_goal)::in,
     future_var_pair::in) is semidet.
 
-should_add_get_goal(NonLocals, RevGoals, future_var_pair(_, Var)) :-
+should_add_get_goal(NonLocals, FwdGoals, future_var_pair(_, Var)) :-
     (
         % If the variable is in the nonlocals set of the entire conjunction
         % then we need to add a get goal, because that means that a goal
@@ -2239,7 +2249,7 @@ should_add_get_goal(NonLocals, RevGoals,
         % we're adding a get_future call that does not make sense.  I'm
         % assuming that only free -> ground instantiation state changes are
         % allowed for these variables.
-        member(Goal, RevGoals),
+        member(Goal, FwdGoals),
         GoalNonLocals = goal_get_nonlocals(Goal),
         set_of_var.contains(GoalNonLocals, Var)
     ).
Index: compiler/live_vars.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/live_vars.m,v
retrieving revision 1.151
diff -u -p -b -r1.151 live_vars.m
--- compiler/live_vars.m	19 Oct 2011 01:08:29 -0000	1.151
+++ compiler/live_vars.m	21 Oct 2011 00:11:11 -0000
@@ -417,12 +417,22 @@ build_live_sets_in_goal_2(GoalExpr0, Goa
                 !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
             par_stack_vars_get_stackvars(!.ParStackVars, InnerStackVars),
 
+            NeedInParConjSet = LCStackVars `set_of_var.union` InnerStackVars,
             NeedInParConj =
-                need_in_par_conj(LCStackVars `set_of_var.union` InnerStackVars),
+                need_in_par_conj(NeedInParConjSet),
             record_par_conj(NeedInParConj, AllocData,
                 GoalInfo0, GoalInfo, !StackAlloc),
 
-            par_stack_vars_end_loop_control(OuterParStackVars, !ParStackVars),
+            % NeedInParConjSet says live, any calls between now and the
+            % recursive call must include this set in the set of stack
+            % variables.
+            WouldDieSet = set_of_var.difference(NeedInParConjSet, !.Liveness),
+            !:Liveness = set_of_var.union(!.Liveness, WouldDieSet),
+            % WouldDieSet are variables that would normally die if this where
+            % not a parallel goal, but we only want them to die after the
+            % recursive call.
+            par_stack_vars_end_loop_control(WouldDieSet, OuterParStackVars,
+                !ParStackVars),
 
             GoalExpr = scope(Reason, SubGoal)
         ;
@@ -492,14 +502,16 @@ build_live_sets_in_goal_2(GoalExpr0, Goa
             % then the recursive call is a barrier for loop control, we have to
             % ensure that spawned off computations use distinct stack slots
             % from one another and the code up to and including this call.
-            par_stack_vars_recursive_call(MaybeNeedLC, !ParStackVars),
+            par_stack_vars_recursive_call(MaybeNeedLC, DelayDeathSet,
+                !ParStackVars),
             (
                 MaybeNeedLC = yes(NeedLC),
                 at_recursive_call_for_loop_control(NeedLC, AllocData,
                     !StackAlloc)
             ;
                 MaybeNeedLC = no
-            )
+            ),
+            !:Liveness = set_of_var.difference(!.Liveness, DelayDeathSet)
         ;
             true
         )
@@ -829,6 +841,11 @@ record_par_conj(NeedInParConj, AllocData
                 % scopes.
                 list(set_of_progvar),
 
+                % The set of variables whose death we must delay until after
+                % the recursive call.  they may still be using their slots on
+                % our stack frame.
+                set_of_progvar,
+
                 % Accumulating set of variables that need stack slots between a
                 % loop control scope and either another loop control scope or a
                 % recursive call.
@@ -880,13 +897,14 @@ par_stack_vars_end_parallel_conjunction(
             `set_of_var.union` (LiveSet `set_of_var.difference` OuterNonLocals),
         ParStackVars = loop_control_scope(OuterNonLocals, StackVars)
     ;
-        OuterParStackVars = after_loop_control_scope(StackVarsList, StackVars0),
+        OuterParStackVars = after_loop_control_scope(StackVarsList,
+            WouldDieSet, StackVars0),
         % In this case we don't have access to an OuterNonLocals set, so this
         % is a conservative approximation.
         StackVars = StackVars0 `set_of_var.union` InnerStackVars
             `set_of_var.union` LiveSet,
         ParStackVars =
-            after_loop_control_scope(StackVarsList, StackVars)
+            after_loop_control_scope(StackVarsList, WouldDieSet, StackVars)
     ).
 
 :- pred par_stack_vars_start_loop_control(set_of_progvar::in,
@@ -896,7 +914,7 @@ par_stack_vars_start_loop_control(NonLoc
         loop_control_scope(NonLocals, set_of_var.init)) :-
     (
         ( ParStackVars0 = not_in_parallel_context
-        ; ParStackVars0 = after_loop_control_scope(_, _)
+        ; ParStackVars0 = after_loop_control_scope(_, _, _)
         )
     ;
         ( ParStackVars0 = parallel_conjunction(_, _, _)
@@ -906,19 +924,24 @@ par_stack_vars_start_loop_control(NonLoc
             "Loop control scope found in other parallel context")
     ).
 
-:- pred par_stack_vars_end_loop_control(parallel_stackvars::in,
+:- pred par_stack_vars_end_loop_control(set_of_progvar::in,
+    parallel_stackvars::in,
     parallel_stackvars::in, parallel_stackvars::out) is det.
 
-par_stack_vars_end_loop_control(OldParStackVars, ParStackVars0, ParStackVars) :-
+par_stack_vars_end_loop_control(NewWouldDieSet, OldParStackVars, ParStackVars0,
+        ParStackVars) :-
     par_stack_vars_get_stackvars(ParStackVars0, NewStackVars),
     (
         OldParStackVars = not_in_parallel_context,
-        ParStackVars = after_loop_control_scope([NewStackVars], set_of_var.init)
+        ParStackVars = after_loop_control_scope([NewStackVars], NewWouldDieSet,
+            set_of_var.init)
     ;
-        OldParStackVars = after_loop_control_scope(StackVarsList, StackVarsAcc),
+        OldParStackVars = after_loop_control_scope(StackVarsList, WouldDieSet0,
+            StackVarsAcc),
+        WouldDieSet = WouldDieSet0 `set_of_var.union` NewWouldDieSet,
         ParStackVars =
             after_loop_control_scope([NewStackVars | StackVarsList],
-                StackVarsAcc)
+                WouldDieSet, StackVarsAcc)
     ;
         ( OldParStackVars = parallel_conjunction(_, _, _)
         ; OldParStackVars = loop_control_scope(_, _)
@@ -936,7 +959,7 @@ par_stack_vars_get_stackvars(parallel_co
         StackVars) :-
     StackVars = set_of_var.union_list(StackVarss).
 par_stack_vars_get_stackvars(loop_control_scope(_, StackVars), StackVars).
-par_stack_vars_get_stackvars(after_loop_control_scope(_, StackVars),
+par_stack_vars_get_stackvars(after_loop_control_scope(_, _, StackVars),
     StackVars).
 
 :- pred par_stack_vars_accumulate_stack_vars(set_of_progvar::in,
@@ -954,8 +977,8 @@ par_stack_vars_accumulate_stack_vars(New
         loop_control_scope(NonLocals, AccStackVars)) :-
     AccStackVars = AccStackVars0 `set_of_var.union` NewStackVars.
 par_stack_vars_accumulate_stack_vars(NewStackVars,
-        after_loop_control_scope(LocalStackVars, AccStackVars0),
-        after_loop_control_scope(LocalStackVars, AccStackVars)) :-
+        after_loop_control_scope(LocalStackVars, WouldDieSet, AccStackVars0),
+        after_loop_control_scope(LocalStackVars, WouldDieSet, AccStackVars)) :-
     AccStackVars = AccStackVars0 `set_of_var.union` NewStackVars.
 
 :- pred par_stack_vars_get_nonlocals(parallel_stackvars::in,
@@ -964,7 +987,7 @@ par_stack_vars_accumulate_stack_vars(New
 par_stack_vars_get_nonlocals(not_in_parallel_context, set_of_var.init).
 par_stack_vars_get_nonlocals(parallel_conjunction(NonLocals, _, _), NonLocals).
 par_stack_vars_get_nonlocals(loop_control_scope(NonLocals, _), NonLocals).
-par_stack_vars_get_nonlocals(after_loop_control_scope(_, _), set_of_var.init).
+par_stack_vars_get_nonlocals(after_loop_control_scope(_, _, _), set_of_var.init).
 
 :- pred par_stack_vars_next_par_conjunct(
     parallel_stackvars::in, parallel_stackvars::out) is det.
@@ -979,19 +1002,22 @@ par_stack_vars_next_par_conjunct(!ParSta
     ).
 
 :- pred par_stack_vars_recursive_call(maybe(need_for_loop_control)::out,
-    parallel_stackvars::in, parallel_stackvars::out) is det.
+    set_of_progvar::out, parallel_stackvars::in, parallel_stackvars::out)
+    is det.
 
-par_stack_vars_recursive_call(MaybeNeedLC, !ParStackVars) :-
+par_stack_vars_recursive_call(MaybeNeedLC, DelayDeathSet, !ParStackVars) :-
     (
         ( !.ParStackVars = not_in_parallel_context
         ; !.ParStackVars = parallel_conjunction(_, _, _)
         ),
-        MaybeNeedLC = no
+        MaybeNeedLC = no,
+        DelayDeathSet = set_of_var.init
     ;
         !.ParStackVars = loop_control_scope(_, _),
         unexpected($module, $pred, "recursive call in loop control scope")
     ;
-        !.ParStackVars = after_loop_control_scope(StackVarsList0, StackVars),
+        !.ParStackVars = after_loop_control_scope(StackVarsList0, DelayDeathSet,
+            StackVars),
         StackVarsList = [StackVars | StackVarsList0],
         cartesian_product_list(StackVarsList, NonoverlapSets),
         MaybeNeedLC = yes(need_for_loop_control(NonoverlapSets)),
Index: compiler/opt_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/opt_util.m,v
retrieving revision 1.180
diff -u -p -b -r1.180 opt_util.m
--- compiler/opt_util.m	17 Oct 2011 04:31:30 -0000	1.180
+++ compiler/opt_util.m	21 Oct 2011 00:11:11 -0000
@@ -324,6 +324,7 @@
 :- import_module check_hlds.type_util.
 :- import_module hlds.hlds_llds.
 :- import_module hlds.special_pred.
+:- import_module ll_backend.code_util.
 :- import_module ll_backend.exprn_aux.
 :- import_module parse_tree.prog_data.
 
@@ -1266,8 +1267,8 @@ can_use_livevals(lc_join_and_terminate(_
 instr_labels(Instr, Labels, CodeAddrs) :-
     instr_labels_2(Instr, Labels0, CodeAddrs1),
     instr_rvals_and_lvals(Instr, Rvals, Lvals),
-    exprn_aux.rval_list_addrs(Rvals, CodeAddrs2, _),
-    exprn_aux.lval_list_addrs(Lvals, CodeAddrs3, _),
+    exprn_aux.rval_list_addrs(to_sorted_list(Rvals), CodeAddrs2, _),
+    exprn_aux.lval_list_addrs(to_sorted_list(Lvals), CodeAddrs3, _),
     CodeAddrs = CodeAddrs1 ++ CodeAddrs2 ++ CodeAddrs3,
     find_label_code_addrs(CodeAddrs, Labels0, Labels).
 
@@ -1505,154 +1506,6 @@ foreign_proc_labels(MaybeFixedLabel, May
         MaybeDefLabel = no
     ).
 
-    % Determine all the rvals and lvals referenced by an instruction.
-    %
-:- pred instr_rvals_and_lvals(instr::in, list(rval)::out, list(lval)::out)
-    is det.
-
-instr_rvals_and_lvals(comment(_), [], []).
-instr_rvals_and_lvals(livevals(_), [], []).
-instr_rvals_and_lvals(block(_, _, Instrs), Labels, CodeAddrs) :-
-    instr_list_rvals_and_lvals(Instrs, Labels, CodeAddrs).
-instr_rvals_and_lvals(assign(Lval,Rval), [Rval], [Lval]).
-instr_rvals_and_lvals(keep_assign(Lval,Rval), [Rval], [Lval]).
-instr_rvals_and_lvals(llcall(_, _, _, _, _, _), [], []).
-instr_rvals_and_lvals(mkframe(_, _), [], []).
-instr_rvals_and_lvals(label(_), [], []).
-instr_rvals_and_lvals(goto(_), [], []).
-instr_rvals_and_lvals(computed_goto(Rval, _), [Rval], []).
-instr_rvals_and_lvals(arbitrary_c_code(_, _, _), [], []).
-instr_rvals_and_lvals(if_val(Rval, _), [Rval], []).
-instr_rvals_and_lvals(save_maxfr(Lval), [], [Lval]).
-instr_rvals_and_lvals(restore_maxfr(Lval), [], [Lval]).
-instr_rvals_and_lvals(incr_hp(Lval, _, _, SizeRval, _, _, MaybeRegionRval,
-        MaybeReuse), Rvals, Lvals) :-
-    some [!Rvals, !Lvals] (
-        !:Rvals = [SizeRval],
-        !:Lvals = [Lval],
-        (
-            MaybeRegionRval = yes(RegionRval),
-            !:Rvals = [RegionRval | !.Rvals]
-        ;
-            MaybeRegionRval = no
-        ),
-        (
-            MaybeReuse = llds_reuse(ReuseRval, MaybeFlagLval),
-            !:Rvals = [ReuseRval | !.Rvals],
-            (
-                MaybeFlagLval = yes(FlagLval),
-                !:Lvals = [FlagLval | !.Lvals]
-            ;
-                MaybeFlagLval = no
-            )
-        ;
-            MaybeReuse = no_llds_reuse
-        ),
-        Rvals = !.Rvals,
-        Lvals = !.Lvals
-    ).
-instr_rvals_and_lvals(mark_hp(Lval), [], [Lval]).
-instr_rvals_and_lvals(restore_hp(Rval), [Rval], []).
-instr_rvals_and_lvals(free_heap(Rval), [Rval], []).
-    % The region instructions implicitly specify some stackvars or framevars,
-    % but they cannot reference lvals or rvals that involve code addresses or
-    % labels, and that is the motivation of the only current invoker of
-    % instr_rvals_and_lvals.
-instr_rvals_and_lvals(push_region_frame(_, _), [], []).
-instr_rvals_and_lvals(region_fill_frame(_, _, IdRval, NumLval, AddrLval),
-    [IdRval], [NumLval, AddrLval]).
-instr_rvals_and_lvals(region_set_fixed_slot(_, _, ValueRval),
-    [ValueRval], []).
-instr_rvals_and_lvals(use_and_maybe_pop_region_frame(_, _), [], []).
-instr_rvals_and_lvals(store_ticket(Lval), [], [Lval]).
-instr_rvals_and_lvals(reset_ticket(Rval, _Reason), [Rval], []).
-instr_rvals_and_lvals(discard_ticket, [], []).
-instr_rvals_and_lvals(prune_ticket, [], []).
-instr_rvals_and_lvals(mark_ticket_stack(Lval), [], [Lval]).
-instr_rvals_and_lvals(prune_tickets_to(Rval), [Rval], []).
-instr_rvals_and_lvals(incr_sp(_, _, _), [], []).
-instr_rvals_and_lvals(decr_sp(_), [], []).
-instr_rvals_and_lvals(decr_sp_and_return(_), [], []).
-instr_rvals_and_lvals(foreign_proc_code(_, Cs, _, _, _, _, _, _, _, _),
-        Rvals, Lvals) :-
-    foreign_proc_components_get_rvals_and_lvals(Cs, Rvals, Lvals).
-instr_rvals_and_lvals(init_sync_term(Lval, _, _), [], [Lval]).
-instr_rvals_and_lvals(fork_new_child(Lval, _), [], [Lval]).
-instr_rvals_and_lvals(join_and_continue(Lval, _), [], [Lval]).
-instr_rvals_and_lvals(lc_create_loop_control(_, Lval), [], [Lval]).
-instr_rvals_and_lvals(lc_wait_free_slot(Rval, Lval, _), [Rval], [Lval]).
-instr_rvals_and_lvals(lc_spawn_off(LCRval, LCSRval, _), [LCRval, LCSRval], []).
-instr_rvals_and_lvals(lc_join_and_terminate(LCRval, LCSRval),
-    [LCRval, LCSRval], []).
-
-    % Extract the rvals and lvals from the foreign_proc_components.
-    %
-:- pred foreign_proc_components_get_rvals_and_lvals(
-    list(foreign_proc_component)::in,
-    list(rval)::out, list(lval)::out) is det.
-
-foreign_proc_components_get_rvals_and_lvals([], [], []).
-foreign_proc_components_get_rvals_and_lvals([Comp | Comps], !:Rvals, !:Lvals) :-
-    foreign_proc_components_get_rvals_and_lvals(Comps, !:Rvals, !:Lvals),
-    foreign_proc_component_get_rvals_and_lvals(Comp, !Rvals, !Lvals).
-
-    % Extract the rvals and lvals from the foreign_proc_component
-    % and add them to the list.
-    %
-:- pred foreign_proc_component_get_rvals_and_lvals(foreign_proc_component::in,
-    list(rval)::in, list(rval)::out, list(lval)::in, list(lval)::out) is det.
-
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_inputs(Inputs),
-        !Rvals, !Lvals) :-
-    NewRvals = foreign_proc_inputs_get_rvals(Inputs),
-    list.append(NewRvals, !Rvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_outputs(Outputs),
-        !Rvals, !Lvals) :-
-    NewLvals = foreign_proc_outputs_get_lvals(Outputs),
-    list.append(NewLvals, !Lvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_user_code(_, _, _),
-        !Rvals, !Lvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_raw_code(_, _, _, _),
-        !Rvals, !Lvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_fail_to(_),
-        !Rvals, !Lvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_alloc_id(_),
-        !Rvals, !Lvals).
-foreign_proc_component_get_rvals_and_lvals(foreign_proc_noop,
-        !Rvals, !Lvals).
-
-    % Extract the rvals from the foreign_proc_input.
-    %
-:- func foreign_proc_inputs_get_rvals(list(foreign_proc_input)) = list(rval).
-
-foreign_proc_inputs_get_rvals([]) = [].
-foreign_proc_inputs_get_rvals([Input | Inputs]) = [Rval | Rvals] :-
-    Input = foreign_proc_input(_Name, _VarType, _IsDummy, _OrigType, Rval,
-        _, _),
-    Rvals = foreign_proc_inputs_get_rvals(Inputs).
-
-    % Extract the lvals from the foreign_proc_output.
-    %
-:- func foreign_proc_outputs_get_lvals(list(foreign_proc_output)) = list(lval).
-
-foreign_proc_outputs_get_lvals([]) = [].
-foreign_proc_outputs_get_lvals([Output | Outputs]) = [Lval | Lvals] :-
-    Output = foreign_proc_output(Lval, _VarType, _IsDummy, _OrigType,
-        _Name, _, _),
-    Lvals = foreign_proc_outputs_get_lvals(Outputs).
-
-    % Determine all the rvals and lvals referenced by a list of instructions.
-    %
-:- pred instr_list_rvals_and_lvals(list(instruction)::in,
-    list(rval)::out, list(lval)::out) is det.
-
-instr_list_rvals_and_lvals([], [], []).
-instr_list_rvals_and_lvals([llds_instr(Uinstr, _) | Instrs], Rvals, Lvals) :-
-    instr_rvals_and_lvals(Uinstr, HeadRvals, HeadLvals),
-    instr_list_rvals_and_lvals(Instrs, TailRvals, TailLvals),
-    Rvals = HeadRvals ++ TailRvals,
-    Lvals = HeadLvals ++ TailLvals.
-
 instr_list_labels([], [], []).
 instr_list_labels([llds_instr(Uinstr, _) | Instrs], Labels, CodeAddrs) :-
     instr_labels(Uinstr, HeadLabels, HeadCodeAddrs),
Index: compiler/par_conj_gen.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/par_conj_gen.m,v
retrieving revision 1.52
diff -u -p -b -r1.52 par_conj_gen.m
--- compiler/par_conj_gen.m	20 Oct 2011 04:37:43 -0000	1.52
+++ compiler/par_conj_gen.m	21 Oct 2011 00:11:11 -0000
@@ -119,6 +119,7 @@
 :- import_module libs.options.
 :- import_module ll_backend.code_gen.
 :- import_module ll_backend.code_info.
+:- import_module ll_backend.code_util.
 :- import_module ll_backend.continuation_info.
 :- import_module ll_backend.exprn_aux.
 :- import_module ll_backend.llds_out.
@@ -347,71 +348,17 @@ generate_lc_spawn_off(Goal, LCVar, LCSVa
     best_variable_location_det(!.CI, LCVar, LCVarLocn),
     best_variable_location_det(!.CI, LCSVar, LCSVarLocn),
 
-    (
-        UseParentStack = lc_use_parent_stack_frame,
-        CopyCode = cord.empty
-    ;
-        UseParentStack = lc_create_frame_on_child_stack,
-        list.map(get_variable_slot(!.CI), InputVars, InputStackSlots),
-
-        % XXX We only know the size of the stack frame for certain
-        % after we have finished generating code for the procedure.
-        %
-        % There are several different ways we can set the size of the first
-        % stack frame on the stack of the child context.
-        %
-        % The simplest way is to make a hopefully conservative estimate of the
-        % size of the current procedure's stack frame. If necessary, we can
-        % ENSURE that this estimate is conservative by testing whether it is at
-        % the end of generate_proc_code, and if not, recording the actual stack
-        % frame size in the code_info and redoing the code generation for the
-        % entire procedure, this time using the recorded size instead of the
-        % estimate.
-        %
-        % A second way is to move this code AFTER we have computed GoalCode,
-        % collect all the stackvar references in it, and base the size of the
-        % child stack frame on the highest numbered stackvar reference in
-        % there.
-        %
-        % The third way is the same as the second, except it compresses
-        % out any stack slots that are not used by GoalCode. This would require
-        % remapping all the stackvar references in GoalCode, as well as
-        % applying the compression map during the creation of CopyStr.
-        %
-        % For now, we use the first method without the fallback. This should
-        % be sufficient for the correctness of our benchmark programs, and
-        % *shouldn't* lose too much efficiency.
-
-        FrameSize = 100,
-        copy_slots_to_child_stack(FrameSize, LCVarLocn, LCSVarLocn,
-            InputStackSlots, CopyStr),
-        AffectsLiveness = proc_does_not_affect_liveness,
-        LiveLvalsInfo = live_lvals_info(
-            set.list_to_set([LCVarLocn, LCSVarLocn | InputStackSlots])),
-        CopyUinstr = arbitrary_c_code(AffectsLiveness, LiveLvalsInfo,
-            CopyStr),
-        CopyInstr = llds_instr(CopyUinstr, ""),
-        CopyCode = singleton(CopyInstr)
-    ),
-
-    % Create the call to spawn_off.
-    remember_position(!.CI, PositionBeforeSpawnOff),
-
     get_next_label(SpawnOffLabel, !CI),
     SpawnUinstr = lc_spawn_off(lval(LCVarLocn), lval(LCSVarLocn),
         SpawnOffLabel),
     SpawnInstr = llds_instr(SpawnUinstr, ""),
     SpawnOffCode = singleton(SpawnInstr),
-    % XXX This snapshot seems unnecessary; it is guaranteed to be the same
-    % as the previous snapshot.
     remember_position(!.CI, PositionAfterSpawnOff),
 
     % Code to spawn off.
     LabelUinstr = label(SpawnOffLabel),
     LabelInstr = llds_instr(LabelUinstr, "Label for spawned off code"),
     LabelCode = singleton(LabelInstr),
-    % XXX This reset seems unnecessary; no code
-    reset_to_position(PositionBeforeSpawnOff, !CI),
 
     % We don't need to clear all the registers, all the variables except for
     % LC and LCS are considered to be on the stack.
@@ -422,19 +369,57 @@ generate_lc_spawn_off(Goal, LCVar, LCSVa
     % We expect that the join_and_terminate call is already in Goal.
     SpawnedOffCode0 = LabelCode ++ GoalCode,
 
+    reset_to_position(PositionAfterSpawnOff, !CI),
+
     (
         UseParentStack = lc_use_parent_stack_frame,
-        replace_stack_vars_by_parent_sv(SpawnedOffCode0, SpawnedOffCode)
+        replace_stack_vars_by_parent_sv(SpawnedOffCode0, SpawnedOffCode),
+        CopyCode = cord.empty,
+
+        % Mark the output values as available in registers, code inserted after
+        % the recursive call expects to be able to read them.  Because they're
+        % gaurnteed to be placed in distinct stack slots it's okay to produce
+        % them a little early - really they could be produced from any point
+        % after spawn_off until the barrier in the base case.
+
+        % This module has a find_outputs predicate, but I can't see how set
+        % difference wouldn't work.
+        OutputVars = set_of_var.difference(NonLocalsSet, KnownVarsSet),
+        place_all_outputs(set_of_var.to_sorted_list(OutputVars), !CI)
     ;
         UseParentStack = lc_create_frame_on_child_stack,
-        SpawnedOffCode = SpawnedOffCode0
+        list.map(get_variable_slot(!.CI), InputVars, InputStackSlots),
 
         % XXX: We could take this opportunity to remove gaps in the stack
         % frame as discussed in our meetings.
-        % sorry($module, $pred, "unimplemented")
-    ),
+        instr_list_max_stack_ref(SpawnedOffCode0, MaxStackRef),
+        SpawnedOffCode = SpawnedOffCode0,
 
-    reset_to_position(PositionAfterSpawnOff, !CI),
+        % We only know the size of the stack frame for certain after we have
+        % finished generating code for the procedure.
+        %
+        % There are several different ways we can set the size of the first
+        % stack frame on the stack of the child context.
+        %
+        % We have choosen to implement this by collecting all the stackvar
+        % references in SpawnedOffCode0, and base the size of the child stack
+        % frame on the highest numbered stackvar reference in there.
+        %
+        % We could also compress out any stack slots that are not used by
+        % GoalCode. This would require remapping all the stackvar references in
+        % GoalCode, as well as applying the compression map during the creation
+        % of CopyStr.
+        FrameSize = MaxStackRef,
+        copy_slots_to_child_stack(FrameSize, LCVarLocn, LCSVarLocn,
+            InputStackSlots, CopyStr),
+        AffectsLiveness = proc_does_not_affect_liveness,
+        LiveLvalsInfo = live_lvals_info(
+            set.list_to_set([LCVarLocn, LCSVarLocn | InputStackSlots])),
+        CopyUinstr = arbitrary_c_code(AffectsLiveness, LiveLvalsInfo,
+            CopyStr),
+        CopyInstr = llds_instr(CopyUinstr, ""),
+        CopyCode = singleton(CopyInstr)
+    ),
 
     % The spawned off code is written into the procedure separately.
     add_out_of_line_code(SpawnedOffCode, !CI),
@@ -601,6 +586,29 @@ replace_stack_vars_by_parent_sv_lval(Lva
 
 %-----------------------------------------------------------------------------%
 
+:- pred instr_list_max_stack_ref(cord(instruction)::in, int::out) is det.
+
+instr_list_max_stack_ref(Instrs, MaxRef) :-
+    instrs_rvals_and_lvals(cord.list(Instrs), RVals, LVals0),
+    LValsInRvalsLists = map(lvals_in_rval, to_sorted_list(RVals)),
+    LValsSets = map(set, LValsInRvalsLists),
+    LVals = set.union_list(LValsSets) `set.union` LVals0,
+    set.fold(max_stack_ref_acc, LVals, 0, MaxRef).
+
+:- pred max_stack_ref_acc(lval::in, int::in, int::out) is det.
+
+max_stack_ref_acc(LVal, Max0, Max) :-
+    (
+        LVal = stackvar(N),
+        N > Max0
+    ->
+        Max = N
+    ;
+        Max = Max0
+    ).
+
+%-----------------------------------------------------------------------------%
+
 :- pred find_outputs(list(prog_var)::in, instmap::in, instmap::in,
     module_info::in, list(prog_var)::in, list(prog_var)::out) is det.
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20111021/732edd3f/attachment.sig>


More information about the reviews mailing list