[m-rev.] for review: parallel execution mechanism (2/2)

Peter Wang wangp at students.csse.unimelb.edu.au
Tue Sep 12 17:13:03 AEST 2006


Index: compiler/options.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/options.m,v
retrieving revision 1.526
diff -u -r1.526 options.m
--- compiler/options.m	10 Sep 2006 23:39:05 -0000	1.526
+++ compiler/options.m	11 Sep 2006 05:07:15 -0000
@@ -1767,6 +1767,7 @@
 long_option("bits-per-word",        bits_per_word).
 long_option("bytes-per-word",       bytes_per_word).
 long_option("conf-low-tag-bits",    conf_low_tag_bits).
+long_option("sync-term-size",       sync_term_size).
 long_option("unboxed-float",        unboxed_float).
 long_option("unboxed-enums",        unboxed_enums).
 long_option("unboxed-no-tag-types", unboxed_no_tag_types).
Index: compiler/par_conj_gen.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/par_conj_gen.m,v
retrieving revision 1.28
diff -u -r1.28 par_conj_gen.m
--- compiler/par_conj_gen.m	22 Aug 2006 05:04:01 -0000	1.28
+++ compiler/par_conj_gen.m	11 Sep 2006 05:04:48 -0000
@@ -7,7 +7,7 @@
 %---------------------------------------------------------------------------%
 %
 % File: par_conj.m.
-% Main authors: conway.
+% Main authors: conway, wangp.
 %
 % The predicates of this module generate code for parallel conjunctions.
 %
@@ -30,9 +30,8 @@
 %     predictable termination properties.
 %     Parallel conjunction does not of itself suggest any information
 %     about which order two goals should be executed, however if
-%     coroutining (not currently implemented) is being used, then the
-%     data dependencies between the two goals will constrain the order
-%     of execution at runtime.
+%     coroutining is being used, then the data dependencies between
+%     the two goals will constrain the order of execution at runtime.
 %
 %   [Mode correctness]
 %   - `,'/2 has a *sequential* behaviour `A, B' proves `A' *then*
@@ -47,23 +46,13 @@
 %     and-parallelism), but an and-parallel goal may use bindings made
 %     in conjoined goals which may lead to coroutining.
 %
-% The current implementation only supports independent and-parallelism.
+% The current implementation mainly supports independent and-parallelism.
 % The syntax for parallel conjunction is `&'/2 which behaves like `,'/2
 % in that sequences get flattened (ie A & (B & C) <=> (A & B) & C).
+% A subset of dependent and-parallelism is supported (see dep_par_conj.m).
 %
-% Type checking works exactly the same for parallel conjunction as it does
-% for sequential conjunction.
-%
-% Mode analysis schedules a parallel conjunction if all the conjuncts can
-% be scheduled independently, and they bind disjoint sets of variables
-% (type-nodes). This is done by mode checking each conjunct with the same
-% initial instmap and `locking' (as is done for the nonlocal variables of a
-% negation[1]) any variables that get bound in that conjunct before
-% recursively processing the rest of the parallel conjunction. At the end of
-% the conjunction the final instmaps from the conjuncts are merged by unifying
-% them. Since the variable `locking' ensures that the variables bound by each
-% conjunct are distinct from those bound by the other conjuncts, the
-% unification of the instmaps is guaranteed to succeed.
+% Type checking and mode analysis work exactly the same for parallel
+% conjunction as for sequential conjunction.
 %
 % In principle, the determinism of a parallel conjunction is derived from
 % its conjuncts in the same way as the determinism of a conjunction but
@@ -71,24 +60,19 @@
 % conjunction, determinism analysis works by inferring the determinism of
 % each conjunct and reporting an error if it is not a model_det determinism.
 %
-% We conservatively require that any variable that is nonlocal to more
-% than one parallel conjunct become shared at the start of the parallel
-% conjunction. This avoids problems where one conjunct has a use in a
-% di mode and another in a ui mode. This would introduce an implicit
-% dependency between the two conjuncts, which at present is illegal,
-% since parallel conjunction is currently *independent* parallel
-% conjunction only.
-%
 % The code generated for a parallel conjunction consists of a piece of
 % initialization code which creates a term on the heap to be used for
 % controlling the synchronization of the conjuncts and the code for the
-% conjuncts each proceeded by a command to start the conjunct as a new
-% thread of execution (except the last which executes in the "parent"
-% thread), and each succeeded by a command that signals that the execution
-% of the conjunct has completed and terminates the thread (except for
-% the "parent" thread which suspends till all the other parallel conjuncts
-% have terminated, when it will be woken up). The synchronization terms
-% are referred to in the code as 'sync_term's.
+% conjuncts.  The synchronization terms are referred to in the code as
+% 'sync_term's.  Conjuncts are executed "left to right".  At the start of
+% the i'th conjunct is a command to "spark" the i+1'th conjunct, i.e.
+% record enough information to begin executing the next conjunct either
+% in parallel, or to return to it later when the current conjunct ends.
+% At the end of each conjunct is a command that checks if all the
+% conjuncts of the parallel conjunction have been executed and completed.
+% If not, we begin execution of the next conjunct as recorded in the
+% spark.  If so, the parallel conjunction is complete.  (In reality it's
+% a lot more complicated.)
 %
 % The runtime support for parallel conjunction is documented in the runtime
 % directory in mercury_context.{c,h}.
@@ -129,6 +113,7 @@
 :- import_module ll_backend.code_info.
 :- import_module ll_backend.code_util.
 :- import_module ll_backend.continuation_info.
+:- import_module ll_backend.exprn_aux.
 :- import_module parse_tree.prog_data.
 
 :- import_module bool.
@@ -139,6 +124,7 @@
 :- import_module pair.
 :- import_module set.
 :- import_module string.
+:- import_module unit.
 
 %---------------------------------------------------------------------------%
 
@@ -152,10 +138,37 @@
         CodeModel = model_non,
         sorry(this_file, "nondet parallel conjunction not implemented")
     ),
+
     code_info.get_globals(!.CI, Globals),
     globals.lookup_int_option(Globals, sync_term_size, STSize),
+
+    % When entering a parallel conjunctions at the shallowest level in
+    % the procedure, we have to set the parent_sp register to the value
+    % of the sp register, and restore it when the parallel conjunction
+    % finishes.
+    code_info.get_par_conj_depth(!.CI, Depth),
+    (if Depth = 0 then
+        code_info.acquire_temp_slot(lval(parent_sp), ParentSpSlot, !CI),
+        MaybeSetParentSpCode = node([
+            assign(ParentSpSlot, lval(parent_sp))
+                - "save the old parent stack pointer",
+            assign(parent_sp, lval(sp))
+                - "set the parent stack pointer"
+        ]),
+        MaybeRestoreParentSpCode = node([
+            assign(parent_sp, lval(ParentSpSlot))
+                - "restore old parent stack pointer"
+        ]),
+        MaybeReleaseParentSpSlot = yes(ParentSpSlot)
+    else
+        MaybeSetParentSpCode = empty,
+        MaybeRestoreParentSpCode = empty,
+        MaybeReleaseParentSpSlot = no
+    ),
+
     code_info.get_known_variables(!.CI, Vars),
     code_info.save_variables_on_stack(Vars, SaveCode, !CI),
+
     goal_info_get_code_gen_nonlocals(GoalInfo, Nonlocals),
     set.to_sorted_list(Nonlocals, Variables),
     code_info.get_instmap(!.CI, Initial),
@@ -163,89 +176,199 @@
     instmap.apply_instmap_delta(Initial, Delta, Final),
     code_info.get_module_info(!.CI, ModuleInfo),
     find_outputs(Variables, Initial, Final, ModuleInfo, [], Outputs),
+
     list.length(Goals, NumGoals),
     code_info.acquire_reg(reg_r, RegLval, !CI),
-    code_info.acquire_temp_slot(sync_term, SyncSlot, !CI),
-    code_info.acquire_temp_slot(lval(sp), SpSlot, !CI),
-    MakeTerm = node([
-        assign(SpSlot, lval(sp))
-            - "save the parent stack pointer",
+    code_info.acquire_persistent_temp_slot(sync_term, SyncSlot, !CI),
+    (if SyncSlot = stackvar(SlotNum) then
+        ParentSyncSlot = parent_stackvar(SlotNum)
+    else
+        unexpected(this_file, "generate_par_conj")
+    ),
+
+    MakeSyncTermCode = node([
         % The may_not_use_atomic here is conservative.
         incr_hp(RegLval, no, no, const(llconst_int(STSize)),
-            "synchronization vector", may_not_use_atomic_alloc)
-            - "allocate a synchronization vector",
+            "sync term", may_not_use_atomic_alloc)
+            - "allocate a sync term",
         init_sync_term(RegLval, NumGoals)
             - "initialize sync term",
         assign(SyncSlot, lval(RegLval))
-            - "store the sync-term on the stack"
+            - "store the sync term on the stack"
     ]),
     code_info.release_reg(RegLval, !CI),
+
+    code_info.set_par_conj_depth(Depth+1, !CI),
+    code_info.get_next_label(EndLabel, !CI),
     code_info.clear_all_registers(no, !CI),
-    generate_det_par_conj_2(Goals, 0, SyncSlot, SpSlot, Initial, no,
+    generate_det_par_conj_2(Goals, ParentSyncSlot, EndLabel, Initial, no,
         GoalCode, !CI),
-    code_info.release_temp_slot(SyncSlot, !CI),
-    Code = tree(tree(SaveCode, MakeTerm), GoalCode),
+    code_info.set_par_conj_depth(Depth, !CI),
+
+    EndLabelCode = node([
+        label(EndLabel)
+            - "end of parallel conjunction"
+    ]),
+    Code = tree_list([
+        MaybeSetParentSpCode, SaveCode, MakeSyncTermCode,
+        GoalCode, EndLabelCode, MaybeRestoreParentSpCode
+    ]),
+
+    % We can't release the sync slot right now, in case we are in a
+    % nested parallel conjunction.  Consider:
+    %
+    %   (
+    %       (A & B)   % inner1
+    %   &
+    %       (C & D)   % inner2
+    %   )
+    %
+    % If inner1 released its sync slot now then it might end up being reused
+    % by inner2.  But inner1 and inner2 could be executing simultaneously.
+    % In general we can't release the sync slot of any parallel conjunction
+    % until we leave the shallowest parallel conjunction, i.e. at depth 0.
+    % For now we only release the sync slots of parallel conjunctions at the
+    % top level.
+    %
+    % XXX release sync slots of nested parallel conjunctions
+    %
+    (if Depth = 0 then
+        code_info.release_persistent_temp_slot(SyncSlot, !CI)
+    else
+        true
+    ),
+    (
+        MaybeReleaseParentSpSlot = yes(ParentSpSlot1),
+        code_info.release_temp_slot(ParentSpSlot1, !CI)
+    ;
+        MaybeReleaseParentSpSlot = no
+    ),
     code_info.clear_all_registers(no, !CI),
     place_all_outputs(Outputs, !CI).
 
-:- pred generate_det_par_conj_2(list(hlds_goal)::in, int::in,
-    lval::in, lval::in, instmap::in, branch_end::in, code_tree::out,
+:- pred generate_det_par_conj_2(list(hlds_goal)::in,
+    lval::in, label::in, instmap::in, branch_end::in, code_tree::out,
     code_info::in, code_info::out) is det.
 
-generate_det_par_conj_2([], _N, _SyncTerm, _SpSlot, _Initial, _, empty, !CI).
-generate_det_par_conj_2([Goal | Goals], N, SyncTerm, SpSlot,
+generate_det_par_conj_2([], _ParentSyncTerm, _EndLabel,
+        _Initial, _, empty, !CI).
+generate_det_par_conj_2([Goal | Goals], ParentSyncTerm, EndLabel,
         Initial, MaybeEnd0, Code, !CI) :-
     code_info.remember_position(!.CI, StartPos),
-    code_info.get_next_label(ThisConjunct, !CI),
-    code_info.get_next_label(NextConjunct, !CI),
-    code_gen.generate_goal(model_det, Goal, ThisGoalCode, !CI),
+    code_gen.generate_goal(model_det, Goal, ThisGoalCode0, !CI),
+    replace_stack_vars_by_parent_sv(ThisGoalCode0, ThisGoalCode),
+
     code_info.get_stack_slots(!.CI, AllSlots),
     code_info.get_known_variables(!.CI, Variables),
     set.list_to_set(Variables, LiveVars),
     map.select(AllSlots, LiveVars, StoreMap0),
     StoreMap = map.map_values(key_stack_slot_to_abs_locn, StoreMap0),
     code_info.generate_branch_end(StoreMap, MaybeEnd0, MaybeEnd,
-        SaveCode, !CI),
-    Goal = _GoalExpr - GoalInfo,
-    goal_info_get_instmap_delta(GoalInfo, Delta),
-    instmap.apply_instmap_delta(Initial, Delta, Final),
-    code_info.get_module_info(!.CI, ModuleInfo),
-    find_outputs(Variables, Initial, Final, ModuleInfo, [], TheseOutputs),
-    copy_outputs(!.CI, TheseOutputs, SpSlot, CopyCode),
+        SaveCode0, !CI),
+    replace_stack_vars_by_parent_sv(SaveCode0, SaveCode),
+
     (
         Goals = [_ | _],
+        code_info.get_next_label(NextConjunct, !CI),
         code_info.reset_to_position(StartPos, !CI),
-        code_info.get_total_stackslot_count(!.CI, NumSlots),
         ForkCode = node([
-            fork(ThisConjunct, NextConjunct, NumSlots)
-                - "fork off a child",
-            label(ThisConjunct)
-                - "child thread"
+            fork(NextConjunct)
+                - "fork off a child"
         ]),
         JoinCode = node([
-            join_and_terminate(SyncTerm)
+            join_and_continue(ParentSyncTerm, EndLabel)
                 - "finish",
             label(NextConjunct)
                 - "start of the next conjunct"
         ])
     ;
         Goals = [],
-        code_info.get_next_label(ContLab, !CI),
         ForkCode = empty,
         JoinCode = node([
-            join_and_continue(SyncTerm, ContLab)
-                - "sync with children then continue",
-            label(ContLab)
-                - "end of parallel conjunction"
+            join_and_continue(ParentSyncTerm, EndLabel)
+                - "finish"
         ])
     ),
-    ThisCode = tree_list([ForkCode, ThisGoalCode, SaveCode, CopyCode,
-        JoinCode]),
-    N1 = N + 1,
-    generate_det_par_conj_2(Goals, N1, SyncTerm, SpSlot, Initial, MaybeEnd,
+    ThisCode = tree_list([ForkCode, ThisGoalCode, SaveCode, JoinCode]),
+    generate_det_par_conj_2(Goals, ParentSyncTerm, EndLabel, Initial, MaybeEnd,
         RestCode, !CI),
     Code = tree(ThisCode, RestCode).
 
+%-----------------------------------------------------------------------------%
+
+    % In the code of parallel conjuncts we have to refer to stack slots in
+    % the procedure's stack frame via the `parent_sp' register instead of the
+    % usual `sp' register, as the conjunct could be running in a different
+    % context.
+    %
+:- pred replace_stack_vars_by_parent_sv(code_tree::in, code_tree::out) is det.
+:- pred replace_stack_vars_by_parent_sv_instrs(list(instruction)::in,
+    list(instruction)::out) is det.
+:- pred replace_stack_vars_by_parent_sv_lval(lval::in, lval::out,
+    unit::in, unit::out) is det.
+
+replace_stack_vars_by_parent_sv(!Code) :-
+    tree.map(replace_stack_vars_by_parent_sv_instrs, !Code).
+
+replace_stack_vars_by_parent_sv_instrs(!Instrs) :-
+    list.map_foldl(
+        transform_lval_in_instr(replace_stack_vars_by_parent_sv_lval),
+        !Instrs, unit, _).
+
+replace_stack_vars_by_parent_sv_lval(Lval0, Lval, !Acc) :-
+    TransformRval = replace_stack_vars_by_parent_sv_lval,
+    (
+        ( Lval0 = stackvar(SlotNum)
+        ; Lval0 = parent_stackvar(SlotNum)
+        ),
+        Lval = parent_stackvar(SlotNum)
+    ;
+        ( Lval0 = reg(_Type, _RegNum)
+        ; Lval0 = succip
+        ; Lval0 = maxfr
+        ; Lval0 = curfr
+        ; Lval0 = hp
+        ; Lval0 = sp
+        ; Lval0 = parent_sp
+        ; Lval0 = temp(_Type, _TmpNum)
+        ; Lval0 = framevar(_SlotNum)
+        ; Lval0 = lvar(_Var)
+        ; Lval0 = global_var_ref(_GlobalVarName)
+        ),
+        Lval = Lval0
+    ;
+        Lval0 = succip_slot(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = succip_slot(Rval)
+    ;
+        Lval0 = redoip_slot(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = redoip_slot(Rval)
+    ;
+        Lval0 = redofr_slot(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = redofr_slot(Rval)
+    ;
+        Lval0 = succfr_slot(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = succfr_slot(Rval)
+    ;
+        Lval0 = prevfr_slot(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = prevfr_slot(Rval)
+    ;
+        Lval0 = field(Tag, Rval1, Rval2),
+        transform_lval_in_rval(TransformRval, Rval1, Rval3, !Acc),
+        transform_lval_in_rval(TransformRval, Rval2, Rval4, !Acc),
+        Lval = field(Tag, Rval3, Rval4)
+    ;
+        Lval0 = mem_ref(Rval0),
+        transform_lval_in_rval(TransformRval, Rval0, Rval, !Acc),
+        Lval = mem_ref(Rval)
+    ).
+
+%-----------------------------------------------------------------------------%
+
 :- pred find_outputs(list(prog_var)::in, instmap::in, instmap::in,
     module_info::in, list(prog_var)::in, list(prog_var)::out) is det.
 
@@ -260,45 +383,6 @@
     ),
     find_outputs(Vars, Initial, Final, ModuleInfo, !Outputs).
 
-    % XXX at the moment we are copying back too much.  Conjuncts which
-    % are only consumers of variable X should not be copying it back.
-    %
-    % Also, variables which are shared (i.e. we allocate a future for
-    % it) do not need copying back if we take the address of the
-    % shared variable and store it in the future, then write
-    % to that address on `signal' or the last `wait'.
-    %
-:- pred copy_outputs(code_info::in, list(prog_var)::in, lval::in,
-    code_tree::out) is det.
-
-copy_outputs(_, [], _, empty).
-copy_outputs(CI, [Var | Vars], SpSlot, Code) :-
-    code_info.get_variable_slot(CI, Var, SrcSlot),
-    ( SrcSlot = stackvar(SlotNum) ->
-        (
-            code_info.get_module_info(CI, ModuleInfo),
-            code_info.variable_type(CI, Var) = Type,
-            is_dummy_argument_type(ModuleInfo, Type)
-        ->
-            % Don't copy dummy values.
-            ThisCode = empty
-        ;
-            % The stack pointer points to the last used word on the stack.
-            % We want MR_sp[-0] = MR_sv(1), MR_sp[-1] = MR_sv(2), etc.
-            NegSlotNum = (1 - SlotNum),
-            DestSlot = field(yes(0), lval(SpSlot),
-                const(llconst_int(NegSlotNum))),
-            VarName = code_info.variable_to_string(CI, Var),
-            Msg = "copy result " ++ VarName ++ " to parent stackframe",
-            ThisCode = node([assign(DestSlot, lval(SrcSlot)) - Msg])
-        ),
-        Code = tree(ThisCode, RestCode),
-        copy_outputs(CI, Vars, SpSlot, RestCode)
-    ;
-        unexpected(this_file,
-            "copy_outputs: par conj in model non procedure!")
-    ).
-
 :- pred place_all_outputs(list(prog_var)::in, code_info::in, code_info::out)
     is det.
 
Index: compiler/reassign.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/reassign.m,v
retrieving revision 1.19
diff -u -r1.19 reassign.m
--- compiler/reassign.m	20 Aug 2006 05:01:33 -0000	1.19
+++ compiler/reassign.m	11 Sep 2006 05:04:48 -0000
@@ -295,7 +295,7 @@
         !:RevInstrs = [Instr0 | !.RevInstrs],
         clobber_dependents(Target, !KnownContentsMap, !DepLvalMap)
     ;
-        Uinstr0 = fork(_, _, _),
+        Uinstr0 = fork(_),
         !:RevInstrs = [Instr0 | !.RevInstrs],
         % Both the parent and the child thread jump to labels specified
         % by the fork instruction, so the value of !:KnownContentsMap doesn't
@@ -304,13 +304,6 @@
         !:KnownContentsMap = map.init,
         !:DepLvalMap = map.init
     ;
-        Uinstr0 = join_and_terminate(_),
-        !:RevInstrs = [Instr0 | !.RevInstrs],
-        % The value of KnownContentsMap doesn't really matter since this
-        % instruction terminates the execution of this thread.
-        !:KnownContentsMap = map.init,
-        !:DepLvalMap = map.init
-    ;
         Uinstr0 = join_and_continue(_, _),
         !:RevInstrs = [Instr0 | !.RevInstrs],
         % Other threads may modify any lval.
Index: compiler/stack_layout.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/stack_layout.m,v
retrieving revision 1.121
diff -u -r1.121 stack_layout.m
--- compiler/stack_layout.m	22 Aug 2006 05:04:07 -0000	1.121
+++ compiler/stack_layout.m	11 Sep 2006 05:04:48 -0000
@@ -1424,6 +1424,9 @@
 represent_lval(stackvar(Num), Word) :-
     expect(Num > 0, this_file, "represent_lval: bad stackvar"),
     make_tagged_word(lval_stackvar, Num, Word).
+represent_lval(parent_stackvar(Num), Word) :-
+    expect(Num > 0, this_file, "represent_lval: bad parent_stackvar"),
+    make_tagged_word(lval_parent_stackvar, Num, Word).
 represent_lval(framevar(Num), Word) :-
     expect(Num > 0, this_file, "represent_lval: bad framevar"),
     make_tagged_word(lval_framevar, Num, Word).
@@ -1437,6 +1440,8 @@
     make_tagged_word(lval_hp, 0, Word).
 represent_lval(sp, Word) :-
     make_tagged_word(lval_sp, 0, Word).
+represent_lval(parent_sp, Word) :-
+    make_tagged_word(lval_parent_sp, 0, Word).
 
 represent_lval(temp(_, _), _) :-
     unexpected(this_file, "continuation live value stored in temp register").
@@ -1484,7 +1489,9 @@
     ;       lval_curfr
     ;       lval_hp
     ;       lval_sp
-    ;       lval_indirect.
+    ;       lval_indirect
+    ;       lval_parent_sp
+    ;       lval_parent_stackvar.
 
 :- pred locn_type_code(locn_type::in, int::out) is det.
 
@@ -1498,6 +1505,8 @@
 locn_type_code(lval_hp,       7).
 locn_type_code(lval_sp,       8).
 locn_type_code(lval_indirect, 9).
+locn_type_code(lval_parent_sp,       10).
+locn_type_code(lval_parent_stackvar, 11).
 
     % This number of tag bits must be able to encode all values of
     % locn_type_code.
@@ -1553,6 +1562,12 @@
 represent_lval_as_byte(sp, Byte) :-
     locn_type_code(lval_sp, Val),
     make_tagged_byte(3, Val, Byte).
+represent_lval_as_byte(parent_sp, Byte) :-
+    locn_type_code(lval_parent_sp, Val),
+    make_tagged_byte(3, Val, Byte).
+represent_lval_as_byte(parent_stackvar(Num), Byte) :-
+    expect(Num > 0, this_file, "represent_lval_as_byte: bad parent_stackvar"),
+    make_tagged_byte(1, Num, Byte).
 
 :- pred make_tagged_byte(int::in, int::in, int::out) is det.
 
Index: compiler/use_local_vars.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/use_local_vars.m,v
retrieving revision 1.26
diff -u -r1.26 use_local_vars.m
--- compiler/use_local_vars.m	22 Aug 2006 05:04:14 -0000	1.26
+++ compiler/use_local_vars.m	11 Sep 2006 05:04:48 -0000
@@ -520,9 +520,7 @@
     ;
         Uinstr0 = init_sync_term(_, _)
     ;
-        Uinstr0 = fork(_, _, _)
-    ;
-        Uinstr0 = join_and_terminate(_)
+        Uinstr0 = fork(_)
     ;
         Uinstr0 = join_and_continue(_, _)
     ;
Index: compiler/var_locn.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/var_locn.m,v
retrieving revision 1.43
diff -u -r1.43 var_locn.m
--- compiler/var_locn.m	22 Aug 2006 05:04:14 -0000	1.43
+++ compiler/var_locn.m	11 Sep 2006 05:04:48 -0000
@@ -1927,6 +1927,7 @@
     (
         ( Lval0 = reg(_, _)
         ; Lval0 = stackvar(_)
+        ; Lval0 = parent_stackvar(_)
         ; Lval0 = framevar(_)
         ; Lval0 = global_var_ref(_)
         ; Lval0 = succip
@@ -1934,6 +1935,7 @@
         ; Lval0 = curfr
         ; Lval0 = hp
         ; Lval0 = sp
+        ; Lval0 = parent_sp
         ),
         Lval = Lval0,
         Code = empty
@@ -2176,6 +2178,7 @@
 
 is_root_lval(reg(reg_r, _)).
 is_root_lval(stackvar(_)).
+is_root_lval(parent_stackvar(_)).
 is_root_lval(framevar(_)).
 
 %----------------------------------------------------------------------------%
Index: library/par_builtin.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/library/par_builtin.m,v
retrieving revision 1.5
diff -u -r1.5 par_builtin.m
--- library/par_builtin.m	15 Aug 2006 04:19:38 -0000	1.5
+++ library/par_builtin.m	10 Sep 2006 09:45:39 -0000
@@ -109,7 +109,7 @@
     GC_REGISTER_FINALIZER(Future, MR_finalize_future, NULL, NULL, NULL);
   #endif
 
-    Future->signalled = 0;
+    Future->signalled = MR_FALSE;
     Future->suspended = NULL;
     Future->value = 0;
 
@@ -154,7 +154,6 @@
         MR_UNLOCK(&(Future->lock), ""future.wait"");
     } else {
         MR_Context *ctxt;
-        MercuryThreadList *new_element;
 
         /*
         ** The address of the future can be lost when we resume so save it on
@@ -163,39 +162,21 @@
         MR_incr_sp(1);
         MR_sv(1) = (MR_Word) Future;
 
-        ctxt = MR_ENGINE(MR_eng_this_context);
-
-        /*
-        ** Mark the current context as being owned by this thread to prevent it
-        ** from being resumed by another thread. Specifically we don't want the
-        ** 'main' context to be resumed by any thread other than the primordial
-        ** thread, because after the primordial thread finishes executing the
-        ** main program it has to clean up the Mercury runtime.
-        **
-        ** XXX this solution seems too heavy for the problem at hand
-        */
-        MR_ENGINE(MR_eng_c_depth)++;
-
-        new_element = MR_GC_NEW(MercuryThreadList);
-        new_element->thread = ctxt->MR_ctxt_owner_thread;
-        new_element->next = MR_ENGINE(MR_eng_saved_owners);
-        MR_ENGINE(MR_eng_saved_owners) = new_element;
-
-        ctxt->MR_ctxt_owner_thread = MR_ENGINE(MR_eng_owner_thread);
-
         /*
         ** Save this context and put it on the list of suspended contexts for
         ** this future.
         */
+        ctxt = MR_ENGINE(MR_eng_this_context);
         MR_save_context(ctxt);
+
         ctxt->MR_ctxt_resume = MR_ENTRY(mercury__par_builtin__wait_resume);
         ctxt->MR_ctxt_next = Future->suspended;
         Future->suspended = ctxt;
 
         MR_UNLOCK(&(Future->lock), ""future.wait"");
-        MR_runnext();
 
-        assert(0);
+        MR_ENGINE(MR_eng_this_context) = NULL;
+        MR_runnext();
     }
 
 #else
@@ -235,27 +216,7 @@
         Future = (MR_Future *) MR_sv(1);
         MR_decr_sp(1);
 
-        assert(Future->signalled == 1);
-
-        /* Restore the owning thread in the current context. */
-        assert(MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread
-            == MR_ENGINE(MR_eng_owner_thread));
-        MR_ENGINE(MR_eng_c_depth)--;
-        {
-            MercuryThreadList *tmp;
-            MercuryThread val;
-
-            tmp = MR_ENGINE(MR_eng_saved_owners);
-            if (tmp != NULL)
-            {
-                val = tmp->thread;
-                MR_ENGINE(MR_eng_saved_owners) = tmp->next;
-                MR_GC_free(tmp);
-            } else {
-                val = 0;
-            }
-            MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread = val;
-        }
+        assert(Future->signalled);
 
         /* Return to the caller of par_builtin.wait. */
         MR_r1 = Future->value;
@@ -298,7 +259,7 @@
 "
 #if (!defined MR_HIGHLEVEL_CODE) && (defined MR_THREAD_SAFE)
 
-    assert(Future->signalled == 1);
+    assert(Future->signalled);
     Value = Future->value;
 
 #else
@@ -316,6 +277,7 @@
 #if (!defined MR_HIGHLEVEL_CODE) && (defined MR_THREAD_SAFE)
 
     MR_Context *ctxt;
+    MR_Context *next;
 
     MR_LOCK(&(Future->lock), ""future.signal"");
 
@@ -323,26 +285,24 @@
     ** If the same future is passed twice to a procedure then it
     ** could be signalled twice, but the value must be the same.
     */
-    if (Future->signalled != 0) {
-        assert(Future->signalled == 1);
+    if (Future->signalled) {
         assert(Future->value == Value);
     } else {
-        Future->signalled++;
+        Future->signalled = MR_TRUE;
         Future->value = Value;
     }
 
     /* Schedule all the contexts which are blocking on this future. */
     ctxt = Future->suspended;
     while (ctxt != NULL) {
-        MR_schedule(ctxt);
-        ctxt = ctxt->MR_ctxt_next;
+        next = ctxt->MR_ctxt_next;
+        MR_schedule_context(ctxt);  /* clobbers MR_ctxt_next */
+        ctxt = next;
     }
     Future->suspended = NULL;
 
     MR_UNLOCK(&(Future->lock), ""future.signal"");
 
-    assert(Future->signalled == 1);
-
 #else
 
     MR_fatal_error(""internal error: par_builtin.signal"");
Index: runtime/mercury_context.c
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.c,v
retrieving revision 1.47
diff -u -r1.47 mercury_context.c
--- runtime/mercury_context.c	5 Jul 2006 03:00:43 -0000	1.47
+++ runtime/mercury_context.c	11 Sep 2006 05:06:57 -0000
@@ -33,10 +33,20 @@
 #include "mercury_reg_workarounds.h"    /* for `MR_fd*' stuff */
 
 static  void            MR_init_context_maybe_generator(MR_Context *c,
-                            MR_Generator *gen);
+                            const char *id, MR_Generator *gen);
 
+/*---------------------------------------------------------------------------*/
+
+/*
+** The run queue and spark queue are protected and signalled with the
+** same lock and condition variable.
+*/
 MR_Context              *MR_runqueue_head;
 MR_Context              *MR_runqueue_tail;
+#ifndef MR_HIGHLEVEL_CODE
+  MR_Spark              *MR_spark_queue_head;
+  MR_Spark              *MR_spark_queue_tail;
+#endif
 #ifdef  MR_THREAD_SAFE
   MercuryLock           MR_runqueue_lock;
   MercuryCond           MR_runqueue_cond;
@@ -59,6 +69,11 @@
   static MercuryLock    free_context_list_lock;
 #endif
 
+int MR_num_idle_engines = 0;
+int MR_num_outstanding_contexts_and_sparks = 0;
+
+/*---------------------------------------------------------------------------*/
+
 void
 MR_init_thread_stuff(void)
 {
@@ -87,8 +102,9 @@
 #endif
 }
 
-void 
-MR_init_context(MR_Context *c, const char *id, MR_Generator *gen)
+static void 
+MR_init_context_maybe_generator(MR_Context *c, const char *id,
+    MR_Generator *gen)
 {
     c->MR_ctxt_id = id;
     c->MR_ctxt_next = NULL;
@@ -100,29 +116,29 @@
 #ifndef MR_HIGHLEVEL_CODE
     c->MR_ctxt_succip = MR_ENTRY(MR_do_not_reached);
 
-    if (c->MR_ctxt_detstack_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_detstack_zone);
-    } else if (gen != NULL) {
-        c->MR_ctxt_detstack_zone = MR_create_zone("gen_detstack",
-            0, MR_gen_detstack_size, MR_next_offset(),
-            MR_gen_detstack_zone_size, MR_default_handler);
-    } else {
-        c->MR_ctxt_detstack_zone = MR_create_zone("detstack",
-            0, MR_detstack_size, MR_next_offset(),
-            MR_detstack_zone_size, MR_default_handler);
+    if (c->MR_ctxt_detstack_zone == NULL) {
+        if (gen != NULL) {
+            c->MR_ctxt_detstack_zone = MR_create_zone("gen_detstack",
+                    0, MR_gen_detstack_size, MR_next_offset(),
+                    MR_gen_detstack_zone_size, MR_default_handler);
+        } else {
+            c->MR_ctxt_detstack_zone = MR_create_zone("detstack",
+                    0, MR_detstack_size, MR_next_offset(),
+                    MR_detstack_zone_size, MR_default_handler);
+        }
     }
     c->MR_ctxt_sp = c->MR_ctxt_detstack_zone->MR_zone_min;
 
-    if (c->MR_ctxt_nondetstack_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_nondetstack_zone);
-    } else if (gen != NULL) {
-        c->MR_ctxt_nondetstack_zone = MR_create_zone("gen_nondetstack",
-            0, MR_gen_nonstack_size, MR_next_offset(),
-            MR_gen_nonstack_zone_size, MR_default_handler);
-    } else {
-        c->MR_ctxt_nondetstack_zone = MR_create_zone("nondetstack",
-            0, MR_nondstack_size, MR_next_offset(),
-            MR_nondstack_zone_size, MR_default_handler);
+    if (c->MR_ctxt_nondetstack_zone == NULL) {
+        if (gen != NULL) {
+            c->MR_ctxt_nondetstack_zone = MR_create_zone("gen_nondetstack",
+                    0, MR_gen_nonstack_size, MR_next_offset(),
+                    MR_gen_nonstack_zone_size, MR_default_handler);
+        } else {
+            c->MR_ctxt_nondetstack_zone = MR_create_zone("nondetstack",
+                    0, MR_nondstack_size, MR_next_offset(),
+                    MR_nondstack_zone_size, MR_default_handler);
+        }
     }
     /*
     ** Note that maxfr and curfr point to the last word in the frame,
@@ -147,27 +163,21 @@
             "generator and stack_copy");
     }
 
-    if (c->MR_ctxt_genstack_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_genstack_zone);
-    } else {
+    if (c->MR_ctxt_genstack_zone == NULL) {
         c->MR_ctxt_genstack_zone = MR_create_zone("genstack", 0,
             MR_genstack_size, MR_next_offset(),
             MR_genstack_zone_size, MR_default_handler);
     }
     c->MR_ctxt_gen_next = 0;
 
-    if (c->MR_ctxt_cutstack_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_cutstack_zone);
-    } else {
+    if (c->MR_ctxt_cutstack_zone == NULL) {
         c->MR_ctxt_cutstack_zone = MR_create_zone("cutstack", 0,
             MR_cutstack_size, MR_next_offset(),
             MR_cutstack_zone_size, MR_default_handler);
     }
     c->MR_ctxt_cut_next = 0;
 
-    if (c->MR_ctxt_pnegstack_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_pnegstack_zone);
-    } else {
+    if (c->MR_ctxt_pnegstack_zone == NULL) {
         c->MR_ctxt_pnegstack_zone = MR_create_zone("pnegstack", 0,
             MR_pnegstack_size, MR_next_offset(),
             MR_pnegstack_zone_size, MR_default_handler);
@@ -178,6 +188,10 @@
   #ifdef MR_USE_MINIMAL_MODEL_OWN_STACKS
     c->MR_ctxt_owner_generator = gen;
   #endif /* MR_USE_MINIMAL_MODEL_OWN_STACKS */
+
+    c->MR_ctxt_parent_sp = NULL;
+    c->MR_ctxt_spark_stack = NULL;
+
 #endif /* !MR_HIGHLEVEL_CODE */
 
 #ifdef MR_USE_TRAIL
@@ -185,9 +199,7 @@
         MR_fatal_error("MR_init_context_maybe_generator: generator and trail");
     }
 
-    if (c->MR_ctxt_trail_zone != NULL) {
-        MR_reset_redzone(c->MR_ctxt_trail_zone);
-    } else {
+    if (c->MR_ctxt_trail_zone == NULL) {
         c->MR_ctxt_trail_zone = MR_create_zone("trail", 0,
             MR_trail_size, MR_next_offset(),
             MR_trail_zone_size, MR_default_handler);
@@ -214,6 +226,9 @@
     MR_Context  *c;
 
     MR_LOCK(&free_context_list_lock, "create_context");
+
+    MR_num_outstanding_contexts_and_sparks++;
+
     if (free_context_list == NULL) {
         MR_UNLOCK(&free_context_list_lock, "create_context i");
         c = MR_GC_NEW(MR_Context);
@@ -230,14 +245,30 @@
         MR_UNLOCK(&free_context_list_lock, "create_context ii");
     }
 
-    MR_init_context(c, id, gen);
+    MR_init_context_maybe_generator(c, id, gen);
     return c;
 }
 
 void 
 MR_destroy_context(MR_Context *c)
 {
+    MR_assert(c);
+
+#ifndef MR_HIGHLEVEL_CODE
+    MR_assert(c->MR_ctxt_spark_stack == NULL);
+#endif
+
+    /* XXX not sure if this is an overall win yet */
+#if 0 && defined(MR_CONSERVATIVE_GC) && !defined(MR_HIGHLEVEL_CODE)
+    /* Clear stacks to prevent retention of data. */
+    MR_clear_zone_for_GC(c->MR_ctxt_detstack_zone,
+        c->MR_ctxt_detstack_zone->MR_zone_min);
+    MR_clear_zone_for_GC(c->MR_ctxt_nondetstack_zone,
+        c->MR_ctxt_nondetstack_zone->MR_zone_min);
+#endif /* defined(MR_CONSERVATIVE_GC) && !defined(MR_HIGHLEVEL_CODE) */
+
     MR_LOCK(&free_context_list_lock, "destroy_context");
+    MR_num_outstanding_contexts_and_sparks--;
     c->MR_ctxt_next = free_context_list;
     free_context_list = c;
     MR_UNLOCK(&free_context_list_lock, "destroy_context");
@@ -324,7 +355,7 @@
                 && FD_ISSET(pctxt->fd, &ex_set))
             )
         {
-            MR_schedule(pctxt->context);
+            MR_schedule_context(pctxt->context);
         }
     }
 
@@ -338,10 +369,10 @@
 }
 
 void
-MR_schedule(MR_Context *ctxt)
+MR_schedule_context(MR_Context *ctxt)
 {
+    MR_LOCK(&MR_runqueue_lock, "schedule_context");
     ctxt->MR_ctxt_next = NULL;
-    MR_LOCK(&MR_runqueue_lock, "schedule");
     if (MR_runqueue_tail) {
         MR_runqueue_tail->MR_ctxt_next = ctxt;
         MR_runqueue_tail = ctxt;
@@ -361,11 +392,45 @@
         MR_BROADCAST(&MR_runqueue_cond);
     }
 #endif
-    MR_UNLOCK(&MR_runqueue_lock, "schedule");
+    MR_UNLOCK(&MR_runqueue_lock, "schedule_context");
 }
 
 #ifndef MR_HIGHLEVEL_CODE
 
+void
+MR_schedule_spark_globally(MR_Spark *spark)
+{
+    MR_LOCK(&MR_runqueue_lock, "schedule_spark_globally");
+    if (MR_spark_queue_tail) {
+        MR_spark_queue_tail->MR_spark_next = spark;
+        MR_spark_queue_tail = spark;
+    } else {
+        MR_spark_queue_head = spark;
+        MR_spark_queue_tail = spark;
+    }
+    MR_num_outstanding_contexts_and_sparks++;
+  #ifdef MR_THREAD_SAFE
+    MR_SIGNAL(&MR_runqueue_cond);
+  #endif
+    MR_UNLOCK(&MR_runqueue_lock, "schedule_spark_globally");
+}
+
+void
+MR_schedule_spark_locally(MR_Spark *spark)
+{
+    MR_Context  *ctxt;
+
+    ctxt = MR_ENGINE(MR_eng_this_context);
+
+    /* 
+    ** Only the engine running the context is allowed to access
+    ** the context's spark stack, so no locking is required here.
+    */
+    spark->MR_spark_next = ctxt->MR_ctxt_spark_stack;
+    ctxt->MR_ctxt_spark_stack = spark;
+}
+
+
 MR_define_extern_entry(MR_do_runnext);
 
 MR_BEGIN_MODULE(scheduler_module)
@@ -377,14 +442,24 @@
 {
     MR_Context      *tmp;
     MR_Context      *prev;
+    MR_Spark        *spark;
     unsigned        depth;
     MercuryThread   thd;
 
+    /*
+    ** If this engine is holding onto a context, the context should not be in
+    ** the middle of running some code.
+    */
+    assert(MR_ENGINE(MR_eng_this_context) == NULL ||
+            MR_ENGINE(MR_eng_this_context)->MR_ctxt_spark_stack == NULL);
+
     depth = MR_ENGINE(MR_eng_c_depth);
     thd = MR_ENGINE(MR_eng_owner_thread);
 
     MR_LOCK(&MR_runqueue_lock, "MR_do_runnext (i)");
 
+    MR_num_idle_engines++;
+
     while (1) {
         if (MR_exit_now == MR_TRUE) {
             /*
@@ -395,6 +470,8 @@
             MR_UNLOCK(&MR_runqueue_lock, "MR_do_runnext (ii)");
             MR_destroy_thread(MR_cur_engine());
         }
+
+        /* Search for a ready context which we can handle. */
         tmp = MR_runqueue_head;
         /* XXX check pending io */
         prev = NULL;
@@ -402,17 +479,27 @@
             if ((depth > 0 && tmp->MR_ctxt_owner_thread == thd) ||
                 (tmp->MR_ctxt_owner_thread == (MercuryThread) NULL))
             {
-                break;
+                MR_num_idle_engines--;
+                goto ReadyContext;
             }
             prev = tmp;
             tmp = tmp->MR_ctxt_next;
         }
-        if (tmp != NULL) {
-            break;
+
+        /* Check if the spark queue is nonempty. */
+        spark = MR_spark_queue_head;
+        if (spark != NULL) {
+            MR_num_idle_engines--;
+            MR_num_outstanding_contexts_and_sparks--;
+            goto ReadySpark;
         }
+
+        /* Nothing to do, go back to sleep. */
         MR_WAIT(&MR_runqueue_cond, &MR_runqueue_lock);
     }
-    MR_ENGINE(MR_eng_this_context) = tmp;
+
+  ReadyContext:
+
     if (prev != NULL) {
         prev->MR_ctxt_next = tmp->MR_ctxt_next;
     } else {
@@ -422,11 +509,39 @@
         MR_runqueue_tail = prev;
     }
     MR_UNLOCK(&MR_runqueue_lock, "MR_do_runnext (iii)");
-    MR_load_context(MR_ENGINE(MR_eng_this_context));
-    MR_GOTO(MR_ENGINE(MR_eng_this_context)->MR_ctxt_resume);
+
+    /* Discard whatever unused context we may have and switch to tmp. */
+    if (MR_ENGINE(MR_eng_this_context) != NULL) {
+        MR_destroy_context(MR_ENGINE(MR_eng_this_context));
+    }
+    MR_ENGINE(MR_eng_this_context) = tmp;
+    MR_load_context(tmp);
+    MR_GOTO(tmp->MR_ctxt_resume);
+
+  ReadySpark:
+
+    MR_spark_queue_head = spark->MR_spark_next;
+    if (MR_spark_queue_tail == spark) {
+        MR_spark_queue_tail = NULL;
+    }
+    MR_UNLOCK(&MR_runqueue_lock, "MR_do_runnext (iii)");
+
+    /* Grab a new context if we haven't got one then begin execution. */
+    if (MR_ENGINE(MR_eng_this_context) == NULL) {
+        MR_ENGINE(MR_eng_this_context) = MR_create_context("from spark", NULL);
+        MR_load_context(MR_ENGINE(MR_eng_this_context));
+    }
+    MR_parent_sp = spark->MR_spark_parent_sp;
+    MR_GOTO(spark->MR_spark_resume);
 }
 #else /* !MR_THREAD_SAFE */
 {
+    /*
+    ** We don't support actually putting things in the global spark queue
+    ** in these grades.
+    */
+    assert(MR_spark_queue_head == NULL);
+
     if (MR_runqueue_head == NULL && MR_pending_contexts == NULL) {
         MR_fatal_error("empty runqueue!");
     }
Index: runtime/mercury_context.h
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.h,v
retrieving revision 1.30
diff -u -r1.30 mercury_context.h
--- runtime/mercury_context.h	8 Mar 2006 08:35:38 -0000	1.30
+++ runtime/mercury_context.h	10 Sep 2006 13:37:23 -0000
@@ -23,10 +23,10 @@
 ** by `context_ptr'. The context contains no rN or fN registers - all
 ** registers are "context save" (by analogy to caller-save).
 **
-** When a new context is created information is passed to the new context
-** on the stack. The top stackframe of the current context is copied to
-** become the first det stackframe in the new process. (XXX this will need
-** fixing eventually to include the nondet frame as well.)
+** When a new context is created information is passed to and from the
+** new context via the stack frame of the procedure that originated the
+** parallel conjunction. The code of a parallel conjunct has access
+** to the procedure's stack frame via the `parent_sp' register.
 **
 ** Contexts can migrate transparently between multiple Posix threads.
 **
@@ -101,6 +101,9 @@
 ** pnegstack_zone   The possibly_negated_context stack zone for this context.
 ** pneg_next        The saved pneg_next for this context.
 **
+** parent_sp        The saved parent_sp for this context.
+** spark_stack      The sparks generated by this context, in a stack.
+**
 ** trail_zone       The trail zone for this context.
 ** trail_ptr        The saved MR_trail_ptr for this context.
 ** ticket_counter   The saved MR_ticket_counter for this context.
@@ -114,6 +117,8 @@
 */
 
 typedef struct MR_Context_Struct MR_Context;
+typedef struct MR_Spark_Struct MR_Spark;
+
 struct MR_Context_Struct {
     const char          *MR_ctxt_id;
     MR_Context          *MR_ctxt_next; 
@@ -146,6 +151,9 @@
   #ifdef MR_USE_MINIMAL_MODEL_OWN_STACKS
     MR_Generator        *MR_ctxt_owner_generator;
   #endif /* MR_USE_MINIMAL_MODEL_OWN_STACKS */
+
+    MR_Word             *MR_ctxt_parent_sp;
+    MR_Spark            *MR_ctxt_spark_stack;
 #endif /* !MR_HIGHLEVEL_CODE */
 
 #ifdef  MR_USE_TRAIL
@@ -162,10 +170,41 @@
 };
 
 /*
+** A spark contains just enough information to begin execution of a parallel
+** conjunct.  Sparks are allocated on the heap, and can be stored in a
+** context's spark stack, or in the global spark queue.  In the former case, a
+** spark will eventually be executed on the same context (same detstack, etc.)
+** as the code that generated the spark.  In the latter case the spark can be
+** picked up and executed by any idle engine in a different context.
+**
+** In the current implementation a spark is put on the global spark queue if,
+** at the time a fork instruction is reached, we think the spark has a chance
+** of being picked up for execution by an idle engine.  Otherwise the spark
+** goes on the context's spark stack.  This is an irrevocable decision.
+** A future possibility is to allow idle engines to steal work from the cold
+** end of some context's spark stack.
+*/
+#ifndef MR_HIGHLEVEL_CODE
+struct MR_Spark_Struct {
+    MR_Spark            *MR_spark_next;
+    MR_Code             *MR_spark_resume;
+    MR_Word             *MR_spark_parent_sp;
+};
+#endif
+
+/*
 ** The runqueue is a linked list of contexts that are runnable.
+** The spark_queue is a linked list of sparks that are runnable.
+** We keep them separate to prioritise contexts (which are mainly
+** computations which have already started) over sparks (which are
+** computations which have not begun).
 */
 extern      MR_Context  *MR_runqueue_head;
 extern      MR_Context  *MR_runqueue_tail;
+#ifndef MR_HIGHLEVEL_CODE
+  extern    MR_Spark    *MR_spark_queue_head;
+  extern    MR_Spark    *MR_spark_queue_tail;
+#endif
 #ifdef  MR_THREAD_SAFE
   extern    MercuryLock MR_runqueue_lock;
   extern    MercuryCond MR_runqueue_cond;
@@ -207,6 +246,32 @@
 #endif
 
 /*
+** The number of engines waiting for work.
+** We don't protect it with a separate lock, but updates to it are made while
+** holding the MR_runqueue_lock.  Reads are made without the lock.  We may need
+** to use atomic instructions or memory fences on some architectures.
+*/
+extern  int         MR_num_idle_engines;
+
+/*
+** The number of contexts that are not in the free list (i.e. are executing or
+** suspended) plus the number of sparks in the spark queue.  We count those
+** sparks as they can quickly accumulate on the spark queue before any of them
+** are taken up for execution.  Once they do get taken up, many contexts would
+** need to be allocated to execute them.  Sparks not on the spark queue are
+** currently guaranteed to be executed on their originating context so won't
+** cause allocation of more contexts.
+**
+** What we are mainly interested in here is preventing too many contexts from
+** being allocated, as each context is quite large and we can quickly run out
+** of memory.  Another problem is due to the context free list and conservative
+** garbage collection: every context ever allocated will be scanned.  (Getting
+** the garbage collector not to scan contexts on the free list should be
+** possible though.)
+*/
+extern  int         MR_num_outstanding_contexts_and_sparks;
+
+/*
 ** Initializes a context structure, and gives it the given id. If gen is
 ** non-NULL, the context is for the given generator.
 */
@@ -246,7 +311,19 @@
 ** Append the given context onto the end of the run queue.
 */
 
-extern  void        MR_schedule(MR_Context *ctxt);
+extern  void        MR_schedule_context(MR_Context *ctxt);
+
+#ifndef MR_HIGHLEVEL_CODE
+/*
+** Append the given spark onto the end of the spark queue.
+*/
+extern  void        MR_schedule_spark_globally(MR_Spark *spark);
+
+/*
+** Push the given spark onto the hot end of the context's spark stack.
+*/
+extern  void        MR_schedule_spark_locally(MR_Spark *spark);
+#endif /* !MR_HIGHLEVEL_CODE */
 
 #ifndef MR_HIGHLEVEL_CODE
   MR_declare_entry(MR_do_runnext);
@@ -264,33 +341,31 @@
 
 #ifndef MR_HIGHLEVEL_CODE
   /*
-  ** fork_new_context(MR_Code *child, MR_Code *parent, int numslots):
-  ** create a new context to execute the code at `child', and
-  ** copy the topmost `numslots' from the current stackframe.
-  ** The new context gets put on the runqueue, and the current
-  ** context resumes at `parent'.
+  ** fork_new_child(MR_Code *child);
+  ** create a new spark to execute the code at `child'.  The new spark is put
+  ** on the global spark queue or the context-local spark stack.  The current
+  ** context resumes at `parent'.  MR_parent_sp must already be set
+  ** appropriately before this instruction is executed.
   */
-  #define MR_fork_new_context(child, parent, numslots)          \
+  #define MR_fork_new_child(child)                              \
     do {                                                        \
-        MR_Context  *f_n_c_context;                             \
-        int         fork_new_context_i;                         \
+        MR_Spark *fnc_spark;                                    \
                                                                 \
-        f_n_c_context = MR_create_context("fork_new_context", NULL); \
-        MR_IF_MR_THREAD_SAFE(                                   \
-            f_n_c_context->MR_ctxt_owner_thread = (MercuryThread) NULL; \
-        )                                                       \
-        for (fork_new_context_i = (numslots);                   \
-            fork_new_context_i > 0;                             \
-            fork_new_context_i--)                               \
-        {                                                       \
-            f_n_c_context->MR_ctxt_sp++;                        \
-            *(f_n_c_context->MR_ctxt_sp) = MR_stackvar(fork_new_context_i); \
+        fnc_spark = MR_GC_NEW(MR_Spark);                        \
+        fnc_spark->MR_spark_resume = (child);                   \
+        fnc_spark->MR_spark_parent_sp = MR_parent_sp;           \
+        if (MR_fork_globally_criteria) {                        \
+            MR_schedule_spark_globally(fnc_spark);              \
+        } else {                                                \
+            MR_schedule_spark_locally(fnc_spark);               \
         }                                                       \
-        f_n_c_context->MR_ctxt_resume = (child);                \
-        MR_schedule(f_n_c_context);                             \
-        MR_GOTO(parent);                                        \
     } while (0)
-#endif /* MR_HIGHLEVEL_CODE */
+
+  #define MR_fork_globally_criteria                             \
+    (MR_num_idle_engines != 0 &&                                \
+    MR_num_outstanding_contexts_and_sparks < MR_max_outstanding_contexts)
+
+#endif /* !MR_HIGHLEVEL_CODE */
 
 #ifndef MR_CONSERVATIVE_GC
 
@@ -379,6 +454,9 @@
                 MR_cut_next = load_context_c->MR_ctxt_cut_next;               \
                 MR_pneg_next = load_context_c->MR_ctxt_pneg_next;             \
             )                                                                 \
+            MR_IF_THREAD_SAFE(                                                \
+                MR_parent_sp = load_context_c->MR_ctxt_parent_sp;             \
+            )                                                                 \
         )                                                                     \
         MR_IF_USE_TRAIL(                                                      \
             MR_trail_zone = load_context_c->MR_ctxt_trail_zone;               \
@@ -427,6 +505,9 @@
                 save_context_c->MR_ctxt_cut_next = MR_cut_next;               \
                 save_context_c->MR_ctxt_pneg_next = MR_pneg_next;             \
             )                                                                 \
+            MR_IF_THREAD_SAFE(                                                \
+                save_context_c->MR_ctxt_parent_sp = MR_parent_sp;             \
+            )                                                                 \
         )                                                                     \
         MR_IF_USE_TRAIL(                                                      \
             save_context_c->MR_ctxt_trail_zone = MR_trail_zone;               \
@@ -462,15 +543,22 @@
         MR_save_hp_in_context(save_context_c);                                \
     } while (0)
 
+/*
+** If you change MR_Sync_Term_Struct you need to update configure.in.
+*/
 typedef struct MR_Sync_Term_Struct MR_SyncTerm;
 struct MR_Sync_Term_Struct {
   #ifdef MR_THREAD_SAFE
     MercuryLock     MR_st_lock;
   #endif
+    MR_Context      *MR_st_orig_context;
     int             MR_st_count;
     MR_Context      *MR_st_parent;
 };
 
+#define MR_is_orig_context(ctxt, st)    \
+    ((ctxt) == (st)->MR_st_orig_context)
+
 #define MR_init_sync_term(sync_term, nbranches)                     \
     do {                                                            \
         MR_SyncTerm *st;                                            \
@@ -480,48 +568,89 @@
         MR_IF_THREAD_SAFE(                                          \
             pthread_mutex_init(&(st->MR_st_lock), MR_MUTEX_ATTR);   \
         )                                                           \
+        st->MR_st_orig_context = MR_ENGINE(MR_eng_this_context);    \
         st->MR_st_count = (nbranches);                              \
         st->MR_st_parent = NULL;                                    \
     } while (0)
 
-#define MR_join_and_terminate(sync_term)                            \
-    do {                                                            \
-        MR_SyncTerm *st;                                            \
-                                                                    \
-        st = (MR_SyncTerm *) sync_term;                             \
-        MR_assert(st != NULL);                                      \
-        MR_LOCK(&(st->MR_st_lock), "terminate");                    \
-        (st->MR_st_count)--;                                        \
-        if (st->MR_st_count == 0) {                                 \
-            MR_assert(st->MR_st_parent != NULL);                    \
-            MR_UNLOCK(&(st->MR_st_lock), "terminate i");            \
-            MR_schedule(st->MR_st_parent);                          \
-        } else {                                                    \
-            MR_assert(st->MR_st_count > 0);                         \
-            MR_UNLOCK(&(st->MR_st_lock), "terminate ii");           \
-        }                                                           \
-        MR_destroy_context(MR_ENGINE(MR_eng_this_context));         \
-        MR_runnext();                       \
+#define MR_join_and_continue(sync_term, join_label)                           \
+    do {                                                                      \
+        MR_SyncTerm *st;                                                      \
+                                                                              \
+        st = (MR_SyncTerm *) sync_term;                                       \
+        MR_assert(st != NULL);                                                \
+        MR_LOCK(&(st->MR_st_lock), "continue");                               \
+        MR_assert(st->MR_st_count > 0);                                       \
+        (st->MR_st_count)--;                                                  \
+        if (st->MR_st_count == 0) {                                           \
+            if (MR_is_orig_context(MR_ENGINE(MR_eng_this_context), st)) {     \
+                /*                                                            \
+                ** This context originated this parallel conjunction and      \
+                ** all the branches have finished so jump to the join label.  \
+                */                                                            \
+                MR_assert(st->MR_st_parent == NULL);                          \
+                MR_UNLOCK(&(st->MR_st_lock), "continue i");                   \
+                MR_GOTO(join_label);                                          \
+            } else {                                                          \
+                /*                                                            \
+                ** This context didn't originate this parallel conjunction.   \
+                ** We're the last branch to finish.  The originating          \
+                ** context should be suspended waiting for us to finish,      \
+                ** so wake it up.                                             \
+                */                                                            \
+                MR_assert(st->MR_st_parent != NULL);                          \
+                st->MR_st_parent->MR_ctxt_resume = join_label;                \
+                MR_schedule_context(st->MR_st_parent);                        \
+                MR_UNLOCK(&(st->MR_st_lock), "continue ii");                  \
+                MR_runnext();                                                 \
+            }                                                                 \
+        } else {                                                              \
+            MR_join_and_continue_2(st);                                       \
+        }                                                                     \
     } while (0)
 
-#define MR_join_and_continue(sync_term, where_to)                   \
-    do {                                                            \
-        MR_SyncTerm *st;                                            \
-                                                                    \
-        st = (MR_SyncTerm *) sync_term;                             \
-        MR_assert(st != NULL);                                      \
-        MR_LOCK(&(st->MR_st_lock), "continue");                     \
-        (st->MR_st_count)--;                                        \
-        if (st->MR_st_count == 0) {                                 \
-            MR_UNLOCK(&(st->MR_st_lock), "continue i");             \
-            MR_GOTO((where_to));                                    \
-        }                                                           \
-        MR_assert(st->MR_st_count > 0);                             \
-        MR_save_context(MR_ENGINE(MR_eng_this_context));            \
-        MR_ENGINE(MR_eng_this_context)->MR_ctxt_resume = (where_to);\
-        st->MR_st_parent = MR_ENGINE(MR_eng_this_context);          \
-        MR_UNLOCK(&(st->MR_st_lock), "continue ii");                \
-        MR_runnext();                                               \
-    } while (0)
+#define MR_join_and_continue_2(st)                                            \
+    do {                                                                      \
+        MR_Context  *ctxt;                                                    \
+        MR_Spark    *spark;                                                   \
+                                                                              \
+        ctxt = MR_ENGINE(MR_eng_this_context);                                \
+        spark = ctxt->MR_ctxt_spark_stack;                                    \
+        if (spark && (spark->MR_spark_parent_sp == MR_parent_sp)) {           \
+            /*                                                                \
+            ** The spark at the top of the stack is due to the same parallel  \
+            ** conjunction that we've just been executing. We can immediately \
+            ** execute the next branch of the same parallel conjunction in    \
+            ** the current context.                                           \
+            */                                                                \
+            MR_UNLOCK(&(st->MR_st_lock), "continue_2 i");                     \
+            ctxt->MR_ctxt_spark_stack = spark->MR_spark_next;                 \
+            MR_GOTO(spark->MR_spark_resume);                                  \
+        } else {                                                              \
+            /*                                                                \
+            ** The spark stack is empty or the next spark is from a different \
+            ** parallel conjunction to the one we've been executing.  Either  \
+            ** way, there's nothing more we can do with this context right    \
+            ** now.                                                           \
+            **                                                                \
+            ** If this context originated the parallel conjunction we've been \
+            ** executing, the rest of the parallel conjunction must have been \
+            ** put on the global spark queue to be executed in other          \
+            ** contexts.  This context will need to be resumed once the       \
+            ** parallel conjunction is completed, so suspend the context and  \
+            ** save the address in the sync term.                             \
+            **                                                                \
+            ** Finally look for other work.                                   \
+            */                                                                \
+            if (MR_is_orig_context(MR_ENGINE(MR_eng_this_context), st)) {     \
+                MR_save_context(MR_ENGINE(MR_eng_this_context));              \
+                MR_assert(st->MR_st_parent == NULL);                          \
+                st->MR_st_parent = ctxt;                                      \
+                MR_ENGINE(MR_eng_this_context) = NULL;                        \
+            }                                                                 \
+            MR_UNLOCK(&(st->MR_st_lock), "continue_2 ii");                    \
+            MR_runnext();                                                     \
+        }                                                                     \
+    } while (0)                                                               \
 
 #endif /* not MERCURY_CONTEXT_H */
Index: runtime/mercury_engine.c
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_engine.c,v
retrieving revision 1.51
diff -u -r1.51 mercury_engine.c
--- runtime/mercury_engine.c	28 Sep 2005 08:31:05 -0000	1.51
+++ runtime/mercury_engine.c	11 Sep 2006 00:40:14 -0000
@@ -142,10 +142,10 @@
 #endif
 
 	/*
-	** Finally, allocate an initial context (Mercury thread)
-	** in the engine and initialize the per-context stuff.
+	** Don't allocate a context for this engine until it is actually
+	** needed.
 	*/
-	eng->MR_eng_this_context = MR_create_context("main", NULL);
+	eng->MR_eng_this_context = NULL;
 }
 
 /*---------------------------------------------------------------------------*/
@@ -156,7 +156,9 @@
 	** XXX there are lots of other resources in MercuryEngine that
 	** might need to be finalized.  
 	*/
-	MR_destroy_context(eng->MR_eng_this_context);
+	if (eng->MR_eng_this_context) {
+		MR_destroy_context(eng->MR_eng_this_context);
+	}
 }
 
 /*---------------------------------------------------------------------------*/
@@ -475,19 +477,18 @@
 	*/
 #ifdef	MR_THREAD_SAFE
 	MR_ENGINE(MR_eng_c_depth)++;
-{
-	MercuryThreadList *new_element;
-
-	new_element = MR_GC_NEW(MercuryThreadList);
-	new_element->thread =
-		MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread;
-	new_element->next = MR_ENGINE(MR_eng_saved_owners);
-	MR_ENGINE(MR_eng_saved_owners) = new_element;
-}
+	if (MR_ENGINE(MR_eng_this_context)) {
+		MercuryThreadList *new_element;
 
-	MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread =
-		MR_ENGINE(MR_eng_owner_thread);
+		new_element = MR_GC_NEW(MercuryThreadList);
+		new_element->thread =
+			MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread;
+		new_element->next = MR_ENGINE(MR_eng_saved_owners);
+		MR_ENGINE(MR_eng_saved_owners) = new_element;
 
+		MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread =
+			MR_ENGINE(MR_eng_owner_thread);
+	}
 #endif
 
 	/*
@@ -504,25 +505,25 @@
 	** the current context.
 	*/
 #ifdef	MR_THREAD_SAFE
-
-	assert(MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread
-		== MR_ENGINE(MR_eng_owner_thread));
-	MR_ENGINE(MR_eng_c_depth)--;
-{
-	MercuryThreadList *tmp;
-	MercuryThread val;
-
-	tmp = MR_ENGINE(MR_eng_saved_owners);
-	if (tmp != NULL)
-	{
-		val = tmp->thread;
-		MR_ENGINE(MR_eng_saved_owners) = tmp->next;
-		MR_GC_free(tmp);
-	} else {
-		val = 0;
+	if (MR_ENGINE(MR_eng_this_context)) {
+		MercuryThreadList *tmp;
+		MercuryThread val;
+
+		assert(MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread
+			== MR_ENGINE(MR_eng_owner_thread));
+		MR_ENGINE(MR_eng_c_depth)--;
+
+		tmp = MR_ENGINE(MR_eng_saved_owners);
+		if (tmp != NULL)
+		{
+			val = tmp->thread;
+			MR_ENGINE(MR_eng_saved_owners) = tmp->next;
+			MR_GC_free(tmp);
+		} else {
+			val = 0;
+		}
+		MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread = val;
 	}
-	MR_ENGINE(MR_eng_this_context)->MR_ctxt_owner_thread = val;
-}
 #endif
 
 	MR_debugmsg1("in label `engine_done', locals at %p\n", locals);
Index: runtime/mercury_engine.h
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_engine.h,v
retrieving revision 1.41
diff -u -r1.41 mercury_engine.h
--- runtime/mercury_engine.h	5 Sep 2006 04:23:21 -0000	1.41
+++ runtime/mercury_engine.h	10 Sep 2006 09:20:00 -0000
@@ -359,6 +359,9 @@
     MR_Word             *MR_eng_sol_hp;
     MR_Word             *MR_eng_global_hp;
 #endif
+#ifdef  MR_THREAD_SAFE
+    MR_Word             *MR_eng_parent_sp;
+#endif
     MR_Context          *MR_eng_this_context;
     MR_Context          MR_eng_context;
 #ifdef  MR_USE_MINIMAL_MODEL_OWN_STACKS
Index: runtime/mercury_regs.c
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_regs.c,v
retrieving revision 1.6
diff -u -r1.6 mercury_regs.c
--- runtime/mercury_regs.c	7 Jul 2004 07:11:15 -0000	1.6
+++ runtime/mercury_regs.c	10 Sep 2006 12:28:11 -0000
@@ -69,6 +69,8 @@
 	printf(MR_COUNT_FORMAT, MR_num_uses[MR_PNEG_NEXT_SLOT]);
 	printf("MR_pneg_stack");
 	printf(MR_COUNT_FORMAT, MR_num_uses[MR_PNEG_STACK_SLOT]);
+	printf("MR_parent_sp");
+	printf(MR_COUNT_FORMAT, MR_num_uses[MR_PARENT_SP_SLOT]);
 }
 #endif	/* MR_MEASURE_REGISTER_USAGE */
 
@@ -126,6 +128,8 @@
 	printf(MR_VERIFY_FORMAT, MR_PNEG_NEXT_SLOT);
 	printf("MR_pneg_stack");
 	printf(MR_VERIFY_FORMAT, MR_PNEG_STACK_SLOT);
+	printf("MR_parent_sp");
+	printf(MR_VERIFY_FORMAT, MR_PARENT_SP_SLOT);
 }
 #endif	/* MR_VERIFY_FAKE_REGISTERS */
 
Index: runtime/mercury_regs.h
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_regs.h,v
retrieving revision 1.27
diff -u -r1.27 mercury_regs.h
--- runtime/mercury_regs.h	17 Mar 2006 03:25:54 -0000	1.27
+++ runtime/mercury_regs.h	10 Sep 2006 13:01:05 -0000
@@ -228,8 +228,14 @@
 **	MR_pneg_next		if minimal model is enabled
 **	MR_pneg_stack		if minimal model is enabled
 **
+**	MR_parent_sp		if parallelism is enabled
+**
 **   The number of these registers is given by MR_NUM_SPECIAL_GLOBAL_REG.
 **
+**   XXX in parallel grade we cannot use global variables for these registers.
+**   They need to be fields in the Mercury engine structure.  We already
+**   do this for MR_parent_sp, but incompletely for a few of the others.
+**
 ** The Mercury abstract machine registers layer also provides MR_virtual_r(),
 ** MR_virtual_succip, MR_virtual_hp, etc., which are similar to mr<N>,
 ** MR_succip, MR_hp, etc. except that they always map to the underlying
@@ -239,7 +245,7 @@
 #define MR_MAX_REAL_R_REG		32
 #define MR_MAX_VIRTUAL_R_REG		1024
 #define	MR_NUM_SPECIAL_MAYBE_REAL_REG	6
-#define MR_NUM_SPECIAL_GLOBAL_REG	13
+#define MR_NUM_SPECIAL_GLOBAL_REG	14
 #define MR_NUM_SPECIAL_REG		(MR_NUM_SPECIAL_MAYBE_REAL_REG + \
 					MR_NUM_SPECIAL_GLOBAL_REG)
 
@@ -501,6 +507,8 @@
 #define	MR_PNEG_STACK_SLOT		(MR_FIRST_UNREAL_SLOT + 11)
 #define	MR_PNEG_NEXT_SLOT		(MR_FIRST_UNREAL_SLOT + 12)
 
+#define	MR_PARENT_SP_SLOT		(MR_FIRST_UNREAL_SLOT + 13)
+
 #define	MR_FIRST_UNREAL_R_SLOT		(MR_FIRST_UNREAL_SLOT + \
 					MR_NUM_SPECIAL_GLOBAL_REG)
 
@@ -554,6 +562,9 @@
 #define MR_pneg_stack		MR_count_usage(MR_CUT_STACK_SLOT,	\
 					MR_pneg_stack_var)
 
+#define MR_parent_sp		MR_count_usage(MR_PARENT_SP_SLOT,	\
+					MR_ENGINE(MR_eng_parent_sp))
+
 #define MR_saved_succip_word(save_area)		(save_area[MR_SI_SLOT])
 #define MR_saved_hp_word(save_area)		(save_area[MR_HP_SLOT])
 #define MR_saved_sp_word(save_area)		(save_area[MR_SP_SLOT])
@@ -579,6 +590,8 @@
 #define MR_saved_pneg_next_word(save_area)	(save_area[MR_PNEG_NEXT_SLOT])
 #define MR_saved_pneg_stack_word(save_area)	(save_area[MR_PNEG_STACK_SLOT])
 
+#define MR_saved_parent_sp_word(save_area)	(save_area[MR_PARENT_SP_SLOT])
+
 #define MR_saved_succip(save_area)					\
 	((MR_Code *) MR_saved_succip_word(save_area))
 #define MR_saved_hp(save_area)						\
@@ -619,6 +632,9 @@
 #define MR_saved_pneg_stack(save_area)					\
 	((MR_PNegStackFrame *) MR_saved_pneg_stack_word(save_area))
 
+#define MR_saved_parent_sp(save_area)					\
+	((MR_Word *) MR_saved_parent_sp_word(save_area))
+
 /*
 ** MR_virtual_reg_value(n) accesses the underlying fake_reg for general
 ** register n, while MR_virtual_reg_assign assigns to it.
@@ -666,6 +682,8 @@
 #define MR_virtual_pneg_next		MR_saved_pneg_next(MR_fake_reg)
 #define MR_virtual_pneg_stack		MR_saved_pneg_stack(MR_fake_reg)
 
+#define MR_virtual_parent_sp		MR_saved_parent_sp(MR_fake_reg)
+
 #ifdef	MR_USE_TRAIL
   #define MR_save_trail_registers()					\
 	do {								\
@@ -717,6 +735,21 @@
   #define MR_restore_mm_registers()	((void) 0)
 #endif
 
+#if !defined(MR_HIGHLEVEL_CODE) && defined(MR_THREAD_SAFE)
+  #define MR_save_par_registers()                                       \
+	do {                                                            \
+		MR_saved_parent_sp_word(MR_fake_reg) = (MR_Word)        \
+			MR_parent_sp;                                   \
+	} while (0)
+  #define MR_restore_par_registers()					\
+	do {								\
+		MR_parent_sp = MR_virtual_parent_sp;			\
+	} while (0)
+#else
+  #define MR_save_par_registers()	((void) 0)
+  #define MR_restore_par_registers()	((void) 0)
+#endif
+
 /*
 ** The MR_save_registers() macro copies the physical machine registers
 ** and the global variables holding special purpose abstract machine registers
@@ -741,6 +774,7 @@
 			MR_global_hp;					\
 		MR_save_trail_registers();				\
 		MR_save_mm_registers();					\
+		MR_save_par_registers();				\
 	} while (0)
 
 #define MR_restore_registers() 						\
@@ -752,6 +786,7 @@
 		MR_global_hp = MR_virtual_global_hp;			\
 		MR_restore_trail_registers();				\
 		MR_restore_mm_registers();				\
+		MR_restore_par_registers();				\
 	} while (0)
 
 /*
Index: runtime/mercury_stacks.h
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_stacks.h,v
retrieving revision 1.53
diff -u -r1.53 mercury_stacks.h
--- runtime/mercury_stacks.h	25 Nov 2005 05:40:47 -0000	1.53
+++ runtime/mercury_stacks.h	10 Sep 2006 07:51:39 -0000
@@ -111,6 +111,7 @@
 #define MR_based_stackvar(base_sp, n)   ((base_sp)[1 - (n)])
 #define MR_stackvar(n)                  MR_based_stackvar(MR_sp, (n))
 #define MR_sv(n)                        MR_stackvar(n)
+#define MR_parent_sv(n)                 MR_based_stackvar(MR_parent_sp, (n))
 
 #define MR_incr_sp(n)                                                         \
     do {                                                                      \
Index: runtime/mercury_thread.c
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_thread.c,v
retrieving revision 1.25
diff -u -r1.25 mercury_thread.c
--- runtime/mercury_thread.c	5 Jul 2006 03:00:43 -0000	1.25
+++ runtime/mercury_thread.c	10 Sep 2006 06:03:25 -0000
@@ -105,7 +105,6 @@
     MR_restore_registers();
 #endif
     MR_load_engine_regs(MR_cur_engine());
-    MR_load_context(MR_ENGINE(MR_eng_this_context));
 
     MR_save_registers();
 
@@ -125,6 +124,11 @@
             return MR_FALSE;
 
         case MR_use_now :
+            if (MR_ENGINE(MR_eng_this_context) == NULL) {
+                MR_ENGINE(MR_eng_this_context) =
+                    MR_create_context("init_thread", NULL);
+            }
+            MR_load_context(MR_ENGINE(MR_eng_this_context));
             return MR_TRUE;
         
         default:
Index: runtime/mercury_wrapper.c
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_wrapper.c,v
retrieving revision 1.165
diff -u -r1.165 mercury_wrapper.c
--- runtime/mercury_wrapper.c	28 Aug 2006 10:13:14 -0000	1.165
+++ runtime/mercury_wrapper.c	10 Sep 2006 13:15:31 -0000
@@ -166,6 +166,10 @@
 /* primary cache size to optimize for, in bytes */
 size_t      MR_pcache_size =            8192;
 
+/* soft limits on the number of contexts we can create */
+MR_Unsigned MR_max_contexts_per_thread = 2;
+MR_Unsigned MR_max_outstanding_contexts;
+
 /* file names for mdb's debugger I/O streams */
 const char  *MR_mdb_in_filename = NULL;
 const char  *MR_mdb_out_filename = NULL;
@@ -571,6 +575,9 @@
         for (i = 1 ; i < MR_num_threads ; i++) {
             MR_create_thread(NULL);
         }
+        while (MR_num_idle_engines < MR_num_threads-1) {
+            /* busy wait until the worker threads are ready */
+        }
     }
   #endif /* ! MR_THREAD_SAFE */
 #endif /* ! MR_HIGHLEVEL_CODE */
@@ -664,7 +671,7 @@
     */
     MR_restore_regs_from_mem(c_regs);
 
-} /* end runtime_mercury_init() */
+} /* end mercury_runtime_init() */
 
 #ifdef MR_CONSERVATIVE_GC
 
@@ -1028,6 +1035,7 @@
     MR_GEN_DETSTACK_REDZONE_SIZE_KWORDS,
     MR_GEN_NONSTACK_REDZONE_SIZE,
     MR_GEN_NONSTACK_REDZONE_SIZE_KWORDS,
+    MR_MAX_CONTEXTS_PER_THREAD,
     MR_MDB_TTY,
     MR_MDB_IN,
     MR_MDB_OUT,
@@ -1098,6 +1106,7 @@
     { "gen-nonstack-zone-size",         1, 0, MR_GEN_NONSTACK_REDZONE_SIZE },
     { "gen-nonstack-zone-size-kwords",
             1, 0, MR_GEN_NONSTACK_REDZONE_SIZE_KWORDS },
+    { "max-contexts-per-thread",        1, 0, MR_MAX_CONTEXTS_PER_THREAD },
     { "mdb-tty",                        1, 0, MR_MDB_TTY },
     { "mdb-in",                         1, 0, MR_MDB_IN },
     { "mdb-out",                        1, 0, MR_MDB_OUT },
@@ -1424,6 +1433,14 @@
                 MR_gen_nonstack_zone_size = size * sizeof(MR_Word);
                 break;
 
+            case MR_MAX_CONTEXTS_PER_THREAD:
+                if (sscanf(MR_optarg, "%lu", &size) != 1) {
+                    usage();
+                }
+
+                MR_max_contexts_per_thread = size;
+                break;
+
             case 'i':
             case MR_MDB_IN:
                 MR_mdb_in_filename = MR_copy_string(MR_optarg);
@@ -1720,6 +1737,8 @@
         }
     }
 
+    MR_max_outstanding_contexts = MR_max_contexts_per_thread * MR_num_threads;
+
     if (MR_lld_print_min > 0 || MR_lld_start_name != NULL) {
         MR_lld_print_enabled = 0;
     }
@@ -2301,17 +2320,14 @@
         MR_table_report_statistics(stdout);
     }
 
-#ifndef MR_HIGHLEVEL_CODE
-  #ifdef MR_THREAD_SAFE
+#if !defined(MR_HIGHLEVEL_CODE) && defined(MR_THREAD_SAFE)
+    MR_LOCK(&MR_runqueue_lock, "exit_now");
     MR_exit_now = MR_TRUE;
     pthread_cond_broadcast(&MR_runqueue_cond);
+    MR_UNLOCK(&MR_runqueue_lock, "exit_now");
 
     assert(MR_primordial_thread == pthread_self());
     MR_primordial_thread = (MercuryThread) 0;
-
-    /* XXX seems to be needed or short programs may have no output */
-    fflush(stdout);
-  #endif
 #endif
 
 #if 0 /* XXX the following code breaks on Win32 */
Index: runtime/mercury_wrapper.h
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_wrapper.h,v
retrieving revision 1.74
diff -u -r1.74 mercury_wrapper.h
--- runtime/mercury_wrapper.h	7 Aug 2006 06:21:32 -0000	1.74
+++ runtime/mercury_wrapper.h	10 Sep 2006 05:45:56 -0000
@@ -225,6 +225,15 @@
 /* heap expansion factor for accurate GC (see mercury_accurate_gc.c) */
 extern  double		MR_heap_expansion_factor;
 
+/* number of outstanding contexts we can create per thread (soft limit) */
+extern	MR_Unsigned	MR_contexts_per_thread;
+
+/*
+** number of outstanding contexts we can create 
+** (MR_contexts_per_thread * MR_num_threads)
+*/
+extern	MR_Unsigned	MR_max_outstanding_contexts;
+
 /* file names for the mdb debugging streams */
 extern	const char	*MR_mdb_in_filename;
 extern	const char	*MR_mdb_out_filename;
Index: scripts/Mercury.config.in
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/scripts/Mercury.config.in,v
retrieving revision 1.15
diff -u -r1.15 Mercury.config.in
--- scripts/Mercury.config.in	18 Aug 2006 06:50:37 -0000	1.15
+++ scripts/Mercury.config.in	8 Sep 2006 11:13:45 -0000
@@ -109,6 +109,7 @@
 		--conf-low-tag-bits "@LOW_TAG_BITS@" \
 		--bits-per-word "@BITS_PER_WORD@" \
 		--bytes-per-word "@BYTES_PER_WORD@" \
+		--sync-term-size "@SYNC_TERM_SIZE@" \
 		--dotnet-library-version "@MS_DOTNET_LIBRARY_VERSION@" \
 		@HAVE_DELAY_SLOT@ \
 		@HAVE_BOXED_FLOATS@ \
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list