[m-rev.] for review: parallel execution mechanism (1/2)

Peter Wang wangp at students.csse.unimelb.edu.au
Tue Sep 12 17:11:40 AEST 2006


Estimated hours taken: 100
Branches: main

This patch changes the parallel execution mechanism in the low level backend.
The main idea is that, even in programs with only moderate parallelism, we
won't have enough processors to exploit it all.  We should try to reduce the
cost in the common case, i.e. when a parallel conjunction gets executed
sequentially.  This patch does two things along those lines:

(1) Instead of unconditionally executing all parallel conjuncts (but the last)
in separate Mercury contexts, we allow a context to continue execution of the
next conjunct of a parallel conjunction if it has just finished executing the
previous conjunct.  This saves on allocating unnecessary contexts, which can
be a big reduction in memory usage.

Notice we also try to execute conjuncts left-to-right so as to minimise the
need to suspend contexts when there are dependencies between conjuncts.

(2) Conjuncts that *are* executed in parallel still need separate contexts.
We used to pass variable bindings to those conjuncts by flushing input
variable values to stack slots and copying the procedure's stack frame to the
new context.  When the conjunct finished, we would copy new variable bindings
back to stack slots in the original context.

What happens now is that we don't do any copying back and forth.  We introduce
a new abstract machine register `parent_sp' which points to the location of
the stack pointer at the time that a parallel conjunction began.  In parallel
conjuncts we refer to all stack slots via the `parent_sp' pointer, since we
could be running on a different context altogether and `sp' would be pointing
into a new detstack.  Since parallel conjuncts now share the procedure's stack
frame, we have to allocate stack slots such that all parallel conjuncts in a
procedure that could be executing simultaneously have distinct sets of stack
slots.  We currently use the simplest possible strategy, i.e. don't allow
variables in parallel conjuncts to reuse stack slots.

Note: in effect parent_sp is a frame pointer which is only set for and used by
the code of parallel conjuncts.  We don't call it a frame pointer as it can be
confused with "frame variables" have to do with the nondet stack.


compiler/code_info.m:
	Add functionality to keep track of how deep inside of nested parallel
	conjunctions the code generator is.

	Add functionality to acquire and release "persistent" temporary stack
	slots.  Unlike normal temporary stack slots, these don't get implicitly
	released when the code generator's location-dependent state is reset.

	Conform to additions of `parent_sp' and parent stack variables.

compiler/exprn_aux.m:
	Generalise the `substitute_lval_in_*' predicates by
	`transform_lval_in_*' predicates.  Instead of performing a fixed
	substitution, these take a higher order predicate which performs some
	operation on each lval.  Redefine the substitution predicates in terms
	of the transformation predicates.

	Conform to changes in `fork', `join_and_terminate' and
	`join_and_continue' instructions.

	Conform to additions of `parent_sp' and parent stack variables.

	Remove `substitute_rval_in_args' and `substitute_rval_in_arg' which
	were unused.

compiler/live_vars.m:
	Introduce a new type `parallel_stackvars' which is threaded through
	`build_live_sets_in_goal'.  We accumulate the sets of variables which
	are assigned stack slots in each parallel conjunct.  At the end of
	processing a parallel conjunction, use this information to force
	variables which are assigned stack slots to have distinct slots.

compiler/llds.m:
	Change the semantics of the `fork' instruction.  It now takes a single
	argument: the label of the next conjunct after the current one.  The
	instruction now "sparks" the next conjunct to be run, either in a
	different context (possibly in parallel, on another Mercury engine) or
	is queued to be executed in the current context after the current
	conjunct is finished.

	Change the semantics of the `join_and_continue' instruction.  This
	instruction now serves to end all parallel conjuncts, not just the
	last one in a parallel conjunction.

	Remove the `join_and_terminate' instruction (no longer used).

	Add the new abstract machine register `parent_sp'.

	Introduce "parent stack slots", which are the same as normal stack
	slots but relative to the `parent_sp' register.

compiler/par_conj_gen.m:
	Change the code generated for parallel conjunctions.  That is:

	- use the new `fork' instruction at the beginning of a parallel
	  conjunct;

	- use the `join_and_continue' instruction at the end of all parallel
	  conjuncts;

	- keep track of how deep the code generator currently is in parallel
	  conjunctions;

	- set and restore the `parent_sp' register when entering a non-nested
	  parallel conjunction;

	- after generating the code of a parallel conjunct, replace all
	  references to stack slots by parent stack slots;

	- remove code to copy back output variables when a parallel conjunct
	  finishes.

	Update some comments.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Add the type `MR_Spark'.  Sparks are allocated on the heap and contain
	enough information to begin execution of a single parallel conjunct.

	Add globals `MR_spark_queue_head' and `MR_spark_queue_tail'.  These
	are pointers to the start and end of a global queue of sparks.  Idle
	engines can pick up work from this queue in the same way that they can
	pick up work from the global context queue (the "run queue").

	Add new fields to the MR_Context structure.  `MR_ctxt_parent_sp' is a
	saved copy of the `parent_sp' register for when the context is
	suspended.  `MR_ctxt_spark_stack' is a stack of sparks that we decided
	not to put on the global spark queue.

	Update `MR_load_context' and `MR_save_context' to save and restore
	`MR_ctxt_parent_sp'.

	Add the counters `MR_num_idle_engines' and
	`MR_num_outstanding_contexts_and_sparks'.  These are used to decide,
	when a `fork' instruction is reached, whether a spark should be put on
	the global spark queue (with potential for parallelism but also more
	overhead) or on the calling context's spark stack (no parallelism and
	less overhead).

	Rename `MR_init_context' to `MR_init_context_maybe_generator'.  When
	initialising contexts, don't reset redzones of already allocated
	stacks.  It seems to be unnecessary (and the reset implementation is
	buggy anyway, though it's fine on Linux).

	Rename `MR_schedule' to `MR_schedule_context'.  Add new functions
	`MR_schedule_spark_globally' and `MR_schedule_spark_locally'.

	In `MR_do_runnext', add code for idle engines to get work from the
	global spark queue.  Resuming contexts are prioritised over sparks.

	Rename `MR_fork_new_context' to `MR_fork_new_child'.  Change the
	definitions of `MR_fork_new_child' and `MR_join_and_continue' as per
	the new behaviour of the `fork' and `join_and_continue' instructions.
	Delete `MR_join_and_terminate'.

	Add a new field `MR_st_orig_context' to the MR_SyncTerm structure to
	record which context originated the parallel conjunction instance
	represented by a MR_SyncTerm instance, and update `MR_init_sync_term'.
	This is needed by the new behaviour of `MR_join_and_continue'.

	Update some comments.

runtime/mercury_engine.h:
runtime/mercury_regs.c:
runtime/mercury_regs.h:
runtime/mercury_stacks.h:
	Add the abstract machine register `parent_sp' and code to copy it to
	and from the fake_reg array.

	Add a macro `MR_parent_sv' to access stack slots via `parent_sp'.

	Add `MR_eng_parent_sp' to the MercuryEngine structure.

runtime/mercury_wrapper.c:
runtime/mercury_wrapper.h:
	Add Mercury runtime option `--max-contexts-per-thread' which is saved
	in the global variable `MR_max_contexts_per_thread'.  The number
	`MR_max_outstanding_contexts' is derived from this.  It sets a soft
	limit on the number of sparks we put in the global spark queue,
	relative to the number of threads we are running.  We don't want to
	put too many sparks on the global queue if there are plenty of ready
	contexts or sparks already on the global queues, as they are likely to
	result in new contexts being allocated.

	When initially creating worker engines, wait until all the worker
	engines have acknowledged that they are idle before continuing.  This
	is mainly so programs (especially benchmarks and test cases) with only
	a few fork instructions near the beginning of the program don't
	execute the forks before any worker engines are ready, resulting in no
	parallelism.

runtime/mercury_engine.c:
runtime/mercury_thread.c:
	Don't allocate a context at the time a Mercury engine is created.  An
	engine only needs a new context when it is about to pick up a spark.

configure.in:
compiler/options.m:
scripts/Mercury.config.in:
	Update to reflect the extra field in MR_SyncTerm.

	Add the option `--sync-term-size' and actually make use the result of
	the sync term size calculated during configuration.

compiler/code_util.m:
compiler/continuation_info.m:
compiler/dupelim.m:
compiler/dupproc.m:
compiler/global_data.m:
compiler/hlds_llds.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/reassign.m:
compiler/stack_layout.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
	Conform to changes in `fork', `join_and_terminate' and
	`join_and_continue' instructions.

	Conform to additions of `parent_sp' and parent stack variables.

	XXX not sure about the changes in stack_layout.m

library/par_builtin.m:
	Conform to changes in the runtime system.


Index: configure.in
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/configure.in,v
retrieving revision 1.471
diff -u -r1.471 configure.in
--- configure.in	10 Sep 2006 23:38:53 -0000	1.471
+++ configure.in	11 Sep 2006 05:07:10 -0000
@@ -1665,6 +1665,7 @@
 	int main() {
 		struct {
 			pthread_mutex_t lock;
+			void		*orig_context;
 			int		count;
 			void		*parent;
 		} x;
Index: compiler/code_info.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/code_info.m,v
retrieving revision 1.330
diff -u -r1.330 code_info.m
--- compiler/code_info.m	6 Sep 2006 04:02:55 -0000	1.330
+++ compiler/code_info.m	11 Sep 2006 05:04:47 -0000
@@ -167,6 +167,14 @@
     %
 :- pred set_instmap(instmap::in, code_info::in, code_info::out) is det.
 
+    % The depth of nested parallel conjunctions.
+    %
+:- pred get_par_conj_depth(code_info::in, int::out) is det.
+
+    % Set the depth of nested parallel conjunctions.
+    %
+:- pred set_par_conj_depth(int::in, code_info::in, code_info::out) is det.
+
     % The number of the last local label allocated.
     %
 :- pred get_label_counter(code_info::in, counter::out) is det.
@@ -252,6 +260,11 @@
 :- pred set_temp_content_map(map(lval, slot_contents)::in,
     code_info::in, code_info::out) is det.
 
+:- pred get_persistent_temps(code_info::in, set(lval)::out) is det.
+
+:- pred set_persistent_temps(set(lval)::in,
+    code_info::in, code_info::out) is det.
+
 :- pred set_closure_layouts(list(layout_data)::in,
     code_info::in, code_info::out) is det.
 
@@ -363,8 +376,14 @@
                                     % fields below. Any keys in that map which
                                     % are not in this set are free for reuse.
 
-                fail_info           :: fail_info
+                fail_info           :: fail_info,
                                     % Information about how to manage failures.
+
+                par_conj_depth      :: int
+                                    % How deep in a nested parallel conjunction
+                                    % we are. This is zero at the beginning of
+                                    % a procedure and increments as we enter
+                                    % parallel conjunctions.
             ).
 
 :- type code_info_persistent
@@ -399,6 +418,11 @@
                                     % slot contains after the end of the
                                     % branched control structure.
 
+                persistent_temps    :: set(lval),
+                                    % Stack slot locations that should not be
+                                    % released event when the code generator
+                                    % resets its location-dependent state.
+
                 closure_layout_seq :: counter,
 
                 closure_layouts     :: list(layout_data),
@@ -461,6 +485,7 @@
     DummyFailInfo = fail_info(ResumePoints, resume_point_unknown,
         may_be_different, not_inside_non_condition, Hijack),
     map.init(TempContentMap),
+    set.init(PersistentTemps),
     set.init(TempsInUse),
     set.init(Zombies),
     map.init(LayoutMap),
@@ -502,7 +527,8 @@
             Zombies,
             VarLocnInfo,
             TempsInUse,
-            DummyFailInfo   % init_fail_info will override this dummy value
+            DummyFailInfo,  % init_fail_info will override this dummy value
+            0               % nested parallel conjunction depth
         ),
         code_info_persistent(
             counter.init(1),
@@ -510,6 +536,7 @@
             LayoutMap,
             0,
             TempContentMap,
+            PersistentTemps,
             counter.init(1),
             [],
             -1,
@@ -557,11 +584,13 @@
 get_var_locn_info(CI, CI ^ code_info_loc_dep ^ var_locn_info).
 get_temps_in_use(CI, CI ^ code_info_loc_dep ^ temps_in_use).
 get_fail_info(CI, CI ^ code_info_loc_dep ^ fail_info).
+get_par_conj_depth(CI, CI ^ code_info_loc_dep ^ par_conj_depth).
 get_label_counter(CI, CI ^ code_info_persistent ^ label_num_src).
 get_succip_used(CI, CI ^ code_info_persistent ^ store_succip).
 get_layout_info(CI, CI ^ code_info_persistent ^ label_info).
 get_max_temp_slot_count(CI, CI ^ code_info_persistent ^ stackslot_max).
 get_temp_content_map(CI, CI ^ code_info_persistent ^ temp_contents).
+get_persistent_temps(CI, CI ^ code_info_persistent ^ persistent_temps).
 get_closure_seq_counter(CI, CI ^ code_info_persistent ^ closure_layout_seq).
 get_closure_layouts(CI, CI ^ code_info_persistent ^ closure_layouts).
 get_max_reg_in_use_at_trace(CI, CI ^ code_info_persistent ^ max_reg_used).
@@ -579,12 +608,14 @@
 set_var_locn_info(EI, CI, CI ^ code_info_loc_dep ^ var_locn_info := EI).
 set_temps_in_use(TI, CI, CI ^ code_info_loc_dep ^ temps_in_use := TI).
 set_fail_info(FI, CI, CI ^ code_info_loc_dep ^ fail_info := FI).
+set_par_conj_depth(N, CI, CI ^ code_info_loc_dep ^ par_conj_depth := N).
 set_label_counter(LC, CI, CI ^ code_info_persistent ^ label_num_src := LC).
 set_succip_used(SU, CI, CI ^ code_info_persistent ^ store_succip := SU).
 set_layout_info(LI, CI, CI ^ code_info_persistent ^ label_info := LI).
 set_max_temp_slot_count(TM, CI,
     CI ^ code_info_persistent ^ stackslot_max := TM).
 set_temp_content_map(CM, CI, CI ^ code_info_persistent ^ temp_contents := CM).
+set_persistent_temps(PT, CI, CI ^ code_info_persistent ^ persistent_temps := PT).
 set_closure_seq_counter(CLS, CI,
     CI ^ code_info_persistent ^ closure_layout_seq := CLS).
 set_closure_layouts(CG, CI, CI ^ code_info_persistent ^ closure_layouts := CG).
@@ -1034,9 +1065,14 @@
 remember_position(CI, position_info(CI)).
 
 reset_to_position(position_info(PosCI), CurCI, NextCI) :-
-    PosCI  = code_info(_, LocDep, _),
-    CurCI  = code_info(Static, _, Persistent),
-    NextCI = code_info(Static, LocDep, Persistent).
+    PosCI   = code_info(_, LocDep, _),
+    CurCI   = code_info(Static, _, Persistent),
+    NextCI0 = code_info(Static, LocDep, Persistent),
+
+    get_persistent_temps(NextCI0, PersistentTemps),
+    get_temps_in_use(NextCI0, TempsInUse0),
+    set.union(PersistentTemps, TempsInUse0, TempsInUse),
+    set_temps_in_use(TempsInUse, NextCI0, NextCI).
 
 reset_resume_known(BranchStart, !CI) :-
     BranchStart = position_info(BranchStartCI),
@@ -3352,6 +3388,8 @@
     ;
         AbsLocn = abs_stackvar(_)
     ;
+        AbsLocn = abs_parent_stackvar(_)
+    ;
         AbsLocn = abs_framevar(_)
     ).
 
@@ -3437,6 +3475,7 @@
     ;
         (
             ( Lval = stackvar(N)
+            ; Lval = parent_stackvar(N)
             ; Lval = framevar(N)
             ),
             N < 0
@@ -3714,6 +3753,18 @@
     %
 :- pred release_temp_slot(lval::in, code_info::in, code_info::out) is det.
 
+    % Acquire a stack slot for storing a temporary. The stack slot is not
+    % implicitly released when the code generator resets its location-dependent
+    % state. The slot_contents description is for accurate gc.
+    %
+:- pred acquire_persistent_temp_slot(slot_contents::in, lval::out,
+    code_info::in, code_info::out) is det.
+
+    % Release a persistent stack slot acquired earlier for a temporary value.
+    %
+:- pred release_persistent_temp_slot(lval::in, code_info::in, code_info::out)
+    is det.
+
     % Return the lval of the stack slot in which the given variable
     % is stored. Aborts if the variable does not have a stack slot
     % an assigned to it.
@@ -3757,6 +3808,18 @@
     set.delete(TempsInUse0, StackVar, TempsInUse),
     set_temps_in_use(TempsInUse, !CI).
 
+acquire_persistent_temp_slot(Item, StackVar, !CI) :-
+    acquire_temp_slot(Item, StackVar, !CI),
+    get_persistent_temps(!.CI, PersistentTemps0),
+    set.insert(PersistentTemps0, StackVar, PersistentTemps),
+    set_persistent_temps(PersistentTemps, !CI).
+
+release_persistent_temp_slot(StackVar, !CI) :-
+    release_temp_slot(StackVar, !CI),
+    get_persistent_temps(!.CI, PersistentTemps0),
+    set.delete(PersistentTemps0, StackVar, PersistentTemps),
+    set_persistent_temps(PersistentTemps, !CI).
+
 %---------------------------------------------------------------------------%
 
 get_variable_slot(CI, Var, Slot) :-
@@ -3791,6 +3854,9 @@
         L = det_slot(N),
         int.max(N, !Max)
     ;
+        L = parent_det_slot(N),
+        int.max(N, !Max)
+    ;
         L = nondet_slot(N),
         int.max(N, !Max)
     ),
Index: compiler/code_util.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/code_util.m,v
retrieving revision 1.174
diff -u -r1.174 code_util.m
--- compiler/code_util.m	22 Aug 2006 05:03:40 -0000	1.174
+++ compiler/code_util.m	11 Sep 2006 05:04:47 -0000
@@ -383,6 +383,7 @@
 
 lvals_in_lval(reg(_, _), []).
 lvals_in_lval(stackvar(_), []).
+lvals_in_lval(parent_stackvar(_), []).
 lvals_in_lval(framevar(_), []).
 lvals_in_lval(succip, []).
 lvals_in_lval(maxfr, []).
@@ -399,6 +400,7 @@
     lvals_in_rval(Rval, Lvals).
 lvals_in_lval(hp, []).
 lvals_in_lval(sp, []).
+lvals_in_lval(parent_sp, []).
 lvals_in_lval(field(_, Rval1, Rval2), Lvals1 ++ Lvals2) :-
     lvals_in_rval(Rval1, Lvals1),
     lvals_in_rval(Rval2, Lvals2).
Index: compiler/continuation_info.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/continuation_info.m,v
retrieving revision 1.78
diff -u -r1.78 continuation_info.m
--- compiler/continuation_info.m	22 Aug 2006 05:03:41 -0000	1.78
+++ compiler/continuation_info.m	11 Sep 2006 05:04:47 -0000
@@ -878,11 +878,13 @@
 live_value_type(lval(redoip_slot(_)), live_value_unwanted).
 live_value_type(lval(succip_slot(_)), live_value_unwanted).
 live_value_type(lval(sp), live_value_unwanted).
+live_value_type(lval(parent_sp), live_value_unwanted).
 live_value_type(lval(lvar(_)), live_value_unwanted).
 live_value_type(lval(field(_, _, _)), live_value_unwanted).
 live_value_type(lval(temp(_, _)), live_value_unwanted).
 live_value_type(lval(reg(_, _)), live_value_unwanted).
 live_value_type(lval(stackvar(_)), live_value_unwanted).
+live_value_type(lval(parent_stackvar(_)), live_value_unwanted).
 live_value_type(lval(framevar(_)), live_value_unwanted).
 live_value_type(lval(mem_ref(_)), live_value_unwanted). % XXX
 live_value_type(lval(global_var_ref(_)), live_value_unwanted).
Index: compiler/dupelim.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/dupelim.m,v
retrieving revision 1.83
diff -u -r1.83 dupelim.m
--- compiler/dupelim.m	20 Aug 2006 05:01:25 -0000	1.83
+++ compiler/dupelim.m	11 Sep 2006 05:04:47 -0000
@@ -416,20 +416,16 @@
         Instr1 = decr_sp_and_return(_),
         Instr = Instr1
     ;
-        Instr1 = fork(_, _, _),
+        Instr1 = fork(_),
         Instr = Instr1
     ;
         Instr1 = init_sync_term(Lval1, N),
         standardize_lval(Lval1, Lval),
         Instr = init_sync_term(Lval, N)
     ;
-        Instr1 = join_and_terminate(Lval1),
+        Instr1 = join_and_continue(Lval1, Label),
         standardize_lval(Lval1, Lval),
-        Instr = join_and_terminate(Lval)
-    ;
-        Instr1 = join_and_continue(Lval1, N),
-        standardize_lval(Lval1, Lval),
-        Instr = join_and_continue(Lval, N)
+        Instr = join_and_continue(Lval, Label)
     ;
         Instr1 = pragma_c(_, _, _, _, _, _, _, _, _),
         Instr = Instr1
@@ -447,8 +443,10 @@
         ; Lval0 = curfr
         ; Lval0 = hp
         ; Lval0 = sp
+        ; Lval0 = parent_sp
         ; Lval0 = temp(_, _)
         ; Lval0 = stackvar(_)
+        ; Lval0 = parent_stackvar(_)
         ; Lval0 = framevar(_)
         ; Lval0 = succip_slot(_)
         ; Lval0 = redoip_slot(_)
@@ -735,10 +733,9 @@
         ; Instr1 = decr_sp(_)
         ; Instr1 = decr_sp_and_return(_)
         ; Instr1 = pragma_c(_, _, _, _, _, _, _, _, _)
-        ; Instr1 = fork(_, _, _)
+        ; Instr1 = fork(_)
         ; Instr1 = init_sync_term(_, _)
         ; Instr1 = join_and_continue(_, _)
-        ; Instr1 = join_and_terminate(_)
         ),
         ( Instr1 = Instr2 ->
             MaybeInstr = yes(Instr1)
@@ -781,6 +778,10 @@
         Lval2 = Lval1,
         Lval = Lval1
     ;
+        Lval1 = parent_sp,
+        Lval2 = Lval1,
+        Lval = Lval1
+    ;
         Lval1 = temp(_, _),
         Lval2 = Lval1,
         Lval = Lval1
@@ -789,6 +790,10 @@
         Lval2 = Lval1,
         Lval = Lval1
     ;
+        Lval1 = parent_stackvar(_),
+        Lval2 = Lval1,
+        Lval = Lval1
+    ;
         Lval1 = framevar(_),
         Lval2 = Lval1,
         Lval = Lval1
Index: compiler/dupproc.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/dupproc.m,v
retrieving revision 1.12
diff -u -r1.12 dupproc.m
--- compiler/dupproc.m	22 Aug 2006 05:03:43 -0000	1.12
+++ compiler/dupproc.m	11 Sep 2006 05:04:47 -0000
@@ -268,17 +268,13 @@
         Instr = decr_sp_and_return(_),
         StdInstr = Instr
     ;
-        Instr = fork(Child, Parent, NumSlots),
+        Instr = fork(Child),
         standardize_label(Child, StdChild, DupProcMap),
-        standardize_label(Parent, StdParent, DupProcMap),
-        StdInstr = fork(StdChild, StdParent, NumSlots)
+        StdInstr = fork(StdChild)
     ;
         Instr = init_sync_term(_, _),
         StdInstr = Instr
     ;
-        Instr = join_and_terminate(_),
-        StdInstr = Instr
-    ;
         Instr = join_and_continue(Lval, Label),
         standardize_label(Label, StdLabel, DupProcMap),
         StdInstr = join_and_continue(Lval, StdLabel)
Index: compiler/exprn_aux.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/exprn_aux.m,v
retrieving revision 1.75
diff -u -r1.75 exprn_aux.m
--- compiler/exprn_aux.m	22 Aug 2006 05:03:44 -0000	1.75
+++ compiler/exprn_aux.m	11 Sep 2006 05:04:47 -0000
@@ -49,6 +49,20 @@
 :- mode args_contain_rval(in, in) is semidet.
 :- mode args_contain_rval(in, out) is nondet.
 
+    % transform_lval_in_instr(Transform, !Instr, !Acc):
+    %
+    % Transform all lvals in !.Instr with the predicate Transform.
+    % An accumulator is threaded through.
+    %
+:- pred transform_lval_in_instr(transform_lval(T)::in(transform_lval),
+    instruction::in, instruction::out, T::in, T::out) is det.
+
+:- pred transform_lval_in_rval(transform_lval(T)::in(transform_lval),
+    rval::in, rval::out, T::in, T::out) is det.
+
+:- type transform_lval(T)   == pred(lval, lval, T, T).
+:- inst transform_lval      == (pred(in, out, in, out) is det).
+
     % substitute_lval_in_instr(OldLval, NewLval, !Instr, !SubstCount):
     %
     % Substitute all occurrences of OldLval in !.Instr with NewLval.
@@ -283,7 +297,9 @@
 vars_in_lval(curfr, []).
 vars_in_lval(hp, []).
 vars_in_lval(sp, []).
+vars_in_lval(parent_sp, []).
 vars_in_lval(stackvar(_SlotNum), []).
+vars_in_lval(parent_stackvar(_SlotNum), []).
 vars_in_lval(framevar(_SlotNum), []).
 vars_in_lval(succip_slot(Rval), Vars) :-
     vars_in_rval(Rval, Vars).
@@ -316,23 +332,15 @@
 
 %-----------------------------------------------------------------------------%
 
-substitute_lval_in_lval(OldLval, NewLval, Lval0, Lval) :-
-    substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval,
-        0, _SubstCount).
-
-substitute_lval_in_rval(OldLval, NewLval, Rval0, Rval) :-
-    substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval,
-        0, _SubstCount).
-
-substitute_lval_in_instr(OldLval, NewLval, Instr0, Instr, !N) :-
+transform_lval_in_instr(Transform, Instr0, Instr, !Acc) :-
     Instr0 = Uinstr0 - Comment,
-    substitute_lval_in_uinstr(OldLval, NewLval, Uinstr0, Uinstr, !N),
+    transform_lval_in_uinstr(Transform, Uinstr0, Uinstr, !Acc),
     Instr = Uinstr - Comment.
 
-:- pred substitute_lval_in_uinstr(lval::in, lval::in,
-    instr::in, instr::out, int::in, int::out) is det.
+:- pred transform_lval_in_uinstr(transform_lval(T)::in(transform_lval),
+    instr::in, instr::out, T::in, T::out) is det.
 
-substitute_lval_in_uinstr(OldLval, NewLval, Uinstr0, Uinstr, !N) :-
+transform_lval_in_uinstr(Transform, Uinstr0, Uinstr, !Acc) :-
     (
         ( Uinstr0 = comment(_Comment)
         ; Uinstr0 = llcall(_, _, _, _, _, _)
@@ -344,130 +352,123 @@
         ; Uinstr0 = incr_sp(_, _)
         ; Uinstr0 = decr_sp(_)
         ; Uinstr0 = decr_sp_and_return(_)
-        ; Uinstr0 = fork(_, _, _)
+        ; Uinstr0 = fork(_)
         ),
         Uinstr = Uinstr0
     ;
         Uinstr0 = livevals(LvalSet0),
         set.to_sorted_list(LvalSet0, Lvals0),
-        list.map_foldl(substitute_lval_in_lval_count(OldLval, NewLval),
-            Lvals0, Lvals, !N),
+        list.map_foldl(Transform, Lvals0, Lvals, !Acc),
         set.list_to_set(Lvals, LvalSet),
         Uinstr = livevals(LvalSet)
     ;
         Uinstr0 = block(TempR, TempF, Instrs0),
-        list.map_foldl(substitute_lval_in_instr(OldLval, NewLval),
-            Instrs0, Instrs, !N),
+        list.map_foldl(transform_lval_in_instr(Transform),
+            Instrs0, Instrs, !Acc),
         Uinstr = block(TempR, TempF, Instrs)
     ;
         Uinstr0 = assign(Lval0, Rval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        Transform(Lval0, Lval, !Acc),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = assign(Lval, Rval)
     ;
         Uinstr0 = computed_goto(Rval0, Labels),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = computed_goto(Rval, Labels)
     ;
         Uinstr0 = arbitrary_c_code(Code, LiveLvals0),
-        substitute_lval_in_live_lval_info(OldLval, NewLval,
-            LiveLvals0, LiveLvals, !N),
+        transform_lval_in_live_lval_info(Transform, LiveLvals0, LiveLvals,
+            !Acc),
         Uinstr = arbitrary_c_code(Code, LiveLvals)
     ;
         Uinstr0 = if_val(Rval0, CodeAddr),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = if_val(Rval, CodeAddr)
     ;
         Uinstr0 = save_maxfr(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = save_maxfr(Lval)
     ;
         Uinstr0 = restore_maxfr(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = restore_maxfr(Lval)
     ;
         Uinstr0 = incr_hp(Lval0, MaybeTag, MO, Rval0, TypeCtor, MayUseAtomic),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        Transform(Lval0, Lval, !Acc),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = incr_hp(Lval, MaybeTag, MO, Rval, TypeCtor, MayUseAtomic)
     ;
         Uinstr0 = mark_hp(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = mark_hp(Lval)
     ;
         Uinstr0 = restore_hp(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = restore_hp(Rval)
     ;
         Uinstr0 = free_heap(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = free_heap(Rval)
     ;
         Uinstr0 = store_ticket(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = store_ticket(Lval)
     ;
         Uinstr0 = reset_ticket(Rval0, Reason),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = reset_ticket(Rval, Reason)
     ;
         Uinstr0 = mark_ticket_stack(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = mark_ticket_stack(Lval)
     ;
         Uinstr0 = prune_tickets_to(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         Uinstr = prune_tickets_to(Rval)
 %   ;
 %       % discard_tickets_to(_) is used only in hand-written code
 %       Uinstr0 = discard_tickets_to(Rval0),
-%       substitute_lval_in_rval(OldLval, NewLval, Rval0, Rval, !N),
+%       transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
 %       Uinstr = discard_tickets_to(Rval)
     ;
         Uinstr0 = pragma_c(Decls, Components0, MayCallMercury,
             MaybeLabel1, MaybeLabel2, MaybeLabel3, MaybeLabel4,
             ReferStackSlot, MayDupl),
-        list.map_foldl(substitute_lval_in_component(OldLval, NewLval),
-            Components0, Components, !N),
+        list.map_foldl(transform_lval_in_component(Transform),
+            Components0, Components, !Acc),
         Uinstr = pragma_c(Decls, Components, MayCallMercury,
             MaybeLabel1, MaybeLabel2, MaybeLabel3, MaybeLabel4,
             ReferStackSlot, MayDupl)
     ;
         Uinstr0 = init_sync_term(Lval0, BranchCount),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = init_sync_term(Lval, BranchCount)
     ;
-        Uinstr0 = join_and_terminate(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
-        Uinstr = join_and_terminate(Lval)
-    ;
         Uinstr0 = join_and_continue(Lval0, Label),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Uinstr = join_and_continue(Lval, Label)
     ).
 
-:- pred substitute_lval_in_component(lval::in, lval::in,
-    pragma_c_component::in, pragma_c_component::out, int::in, int::out) is det.
+:- pred transform_lval_in_component(transform_lval(T)::in(transform_lval),
+    pragma_c_component::in, pragma_c_component::out, T::in, T::out) is det.
 
-substitute_lval_in_component(OldLval, NewLval,
-        Component0, Component, !N) :-
+transform_lval_in_component(Transform, Component0, Component, !Acc) :-
     (
         Component0 = pragma_c_inputs(Inputs0),
-        list.map_foldl(substitute_lval_in_pragma_c_input(OldLval, NewLval),
-            Inputs0, Inputs, !N),
+        list.map_foldl(transform_lval_in_pragma_c_input(Transform),
+            Inputs0, Inputs, !Acc),
         Component = pragma_c_inputs(Inputs)
     ;
         Component0 = pragma_c_outputs(Outputs0),
-        list.map_foldl(substitute_lval_in_pragma_c_output(OldLval, NewLval),
-            Outputs0, Outputs, !N),
+        list.map_foldl(transform_lval_in_pragma_c_output(Transform),
+            Outputs0, Outputs, !Acc),
         Component = pragma_c_outputs(Outputs)
     ;
         Component0 = pragma_c_user_code(_, _),
         Component = Component0
     ;
         Component0 = pragma_c_raw_code(Code, CanBranchAway, LvalSet0),
-        substitute_lval_in_live_lval_info(OldLval, NewLval,
-            LvalSet0, LvalSet, !N),
+        transform_lval_in_live_lval_info(Transform, LvalSet0, LvalSet, !Acc),
         Component = pragma_c_raw_code(Code, CanBranchAway, LvalSet)
     ;
         Component0 = pragma_c_fail_to(_),
@@ -477,93 +478,100 @@
         Component = Component0
     ).
 
-:- pred substitute_lval_in_live_lval_info(lval::in, lval::in,
-    c_code_live_lvals::in, c_code_live_lvals::out, int::in, int::out) is det.
+:- pred transform_lval_in_live_lval_info(transform_lval(T)::in(transform_lval),
+    c_code_live_lvals::in, c_code_live_lvals::out, T::in, T::out) is det.
 
-substitute_lval_in_live_lval_info(_OldLval, _NewLval,
-        no_live_lvals_info, no_live_lvals_info, !N).
-substitute_lval_in_live_lval_info(OldLval, NewLval,
-        live_lvals_info(LvalSet0), live_lvals_info(LvalSet), !N) :-
+transform_lval_in_live_lval_info(_,
+        no_live_lvals_info, no_live_lvals_info, !Acc).
+transform_lval_in_live_lval_info(Transform,
+        live_lvals_info(LvalSet0), live_lvals_info(LvalSet), !Acc) :-
     Lvals0 = set.to_sorted_list(LvalSet0),
-    list.map_foldl(substitute_lval_in_lval_count(OldLval, NewLval),
-        Lvals0, Lvals, !N),
+    list.map_foldl(Transform, Lvals0, Lvals, !Acc),
     set.list_to_set(Lvals, LvalSet).
 
-:- pred substitute_lval_in_pragma_c_input(lval::in, lval::in,
-    pragma_c_input::in, pragma_c_input::out, int::in, int::out) is det.
+:- pred transform_lval_in_pragma_c_input(transform_lval(T)::in(transform_lval),
+    pragma_c_input::in, pragma_c_input::out, T::in, T::out) is det.
 
-substitute_lval_in_pragma_c_input(OldLval, NewLval, Out0, Out,
-        !N) :-
+transform_lval_in_pragma_c_input(Transform, Out0, Out, !Acc) :-
     Out0 = pragma_c_input(Name, VarType, IsDummy, OrigType, Rval0,
         MaybeForeign, BoxPolicy),
-    substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+    transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
     Out = pragma_c_input(Name, VarType, IsDummy, OrigType, Rval,
         MaybeForeign, BoxPolicy).
 
-:- pred substitute_lval_in_pragma_c_output(lval::in, lval::in,
-    pragma_c_output::in, pragma_c_output::out, int::in, int::out) is det.
+:- pred transform_lval_in_pragma_c_output(transform_lval(T)::in(transform_lval),
+    pragma_c_output::in, pragma_c_output::out, T::in, T::out) is det.
 
-substitute_lval_in_pragma_c_output(OldLval, NewLval, Out0, Out, !N) :-
+transform_lval_in_pragma_c_output(Transform, Out0, Out, !Acc) :-
     Out0 = pragma_c_output(Lval0, VarType, IsDummy, OrigType, Name,
         MaybeForeign, BoxPolicy),
-    substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+    Transform(Lval0, Lval, !Acc),
     Out = pragma_c_output(Lval, VarType, IsDummy, OrigType, Name,
         MaybeForeign, BoxPolicy).
 
-:- pred substitute_lval_in_rval_count(lval::in, lval::in,
-    rval::in, rval::out, int::in, int::out) is det.
-
-substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N) :-
+transform_lval_in_rval(Transform, Rval0, Rval, !Acc) :-
     (
         Rval0 = lval(Lval0),
-        substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval, !N),
+        Transform(Lval0, Lval, !Acc),
         Rval = lval(Lval)
     ;
         Rval0 = var(_Var),
         Rval = Rval0
     ;
         Rval0 = mkword(Tag, Rval1),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval1, Rval2, !N),
+        transform_lval_in_rval(Transform, Rval1, Rval2, !Acc),
         Rval = mkword(Tag, Rval2)
     ;
         Rval0 = const(_Const),
         Rval = Rval0
     ;
         Rval0 = unop(Unop, Rval1),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval1, Rval2, !N),
+        transform_lval_in_rval(Transform, Rval1, Rval2, !Acc),
         Rval = unop(Unop, Rval2)
     ;
         Rval0 = binop(Binop, Rval1, Rval2),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval1, Rval3, !N),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval2, Rval4, !N),
+        transform_lval_in_rval(Transform, Rval1, Rval3, !Acc),
+        transform_lval_in_rval(Transform, Rval2, Rval4, !Acc),
         Rval = binop(Binop, Rval3, Rval4)
     ;
         Rval0 = mem_addr(MemRef0),
-        substitute_lval_in_mem_ref(OldLval, NewLval, MemRef0, MemRef, !N),
+        transform_lval_in_mem_ref(Transform, MemRef0, MemRef, !Acc),
         Rval = mem_addr(MemRef)
     ).
 
-:- pred substitute_lval_in_mem_ref(lval::in, lval::in,
-    mem_ref::in, mem_ref::out, int::in, int::out) is det.
+:- pred transform_lval_in_mem_ref(transform_lval(T)::in(transform_lval),
+    mem_ref::in, mem_ref::out, T::in, T::out) is det.
 
-substitute_lval_in_mem_ref(OldLval, NewLval, MemRef0, MemRef, !N) :-
+transform_lval_in_mem_ref(Transform, MemRef0, MemRef, !Acc) :-
     (
         MemRef0 = stackvar_ref(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         MemRef = stackvar_ref(Rval)
     ;
         MemRef0 = framevar_ref(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !Acc),
         MemRef = framevar_ref(Rval)
     ;
         MemRef0 = heap_ref(BaseRval0, Tag, FieldRval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, BaseRval0, BaseRval,
-            !N),
-        substitute_lval_in_rval_count(OldLval, NewLval, FieldRval0, FieldRval,
-            !N),
+        transform_lval_in_rval(Transform, BaseRval0, BaseRval, !Acc),
+        transform_lval_in_rval(Transform, FieldRval0, FieldRval, !Acc),
         MemRef = heap_ref(BaseRval, Tag, FieldRval)
     ).
 
+%-----------------------------------------------------------------------------%
+
+substitute_lval_in_instr(OldLval, NewLval, Instr0, Instr, !N) :-
+    transform_lval_in_instr(substitute_lval_in_lval_count(OldLval, NewLval),
+        Instr0, Instr, !N).
+
+substitute_lval_in_lval(OldLval, NewLval, Lval0, Lval) :-
+    substitute_lval_in_lval_count(OldLval, NewLval, Lval0, Lval,
+        0, _SubstCount).
+
+substitute_lval_in_rval(OldLval, NewLval, Rval0, Rval) :-
+    transform_lval_in_rval(substitute_lval_in_lval_count(OldLval, NewLval),
+        Rval0, Rval, 0, _SubstCount).
+
 :- pred substitute_lval_in_lval_count(lval::in, lval::in,
     lval::in, lval::out, int::in, int::out) is det.
 
@@ -579,6 +587,7 @@
     lval::in, lval::out, int::in, int::out) is det.
 
 substitute_lval_in_lval_count_2(OldLval, NewLval, Lval0, Lval, !N) :-
+    Transform = substitute_lval_in_lval_count(OldLval, NewLval),
     (
         ( Lval0 = reg(_Type, _RegNum)
         ; Lval0 = succip
@@ -586,8 +595,10 @@
         ; Lval0 = curfr
         ; Lval0 = hp
         ; Lval0 = sp
+        ; Lval0 = parent_sp
         ; Lval0 = temp(_Type, _TmpNum)
         ; Lval0 = stackvar(_SlotNum)
+        ; Lval0 = parent_stackvar(_SlotNum)
         ; Lval0 = framevar(_SlotNum)
         ; Lval0 = lvar(_Var)
         ; Lval0 = global_var_ref(_GlobalVarName)
@@ -595,56 +606,35 @@
         Lval = Lval0
     ;
         Lval0 = succip_slot(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = succip_slot(Rval)
     ;
         Lval0 = redoip_slot(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = redoip_slot(Rval)
     ;
         Lval0 = redofr_slot(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = redofr_slot(Rval)
     ;
         Lval0 = succfr_slot(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = succfr_slot(Rval)
     ;
         Lval0 = prevfr_slot(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = prevfr_slot(Rval)
     ;
         Lval0 = field(Tag, Rval1, Rval2),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval1, Rval3, !N),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval2, Rval4, !N),
+        transform_lval_in_rval(Transform, Rval1, Rval3, !N),
+        transform_lval_in_rval(Transform, Rval2, Rval4, !N),
         Lval = field(Tag, Rval3, Rval4)
     ;
         Lval0 = mem_ref(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
+        transform_lval_in_rval(Transform, Rval0, Rval, !N),
         Lval = mem_ref(Rval)
     ).
 
-:- pred substitute_lval_in_args(lval::in, lval::in,
-    list(maybe(rval))::in, list(maybe(rval))::out, int::in, int::out) is det.
-
-substitute_lval_in_args(_OldLval, _NewLval, [], [], !N).
-substitute_lval_in_args(OldLval, NewLval, [M0 | Ms0], [M | Ms], !N) :-
-    substitute_lval_in_arg(OldLval, NewLval, M0, M, !N),
-    substitute_lval_in_args(OldLval, NewLval, Ms0, Ms, !N).
-
-:- pred substitute_lval_in_arg(lval::in, lval::in,
-    maybe(rval)::in, maybe(rval)::out, int::in, int::out) is det.
-
-substitute_lval_in_arg(OldLval, NewLval, MaybeRval0, MaybeRval, !N) :-
-    (
-        MaybeRval0 = yes(Rval0),
-        substitute_lval_in_rval_count(OldLval, NewLval, Rval0, Rval, !N),
-        MaybeRval = yes(Rval)
-    ;
-        MaybeRval0 = no,
-        MaybeRval = MaybeRval0
-    ).
-
 substitute_rval_in_rval(OldRval, NewRval, Rval0, Rval) :-
     ( Rval0 = OldRval ->
         Rval = NewRval
@@ -706,8 +696,10 @@
         ; Lval0 = curfr
         ; Lval0 = hp
         ; Lval0 = sp
+        ; Lval0 = parent_sp
         ; Lval0 = temp(_, _)
         ; Lval0 = stackvar(_)
+        ; Lval0 = parent_stackvar(_)
         ; Lval0 = framevar(_)
         ; Lval0 = global_var_ref(_)
         ; Lval0 = lvar(_)
@@ -744,27 +736,6 @@
         Lval = mem_ref(Rval)
     ).
 
-:- pred substitute_rval_in_args(rval::in, rval::in,
-    list(maybe(rval))::in, list(maybe(rval))::out) is det.
-
-substitute_rval_in_args(_OldRval, _NewRval, [], []).
-substitute_rval_in_args(OldRval, NewRval, [M0 | Ms0], [M | Ms]) :-
-    substitute_rval_in_arg(OldRval, NewRval, M0, M),
-    substitute_rval_in_args(OldRval, NewRval, Ms0, Ms).
-
-:- pred substitute_rval_in_arg(rval::in, rval::in,
-    maybe(rval)::in, maybe(rval)::out) is det.
-
-substitute_rval_in_arg(OldRval, NewRval, MaybeRval0, MaybeRval) :-
-    (
-        MaybeRval0 = yes(Rval0),
-        substitute_rval_in_rval(OldRval, NewRval, Rval0, Rval),
-        MaybeRval = yes(Rval)
-    ;
-        MaybeRval0 = no,
-        MaybeRval = MaybeRval0
-    ).
-
 %-----------------------------------------------------------------------------%
 
 substitute_vars_in_rval([], !Rval).
@@ -886,6 +857,7 @@
 
 lval_addrs(reg(_Type, _RegNum), [], []).
 lval_addrs(stackvar(_SlotNum), [], []).
+lval_addrs(parent_stackvar(_SlotNum), [], []).
 lval_addrs(framevar(_SlotNum), [], []).
 lval_addrs(succip, [], []).
 lval_addrs(maxfr, [], []).
@@ -902,6 +874,7 @@
     rval_addrs(Rval, CodeAddrs, DataAddrs).
 lval_addrs(hp, [], []).
 lval_addrs(sp, [], []).
+lval_addrs(parent_sp, [], []).
 lval_addrs(field(_Tag, Rval1, Rval2), CodeAddrs, DataAddrs) :-
     rval_addrs(Rval1, CodeAddrs1, DataAddrs1),
     rval_addrs(Rval2, CodeAddrs2, DataAddrs2),
Index: compiler/global_data.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/global_data.m,v
retrieving revision 1.23
diff -u -r1.23 global_data.m
--- compiler/global_data.m	4 Sep 2006 01:47:29 -0000	1.23
+++ compiler/global_data.m	11 Sep 2006 05:04:47 -0000
@@ -948,8 +948,7 @@
         ; Instr0 = decr_sp(_)
         ; Instr0 = decr_sp_and_return(_)
         ; Instr0 = init_sync_term(_, _)
-        ; Instr0 = fork(_, _, _)
-        ; Instr0 = join_and_terminate(_)
+        ; Instr0 = fork(_)
         ; Instr0 = join_and_continue(_, _)
         ),
         Instr = Instr0
@@ -1008,8 +1007,10 @@
         ; Lval0 = curfr
         ; Lval0 = hp
         ; Lval0 = sp
+        ; Lval0 = parent_sp
         ; Lval0 = temp(_, _)
         ; Lval0 = stackvar(_)
+        ; Lval0 = parent_stackvar(_)
         ; Lval0 = framevar(_)
         ; Lval0 = succip_slot(_)
         ; Lval0 = redoip_slot(_)
Index: compiler/hlds_llds.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/hlds_llds.m,v
retrieving revision 1.15
diff -u -r1.15 hlds_llds.m
--- compiler/hlds_llds.m	22 Aug 2006 05:03:47 -0000	1.15
+++ compiler/hlds_llds.m	11 Sep 2006 05:04:48 -0000
@@ -31,6 +31,7 @@
 
 :- type stack_slot
     --->    det_slot(int)
+    ;       parent_det_slot(int)
     ;       nondet_slot(int).
 
     % Maps variables to their stack slots.
@@ -41,6 +42,7 @@
     --->    any_reg
     ;       abs_reg(int)
     ;       abs_stackvar(int)
+    ;       abs_parent_stackvar(int)
     ;       abs_framevar(int).
 
 :- type abs_follow_vars_map ==  map(prog_var, abs_locn).
@@ -704,9 +706,11 @@
 %-----------------------------------------------------------------------------%
 
 stack_slot_num(det_slot(N)) = N.
+stack_slot_num(parent_det_slot(N)) = N.
 stack_slot_num(nondet_slot(N)) = N.
 
 stack_slot_to_abs_locn(det_slot(N)) = abs_stackvar(N).
+stack_slot_to_abs_locn(parent_det_slot(N)) = abs_parent_stackvar(N).
 stack_slot_to_abs_locn(nondet_slot(N)) = abs_framevar(N).
 
 key_stack_slot_to_abs_locn(_, Slot) =
@@ -715,6 +719,8 @@
 abs_locn_to_string(any_reg) = "any_reg".
 abs_locn_to_string(abs_reg(N)) = "r" ++ int_to_string(N).
 abs_locn_to_string(abs_stackvar(N)) = "stackvar" ++ int_to_string(N).
+abs_locn_to_string(abs_parent_stackvar(N)) =
+    "parent_stackvar" ++ int_to_string(N).
 abs_locn_to_string(abs_framevar(N)) = "framevar" ++ int_to_string(N).
 
 %-----------------------------------------------------------------------------%
Index: compiler/jumpopt.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/jumpopt.m,v
retrieving revision 1.92
diff -u -r1.92 jumpopt.m
--- compiler/jumpopt.m	22 Aug 2006 05:03:49 -0000	1.92
+++ compiler/jumpopt.m	11 Sep 2006 05:04:48 -0000
@@ -756,16 +756,12 @@
         % for the last time.
         unexpected(this_file, "instr_list: block")
     ;
-        Uinstr0 = fork(Child0, Parent0, NumSlots),
+        Uinstr0 = fork(Child0),
         short_label(Instrmap, Child0, Child),
-        short_label(Instrmap, Parent0, Parent),
-        (
-            Child = Child0,
-            Parent = Parent0
-        ->
+        ( Child = Child0 ->
             NewRemain = usual_case
         ;
-            Uinstr = fork(Child, Parent, NumSlots),
+            Uinstr = fork(Child),
             Comment = Comment0 ++ " (redirect)",
             Instr = Uinstr - Comment,
             NewRemain = specified([Instr], Instrs0)
@@ -802,7 +798,6 @@
         ; Uinstr0 = incr_hp(_, _, _, _, _, _)
         ; Uinstr0 = restore_hp(_)
         ; Uinstr0 = init_sync_term(_, _)
-        ; Uinstr0 = join_and_terminate(_)
         ),
         NewRemain = usual_case
     ),
@@ -1053,8 +1048,10 @@
 jumpopt.short_labels_lval(_, curfr, curfr).
 jumpopt.short_labels_lval(_, hp, hp).
 jumpopt.short_labels_lval(_, sp, sp).
+jumpopt.short_labels_lval(_, parent_sp, parent_sp).
 jumpopt.short_labels_lval(_, temp(T, N), temp(T, N)).
 jumpopt.short_labels_lval(_, stackvar(N), stackvar(N)).
+jumpopt.short_labels_lval(_, parent_stackvar(N), parent_stackvar(N)).
 jumpopt.short_labels_lval(_, framevar(N), framevar(N)).
 jumpopt.short_labels_lval(_, global_var_ref(Var), global_var_ref(Var)).
 jumpopt.short_labels_lval(Instrmap, succip_slot(Rval0), succip_slot(Rval)) :-
Index: compiler/live_vars.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/live_vars.m,v
retrieving revision 1.125
diff -u -r1.125 live_vars.m
--- compiler/live_vars.m	22 Aug 2006 05:03:50 -0000	1.125
+++ compiler/live_vars.m	11 Sep 2006 05:04:48 -0000
@@ -65,6 +65,7 @@
 :- import_module check_hlds.mode_util.
 :- import_module hlds.arg_info.
 :- import_module hlds.code_model.
+:- import_module hlds.goal_util.
 :- import_module hlds.hlds_data.
 :- import_module hlds.hlds_goal.
 :- import_module hlds.hlds_llds.
@@ -80,6 +81,22 @@
 :- import_module map.
 :- import_module pair.
 
+    % Information about which variables in a parallel conjunction need stack
+    % slots.
+    %
+:- type parallel_stackvars
+    --->    parallel_stackvars(
+                set(prog_var),
+                    % Variables nonlocal to the parallel conjunction which need
+                    % their own stack slots.
+                list(set(prog_var)),
+                    % Variables local to parallel conjuncts prior to the
+                    % current conjunct which need stack slots.
+                set(prog_var)
+                    % Accumulating set of variables local to the current
+                    % parallel conjunct which need stack slots.
+            ).
+
 %-----------------------------------------------------------------------------%
 
 % The stack_slots structure (map(prog_var, lval)) is threaded through the
@@ -88,6 +105,20 @@
 
 build_live_sets_in_goal(Goal0 - GoalInfo0, Goal - GoalInfo, ResumeVars0,
         AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+    ParStackVars0 = parallel_stackvars(set.init, [], set.init),
+    build_live_sets_in_goal(Goal0 - GoalInfo0, Goal - GoalInfo, ResumeVars0,
+            AllocData, !StackAlloc, !Liveness, !NondetLiveness,
+            ParStackVars0, _ParStackVars).
+
+:- pred build_live_sets_in_goal(hlds_goal::in, hlds_goal::out,
+    set(prog_var)::in, alloc_data::in, T::in, T::out,
+    set(prog_var)::in, set(prog_var)::out,
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
+
+build_live_sets_in_goal(Goal0 - GoalInfo0, Goal - GoalInfo, ResumeVars0,
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     goal_info_get_pre_deaths(GoalInfo0, PreDeaths),
     goal_info_get_pre_births(GoalInfo0, PreBirths),
     goal_info_get_post_deaths(GoalInfo0, PostDeaths),
@@ -126,7 +157,7 @@
     ),
 
     build_live_sets_in_goal_2(Goal0, Goal, GoalInfo1, GoalInfo, ResumeVars1,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness),
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
 
     ( goal_is_atomic(Goal0) ->
         true
@@ -157,35 +188,64 @@
     hlds_goal_info::in, hlds_goal_info::out,
     set(prog_var)::in, alloc_data::in, T::in, T::out,
     set(prog_var)::in, set(prog_var)::out,
-    set(prog_var)::in, set(prog_var)::out) is det <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
 build_live_sets_in_goal_2(conj(ConjType, Goals0), conj(ConjType, Goals),
         GoalInfo0, GoalInfo, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness) :-
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     (
         ConjType = plain_conj,
         GoalInfo = GoalInfo0,
         build_live_sets_in_conj(Goals0, Goals, ResumeVars0, AllocData,
-            !StackAlloc, !Liveness, !NondetLiveness)
+            !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars)
     ;
         ConjType = parallel_conj,
-        goal_info_get_code_gen_nonlocals(GoalInfo0, NonLocals),
-        set.union(NonLocals, !.Liveness, LiveSet),
-        % Since each parallel conjunct may be run on a different Mercury engine
-        % to the current engine, we must save all the variables that are live
+        !.ParStackVars = parallel_stackvars(OuterNonLocals,
+            OuterLocalStackVars, OuterAccStackVars0),
+
+        % Since each parallel conjunct may be run in a different Mercury context
+        % to the current context, we must save all the variables that are live
         % or nonlocal to the parallel conjunction. Nonlocal variables that are
         % currently free, but are bound inside one of the conjuncts need a
         % stackslot because they are passed out by reference to that stackslot.
-        NeedInParConj = need_in_par_conj(LiveSet),
-        record_par_conj(NeedInParConj, GoalInfo0, GoalInfo, !StackAlloc),
+        goal_info_get_code_gen_nonlocals(GoalInfo0, NonLocals),
+        set.union(NonLocals, !.Liveness, LiveSet),
+
+        InnerNonLocals = LiveSet `set.union` OuterNonLocals,
+        InnerParStackVars0 = parallel_stackvars(InnerNonLocals, [], set.init),
         build_live_sets_in_par_conj(Goals0, Goals, ResumeVars0, AllocData,
-            !StackAlloc, !.Liveness, !Liveness, !NondetLiveness)
+            !StackAlloc, !Liveness, !NondetLiveness,
+            InnerParStackVars0, InnerParStackVars),
+        InnerParStackVars = parallel_stackvars(_, InnerStackVars, _),
+
+        % This is safe but suboptimal.  It causes all variables which need
+        % stack slots in a parallel conjunction to have distinct stack slots.
+        % Variables local to a single conjunct could share stack slots, as
+        % long as the _sets_ of stack slots allocated to different parallel
+        % conjuncts are distinct.
+        NeedInParConj = need_in_par_conj(InnerNonLocals `set.union`
+            set.union_list(InnerStackVars)),
+        record_par_conj(NeedInParConj, GoalInfo0, GoalInfo, !StackAlloc),
+
+        % All the local variables which needed stack slots in the parallel
+        % conjuncts (InnerStackVars) become part of the accumulating set of
+        % variables that have stack slots.  Variables which are not local to
+        % but are needed in the parallel conjunctions also become part of the
+        % accumulating set.
+        OuterAccStackVars = OuterAccStackVars0
+            `set.union` set.union_list(InnerStackVars)
+            `set.union` (LiveSet `set.difference` OuterNonLocals),
+        !:ParStackVars = parallel_stackvars(OuterNonLocals,
+            OuterLocalStackVars, OuterAccStackVars)
     ).
 
 build_live_sets_in_goal_2(disj(Goals0), disj(Goals), GoalInfo, GoalInfo,
-        ResumeVars0, AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        ResumeVars0, AllocData, !StackAlloc, !Liveness, !NondetLiveness,
+        !ParStackVars) :-
     build_live_sets_in_disj(Goals0, Goals, GoalInfo, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness),
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
     (
         Goals = [First | _],
         First = _ - FirstGoalInfo,
@@ -231,33 +291,37 @@
 
 build_live_sets_in_goal_2(switch(Var, CanFail, Cases0),
         switch(Var, CanFail, Cases), GoalInfo, GoalInfo, ResumeVars0,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     build_live_sets_in_cases(Cases0, Cases, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness).
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars).
 
 build_live_sets_in_goal_2(if_then_else(Vars, Cond0, Then0, Else0),
         if_then_else(Vars, Cond, Then, Else), GoalInfo, GoalInfo,
         ResumeVars0, AllocData, !StackAlloc,
-        Liveness0, Liveness, NondetLiveness0, NondetLiveness) :-
+        Liveness0, Liveness, NondetLiveness0, NondetLiveness, !ParStackVars) :-
     build_live_sets_in_goal(Cond0, Cond, ResumeVars0, AllocData, !StackAlloc,
-        Liveness0, LivenessCond, NondetLiveness0, NondetLivenessCond),
+        Liveness0, LivenessCond, NondetLiveness0, NondetLivenessCond,
+        !ParStackVars),
     build_live_sets_in_goal(Then0, Then, ResumeVars0, AllocData, !StackAlloc,
-        LivenessCond, _LivenessThen, NondetLivenessCond, NondetLivenessThen),
+        LivenessCond, _LivenessThen, NondetLivenessCond, NondetLivenessThen,
+        !ParStackVars),
     build_live_sets_in_goal(Else0, Else, ResumeVars0, AllocData, !StackAlloc,
-        Liveness0, Liveness, NondetLiveness0, NondetLivenessElse),
+        Liveness0, Liveness, NondetLiveness0, NondetLivenessElse,
+        !ParStackVars),
     set.union(NondetLivenessThen, NondetLivenessElse, NondetLiveness).
 
 build_live_sets_in_goal_2(negation(Goal0), negation(Goal), GoalInfo, GoalInfo,
-        ResumeVars0, AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        ResumeVars0, AllocData, !StackAlloc, !Liveness, !NondetLiveness,
+        !ParStackVars) :-
     build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness).
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars).
 
 build_live_sets_in_goal_2(scope(Reason, Goal0), scope(Reason, Goal),
         GoalInfo, GoalInfo, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness) :-
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     NondetLiveness0 = !.NondetLiveness,
     build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-        !StackAlloc, !Liveness, !NondetLiveness),
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
 
     % If the "some" goal cannot succeed more than once, then execution cannot
     % backtrack into the inner goal once control has left it. Therefore the
@@ -272,7 +336,7 @@
     ).
 
 build_live_sets_in_goal_2(Goal, Goal, GoalInfo0, GoalInfo, ResumeVars0,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     Goal = generic_call(GenericCall, ArgVars, Modes, _Det),
     ( GenericCall = cast(_) ->
         GoalInfo = GoalInfo0
@@ -284,11 +348,11 @@
         arg_info.partition_generic_call_args(ModuleInfo, ArgVars,
             Types, Modes, _InVars, OutVars, _UnusedVars),
         build_live_sets_in_call(OutVars, GoalInfo0, GoalInfo, ResumeVars0,
-            AllocData, !StackAlloc, !.Liveness, !NondetLiveness)
+            AllocData, !StackAlloc, !.Liveness, !NondetLiveness, !ParStackVars)
     ).
 
 build_live_sets_in_goal_2(Goal, Goal, GoalInfo0, GoalInfo, ResumeVars0,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     Goal = plain_call(PredId, ProcId, ArgVars, Builtin, _, _),
     ModuleInfo = AllocData ^ module_info,
     CallerProcInfo = AllocData ^ proc_info,
@@ -300,11 +364,13 @@
         GoalInfo = GoalInfo0
     ;
         build_live_sets_in_call(OutVars, GoalInfo0, GoalInfo,
-            ResumeVars0, AllocData, !StackAlloc, !.Liveness, !NondetLiveness)
+            ResumeVars0, AllocData, !StackAlloc, !.Liveness, !NondetLiveness,
+            !ParStackVars)
     ).
 
 build_live_sets_in_goal_2(Goal, Goal, GoalInfo, GoalInfo,
-        _ResumeVars0, _AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        _ResumeVars0, _AllocData, !StackAlloc, !Liveness, !NondetLiveness,
+        !ParStackVars) :-
     Goal = unify(_, _, _, Unification, _),
     ( Unification = complicated_unify(_, _, _) ->
         unexpected(this_file, "build_live_sets_in_goal_2: complicated_unify")
@@ -313,7 +379,7 @@
     ).
 
 build_live_sets_in_goal_2(Goal, Goal, GoalInfo0, GoalInfo, ResumeVars0,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     Goal = call_foreign_proc(Attributes, PredId, ProcId, Args, _, _, _),
     ModuleInfo = AllocData ^ module_info,
     CallerProcInfo = AllocData ^ proc_info,
@@ -340,10 +406,11 @@
         % that may be needed at an enclosing resumption point.
 
         build_live_sets_in_call(OutVars, GoalInfo0, GoalInfo,
-            ResumeVars0, AllocData, !StackAlloc, !.Liveness, !NondetLiveness)
+            ResumeVars0, AllocData, !StackAlloc, !.Liveness, !NondetLiveness,
+            !ParStackVars)
     ).
 
-build_live_sets_in_goal_2(shorthand(_), _,_,_,_,_,_,_,_,_,_,_) :-
+build_live_sets_in_goal_2(shorthand(_), _,_,_,_,_,_,_,_,_,_,_,_,_) :-
     % these should have been expanded out by now
     unexpected(this_file, "build_live_sets_in_goal_2: unexpected shorthand").
 
@@ -357,11 +424,12 @@
     %
 :- pred build_live_sets_in_call(set(prog_var)::in, hlds_goal_info::in,
     hlds_goal_info::out, set(prog_var)::in, alloc_data::in, T::in, T::out,
-    set(prog_var)::in, set(prog_var)::in, set(prog_var)::out) is det
-    <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
 build_live_sets_in_call(OutVars, GoalInfo0, GoalInfo, ResumeVars0, AllocData,
-        !StackAlloc, Liveness, !NondetLiveness) :-
+        !StackAlloc, Liveness, !NondetLiveness, !ParStackVars) :-
 
     set.difference(Liveness, OutVars, ForwardVars0),
 
@@ -392,69 +460,90 @@
         set.union(!.NondetLiveness, ForwardVars, !:NondetLiveness)
     ;
         true
-    ).
+    ),
+
+    % In a parallel conjunction all the stack slots we need must not be reused
+    % in other parallel conjuncts.  We keep track of which variables have been
+    % allocated stack slots in each conjunct.
+
+    !.ParStackVars = parallel_stackvars(Nonlocals, ParallelVars, AccVars0),
+    AccVars = AccVars0 `set.union` (ForwardVars `set.difference` Nonlocals),
+    !:ParStackVars = parallel_stackvars(Nonlocals, ParallelVars, AccVars).
 
 %-----------------------------------------------------------------------------%
 
 :- pred build_live_sets_in_conj(list(hlds_goal)::in, list(hlds_goal)::out,
     set(prog_var)::in, alloc_data::in, T::in, T::out,
     set(prog_var)::in, set(prog_var)::out,
-    set(prog_var)::in, set(prog_var)::out) is det <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
-build_live_sets_in_conj([], [], _, _, !StackAlloc, !Liveness, !NondetLiveness).
+build_live_sets_in_conj([], [], _, _, !StackAlloc, !Liveness, !NondetLiveness,
+        !ParStackVars).
 build_live_sets_in_conj([Goal0 | Goals0], [Goal | Goals], ResumeVars0,
-        AllocData, !StackAlloc, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars) :-
     (
         Goal0 = _ - GoalInfo,
         goal_info_get_instmap_delta(GoalInfo, InstMapDelta),
         instmap_delta_is_unreachable(InstMapDelta)
     ->
         build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-            !StackAlloc, !Liveness, !NondetLiveness),
+            !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
         Goals = [] % XXX was Goals = Goal0
     ;
         build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-            !StackAlloc, !Liveness, !NondetLiveness),
+            !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars),
         build_live_sets_in_conj(Goals0, Goals, ResumeVars0, AllocData,
-            !StackAlloc, !Liveness, !NondetLiveness)
+            !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars)
     ).
 
 %-----------------------------------------------------------------------------%
 
 :- pred build_live_sets_in_par_conj(list(hlds_goal)::in, list(hlds_goal)::out,
     set(prog_var)::in, alloc_data::in, T::in, T::out,
-    set(prog_var)::in, set(prog_var)::in, set(prog_var)::out,
-    set(prog_var)::in, set(prog_var)::out) is det <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::out,
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
 build_live_sets_in_par_conj([], [], _, _,
-        !StackAlloc, _Liveness0, !Liveness, !NondetLiveness).
+        !StackAlloc, Liveness, Liveness, !NondetLiveness,
+        ParStackVars, ParStackVars).
 build_live_sets_in_par_conj([Goal0 | Goals0], [Goal | Goals], ResumeVars0,
-        AllocData, !StackAlloc, Liveness0, !Liveness, !NondetLiveness) :-
+        AllocData, !StackAlloc, Liveness0, Liveness, !NondetLiveness,
+        ParStackVars0, ParStackVars) :-
     build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-        !StackAlloc, Liveness0, Liveness1, !NondetLiveness),
-    set.union(Liveness1, !Liveness),
+        !StackAlloc, Liveness0, Liveness, !NondetLiveness,
+        ParStackVars0, ParStackVars1),
+    ParStackVars1 = parallel_stackvars(Nonlocals, PrevSets1, CurSet1),
+    ParStackVars2 = parallel_stackvars(Nonlocals, [CurSet1 | PrevSets1],
+        set.init),
     build_live_sets_in_par_conj(Goals0, Goals, ResumeVars0, AllocData,
-        !StackAlloc, Liveness0, !Liveness, !NondetLiveness).
+        !StackAlloc, Liveness0, _Liveness1, !NondetLiveness,
+        ParStackVars2, ParStackVars).
 
 %-----------------------------------------------------------------------------%
 
 :- pred build_live_sets_in_disj(list(hlds_goal)::in, list(hlds_goal)::out,
     hlds_goal_info::in, set(prog_var)::in, alloc_data::in,
     T::in, T::out, set(prog_var)::in, set(prog_var)::out,
-    set(prog_var)::in, set(prog_var)::out) is det <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
 build_live_sets_in_disj([], [], _, _, _,
-        !StackAlloc, !Liveness, !NondetLiveness).
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars).
 build_live_sets_in_disj([Goal0 | Goals0], [Goal | Goals],
         DisjGoalInfo, ResumeVars0, AllocData, !StackAlloc,
-        Liveness0, Liveness, NondetLiveness0, NondetLiveness) :-
+        Liveness0, Liveness, NondetLiveness0, NondetLiveness, !ParStackVars) :-
     Goal = _ - GoalInfo,
     build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
         !StackAlloc, Liveness0, Liveness,
-        NondetLiveness0, NondetLiveness1),
+        NondetLiveness0, NondetLiveness1, !ParStackVars),
     build_live_sets_in_disj(Goals0, Goals, DisjGoalInfo, ResumeVars0,
         AllocData, !StackAlloc, Liveness0, _Liveness2,
-        NondetLiveness0, NondetLiveness2),
+        NondetLiveness0, NondetLiveness2, !ParStackVars),
     goal_info_get_code_model(DisjGoalInfo, DisjCodeModel),
     ( DisjCodeModel = model_non ->
         % NondetLiveness should be a set of prog_var sets. Instead of taking
@@ -479,17 +568,22 @@
 :- pred build_live_sets_in_cases(list(case)::in, list(case)::out,
     set(prog_var)::in, alloc_data::in, T::in, T::out,
     set(prog_var)::in, set(prog_var)::out,
-    set(prog_var)::in, set(prog_var)::out) is det <= stack_alloc_info(T).
+    set(prog_var)::in, set(prog_var)::out,
+    parallel_stackvars::in, parallel_stackvars::out)
+    is det <= stack_alloc_info(T).
 
 build_live_sets_in_cases([], [], _, _,
-        !StackAlloc, !Liveness, !NondetLiveness).
+        !StackAlloc, !Liveness, !NondetLiveness, !ParStackVars).
 build_live_sets_in_cases([case(Cons, Goal0) | Cases0],
         [case(Cons, Goal) | Cases], ResumeVars0, AllocData,
-        !StackAlloc, Liveness0, Liveness, NondetLiveness0, NondetLiveness) :-
+        !StackAlloc, Liveness0, Liveness, NondetLiveness0, NondetLiveness,
+        !ParStackVars) :-
     build_live_sets_in_goal(Goal0, Goal, ResumeVars0, AllocData,
-        !StackAlloc, Liveness0, Liveness, NondetLiveness0, NondetLiveness1),
+        !StackAlloc, Liveness0, Liveness, NondetLiveness0, NondetLiveness1,
+        !ParStackVars),
     build_live_sets_in_cases(Cases0, Cases, ResumeVars0, AllocData,
-        !StackAlloc, Liveness0, _Liveness2, NondetLiveness0, NondetLiveness2),
+        !StackAlloc, Liveness0, _Liveness2, NondetLiveness0, NondetLiveness2,
+        !ParStackVars),
     set.union(NondetLiveness1, NondetLiveness2, NondetLiveness).
 
 %-----------------------------------------------------------------------------%
Index: compiler/livemap.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/livemap.m,v
retrieving revision 1.79
diff -u -r1.79 livemap.m
--- compiler/livemap.m	20 Aug 2006 05:01:27 -0000	1.79
+++ compiler/livemap.m	11 Sep 2006 05:04:48 -0000
@@ -284,9 +284,7 @@
     ;
         Uinstr0 = init_sync_term(_, _)
     ;
-        Uinstr0 = fork(_, _, _)
-    ;
-        Uinstr0 = join_and_terminate(_)
+        Uinstr0 = fork(_)
     ;
         Uinstr0 = join_and_continue(_, _)
     ;
Index: compiler/llds.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/llds.m,v
retrieving revision 1.339
diff -u -r1.339 llds.m
--- compiler/llds.m	4 Sep 2006 01:47:31 -0000	1.339
+++ compiler/llds.m	11 Sep 2006 05:04:48 -0000
@@ -481,24 +481,17 @@
             % documentation in par_conj_gen.m and runtime/mercury_context.{c,h}
             % for further information about synchronisation terms.)
 
-    ;       fork(label, label, int)
-            % Create a new context. fork(Child, Parent, NumSlots) creates
-            % a new thread which will start executing at Child. After spawning
-            % execution in the child, control branches to Parent. NumSlots is
-            % the number of stack slots that need to be copied to the child's
-            % stack (see comments in runtime/mercury_context.{h,c}).
-
-    ;       join_and_terminate(lval)
-            % Signal that this thread of execution has finished in the current
-            % parallel conjunction, then terminate it. The synchronisation term
-            % is specified by the given lval. (See the documentation in
-            % par_conj_gen.m and runtime/mercury_context.{c,h} for further
-            % information about synchronisation terms.)
+    ;       fork(label)
+            % Create a new spark. fork(Child) creates spark, to begin execution
+            % at Child. Control continues at the next instruction.
 
     ;       join_and_continue(lval, label).
             % Signal that this thread of execution has finished in the current
-            % parallel conjunction, then branch to the given label. The
-            % synchronisation term is specified by the given lval.
+            % parallel conjunct.  How to proceed at the end of a parallel
+            % conjunct is quite involved; see runtime/mercury_context.{c,h}.
+            % The synchronisation term is specified by the given lval.
+            % The label gives the address of the code following the parallel
+            % conjunction. 
 
 :- type nondet_frame_info
     --->    temp_frame(
@@ -777,7 +770,15 @@
             % Virtual machine register holding the heap pointer.
 
     ;       sp
-            % Virtual machine register point to the top of det stack.
+            % Virtual machine register pointing to the top of det stack.
+
+    ;       parent_sp
+            % Virtual machine register pointing to the top of the det stack.
+            % This is only set at the beginning of a parallel conjunction (and
+            % restored afterwards). Parallel conjuncts which refer to stack
+            % slots use this register instead of sp, as they could be running
+            % in a different context, where sp would be pointing into a
+            % different det stack.
 
     ;       temp(reg_type, int)
             % A local temporary register. These temporary registers are
@@ -795,6 +796,11 @@
             % current value of `sp'. These are used in both det and semidet
             % code. Stackvar slot numbers start at 1.
 
+    ;       parent_stackvar(int)
+            % A det stack slot. The number is the offset relative to the
+            % value of `parent_sp'. These are used only in the code
+            % of parallel conjuncts. Stackvar slot numbers start at 1.
+
     ;       framevar(int)
             % A nondet stack slot. The reference is relative to the current
             % value of `curfr'. These are used in nondet code. Framevar slot
@@ -1105,6 +1111,7 @@
 %-----------------------------------------------------------------------------%
 
 stack_slot_to_lval(det_slot(N)) = stackvar(N).
+stack_slot_to_lval(parent_det_slot(N)) = parent_stackvar(N).
 stack_slot_to_lval(nondet_slot(N)) = framevar(N).
 
 key_stack_slot_to_lval(_, Slot) =
@@ -1113,12 +1120,15 @@
 abs_locn_to_lval_or_any_reg(any_reg) = loa_any_reg.
 abs_locn_to_lval_or_any_reg(abs_reg(N)) = loa_lval(reg(reg_r, N)).
 abs_locn_to_lval_or_any_reg(abs_stackvar(N)) = loa_lval(stackvar(N)).
+abs_locn_to_lval_or_any_reg(abs_parent_stackvar(N))
+    = loa_lval(parent_stackvar(N)).
 abs_locn_to_lval_or_any_reg(abs_framevar(N)) = loa_lval(framevar(N)).
 
 abs_locn_to_lval(any_reg) = _ :-
     unexpected(this_file, "abs_locn_to_lval: any_reg").
 abs_locn_to_lval(abs_reg(N)) = reg(reg_r, N).
 abs_locn_to_lval(abs_stackvar(N)) = stackvar(N).
+abs_locn_to_lval(abs_parent_stackvar(N)) = parent_stackvar(N).
 abs_locn_to_lval(abs_framevar(N)) = framevar(N).
 
 key_abs_locn_to_lval(_, AbsLocn) =
@@ -1146,9 +1156,11 @@
 lval_type(curfr, data_ptr).
 lval_type(hp, data_ptr).
 lval_type(sp, data_ptr).
+lval_type(parent_sp, data_ptr).
 lval_type(temp(RegType, _), Type) :-
     register_type(RegType, Type).
 lval_type(stackvar(_), word).
+lval_type(parent_stackvar(_), word).
 lval_type(framevar(_), word).
 lval_type(succip_slot(_), code_ptr).
 lval_type(redoip_slot(_), code_ptr).
Index: compiler/llds_out.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/llds_out.m,v
retrieving revision 1.291
diff -u -r1.291 llds_out.m
--- compiler/llds_out.m	5 Sep 2006 04:23:21 -0000	1.291
+++ compiler/llds_out.m	11 Sep 2006 05:04:48 -0000
@@ -1778,9 +1778,9 @@
     ->
         set_tree234.insert(ContLabel, !ContLabelSet)
     ;
-        Instr = fork(Label1, Label2, _)
+        Instr = fork(Label1)
     ->
-        set_tree234.insert_list([Label1, Label2], !ContLabelSet)
+        set_tree234.insert(Label1, !ContLabelSet)
     ;
         Instr = block(_, _, Block)
     ->
@@ -1977,11 +1977,8 @@
     list.foldl2(output_pragma_c_component_decls, Comps, !DeclSet, !IO).
 output_instr_decls(_, init_sync_term(Lval, _), !DeclSet, !IO) :-
     output_lval_decls(Lval, !DeclSet, !IO).
-output_instr_decls(_, fork(Child, Parent, _), !DeclSet, !IO) :-
-    output_code_addr_decls(label(Child), !DeclSet, !IO),
-    output_code_addr_decls(label(Parent), !DeclSet, !IO).
-output_instr_decls(_, join_and_terminate(Lval), !DeclSet, !IO) :-
-    output_lval_decls(Lval, !DeclSet, !IO).
+output_instr_decls(_, fork(Child), !DeclSet, !IO) :-
+    output_code_addr_decls(label(Child), !DeclSet, !IO).
 output_instr_decls(_, join_and_continue(Lval, Label), !DeclSet, !IO) :-
     output_lval_decls(Lval, !DeclSet, !IO),
     output_code_addr_decls(label(Label), !DeclSet, !IO).
@@ -2549,18 +2546,9 @@
     io.write_int(N, !IO),
     io.write_string(");\n", !IO).
 
-output_instruction(fork(Child, Parent, Lval), _, !IO) :-
-    io.write_string("\tMR_fork_new_context(", !IO),
+output_instruction(fork(Child), _, !IO) :-
+    io.write_string("\tMR_fork_new_child(", !IO),
     output_label_as_code_addr(Child, !IO),
-    io.write_string(", ", !IO),
-    output_label_as_code_addr(Parent, !IO),
-    io.write_string(", ", !IO),
-    io.write_int(Lval, !IO),
-    io.write_string(");\n", !IO).
-
-output_instruction(join_and_terminate(Lval), _, !IO) :-
-    io.write_string("\tMR_join_and_terminate(", !IO),
-    output_lval(Lval, !IO),
     io.write_string(");\n", !IO).
 
 output_instruction(join_and_continue(Lval, Label), _, !IO) :-
@@ -3528,6 +3516,7 @@
         !IO).
 output_lval_decls_format(reg(_, _), _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(stackvar(_), _, _, !N, !DeclSet, !IO).
+output_lval_decls_format(parent_stackvar(_), _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(framevar(_), _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(succip, _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(maxfr, _, _, !N, !DeclSet, !IO).
@@ -3554,6 +3543,7 @@
         !IO).
 output_lval_decls_format(hp, _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(sp, _, _, !N, !DeclSet, !IO).
+output_lval_decls_format(parent_sp, _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(lvar(_), _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(temp(_, _), _, _, !N, !DeclSet, !IO).
 output_lval_decls_format(mem_ref(Rval), FirstIndent, LaterIndent,
@@ -5090,6 +5080,15 @@
     io.write_string("MR_sv(", !IO),
     io.write_int(N, !IO),
     io.write_string(")", !IO).
+output_lval(parent_stackvar(N), !IO) :-
+    ( N =< 0 ->
+        unexpected(this_file, "parent stack var out of range")
+    ;
+        true
+    ),
+    io.write_string("MR_parent_sv(", !IO),
+    io.write_int(N, !IO),
+    io.write_string(")", !IO).
 output_lval(framevar(N), !IO) :-
     ( N =< 0 ->
         unexpected(this_file, "frame var out of range")
@@ -5103,6 +5102,8 @@
     io.write_string("MR_succip", !IO).
 output_lval(sp, !IO) :-
     io.write_string("MR_sp", !IO).
+output_lval(parent_sp, !IO) :-
+    io.write_string("MR_parent_sp", !IO).
 output_lval(hp, !IO) :-
     io.write_string("MR_hp", !IO).
 output_lval(maxfr, !IO) :-
@@ -5183,6 +5184,15 @@
     io.write_string("MR_sv(", !IO),
     io.write_int(N, !IO),
     io.write_string(")", !IO).
+output_lval_for_assign(parent_stackvar(N), word, !IO) :-
+    ( N < 0 ->
+        unexpected(this_file, "parent stack var out of range")
+    ;
+        true
+    ),
+    io.write_string("MR_parent_sv(", !IO),
+    io.write_int(N, !IO),
+    io.write_string(")", !IO).
 output_lval_for_assign(framevar(N), word, !IO) :-
     ( N =< 0 ->
         unexpected(this_file, "frame var out of range")
@@ -5196,6 +5206,8 @@
     io.write_string("MR_succip_word", !IO).
 output_lval_for_assign(sp, word, !IO) :-
     io.write_string("MR_sp_word", !IO).
+output_lval_for_assign(parent_sp, data_ptr, !IO) :-
+    io.write_string("MR_parent_sp", !IO).
 output_lval_for_assign(hp, word, !IO) :-
     io.write_string("MR_hp_word", !IO).
 output_lval_for_assign(maxfr, word, !IO) :-
@@ -5308,6 +5320,8 @@
     "MR_fv(" ++ int_to_string(N) ++ ")".
 lval_to_string(stackvar(N)) =
     "MR_sv(" ++ int_to_string(N) ++ ")".
+lval_to_string(parent_stackvar(N)) =
+    "MR_parent_sv(" ++ int_to_string(N) ++ ")".
 lval_to_string(reg(RegType, RegNum)) =
     "reg(" ++ reg_to_string(RegType, RegNum) ++ ")".
 
@@ -5397,6 +5411,9 @@
         Slot = det_slot(SlotNum),
         StackStr = "sv"
     ;
+        Slot = parent_det_slot(SlotNum),
+        StackStr = "parent_sv"
+    ;
         Slot = nondet_slot(SlotNum),
         StackStr = "fv"
     ),
Index: compiler/middle_rec.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/middle_rec.m,v
retrieving revision 1.122
diff -u -r1.122 middle_rec.m
--- compiler/middle_rec.m	22 Aug 2006 05:03:53 -0000	1.122
+++ compiler/middle_rec.m	11 Sep 2006 05:04:48 -0000
@@ -536,9 +536,7 @@
     find_used_registers_components(Components, !Used).
 find_used_registers_instr(init_sync_term(Lval, _), !Used) :-
     find_used_registers_lval(Lval, !Used).
-find_used_registers_instr(fork(_, _, _), !Used).
-find_used_registers_instr(join_and_terminate(Lval), !Used) :-
-    find_used_registers_lval(Lval, !Used).
+find_used_registers_instr(fork(_), !Used).
 find_used_registers_instr(join_and_continue(Lval, _), !Used) :-
     find_used_registers_lval(Lval, !Used).
 
Index: compiler/opt_debug.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/opt_debug.m,v
retrieving revision 1.175
diff -u -r1.175 opt_debug.m
--- compiler/opt_debug.m	4 Sep 2006 01:47:33 -0000	1.175
+++ compiler/opt_debug.m	11 Sep 2006 05:04:48 -0000
@@ -242,6 +242,8 @@
     dump_reg(Type, Num).
 dump_lval(stackvar(N)) =
     "sv" ++ int_to_string(N).
+dump_lval(parent_stackvar(N)) =
+    "parent_sv" ++ int_to_string(N).
 dump_lval(framevar(N)) =
     "fv" ++ int_to_string(N).
 dump_lval(succip) = "succip".
@@ -259,6 +261,7 @@
     "succip_slot(" ++ dump_rval(R) ++ ")".
 dump_lval(hp) = "hp".
 dump_lval(sp) = "sp".
+dump_lval(parent_sp) = "parent_sp".
 dump_lval(field(MT, N, F)) = Str :-
     (
         MT = yes(T),
@@ -773,15 +776,10 @@
     ;
         Instr = init_sync_term(Lval, N),
         Str = "init_sync_term(" ++ dump_lval(Lval) ++ ", "
-            ++ int_to_string(N) ++")"
+            ++ int_to_string(N) ++ ")"
     ;
-        Instr = fork(Child, Parent, NumSlots),
-        Str = "fork(" ++ dump_label(ProcLabel, Child) ++ ", "
-            ++ dump_label(ProcLabel, Parent) ++ ", "
-            ++ int_to_string(NumSlots) ++ ")"
-    ;
-        Instr = join_and_terminate(Lval),
-        Str = "join_and_terminate(" ++ dump_lval(Lval) ++ ")"
+        Instr = fork(Child),
+        Str = "fork(" ++ dump_label(ProcLabel, Child) ++ ")"
     ;
         Instr = join_and_continue(Lval, Label),
         Str = "join(" ++ dump_lval(Lval) ++ ", "
Index: compiler/opt_util.m
===================================================================
RCS file: /home/mercury/mercury1/repository/mercury/compiler/opt_util.m,v
retrieving revision 1.153
diff -u -r1.153 opt_util.m
--- compiler/opt_util.m	22 Aug 2006 05:04:01 -0000	1.153
+++ compiler/opt_util.m	11 Sep 2006 05:04:48 -0000
@@ -642,6 +642,7 @@
 
 lval_refers_stackvars(reg(_, _)) = no.
 lval_refers_stackvars(stackvar(_)) = yes.
+lval_refers_stackvars(parent_stackvar(_)) = yes.
 lval_refers_stackvars(framevar(_)) = yes.
 lval_refers_stackvars(succip) = no.
 lval_refers_stackvars(maxfr) = no.
@@ -653,6 +654,7 @@
 lval_refers_stackvars(succip_slot(_)) = yes.
 lval_refers_stackvars(hp) = no.
 lval_refers_stackvars(sp) = no.
+lval_refers_stackvars(parent_sp) = no.
 lval_refers_stackvars(field(_, Rval, FieldNum)) =
     bool.or(
         rval_refers_stackvars(Rval),
@@ -861,12 +863,9 @@
         Uinstr = init_sync_term(Lval, _),
         Refers = lval_refers_stackvars(Lval)
     ;
-        Uinstr = fork(_, _, _),
+        Uinstr = fork(_),
         Refers = yes
     ;
-        Uinstr = join_and_terminate(Lval),
-        Refers = lval_refers_stackvars(Lval)
-    ;
         Uinstr = join_and_continue(Lval, _),
         Refers = lval_refers_stackvars(Lval)
     ).
@@ -1005,8 +1004,7 @@
 can_instr_branch_away(decr_sp(_), no).
 can_instr_branch_away(decr_sp_and_return(_), yes).
 can_instr_branch_away(init_sync_term(_, _), no).
-can_instr_branch_away(fork(_, _, _), yes).
-can_instr_branch_away(join_and_terminate(_), no).
+can_instr_branch_away(fork(_), no).
 can_instr_branch_away(join_and_continue(_, _), yes).
 can_instr_branch_away(pragma_c(_, Comps, _, _, _, _, _, _, _), BranchAway) :-
     can_components_branch_away(Comps, BranchAway).
@@ -1082,8 +1080,7 @@
 can_instr_fall_through(decr_sp(_), yes).
 can_instr_fall_through(decr_sp_and_return(_), no).
 can_instr_fall_through(init_sync_term(_, _), yes).
-can_instr_fall_through(fork(_, _, _), no).
-can_instr_fall_through(join_and_terminate(_), no).
+can_instr_fall_through(fork(_), yes).
 can_instr_fall_through(join_and_continue(_, _), no).
 can_instr_fall_through(pragma_c(_, _, _, _, _, _, _, _, _), yes).
 
@@ -1129,8 +1126,7 @@
 can_use_livevals(decr_sp(_), no).
 can_use_livevals(decr_sp_and_return(_), yes).
 can_use_livevals(init_sync_term(_, _), no).
-can_use_livevals(fork(_, _, _), no).
-can_use_livevals(join_and_terminate(_), no).
+can_use_livevals(fork(_), no).
 can_use_livevals(join_and_continue(_, _), no).
 can_use_livevals(pragma_c(_, _, _, _, _, _, _, _, _), no).
 
@@ -1198,8 +1194,7 @@
     % so late that this predicate should never be invoked on such instructions.
     unexpected(this_file, "instr_labels_2: decr_sp_and_return").
 instr_labels_2(init_sync_term(_, _), [], []).
-instr_labels_2(fork(Child, Parent, _), [Child, Parent], []).
-instr_labels_2(join_and_terminate(_), [], []).
+instr_labels_2(fork(Child), [Child], []).
 instr_labels_2(join_and_continue(_, Label), [Label], []).
 instr_labels_2(pragma_c(_, _, _, MaybeFixLabel, MaybeLayoutLabel,
         MaybeOnlyLayoutLabel, MaybeSubLabel, _, _), Labels, []) :-
@@ -1257,8 +1252,7 @@
     % See the comment in instr_labels_2.
     unexpected(this_file, "possible_targets: decr_sp_and_return").
 possible_targets(init_sync_term(_, _), [], []).
-possible_targets(fork(Child, Parent, _), [Child, Parent], []).
-possible_targets(join_and_terminate(_), [], []).
+possible_targets(fork(_Child), [], []).
 possible_targets(join_and_continue(_, L), [L], []).
 possible_targets(pragma_c(_, _, _, MaybeFixedLabel, MaybeLayoutLabel,
         _, MaybeSubLabel, _, _), Labels, []) :-
@@ -1329,8 +1323,7 @@
 instr_rvals_and_lvals(decr_sp(_), [], []).
 instr_rvals_and_lvals(decr_sp_and_return(_), [], []).
 instr_rvals_and_lvals(init_sync_term(Lval, _), [], [Lval]).
-instr_rvals_and_lvals(fork(_, _, _), [], []).
-instr_rvals_and_lvals(join_and_terminate(Lval), [], [Lval]).
+instr_rvals_and_lvals(fork(_), [], []).
 instr_rvals_and_lvals(join_and_continue(Lval, _), [], [Lval]).
 instr_rvals_and_lvals(pragma_c(_, Cs, _, _, _, _, _, _, _),
         Rvals, Lvals) :-
@@ -1475,9 +1468,7 @@
 count_temps_instr(decr_sp_and_return(_), !R, !F).
 count_temps_instr(init_sync_term(Lval, _), !R, !F) :-
     count_temps_lval(Lval, !R, !F).
-count_temps_instr(fork(_, _, _), !R, !F).
-count_temps_instr(join_and_terminate(Lval), !R, !F) :-
-    count_temps_lval(Lval, !R, !F).
+count_temps_instr(fork(_), !R, !F).
 count_temps_instr(join_and_continue(Lval, _), !R, !F) :-
     count_temps_lval(Lval, !R, !F).
 count_temps_instr(pragma_c(_, _, _, _, _, _, _, _, _), !R, !F).
@@ -1580,8 +1571,7 @@
         ; Uinstr = save_maxfr(_)
         ; Uinstr = restore_maxfr(_)
         ; Uinstr = init_sync_term(_, _)     % This is a safe approximation.
-        ; Uinstr = fork(_, _, _)            % This is a safe approximation.
-        ; Uinstr = join_and_terminate(_)    % This is a safe approximation.
+        ; Uinstr = fork(_)                  % This is a safe approximation.
         ; Uinstr = join_and_continue(_, _)  % This is a safe approximation.
         ),
         Touch = yes
@@ -1614,6 +1604,7 @@
 
 touches_nondet_ctrl_lval(reg(_, _), no).
 touches_nondet_ctrl_lval(stackvar(_), no).
+touches_nondet_ctrl_lval(parent_stackvar(_), no).
 touches_nondet_ctrl_lval(framevar(_), no).
 touches_nondet_ctrl_lval(succip, no).
 touches_nondet_ctrl_lval(maxfr, yes).
@@ -1625,6 +1616,7 @@
 touches_nondet_ctrl_lval(succip_slot(_), yes).
 touches_nondet_ctrl_lval(hp, no).
 touches_nondet_ctrl_lval(sp, no).
+touches_nondet_ctrl_lval(parent_sp, no).
 touches_nondet_ctrl_lval(field(_, Rval1, Rval2), Touch) :-
     touches_nondet_ctrl_rval(Rval1, Touch1),
     touches_nondet_ctrl_rval(Rval2, Touch2),
@@ -1688,6 +1680,7 @@
 
 lval_access_rvals(reg(_, _), []).
 lval_access_rvals(stackvar(_), []).
+lval_access_rvals(parent_stackvar(_), []).
 lval_access_rvals(framevar(_), []).
 lval_access_rvals(succip, []).
 lval_access_rvals(maxfr, []).
@@ -1699,6 +1692,7 @@
 lval_access_rvals(succfr_slot(Rval), [Rval]).
 lval_access_rvals(hp, []).
 lval_access_rvals(sp, []).
+lval_access_rvals(parent_sp, []).
 lval_access_rvals(field(_, Rval1, Rval2), [Rval1, Rval2]).
 lval_access_rvals(temp(_, _), []).
 lval_access_rvals(lvar(_), _) :-
@@ -1963,15 +1957,9 @@
 replace_labels_instr(incr_sp(Size, Msg), _, _, incr_sp(Size, Msg)).
 replace_labels_instr(decr_sp(Size), _, _, decr_sp(Size)).
 replace_labels_instr(decr_sp_and_return(Size), _, _, decr_sp_and_return(Size)).
-replace_labels_instr(init_sync_term(T, N), _, _,
-        init_sync_term(T, N)).
-replace_labels_instr(fork(Child0, Parent0, SlotCount), Replmap, _,
-        fork(Child, Parent, SlotCount)) :-
-    replace_labels_label(Child0, Replmap, Child),
-    replace_labels_label(Parent0, Replmap, Parent).
-replace_labels_instr(join_and_terminate(Lval0), Replmap, _,
-        join_and_terminate(Lval)) :-
-    replace_labels_lval(Lval0, Replmap, Lval).
+replace_labels_instr(init_sync_term(T, N), _, _, init_sync_term(T, N)).
+replace_labels_instr(fork(Child0), Replmap, _, fork(Child)) :-
+    replace_labels_label(Child0, Replmap, Child).
 replace_labels_instr(join_and_continue(Lval0, Label0),
         Replmap, _, join_and_continue(Lval, Label)) :-
     replace_labels_label(Label0, Replmap, Label),
@@ -2041,6 +2029,7 @@
 
 replace_labels_lval(reg(RegType, RegNum), _, reg(RegType, RegNum)).
 replace_labels_lval(stackvar(N), _, stackvar(N)).
+replace_labels_lval(parent_stackvar(N), _, parent_stackvar(N)).
 replace_labels_lval(framevar(N), _, framevar(N)).
 replace_labels_lval(succip, _, succip).
 replace_labels_lval(maxfr, _, maxfr).
@@ -2057,6 +2046,7 @@
     replace_labels_rval(Rval0, ReplMap, Rval).
 replace_labels_lval(hp, _, hp).
 replace_labels_lval(sp, _, sp).
+replace_labels_lval(parent_sp, _, parent_sp).
 replace_labels_lval(field(Tag, Base0, Offset0), ReplMap,
         field(Tag, Base, Offset)) :-
     replace_labels_rval(Base0, ReplMap, Base),
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list