[m-rev.] for review: parallel execution mechanism (1/2)
Julien Fischer
juliensf at csse.unimelb.edu.au
Wed Sep 20 15:06:03 AEST 2006
On Tue, 12 Sep 2006, Peter Wang wrote:
> Estimated hours taken: 100
> Branches: main
>
> This patch changes the parallel execution mechanism in the low level backend.
> The main idea is that, even in programs with only moderate parallelism, we
> won't have enough processors to exploit it all. We should try to reduce the
> cost in the common case, i.e. when a parallel conjunction gets executed
> sequentially. This patch does two things along those lines:
>
> (1) Instead of unconditionally executing all parallel conjuncts (but the last)
> in separate Mercury contexts, we allow a context to continue execution of the
> next conjunct of a parallel conjunction if it has just finished executing the
> previous conjunct. This saves on allocating unnecessary contexts, which can
> be a big reduction in memory usage.
>
> Notice we also try to execute conjuncts left-to-right so as to minimise the
> need to suspend contexts when there are dependencies between conjuncts.
Delete "Notice".
> (2) Conjuncts that *are* executed in parallel still need separate contexts.
> We used to pass variable bindings to those conjuncts by flushing input
> variable values to stack slots and copying the procedure's stack frame to the
> new context. When the conjunct finished, we would copy new variable bindings
> back to stack slots in the original context.
>
> What happens now is that we don't do any copying back and forth. We introduce
> a new abstract machine register `parent_sp' which points to the location of
> the stack pointer at the time that a parallel conjunction began. In parallel
> conjuncts we refer to all stack slots via the `parent_sp' pointer, since we
> could be running on a different context altogether and `sp' would be pointing
> into a new detstack. Since parallel conjuncts now share the procedure's stack
> frame, we have to allocate stack slots such that all parallel conjuncts in a
> procedure that could be executing simultaneously have distinct sets of stack
> slots. We currently use the simplest possible strategy, i.e. don't allow
> variables in parallel conjuncts to reuse stack slots.
>
> Note: in effect parent_sp is a frame pointer which is only set for and used by
> the code of parallel conjuncts. We don't call it a frame pointer as it can be
> confused with "frame variables" have to do with the nondet stack.
That last sentence doesn't parse.
> compiler/code_info.m:
> Add functionality to keep track of how deep inside of nested parallel
> conjunctions the code generator is.
>
> Add functionality to acquire and release "persistent" temporary stack
> slots. Unlike normal temporary stack slots, these don't get implicitly
> released when the code generator's location-dependent state is reset.
>
> Conform to additions of `parent_sp' and parent stack variables.
>
> compiler/exprn_aux.m:
> Generalise the `substitute_lval_in_*' predicates by
> `transform_lval_in_*' predicates. Instead of performing a fixed
> substitution, these take a higher order predicate which performs some
> operation on each lval. Redefine the substitution predicates in terms
> of the transformation predicates.
>
> Conform to changes in `fork', `join_and_terminate' and
> `join_and_continue' instructions.
>
> Conform to additions of `parent_sp' and parent stack variables.
>
> Remove `substitute_rval_in_args' and `substitute_rval_in_arg' which
> were unused.
>
> compiler/live_vars.m:
> Introduce a new type `parallel_stackvars' which is threaded through
> `build_live_sets_in_goal'. We accumulate the sets of variables which
> are assigned stack slots in each parallel conjunct. At the end of
> processing a parallel conjunction, use this information to force
> variables which are assigned stack slots to have distinct slots.
>
> compiler/llds.m:
> Change the semantics of the `fork' instruction. It now takes a single
> argument: the label of the next conjunct after the current one. The
> instruction now "sparks" the next conjunct to be run, either in a
> different context (possibly in parallel, on another Mercury engine) or
> is queued to be executed in the current context after the current
> conjunct is finished.
>
> Change the semantics of the `join_and_continue' instruction. This
> instruction now serves to end all parallel conjuncts, not just the
> last one in a parallel conjunction.
>
> Remove the `join_and_terminate' instruction (no longer used).
>
> Add the new abstract machine register `parent_sp'.
>
> Introduce "parent stack slots", which are the same as normal stack
> slots but relative to the `parent_sp' register.
s/the same as/similar to/
> compiler/par_conj_gen.m:
> Change the code generated for parallel conjunctions. That is:
>
> - use the new `fork' instruction at the beginning of a parallel
> conjunct;
>
> - use the `join_and_continue' instruction at the end of all parallel
> conjuncts;
>
> - keep track of how deep the code generator currently is in parallel
> conjunctions;
>
> - set and restore the `parent_sp' register when entering a non-nested
> parallel conjunction;
>
> - after generating the code of a parallel conjunct, replace all
> references to stack slots by parent stack slots;
>
> - remove code to copy back output variables when a parallel conjunct
> finishes.
>
> Update some comments.
>
> runtime/mercury_context.c:
> runtime/mercury_context.h:
> Add the type `MR_Spark'. Sparks are allocated on the heap and contain
> enough information to begin execution of a single parallel conjunct.
>
> Add globals `MR_spark_queue_head' and `MR_spark_queue_tail'. These
> are pointers to the start and end of a global queue of sparks. Idle
> engines can pick up work from this queue in the same way that they can
> pick up work from the global context queue (the "run queue").
>
> Add new fields to the MR_Context structure. `MR_ctxt_parent_sp' is a
> saved copy of the `parent_sp' register for when the context is
> suspended. `MR_ctxt_spark_stack' is a stack of sparks that we decided
> not to put on the global spark queue.
>
> Update `MR_load_context' and `MR_save_context' to save and restore
> `MR_ctxt_parent_sp'.
>
> Add the counters `MR_num_idle_engines' and
> `MR_num_outstanding_contexts_and_sparks'. These are used to decide,
> when a `fork' instruction is reached, whether a spark should be put on
> the global spark queue (with potential for parallelism but also more
> overhead) or on the calling context's spark stack (no parallelism and
> less overhead).
>
> Rename `MR_init_context' to `MR_init_context_maybe_generator'. When
> initialising contexts, don't reset redzones of already allocated
> stacks. It seems to be unnecessary (and the reset implementation is
> buggy anyway, though it's fine on Linux).
>
> Rename `MR_schedule' to `MR_schedule_context'. Add new functions
> `MR_schedule_spark_globally' and `MR_schedule_spark_locally'.
>
> In `MR_do_runnext', add code for idle engines to get work from the
> global spark queue. Resuming contexts are prioritised over sparks.
>
> Rename `MR_fork_new_context' to `MR_fork_new_child'. Change the
> definitions of `MR_fork_new_child' and `MR_join_and_continue' as per
> the new behaviour of the `fork' and `join_and_continue' instructions.
> Delete `MR_join_and_terminate'.
>
> Add a new field `MR_st_orig_context' to the MR_SyncTerm structure to
> record which context originated the parallel conjunction instance
> represented by a MR_SyncTerm instance, and update `MR_init_sync_term'.
> This is needed by the new behaviour of `MR_join_and_continue'.
>
> Update some comments.
>
> runtime/mercury_engine.h:
> runtime/mercury_regs.c:
> runtime/mercury_regs.h:
> runtime/mercury_stacks.h:
> Add the abstract machine register `parent_sp' and code to copy it to
> and from the fake_reg array.
>
> Add a macro `MR_parent_sv' to access stack slots via `parent_sp'.
>
> Add `MR_eng_parent_sp' to the MercuryEngine structure.
>
> runtime/mercury_wrapper.c:
> runtime/mercury_wrapper.h:
> Add Mercury runtime option `--max-contexts-per-thread' which is saved
> in the global variable `MR_max_contexts_per_thread'. The number
> `MR_max_outstanding_contexts' is derived from this. It sets a soft
> limit on the number of sparks we put in the global spark queue,
> relative to the number of threads we are running. We don't want to
> put too many sparks on the global queue if there are plenty of ready
> contexts or sparks already on the global queues, as they are likely to
> result in new contexts being allocated.
>
> When initially creating worker engines, wait until all the worker
> engines have acknowledged that they are idle before continuing. This
> is mainly so programs (especially benchmarks and test cases) with only
> a few fork instructions near the beginning of the program don't
> execute the forks before any worker engines are ready, resulting in no
> parallelism.
>
> runtime/mercury_engine.c:
> runtime/mercury_thread.c:
> Don't allocate a context at the time a Mercury engine is created. An
> engine only needs a new context when it is about to pick up a spark.
>
> configure.in:
> compiler/options.m:
> scripts/Mercury.config.in:
> Update to reflect the extra field in MR_SyncTerm.
>
> Add the option `--sync-term-size' and actually make use the result of
> the sync term size calculated during configuration.
>
> compiler/code_util.m:
> compiler/continuation_info.m:
> compiler/dupelim.m:
> compiler/dupproc.m:
> compiler/global_data.m:
> compiler/hlds_llds.m:
> compiler/jumpopt.m:
> compiler/livemap.m:
> compiler/llds_out.m:
> compiler/middle_rec.m:
> compiler/opt_debug.m:
> compiler/opt_util.m:
> compiler/reassign.m:
> compiler/stack_layout.m:
> compiler/use_local_vars.m:
> compiler/var_locn.m:
> Conform to changes in `fork', `join_and_terminate' and
> `join_and_continue' instructions.
>
> Conform to additions of `parent_sp' and parent stack variables.
>
> XXX not sure about the changes in stack_layout.m
>
> library/par_builtin.m:
> Conform to changes in the runtime system.
...
> Index: compiler/code_info.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/code_info.m,v
> retrieving revision 1.330
> diff -u -r1.330 code_info.m
> --- compiler/code_info.m 6 Sep 2006 04:02:55 -0000 1.330
> +++ compiler/code_info.m 11 Sep 2006 05:04:47 -0000
> @@ -167,6 +167,14 @@
> %
> :- pred set_instmap(instmap::in, code_info::in, code_info::out) is det.
>
> + % The depth of nested parallel conjunctions.
> + %
> +:- pred get_par_conj_depth(code_info::in, int::out) is det.
That should say something like: return current nesting depth for
parallel conjunctions.
> @@ -252,6 +260,11 @@
> :- pred set_temp_content_map(map(lval, slot_contents)::in,
> code_info::in, code_info::out) is det.
>
> +:- pred get_persistent_temps(code_info::in, set(lval)::out) is det.
> +
> +:- pred set_persistent_temps(set(lval)::in,
> + code_info::in, code_info::out) is det.
> +
> :- pred set_closure_layouts(list(layout_data)::in,
> code_info::in, code_info::out) is det.
>
> @@ -363,8 +376,14 @@
> % fields below. Any keys in that map which
> % are not in this set are free for reuse.
>
> - fail_info :: fail_info
> + fail_info :: fail_info,
> % Information about how to manage failures.
> +
> + par_conj_depth :: int
> + % How deep in a nested parallel conjunction
> + % we are. This is zero at the beginning of
> + % a procedure and increments as we enter
> + % parallel conjunctions.
s/increments/incremented/
> ).
>
> :- type code_info_persistent
> @@ -399,6 +418,11 @@
> % slot contains after the end of the
> % branched control structure.
>
> + persistent_temps :: set(lval),
> + % Stack slot locations that should not be
> + % released event when the code generator
s/event/even/
> + % resets its location-dependent state.
> +
> closure_layout_seq :: counter,
>
> closure_layouts :: list(layout_data),
...
> @@ -1034,9 +1065,14 @@
> remember_position(CI, position_info(CI)).
>
> reset_to_position(position_info(PosCI), CurCI, NextCI) :-
> - PosCI = code_info(_, LocDep, _),
> - CurCI = code_info(Static, _, Persistent),
> - NextCI = code_info(Static, LocDep, Persistent).
> + PosCI = code_info(_, LocDep, _),
> + CurCI = code_info(Static, _, Persistent),
> + NextCI0 = code_info(Static, LocDep, Persistent),
> +
> + get_persistent_temps(NextCI0, PersistentTemps),
> + get_temps_in_use(NextCI0, TempsInUse0),
> + set.union(PersistentTemps, TempsInUse0, TempsInUse),
> + set_temps_in_use(TempsInUse, NextCI0, NextCI).
>
> reset_resume_known(BranchStart, !CI) :-
> BranchStart = position_info(BranchStartCI),
> @@ -3352,6 +3388,8 @@
> ;
> AbsLocn = abs_stackvar(_)
> ;
> + AbsLocn = abs_parent_stackvar(_)
> + ;
> AbsLocn = abs_framevar(_)
> ).
>
...
> Index: compiler/llds.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/llds.m,v
> retrieving revision 1.339
> diff -u -r1.339 llds.m
> --- compiler/llds.m 4 Sep 2006 01:47:31 -0000 1.339
> +++ compiler/llds.m 11 Sep 2006 05:04:48 -0000
> @@ -481,24 +481,17 @@
> % documentation in par_conj_gen.m and runtime/mercury_context.{c,h}
> % for further information about synchronisation terms.)
>
> - ; fork(label, label, int)
> - % Create a new context. fork(Child, Parent, NumSlots) creates
> - % a new thread which will start executing at Child. After spawning
> - % execution in the child, control branches to Parent. NumSlots is
> - % the number of stack slots that need to be copied to the child's
> - % stack (see comments in runtime/mercury_context.{h,c}).
> -
> - ; join_and_terminate(lval)
> - % Signal that this thread of execution has finished in the current
> - % parallel conjunction, then terminate it. The synchronisation term
> - % is specified by the given lval. (See the documentation in
> - % par_conj_gen.m and runtime/mercury_context.{c,h} for further
> - % information about synchronisation terms.)
> + ; fork(label)
> + % Create a new spark. fork(Child) creates spark, to begin execution
> + % at Child. Control continues at the next instruction.
>
> ; join_and_continue(lval, label).
> % Signal that this thread of execution has finished in the current
> - % parallel conjunction, then branch to the given label. The
> - % synchronisation term is specified by the given lval.
> + % parallel conjunct. How to proceed at the end of a parallel
> + % conjunct is quite involved; see runtime/mercury_context.{c,h}.
I suggest:
For details of how we proceed at the end of a parallel conjunct
see runtime/mercury_context.{c,h}.
> + % The synchronisation term is specified by the given lval.
> + % The label gives the address of the code following the parallel
> + % conjunction.
>
> :- type nondet_frame_info
> ---> temp_frame(
> @@ -777,7 +770,15 @@
> % Virtual machine register holding the heap pointer.
>
> ; sp
> - % Virtual machine register point to the top of det stack.
> + % Virtual machine register pointing to the top of det stack.
> +
> + ; parent_sp
> + % Virtual machine register pointing to the top of the det stack.
> + % This is only set at the beginning of a parallel conjunction (and
> + % restored afterwards). Parallel conjuncts which refer to stack
> + % slots use this register instead of sp, as they could be running
> + % in a different context, where sp would be pointing into a
> + % different det stack.
>
> ; temp(reg_type, int)
> % A local temporary register. These temporary registers are
> @@ -795,6 +796,11 @@
> % current value of `sp'. These are used in both det and semidet
> % code. Stackvar slot numbers start at 1.
>
> + ; parent_stackvar(int)
> + % A det stack slot. The number is the offset relative to the
> + % value of `parent_sp'. These are used only in the code
> + % of parallel conjuncts. Stackvar slot numbers start at 1.
> +
> ; framevar(int)
> % A nondet stack slot. The reference is relative to the current
> % value of `curfr'. These are used in nondet code. Framevar slot
To be continued ...
Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list