[m-rev.] for review: improve parallel execution mechanism
Julien Fischer
juliensf at csse.unimelb.edu.au
Tue Sep 4 19:15:37 AEST 2007
On Tue, 4 Sep 2007, Peter Wang wrote:
> Branches: main
>
> Make the parallel conjunction execution mechanism more efficient.
>
> 1. Don't allocate sync terms on the heap. Sync terms are now allocated in
> the stack frame of the procedure call which originates a parallel
> conjunction.
>
> 2. Don't allocate individual sparks on the heap. Sparks are now stored in
> preallocated, growing arrays using an algorithm that doesn't use locks.
preallocated, growing arrays?
> 3. Don't have one mutex per sync term. Just use one mutex to protect
> concurrent accesses to all sync terms (it's is rarely needed anyway). This
> makes sync terms smaller and saves initialising a mutex for each parallel
> conjunction encountered.
>
> 4. We don't bother to acquire the global sync term lock if we know a parallel
> conjunction couldn't be executing in parallel. In a highly parallel program,
> the majority of parallel conjunctions will be executed sequentially so
> protecting the sync terms from concurrent accesses is unnecessary.
>
>
> par_fib(39) is ~8.4 times faster (user time) on my laptop (Linux 2.6, x86_64),
Did you try any other benchmarks?
> which is ~3.5 as slow as sequential execution.
How much of that remaining overhead is down to GC?
> configure.in:
> Update the configuration for a changed MR_SyncTerm structure.
>
> compiler/llds.m:
> Make the fork instruction take a second argument, which is the base
> stack slot of the sync term.
>
> Rename it to fork_new_child to match the macro name in the runtime.
>
> compiler/par_conj_gen.m:
> Change the generated code for parallel conjunctions to allocate sync
> terms on the stack and to pass the sync term to fork_new_child.
>
> compiler/dupelim.m:
> compiler/dupproc.m:
> compiler/exprn_aux.m:
> compiler/global_data.m:
> compiler/jumpopt.m:
> compiler/livemap.m:
> compiler/llds_out.m:
> compiler/llds_to_x86_64.m:
> compiler/middle_rec.m:
> compiler/opt_debug.m:
> compiler/opt_util.m:
> compiler/reassign.m:
> compiler/use_local_vars.m:
> Conform to the change in the fork instruction.
>
> compiler/liveness.m:
> compiler/proc_gen.m:
> Disable use of the parallel conjunction operator in the compiler as
> older versions of the compiler will generate code incompatible with
> the new runtime.
I take it you plan to add them back after this change has bootchecked?
> runtime/mercury_context.c:
> runtime/mercury_context.h:
> Remove the next pointer field from MR_Spark as it's no longer needed.
>
> Remove the mutex from MR_SyncTerm. Add a field to record if a spark
> belonging to the sync term was scheduled globally, i.e. if the
> parallel conjunction might be executed in parallel.
>
> Define MR_SparkDeque and MR_SparkArray.
>
> Use MR_SparkDeques to hold per-context sparks and global sparks.
>
> Change the abstract machine instructions MR_init_sync_term,
> MR_fork_new_child, MR_join_and_continue as per the main change log.
>
> Use a preprocessor macro MR_LL_PARALLEL_CONJ as a shorthand for
> !MR_HIGHLEVEL_CODE && MR_THREAD_SAFE.
>
> Take the opportunity to clean things up a bit.
>
> runtime/mercury_wsdeque.c:
> runtime/mercury_wsdeque.h:
> New files containing an implementation of work-stealing deques. We
> don't do work stealing yet but we use the underlying data structure.
>
> runtime/mercury_atomic.h:
> New file to contain atomic operations. Currently it just contains
> compare-and-swap for gcc/x86_64, gcc/x86 and gcc-4.1.
I think mercury_atomic_ops.h would be a better name.
...
> Index: ./compiler/par_conj_gen.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/par_conj_gen.m,v
> retrieving revision 1.35
> diff -u -r1.35 par_conj_gen.m
> --- ./compiler/par_conj_gen.m 7 Aug 2007 07:10:01 -0000 1.35
> +++ ./compiler/par_conj_gen.m 4 Sep 2007 07:30:35 -0000
> @@ -74,11 +74,10 @@
> % finished, causes the code following the parallel conjunction to execute in
> % the context that originated the parallel conjunction. If the originating
> % context can't execute the next conjunct and the parallel conjunction isn't
> -% finished, it must suspend and store its address in the sync term. When a
> -% non-originating context later finds that the parallel conjunction _is_
> -% finished, it will then cause the originating context to resume execution
> -% at the join point. Please see the implementation of MR_join_and_continue()
> -% for the details.
> +% finished, it must suspend. When a non-originating context later finds that
> +% the parallel conjunction _is_ finished, it will then cause the originating
> +% context to resume execution at the join point. Please see the
> +% implementation of MR_join_and_continue() for the details.
> %
> % The runtime support for parallel conjunction is documented in the runtime
> % directory in mercury_context.{c,h}.
...
> --- ./compiler/liveness.m 7 Aug 2007 07:09:57 -0000 1.156
> +++ ./compiler/liveness.m 4 Sep 2007 07:30:35 -0000
> @@ -241,7 +241,7 @@
> list.split_list(1000, PredIds, HeadPredIds, TailPredIds)
> then
> ( detect_liveness_preds_parallel_3(HeadPredIds, HLDS0, !HLDS)
> - & detect_liveness_preds_parallel_2(TailPredIds, HLDS0, !HLDS)
> + , detect_liveness_preds_parallel_2(TailPredIds, HLDS0, !HLDS)
> )
> else
> detect_liveness_preds_parallel_3(PredIds, HLDS0, !HLDS)
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/proc_gen.m,v
> retrieving revision 1.22
> diff -u -r1.22 proc_gen.m
> --- ./compiler/proc_gen.m 13 Aug 2007 03:01:43 -0000 1.22
> +++ ./compiler/proc_gen.m 4 Sep 2007 07:30:35 -0000
> @@ -171,7 +171,7 @@
> list.map_foldl(generate_pred_code_par(ModuleInfo0),
> PredIdsA, PredProceduresA, GlobalData0, GlobalDataA),
> list.condense(PredProceduresA, ProceduresA)
> - &
> + ,
> list.condense(ListsOfPredIdsB, PredIdsB),
> GlobalData1 = bump_type_num_counter(GlobalData0, type_num_skip),
> list.map_foldl(generate_pred_code_par(ModuleInfo0),
Add an XXX comment mentioning that this should be a parallel
conjunction. (And in the module above.)
> Index: ./compiler/llds.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/llds.m,v
> retrieving revision 1.352
> diff -u -r1.352 llds.m
> --- ./compiler/llds.m 20 Aug 2007 03:35:57 -0000 1.352
> +++ ./compiler/llds.m 4 Sep 2007 07:30:35 -0000
> @@ -546,16 +546,19 @@
> % duplicated by jump optimization.
>
> ; init_sync_term(lval, int)
> - % Initialize a synchronization term. The first argument contains
> - % the lvalue into which we will store the synchronization term,
> - % and the second argument indicates how many branches we expect
> - % to join at the end of the parallel conjunction. (See the
> - % documentation in par_conj_gen.m and runtime/mercury_context.{c,h}
> - % for further information about synchronisation terms.)
> -
> - ; fork(label)
> - % Create a new spark. fork(Child) creates spark, to begin execution
> - % at Child. Control continues at the next instruction.
> + % Initialize a synchronization term, which is a continuous number
> + % of slots on the detstack. The first argument contains the base
> + % address of the synchronization term. The second argument
> + % indicates how many branches we expect to join at the end of the
> + % parallel conjunction. (See the documentation in par_conj_gen.m
> + % and runtime/mercury_context.{c,h} for further information about
> + % synchronisation terms.)
> +
> + ; fork_new_child(lval, label)
> + % Create a new spark. fork(SyncTerm, Child) creates spark, to begin
> + % execution at Child, where SyncTerm contains the base address of
> + % the synchronisation term. Control continues at the next
> + % instruction.
>
> ; join_and_continue(lval, label).
> % Signal that this thread of execution has finished in the current
> Index: ./runtime/mercury_context.h
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.h,v
> retrieving revision 1.43
> diff -u -r1.43 mercury_context.h
> --- ./runtime/mercury_context.h 10 May 2007 05:24:16 -0000 1.43
> +++ ./runtime/mercury_context.h 4 Sep 2007 07:30:35 -0000
> @@ -59,6 +59,14 @@
> #include "mercury_goto.h" /* for MR_GOTO() */
> #include "mercury_conf.h" /* for MR_CONSERVATIVE_GC */
>
> +#if !defined(MR_HIGHLEVEL_CODE) && defined(MR_THREAD_SAFE)
> + /*
> + ** Whether we are in a grade which supports the low-level parallel
> + ** conjunction execution mechanism.
> + */
> + #define MR_LL_PARALLEL_CONJ 1
> +#endif
mercury_conf_param.h would be a more appropriate place to define that
IMO.
...
> Index: ./runtime/mercury_context.c
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.c,v
> retrieving revision 1.56
> diff -u -r1.56 mercury_context.c
> --- ./runtime/mercury_context.c 1 May 2007 01:11:42 -0000 1.56
> +++ ./runtime/mercury_context.c 4 Sep 2007 07:30:35 -0000
> @@ -40,17 +40,24 @@
> /*
> ** The run queue and spark queue are protected and signalled with the
> ** same lock and condition variable.
> +**
> +** The single sync term lock is used to prevent races in MR_join_and_continue.
> +** The holder of the sync term lock may acquire the runqueue lock but not vice
> +** versa. (We could also have one sync term lock per context, and make
> +** MR_join_and_continue acquire the sync term lock of the context that
> +** originated the parallel conjunction, but contention of the single lock
contention for the single lock
...
> Index: ./runtime/mercury_wsdeque.h
> ===================================================================
> RCS file: ./runtime/mercury_wsdeque.h
> diff -N ./runtime/mercury_wsdeque.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ ./runtime/mercury_wsdeque.h 4 Sep 2007 07:30:35 -0000
> @@ -0,0 +1,158 @@
> +/*
> +** vim: ts=4 sw=4 expandtab
> +*/
> +/*
> +** Copyright (C) 2007 The University of Melbourne.
> +** This file may only be copied under the terms of the GNU Library General
> +** Public License - see the file COPYING.LIB in the Mercury distribution.
> +*/
> +
> +#ifndef MERCURY_WSDEQUE_H
> +#define MERCURY_WSDEQUE_H
> +
> +#ifdef MR_LL_PARALLEL_CONJ
> +
> +#include "mercury_atomic.h"
> +
> +/* XXX should experiment with these */
> +#define MR_INITIAL_GLOBAL_SPARK_QUEUE_SIZE 4
> +#define MR_INITIAL_LOCAL_SPARK_DEQUE_SIZE 8
> +
> +/*---------------------------------------------------------------------------*/
> +
> +/* See mercury_context.h for the definition of MR_SparkDeque. */
> +
> +struct MR_SparkArray_Struct {
> + MR_Integer MR_sa_max; /* power of two - 1 */
> + volatile MR_Spark MR_sa_segment[1]; /* really MR_sa_max + 1 */
> +};
> +
> +/*
> +** MR_sa_element(Array, Pos)
> +** Index into Array modulo its size, i.e. treating it as a circular array.
> +**
> +** MR_sa_max is a power of two - 1 so that we can use a bitwise AND operation
> +** operation instead of modulo when indexing into the array, which makes a
> +** significant difference.
> +*/
> +#define MR_sa_element(arr, pos) (arr->MR_sa_segment[pos & arr->MR_sa_max])
Place parentheses around `pos' and `arr' in the body of that macro.
...
> Index: ./runtime/mercury_atomic.h
> ===================================================================
> RCS file: ./runtime/mercury_atomic.h
> diff -N ./runtime/mercury_atomic.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ ./runtime/mercury_atomic.h 4 Sep 2007 07:30:35 -0000
> @@ -0,0 +1,80 @@
> +/*
> +** vim:ts=4 sw=4 expandtab
> +*/
> +/*
> +** Copyright (C) 2007 The University of Melbourne.
> +** This file may only be copied under the terms of the GNU Library General
> +** Public License - see the file COPYING.LIB in the Mercury distribution.
> +*/
> +
> +/*
> +** mercury_atomic.h - defines atomic operations.
> +*/
> +
> +#ifndef MERCURY_ATOMIC_H
> +#define MERCURY_ATOMIC_H
> +
> +/*
> +** If the value at addr is equal to old, assign new to addr and return true.
> +** Otherwise return false.
> +*/
> +MR_EXTERN_INLINE MR_bool
> +MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
> + MR_Integer new_val) ;
> +
> +/*---------------------------------------------------------------------------*/
> +
> +#if defined(__GNUC__) && defined(__x86_64__)
> +
> + MR_EXTERN_INLINE MR_bool
> + MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
> + MR_Integer new_val)
> + {
> + char result;
> +
> + __asm__ __volatile__(
> + "lock; cmpxchgq %3, %0; setz %1"
> + : "=m"(*addr), "=q"(result)
> + : "m"(*addr), "r" (new_val), "a"(old)
> + : "memory"
> + );
> + return (int) result;
> + }
> +
> +#elif defined(__GNUC__) && defined(__i386__)
> +
> + /* Really 486 or better. */
> + MR_EXTERN_INLINE MR_bool
> + MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
> + MR_Integer new_val)
> + {
> + char result;
> +
> + __asm__ __volatile__(
> + "lock; cmpxchgl %3, %0; setz %1"
> + : "=m"(*addr), "=q"(result)
> + : "m"(*addr), "r" (new_val), "a"(old)
> + : "memory");
> + return (int) result;
> + }
> +
> +#elif __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)
> +
> + /* gcc 4.1 and above have builtin atomic operations. */
> +
> + MR_EXTERN_INLINE MR_bool
> + MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
> + MR_Integer new_val)
> + {
> + return __sync_bool_compare_and_swap(addr, old, new_val);
> + }
Is there a reason to prefer the inline assembler version on x86(_64)s
if gcc 4.1 is available?
...
> +/*
> +** If we don't have definitions available for this compiler or architecture
> +** then we will get a link error in low-level .par grades.
So this kills of the lowlevel .par grades on powerpc for now?
> No other grades
> +** currently require on atomic ops.
Delete "on".
That looks okay otherwise, but please bootcheck in few grades in order
to make sure that you've got all the #ifdefs in the right spots
(hlc.par.gc is one such grade.)
Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list