[m-rev.] for review: parallel execution mechanism (2/2)

Julien Fischer juliensf at csse.unimelb.edu.au
Wed Sep 20 17:22:15 AEST 2006
Previous message: [m-rev.] for review: parallel execution mechanism (2/2)
Next message: [m-rev.] for review: parallel execution mechanism (2/2)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 12 Sep 2006, Peter Wang wrote:

> Index: compiler/par_conj_gen.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/compiler/par_conj_gen.m,v
> retrieving revision 1.28
> diff -u -r1.28 par_conj_gen.m
> --- compiler/par_conj_gen.m	22 Aug 2006 05:04:01 -0000	1.28
> +++ compiler/par_conj_gen.m	11 Sep 2006 05:04:48 -0000
> @@ -7,7 +7,7 @@
> %---------------------------------------------------------------------------%
> %
> % File: par_conj.m.
> -% Main authors: conway.
> +% Main authors: conway, wangp.
> %
> % The predicates of this module generate code for parallel conjunctions.
> %
> @@ -30,9 +30,8 @@
> %     predictable termination properties.
> %     Parallel conjunction does not of itself suggest any information
> %     about which order two goals should be executed, however if
> -%     coroutining (not currently implemented) is being used, then the
> -%     data dependencies between the two goals will constrain the order
> -%     of execution at runtime.
> +%     coroutining is being used, then the data dependencies between
> +%     the two goals will constrain the order of execution at runtime.
> %
> %   [Mode correctness]
> %   - `,'/2 has a *sequential* behaviour `A, B' proves `A' *then*
> @@ -47,23 +46,13 @@
> %     and-parallelism), but an and-parallel goal may use bindings made
> %     in conjoined goals which may lead to coroutining.
> %
> -% The current implementation only supports independent and-parallelism.
> +% The current implementation mainly supports independent and-parallelism.
> % The syntax for parallel conjunction is `&'/2 which behaves like `,'/2
> % in that sequences get flattened (ie A & (B & C) <=> (A & B) & C).
> +% A subset of dependent and-parallelism is supported (see dep_par_conj.m).
> %

How about:

 	The current implementation supports independent and-parallism
 	and a subset of dependent and-parallism (see dep_par_conj.m).

> % The code generated for a parallel conjunction consists of a piece of
> % initialization code which creates a term on the heap to be used for
> % controlling the synchronization of the conjuncts and the code for the
> -% conjuncts each proceeded by a command to start the conjunct as a new
> -% thread of execution (except the last which executes in the "parent"
> -% thread), and each succeeded by a command that signals that the execution
> -% of the conjunct has completed and terminates the thread (except for
> -% the "parent" thread which suspends till all the other parallel conjuncts
> -% have terminated, when it will be woken up). The synchronization terms
> -% are referred to in the code as 'sync_term's.
> +% conjuncts.  The synchronization terms are referred to in the code as
> +% 'sync_term's.  Conjuncts are executed "left to right".  At the start of
> +% the i'th conjunct is a command to "spark" the i+1'th conjunct, i.e.
> +% record enough information to begin executing the next conjunct either
> +% in parallel, or to return to it later when the current conjunct ends.

s/later when/after/

> +% At the end of each conjunct is a command that checks if all the
> +% conjuncts of the parallel conjunction have been executed and completed.
> +% If not, we begin execution of the next conjunct as recorded in the
> +% spark.  If so, the parallel conjunction is complete.  (In reality it's
> +% a lot more complicated.)

Unfortunately the Mercury developers (particularly those who will
have to maintain this piece of code) need to know about reality -
how/why is it more complicated?

...

> Index: runtime/mercury_context.c
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.c,v
> retrieving revision 1.47
> diff -u -r1.47 mercury_context.c
> --- runtime/mercury_context.c	5 Jul 2006 03:00:43 -0000	1.47
> +++ runtime/mercury_context.c	11 Sep 2006 05:06:57 -0000
> @@ -33,10 +33,20 @@
> #include "mercury_reg_workarounds.h"    /* for `MR_fd*' stuff */
>
> static  void            MR_init_context_maybe_generator(MR_Context *c,
> -                            MR_Generator *gen);
> +                            const char *id, MR_Generator *gen);
>
> +/*---------------------------------------------------------------------------*/
> +
> +/*
> +** The run queue and spark queue are protected and signalled with the
> +** same lock and condition variable.
> +*/
> MR_Context              *MR_runqueue_head;
> MR_Context              *MR_runqueue_tail;
> +#ifndef MR_HIGHLEVEL_CODE
> +  MR_Spark              *MR_spark_queue_head;
> +  MR_Spark              *MR_spark_queue_tail;
> +#endif
> #ifdef  MR_THREAD_SAFE
>   MercuryLock           MR_runqueue_lock;
>   MercuryCond           MR_runqueue_cond;
> @@ -59,6 +69,11 @@
>   static MercuryLock    free_context_list_lock;
> #endif
>
> +int MR_num_idle_engines = 0;
> +int MR_num_outstanding_contexts_and_sparks = 0;


Is there any reason why these need to be signed ints?

> +
> +/*---------------------------------------------------------------------------*/
> +
> void
> MR_init_thread_stuff(void)
> {
> @@ -87,8 +102,9 @@
> #endif
> }
>
> -void
> -MR_init_context(MR_Context *c, const char *id, MR_Generator *gen)
> +static void
> +MR_init_context_maybe_generator(MR_Context *c, const char *id,
> +    MR_Generator *gen)
> {
>     c->MR_ctxt_id = id;
>     c->MR_ctxt_next = NULL;
> @@ -100,29 +116,29 @@
> #ifndef MR_HIGHLEVEL_CODE
>     c->MR_ctxt_succip = MR_ENTRY(MR_do_not_reached);
>
> -    if (c->MR_ctxt_detstack_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_detstack_zone);


I'm not certain that removing the calls to MR_reset_redzone is the right
thing to be do here.  (I don't know enough about this code to be able to
say so for sure.)


> -    } else if (gen != NULL) {
> -        c->MR_ctxt_detstack_zone = MR_create_zone("gen_detstack",
> -            0, MR_gen_detstack_size, MR_next_offset(),
> -            MR_gen_detstack_zone_size, MR_default_handler);
> -    } else {
> -        c->MR_ctxt_detstack_zone = MR_create_zone("detstack",
> -            0, MR_detstack_size, MR_next_offset(),
> -            MR_detstack_zone_size, MR_default_handler);
> +    if (c->MR_ctxt_detstack_zone == NULL) {
> +        if (gen != NULL) {
> +            c->MR_ctxt_detstack_zone = MR_create_zone("gen_detstack",
> +                    0, MR_gen_detstack_size, MR_next_offset(),
> +                    MR_gen_detstack_zone_size, MR_default_handler);
> +        } else {
> +            c->MR_ctxt_detstack_zone = MR_create_zone("detstack",
> +                    0, MR_detstack_size, MR_next_offset(),
> +                    MR_detstack_zone_size, MR_default_handler);
> +        }
>     }
>     c->MR_ctxt_sp = c->MR_ctxt_detstack_zone->MR_zone_min;
>
> -    if (c->MR_ctxt_nondetstack_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_nondetstack_zone);
> -    } else if (gen != NULL) {
> -        c->MR_ctxt_nondetstack_zone = MR_create_zone("gen_nondetstack",
> -            0, MR_gen_nonstack_size, MR_next_offset(),
> -            MR_gen_nonstack_zone_size, MR_default_handler);
> -    } else {
> -        c->MR_ctxt_nondetstack_zone = MR_create_zone("nondetstack",
> -            0, MR_nondstack_size, MR_next_offset(),
> -            MR_nondstack_zone_size, MR_default_handler);
> +    if (c->MR_ctxt_nondetstack_zone == NULL) {
> +        if (gen != NULL) {
> +            c->MR_ctxt_nondetstack_zone = MR_create_zone("gen_nondetstack",
> +                    0, MR_gen_nonstack_size, MR_next_offset(),
> +                    MR_gen_nonstack_zone_size, MR_default_handler);
> +        } else {
> +            c->MR_ctxt_nondetstack_zone = MR_create_zone("nondetstack",
> +                    0, MR_nondstack_size, MR_next_offset(),
> +                    MR_nondstack_zone_size, MR_default_handler);
> +        }
>     }
>     /*
>     ** Note that maxfr and curfr point to the last word in the frame,
> @@ -147,27 +163,21 @@
>             "generator and stack_copy");
>     }
>
> -    if (c->MR_ctxt_genstack_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_genstack_zone);
> -    } else {
> +    if (c->MR_ctxt_genstack_zone == NULL) {
>         c->MR_ctxt_genstack_zone = MR_create_zone("genstack", 0,
>             MR_genstack_size, MR_next_offset(),
>             MR_genstack_zone_size, MR_default_handler);
>     }
>     c->MR_ctxt_gen_next = 0;
>
> -    if (c->MR_ctxt_cutstack_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_cutstack_zone);
> -    } else {
> +    if (c->MR_ctxt_cutstack_zone == NULL) {
>         c->MR_ctxt_cutstack_zone = MR_create_zone("cutstack", 0,
>             MR_cutstack_size, MR_next_offset(),
>             MR_cutstack_zone_size, MR_default_handler);
>     }
>     c->MR_ctxt_cut_next = 0;
>
> -    if (c->MR_ctxt_pnegstack_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_pnegstack_zone);
> -    } else {
> +    if (c->MR_ctxt_pnegstack_zone == NULL) {
>         c->MR_ctxt_pnegstack_zone = MR_create_zone("pnegstack", 0,
>             MR_pnegstack_size, MR_next_offset(),
>             MR_pnegstack_zone_size, MR_default_handler);
> @@ -178,6 +188,10 @@
>   #ifdef MR_USE_MINIMAL_MODEL_OWN_STACKS
>     c->MR_ctxt_owner_generator = gen;
>   #endif /* MR_USE_MINIMAL_MODEL_OWN_STACKS */
> +
> +    c->MR_ctxt_parent_sp = NULL;
> +    c->MR_ctxt_spark_stack = NULL;
> +
> #endif /* !MR_HIGHLEVEL_CODE */
>
> #ifdef MR_USE_TRAIL
> @@ -185,9 +199,7 @@
>         MR_fatal_error("MR_init_context_maybe_generator: generator and trail");
>     }
>
> -    if (c->MR_ctxt_trail_zone != NULL) {
> -        MR_reset_redzone(c->MR_ctxt_trail_zone);
> -    } else {
> +    if (c->MR_ctxt_trail_zone == NULL) {
>         c->MR_ctxt_trail_zone = MR_create_zone("trail", 0,
>             MR_trail_size, MR_next_offset(),
>             MR_trail_zone_size, MR_default_handler);
> @@ -214,6 +226,9 @@
>     MR_Context  *c;
>
>     MR_LOCK(&free_context_list_lock, "create_context");
> +
> +    MR_num_outstanding_contexts_and_sparks++;
> +
>     if (free_context_list == NULL) {
>         MR_UNLOCK(&free_context_list_lock, "create_context i");
>         c = MR_GC_NEW(MR_Context);
> @@ -230,14 +245,30 @@
>         MR_UNLOCK(&free_context_list_lock, "create_context ii");
>     }
>
> -    MR_init_context(c, id, gen);
> +    MR_init_context_maybe_generator(c, id, gen);
>     return c;
> }
>

,,,


> Index: runtime/mercury_context.h
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_context.h,v
> retrieving revision 1.30
> diff -u -r1.30 mercury_context.h
> --- runtime/mercury_context.h	8 Mar 2006 08:35:38 -0000	1.30
> +++ runtime/mercury_context.h	10 Sep 2006 13:37:23 -0000
> @@ -23,10 +23,10 @@
> ** by `context_ptr'. The context contains no rN or fN registers - all
> ** registers are "context save" (by analogy to caller-save).
> **
> -** When a new context is created information is passed to the new context
> -** on the stack. The top stackframe of the current context is copied to
> -** become the first det stackframe in the new process. (XXX this will need
> -** fixing eventually to include the nondet frame as well.)
> +** When a new context is created information is passed to and from the
> +** new context via the stack frame of the procedure that originated the
> +** parallel conjunction. The code of a parallel conjunct has access
> +** to the procedure's stack frame via the `parent_sp' register.
> **
> ** Contexts can migrate transparently between multiple Posix threads.
> **
> @@ -101,6 +101,9 @@
> ** pnegstack_zone   The possibly_negated_context stack zone for this context.
> ** pneg_next        The saved pneg_next for this context.
> **
> +** parent_sp        The saved parent_sp for this context.
> +** spark_stack      The sparks generated by this context, in a stack.
> +**
> ** trail_zone       The trail zone for this context.
> ** trail_ptr        The saved MR_trail_ptr for this context.
> ** ticket_counter   The saved MR_ticket_counter for this context.
> @@ -114,6 +117,8 @@
> */
>
> typedef struct MR_Context_Struct MR_Context;
> +typedef struct MR_Spark_Struct MR_Spark;
> +
> struct MR_Context_Struct {
>     const char          *MR_ctxt_id;
>     MR_Context          *MR_ctxt_next;
> @@ -146,6 +151,9 @@
>   #ifdef MR_USE_MINIMAL_MODEL_OWN_STACKS
>     MR_Generator        *MR_ctxt_owner_generator;
>   #endif /* MR_USE_MINIMAL_MODEL_OWN_STACKS */
> +
> +    MR_Word             *MR_ctxt_parent_sp;
> +    MR_Spark            *MR_ctxt_spark_stack;
> #endif /* !MR_HIGHLEVEL_CODE */
>
> #ifdef  MR_USE_TRAIL
> @@ -162,10 +170,41 @@
> };
>
> /*
> +** A spark contains just enough information to begin execution of a parallel
> +** conjunct.  Sparks are allocated on the heap, and can be stored in a
> +** context's spark stack, or in the global spark queue.  In the former case, a
> +** spark will eventually be executed on the same context (same detstack, etc.)

s/on the/by the/


> +** as the code that generated the spark.  In the latter case the spark can be
> +** picked up and executed by any idle engine in a different context.
> +**
> +** In the current implementation a spark is put on the global spark queue if,
> +** at the time a fork instruction is reached, we think the spark has a chance

s/at/by/


> +** of being picked up for execution by an idle engine.  Otherwise the spark
> +** goes on the context's spark stack.  This is an irrevocable decision.

 	At the moment this is an irrevocable ...

> +** A future possibility is to allow idle engines to steal work from the cold
> +** end of some context's spark stack.
> +*/
> +#ifndef MR_HIGHLEVEL_CODE
> +struct MR_Spark_Struct {
> +    MR_Spark            *MR_spark_next;
> +    MR_Code             *MR_spark_resume;
> +    MR_Word             *MR_spark_parent_sp;
> +};
> +#endif
> +
> +/*
> ** The runqueue is a linked list of contexts that are runnable.
> +** The spark_queue is a linked list of sparks that are runnable.
> +** We keep them separate to prioritise contexts (which are mainly
> +** computations which have already started) over sparks (which are
> +** computations which have not begun).
> */
> extern      MR_Context  *MR_runqueue_head;
> extern      MR_Context  *MR_runqueue_tail;
> +#ifndef MR_HIGHLEVEL_CODE
> +  extern    MR_Spark    *MR_spark_queue_head;
> +  extern    MR_Spark    *MR_spark_queue_tail;
> +#endif
> #ifdef  MR_THREAD_SAFE
>   extern    MercuryLock MR_runqueue_lock;
>   extern    MercuryCond MR_runqueue_cond;
> @@ -207,6 +246,32 @@
> #endif
>
> /*
> +** The number of engines waiting for work.
> +** We don't protect it with a separate lock, but updates to it are made while
> +** holding the MR_runqueue_lock.  Reads are made without the lock.  We may need
> +** to use atomic instructions or memory fences on some architectures.
> +*/

That last sentence should be an XXX.

> +extern  int         MR_num_idle_engines;
> +
> +/*
> +** The number of contexts that are not in the free list (i.e. are executing or
> +** suspended) plus the number of sparks in the spark queue.  We count those
> +** sparks as they can quickly accumulate on the spark queue before any of them
> +** are taken up for execution.  Once they do get taken up, many contexts would
> +** need to be allocated to execute them.  Sparks not on the spark queue are
> +** currently guaranteed to be executed on their originating context so won't
> +** cause allocation of more contexts.
> +**
> +** What we are mainly interested in here is preventing too many contexts from
> +** being allocated, as each context is quite large and we can quickly run out
> +** of memory.  Another problem is due to the context free list and conservative
> +** garbage collection: every context ever allocated will be scanned.  (Getting
> +** the garbage collector not to scan contexts on the free list should be
> +** possible though.)
> +*/
> +extern  int         MR_num_outstanding_contexts_and_sparks;
> +
> +/*
> ** Initializes a context structure, and gives it the given id. If gen is
> ** non-NULL, the context is for the given generator.
> */

...

> Index: runtime/mercury_wrapper.h
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/runtime/mercury_wrapper.h,v
> retrieving revision 1.74
> diff -u -r1.74 mercury_wrapper.h
> --- runtime/mercury_wrapper.h	7 Aug 2006 06:21:32 -0000	1.74
> +++ runtime/mercury_wrapper.h	10 Sep 2006 05:45:56 -0000
> @@ -225,6 +225,15 @@
> /* heap expansion factor for accurate GC (see mercury_accurate_gc.c) */
> extern  double		MR_heap_expansion_factor;
>
> +/* number of outstanding contexts we can create per thread (soft limit) */
> +extern	MR_Unsigned	MR_contexts_per_thread;

Please fix the formatting of that comment, e.g.

 	/*
 	** The number of outstanding contexts we can create per thread
 	** (soft limit).
 	*/

> +/*
> +** number of outstanding contexts we can create
> +** (MR_contexts_per_thread * MR_num_threads)
> +*/
> +extern	MR_Unsigned	MR_max_outstanding_contexts;

Likewise (and in a few other spots as well).

Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
Previous message: [m-rev.] for review: parallel execution mechanism (2/2)
Next message: [m-rev.] for review: parallel execution mechanism (2/2)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list