[m-rev.] for review: Support dynamic creation of Mercury engines in low-level C parallel grades.

Paul Bone paul at bone.id.au
Wed Jun 25 14:29:42 AEST 2014


On Wed, Jun 25, 2014 at 11:47:37AM +1000, Peter Wang wrote:
> On Tue, 24 Jun 2014 19:42:50 +1000, Paul Bone <paul at bone.id.au> wrote:
> > > 
> > > 	Rename MR_try_wake_an_engine to MR_try_wake_ws_engine
> > > 	and restrict it to work-stealing engines.
> > 
> > What if a context that has an exclusive engine becomes runnable, how do you
> > wake it?  I suggest instead parameterizing MR_try_wake_an_engine.  Is it not
> > valid to send MR_ENGINE_ACTION_CONTEXT to an exclusive engine?  For example
> > if the exclusive context blocked on a thread.channel or something?
> 
> try_wake_engine does that, which is different from MR_try_wake_ws_engine.
> 

I remember more clearly now, MR_try_wake_ws_engine (and it's former name)
where about indescriminatly waking an engine, but exclusive engines are
always worken specifically so this makes sense.

> > > 	Rename MR_shutdown_all_engines to MR_shutdown_ws_engines
> > > 	and restrict it to work-stealing engines.
> > 
> > I was initially confused by this, i didn't realize that these engines would
> > shutdown by themselves when they finnished their work.  The changelog should
> > reflect this.
> 
> Added a comment. 
> > 
> > > 	Make do_work_steal fail the attempt for exclusive engines.
> > > 	There is one call where this might happen.
> > 
> > Where?  I couldn't find / didn't see it.
> > 
> 
> In MR_do_sleep, case ETIMEDOUT.  Perhaps it can't happen?
> I need to understand the states better.

IIRC ETIMEOUT (from sem_wait) is handled so that we can have a polling
behaviour for workstealing instead of notification.  I think it's only valid
when we use a timed wait (in which case workstealing notification is
disabled.  This is a define that testers can use to tune between the pros
and cons of these two stratergies.

> I'll make MR_do_sleep test for a work-stealing engine before calling
> do_work_steal.

That makes more sense.

> > > 	Add notes to MR_attempt_steal_spark.  Its behaviour is unchanged.
> > > 
> > > 	Replace a call to MR_destroy_thread by MR_finalize_thread_engine.
> > > 
> > > 	Delete MR_num_exited_engines.  It was unused.
> > > 
> > > runtime/mercury_thread.c:
> > > runtime/mercury_thread.h:
> > > 	Delete MR_next_engine_id and MR_next_engine_id_lock.
> > > 	Engine ids can now be recycled so they are not enough.
> > 
> > This didn't parse well.
> 
> More explanation?
> 
> 	Delete MR_next_engine_id and MR_next_engine_id_lock.  Because
> 	engines can be created and destroyed dynamically, and engine ids
> 	index into fixed-sized arrays, we cannot allocate engine ids by
> 	just incrementing a counter.

I suggest reversing the parts of that description.

We can no-longer allocate engine ids by incrementing a counter as engine ids
may be reused so that we can use fixed sized arrays.  Therefore we've
removed MR_next_engine_id and MR_next_engine_id_lock.

> > > +static void
> > > +MR_shutdown_engine_for_threads(MercuryEngine *eng)
> > > +{
> > > +  #ifndef MR_HIGHLEVEL_CODE
> > > +    MR_EngineId id = eng->MR_eng_id;
> > > +
> > > +    MR_LOCK(&MR_all_engine_bases_lock, "MR_shutdown_engine_for_threads");
> > > +
> > > +    assert(MR_all_engine_bases[id] == eng);
> > > +    MR_all_engine_bases[id] = NULL;
> > > +
> > > +    if (MR_highest_engine_id == id) {
> > > +        int i;
> > > +        for (i = id - 1; i >= 0; i--) {
> > > +            if (MR_all_engine_bases[i] != NULL) {
> > > +                MR_highest_engine_id = (MR_EngineId) i;
> > > +                break;
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +    assert(MR_spark_deques[id] == eng->MR_eng_spark_deque);
> > > +    MR_spark_deques[id] = NULL;
> > > +
> > > +    MR_UNLOCK(&MR_all_engine_bases_lock, "MR_shutdown_engine_for_threads");
> > > +  #endif
> > > +}
> > > +#endif /* MR_THREAD_SAFE */
> > 
> > So we leave the engine sleep sync data slot allocated?  Is that deliberate?
> > Or maybe I've forgotten something since I wrote that code.
> 
> engine_sleep_sync_data slots are always initialised.  When the engine
> shuts down the slot should be left in an acceptable state for the next
> engine to occupy that slot - or, that was the intention.  I'll check it
> and maybe add an assertion.

Okay thanks.

> > > diff --git a/runtime/mercury_wrapper.c b/runtime/mercury_wrapper.c
> > > index f9c09ca..82aa750 100644
> > > --- a/runtime/mercury_wrapper.c
> > > +++ b/runtime/mercury_wrapper.c
> > > @@ -323,15 +323,10 @@ static  char        *MR_mem_usage_report_prefix = NULL;
> > >  
> > >  static  int         MR_num_output_args = 0;
> > >  
> > > -/*
> > > -** This is initialized to zero. If it is still zero after configuration of the
> > > -** runtime but before threads are started, then we set it to the number of
> > > -** processors on the system (if support is available to detect this).
> > > -** Otherwise, we fall back to 1.
> > > -*/
> > > -MR_Unsigned         MR_num_threads = 0;
> > > +#ifdef MR_LL_PARALLEL_CONJ
> > > +MR_Unsigned         MR_num_ws_engines = 0;
> > > +MR_Unsigned         MR_max_engines = 8192;
> > 
> > That's a lot of pthreads.  However if you're not using them all then the
> > only cost is memory for three arrays that hold null pointers.  3*8*8192 =
> > 192KB.  It doesn't seem like much but, and I choose these words
> > specifically, 1024 pthreads ought to be enough for anybody.
> 
> Yeah, I don't know when you would need that many.  Out of interest, I
> can already comfortably create 8000 engines in asm_fast.par.gc.stseg
> (doing practically nothing, of course).  top says the process is taking
> about 5 GB RES.  With --detstack-size 1 --nondetstack-size 1 it's only
> 474 MB RES.
> 
> But I'll drop it down.

Impressive.  I expected to hit some kind of OS limit or at least policy
(like an rlimit thing).


> > > diff --git a/runtime/mercury_wrapper.h b/runtime/mercury_wrapper.h
> > > index 4f7ce48..3a4a50c 100644
> > > --- a/runtime/mercury_wrapper.h
> > > +++ b/runtime/mercury_wrapper.h
> > > @@ -258,7 +258,7 @@ extern MR_Unsigned          MR_contexts_per_thread;
> > >  
> > >  /*
> > >  ** The number of outstanding contexts we can create
> > > -** (MR_contexts_per_thread * MR_num_threads).
> > > +** (MR_contexts_per_thread * MR_num_ws_engines).
> > >  */
> > >  extern MR_Unsigned          MR_max_outstanding_contexts;
> > >  
> > 
> > That raises a question.  Are contexts created for thread.spawn_native
> > counted towards this limit?
> 
> Yes, because it is a (crude) attempt to limit memory usage.

Okay, that's fair enough.  It should be in a comment somewhere, probably
here though.


-- 
Paul Bone



More information about the reviews mailing list