[m-rev.] for review: Support dynamic creation of Mercury engines in low-level C parallel grades.
Paul Bone
paul at bone.id.au
Wed Jun 25 14:29:42 AEST 2014
On Wed, Jun 25, 2014 at 11:47:37AM +1000, Peter Wang wrote:
> On Tue, 24 Jun 2014 19:42:50 +1000, Paul Bone <paul at bone.id.au> wrote:
> > >
> > > Rename MR_try_wake_an_engine to MR_try_wake_ws_engine
> > > and restrict it to work-stealing engines.
> >
> > What if a context that has an exclusive engine becomes runnable, how do you
> > wake it? I suggest instead parameterizing MR_try_wake_an_engine. Is it not
> > valid to send MR_ENGINE_ACTION_CONTEXT to an exclusive engine? For example
> > if the exclusive context blocked on a thread.channel or something?
>
> try_wake_engine does that, which is different from MR_try_wake_ws_engine.
>
I remember more clearly now, MR_try_wake_ws_engine (and it's former name)
where about indescriminatly waking an engine, but exclusive engines are
always worken specifically so this makes sense.
> > > Rename MR_shutdown_all_engines to MR_shutdown_ws_engines
> > > and restrict it to work-stealing engines.
> >
> > I was initially confused by this, i didn't realize that these engines would
> > shutdown by themselves when they finnished their work. The changelog should
> > reflect this.
>
> Added a comment.
> >
> > > Make do_work_steal fail the attempt for exclusive engines.
> > > There is one call where this might happen.
> >
> > Where? I couldn't find / didn't see it.
> >
>
> In MR_do_sleep, case ETIMEDOUT. Perhaps it can't happen?
> I need to understand the states better.
IIRC ETIMEOUT (from sem_wait) is handled so that we can have a polling
behaviour for workstealing instead of notification. I think it's only valid
when we use a timed wait (in which case workstealing notification is
disabled. This is a define that testers can use to tune between the pros
and cons of these two stratergies.
> I'll make MR_do_sleep test for a work-stealing engine before calling
> do_work_steal.
That makes more sense.
> > > Add notes to MR_attempt_steal_spark. Its behaviour is unchanged.
> > >
> > > Replace a call to MR_destroy_thread by MR_finalize_thread_engine.
> > >
> > > Delete MR_num_exited_engines. It was unused.
> > >
> > > runtime/mercury_thread.c:
> > > runtime/mercury_thread.h:
> > > Delete MR_next_engine_id and MR_next_engine_id_lock.
> > > Engine ids can now be recycled so they are not enough.
> >
> > This didn't parse well.
>
> More explanation?
>
> Delete MR_next_engine_id and MR_next_engine_id_lock. Because
> engines can be created and destroyed dynamically, and engine ids
> index into fixed-sized arrays, we cannot allocate engine ids by
> just incrementing a counter.
I suggest reversing the parts of that description.
We can no-longer allocate engine ids by incrementing a counter as engine ids
may be reused so that we can use fixed sized arrays. Therefore we've
removed MR_next_engine_id and MR_next_engine_id_lock.
> > > +static void
> > > +MR_shutdown_engine_for_threads(MercuryEngine *eng)
> > > +{
> > > + #ifndef MR_HIGHLEVEL_CODE
> > > + MR_EngineId id = eng->MR_eng_id;
> > > +
> > > + MR_LOCK(&MR_all_engine_bases_lock, "MR_shutdown_engine_for_threads");
> > > +
> > > + assert(MR_all_engine_bases[id] == eng);
> > > + MR_all_engine_bases[id] = NULL;
> > > +
> > > + if (MR_highest_engine_id == id) {
> > > + int i;
> > > + for (i = id - 1; i >= 0; i--) {
> > > + if (MR_all_engine_bases[i] != NULL) {
> > > + MR_highest_engine_id = (MR_EngineId) i;
> > > + break;
> > > + }
> > > + }
> > > + }
> > > +
> > > + assert(MR_spark_deques[id] == eng->MR_eng_spark_deque);
> > > + MR_spark_deques[id] = NULL;
> > > +
> > > + MR_UNLOCK(&MR_all_engine_bases_lock, "MR_shutdown_engine_for_threads");
> > > + #endif
> > > +}
> > > +#endif /* MR_THREAD_SAFE */
> >
> > So we leave the engine sleep sync data slot allocated? Is that deliberate?
> > Or maybe I've forgotten something since I wrote that code.
>
> engine_sleep_sync_data slots are always initialised. When the engine
> shuts down the slot should be left in an acceptable state for the next
> engine to occupy that slot - or, that was the intention. I'll check it
> and maybe add an assertion.
Okay thanks.
> > > diff --git a/runtime/mercury_wrapper.c b/runtime/mercury_wrapper.c
> > > index f9c09ca..82aa750 100644
> > > --- a/runtime/mercury_wrapper.c
> > > +++ b/runtime/mercury_wrapper.c
> > > @@ -323,15 +323,10 @@ static char *MR_mem_usage_report_prefix = NULL;
> > >
> > > static int MR_num_output_args = 0;
> > >
> > > -/*
> > > -** This is initialized to zero. If it is still zero after configuration of the
> > > -** runtime but before threads are started, then we set it to the number of
> > > -** processors on the system (if support is available to detect this).
> > > -** Otherwise, we fall back to 1.
> > > -*/
> > > -MR_Unsigned MR_num_threads = 0;
> > > +#ifdef MR_LL_PARALLEL_CONJ
> > > +MR_Unsigned MR_num_ws_engines = 0;
> > > +MR_Unsigned MR_max_engines = 8192;
> >
> > That's a lot of pthreads. However if you're not using them all then the
> > only cost is memory for three arrays that hold null pointers. 3*8*8192 =
> > 192KB. It doesn't seem like much but, and I choose these words
> > specifically, 1024 pthreads ought to be enough for anybody.
>
> Yeah, I don't know when you would need that many. Out of interest, I
> can already comfortably create 8000 engines in asm_fast.par.gc.stseg
> (doing practically nothing, of course). top says the process is taking
> about 5 GB RES. With --detstack-size 1 --nondetstack-size 1 it's only
> 474 MB RES.
>
> But I'll drop it down.
Impressive. I expected to hit some kind of OS limit or at least policy
(like an rlimit thing).
> > > diff --git a/runtime/mercury_wrapper.h b/runtime/mercury_wrapper.h
> > > index 4f7ce48..3a4a50c 100644
> > > --- a/runtime/mercury_wrapper.h
> > > +++ b/runtime/mercury_wrapper.h
> > > @@ -258,7 +258,7 @@ extern MR_Unsigned MR_contexts_per_thread;
> > >
> > > /*
> > > ** The number of outstanding contexts we can create
> > > -** (MR_contexts_per_thread * MR_num_threads).
> > > +** (MR_contexts_per_thread * MR_num_ws_engines).
> > > */
> > > extern MR_Unsigned MR_max_outstanding_contexts;
> > >
> >
> > That raises a question. Are contexts created for thread.spawn_native
> > counted towards this limit?
>
> Yes, because it is a (crude) attempt to limit memory usage.
Okay, that's fair enough. It should be in a comment somewhere, probably
here though.
--
Paul Bone
More information about the reviews
mailing list