[m-dev.] thread.spawn_native

Paul Bone paul at bone.id.au
Fri Jun 13 12:41:18 AEST 2014


On Thu, Jun 12, 2014 at 06:15:57PM +1000, Peter Wang wrote:
> On Thu, 12 Jun 2014 15:49:49 +1000, Paul Bone <paul at bone.id.au> wrote:
> > 
> > So my main objection to spawn_native is that it creating and destroying
> > Mercury engines at runtime, which is difficult.  However spawning IO workers
> > is much less of a problem.  So I propose that we add support for IO workers
> > to the runtime, and add library code so that the programmer can manage IO
> > workers for libraries with these requirements.
> > 
> >     :- type io_worker_thread.
> > 
> >     :- pred spawn_io_worker(io_worker_thread::out, io::di, io::uo) is det.
> > 
> >     :- pred end_io_worker(io_worker_thread::in, io::di, io::uo) is det.
> > 
> > This allows the programmer to manipulate IO workers like any other variable,
> > they can use them in part of the representation of a library's state.
> > 
> > Then we add something for foreign calls, maybe a pragma or a foreign call
> > attribute, so that the compiler and runtime system know that the foreign
> > call should be executed by the IO worker.  The compiler and runtime system
> > are responsible for passing the data to the IO worker thread, and returning
> > ti from that thread and waking the Mercury context when the foreign call
> > returns.
> > 
> > This should be much easier than safely adding spawn_native.
> > 
> 
> Interesting idea.
> Where does the compiler get the IO worker variable from?

The runtime system spawns the thread and gives the user a handle to it.

So io_worker_thread literally represents the worker thread.

> > However allowing the number of Mercury engines to change at runtime
> > (spawn_native) has another benefit that I'd forgotten until now.  It allows
> > Mercury, given information from the OS, to adjust it's demands on the
> > system's parallel processing resources.  When Apple announced Grand Central
> > Dispatch it caused Zoltan and I to think about how multiple programs like
> > Mercury programs interact when running on the same hardware.  If you have
> > four processors and two CPU-bound Mercury programs, the Mercury programs by
> > default try to use four processors because that's how many are detected.
> > However, it'd be a lot more efficient if the programs didn't share the same
> > four processors but instead used two processors each (two native threads
> > each), so that there are fewer context switches.  These kinds of adjustments
> > are best made at runtime as situations change.  As far as we know no OS
> > provides this kind of information so this isn't a practical concern right
> > now.  However it is another reason why allowing on-the-fly creation and
> > destruction of Mercury engines might be a good thing.
> > 
> > Given that, I think that removing this restriction and implementing
> > spawn_native may be easier than the combination of IO workers, asynchronous
> > IO, library support for IO workers and a foreign call annotation for context
> > pinning.
> > 
> > What are your goals WRT timeliness?  Depending on when you need it I can
> > review the RTS code and remove this restriction.
> 
> We have hlc.par.gc so there's no hurry.
> 
> I started playing with it today.  My plan was vaguely this:
> 
> Separate Mercury engines into two kinds: a fixed number of "common"
> engines created at startup which can perform work-stealing, and a
> dynamic number of "exclusive" engines which do not steal work but only
> execute code for a single context.
> 
> spawn_native creates an exclusive engine and associated context.
> Exclusive engines are assigned higher engine ids than all the common
> engines.
> 
> Prevent common engines from stealing work from exclusive
> engines/contexts.  Parallel conjunctions should still work, except that
> they only execute sequentially in exclusive contexts.  Good enough for
> real programs ;)

*grumble*

> Preallocate longer arrays for MR_spark_deques and engine_sleep_sync_data
> (etc?), up to some limit.  The lower slots would be permanently taken by
> common engines, the higher slots claimed dynamically by exclusive engines.
> This imposes an arbitrary limit on the number of engines that can be
> created by spawn_native.  It should be okay in practice, but maybe we
> can do better.

We can try dynamically allocating these and protecting them.  (or at least
protecting the higher slots).  If it doesn't slow things down noticeably it'd be better without this limit.

A reasonable trade off is to allow this to be allocated when the program
starts but make it configurable using the MERCURY_OPTIONS environment
variable.  Then at least it's not hard-coded.  This has the same trade-offs
for users as the stack sizes.

> Maybe we just need to allocate address space without
> committing?

Do you mean map some anonymous memory and not write to it until we need it?
The arrays aren't likely to be that big so I don't see the benefit.

> Add a new annotation for foreign procs.  A call to a foreign proc with
> that annotation must be executed on the exclusive engine that the
> context originated on.  (`may_not_migrate' basically)

And/or expose the IO workers via the library.

> Allow common engines to steal from exclusive engines (respecting the
> annotation).  This is no doubt fiddly, but at least the global arrays
> aren't being reallocated at runtime.  The fixed number of work-stealing
> engines may help.
> 
> What do you think?  It sounds like you had something more ambitious in
> mind.

Generally I really don't like the idea that there are two kinds of Mercury
engines.  I'd prefer to just have one kind of Mercury engine only.  With the
addition of IO workers - which do not execute Mercury code, they only
execute foreign code, eg foreign code that we cannot prevent from blocking
or that needs it's own exclusive IO worker because some state is attached to
that thread.

I'm changing my mind about keeping the number of Mercury engines fixed, like
in my previous e-mail, I think that allowing the number of engines to vary
could be useful.  Regarding global structures we could allow the maximum
number of engines to be specified at runtime using MERCURY_OPTIONS, along
with a minimum or even default number of engines.  This means that arrays
don't need resizing but we do need to check that slots in arrays are not
NULL.  This can probably also be lock-free, including work stealing.

So this, with the right foreign code annotations, solves the problem of
particular libraries keeping their state in thread local storage, it can
also be used for blocking IO.  However, for blocking IO I'd prefer to use
asynchronous calls (O_NONBLOCK see write(2)) and check if the return
EWOULDBLOCK or EAGAIN.  If they do the engine should suspend the context and
wake it once the file descriptor is ready to retry the operation
(see select(2) or libevent/libev).  This allows the engine to execute some
other context (use the CPU) while it waits for the IO call.  The goal is to
make parallel Mercury programs that use sagnificant amounts of IO not affect
other CPU or IO tasks (on other contexts) without breaking the way people
write programs.

Adding IO workers is as you said "an escape hatch" for when O_NONBLOCK is
not supported either by the OS or some other library.

I'm not sure if I've explained this well or left out anything.  Let me know
if something isn't clear.

Thanks.


-- 
Paul Bone



More information about the developers mailing list