[m-dev.] thread.spawn_native

Peter Wang novalazy at gmail.com
Thu Jun 12 18:15:57 AEST 2014


On Thu, 12 Jun 2014 15:49:49 +1000, Paul Bone <paul at bone.id.au> wrote:
> 
> So my main objection to spawn_native is that it creating and destroying
> Mercury engines at runtime, which is difficult.  However spawning IO workers
> is much less of a problem.  So I propose that we add support for IO workers
> to the runtime, and add library code so that the programmer can manage IO
> workers for libraries with these requirements.
> 
>     :- type io_worker_thread.
> 
>     :- pred spawn_io_worker(io_worker_thread::out, io::di, io::uo) is det.
> 
>     :- pred end_io_worker(io_worker_thread::in, io::di, io::uo) is det.
> 
> This allows the programmer to manipulate IO workers like any other variable,
> they can use them in part of the representation of a library's state.
> 
> Then we add something for foreign calls, maybe a pragma or a foreign call
> attribute, so that the compiler and runtime system know that the foreign
> call should be executed by the IO worker.  The compiler and runtime system
> are responsible for passing the data to the IO worker thread, and returning
> ti from that thread and waking the Mercury context when the foreign call
> returns.
> 
> This should be much easier than safely adding spawn_native.
> 

Interesting idea.
Where does the compiler get the IO worker variable from?

> However allowing the number of Mercury engines to change at runtime
> (spawn_native) has another benefit that I'd forgotten until now.  It allows
> Mercury, given information from the OS, to adjust it's demands on the
> system's parallel processing resources.  When Apple announced Grand Central
> Dispatch it caused Zoltan and I to think about how multiple programs like
> Mercury programs interact when running on the same hardware.  If you have
> four processors and two CPU-bound Mercury programs, the Mercury programs by
> default try to use four processors because that's how many are detected.
> However, it'd be a lot more efficient if the programs didn't share the same
> four processors but instead used two processors each (two native threads
> each), so that there are fewer context switches.  These kinds of adjustments
> are best made at runtime as situations change.  As far as we know no OS
> provides this kind of information so this isn't a practical concern right
> now.  However it is another reason why allowing on-the-fly creation and
> destruction of Mercury engines might be a good thing.
> 
> Given that, I think that removing this restriction and implementing
> spawn_native may be easier than the combination of IO workers, asynchronous
> IO, library support for IO workers and a foreign call annotation for context
> pinning.
> 
> What are your goals WRT timeliness?  Depending on when you need it I can
> review the RTS code and remove this restriction.

We have hlc.par.gc so there's no hurry.

I started playing with it today.  My plan was vaguely this:

Separate Mercury engines into two kinds: a fixed number of "common"
engines created at startup which can perform work-stealing, and a
dynamic number of "exclusive" engines which do not steal work but only
execute code for a single context.

spawn_native creates an exclusive engine and associated context.
Exclusive engines are assigned higher engine ids than all the common
engines.

Prevent common engines from stealing work from exclusive
engines/contexts.  Parallel conjunctions should still work, except that
they only execute sequentially in exclusive contexts.  Good enough for
real programs ;)

Preallocate longer arrays for MR_spark_deques and engine_sleep_sync_data
(etc?), up to some limit.  The lower slots would be permanently taken by
common engines, the higher slots claimed dynamically by exclusive engines.
This imposes an arbitrary limit on the number of engines that can be
created by spawn_native.  It should be okay in practice, but maybe we
can do better.  Maybe we just need to allocate address space without
committing?

Add a new annotation for foreign procs.  A call to a foreign proc with
that annotation must be executed on the exclusive engine that the
context originated on.  (`may_not_migrate' basically)

Allow common engines to steal from exclusive engines (respecting the
annotation).  This is no doubt fiddly, but at least the global arrays
aren't being reallocated at runtime.  The fixed number of work-stealing
engines may help.

What do you think?  It sounds like you had something more ambitious in
mind.

Peter



More information about the developers mailing list