[m-dev.] thread.spawn_native
Peter Wang
novalazy at gmail.com
Fri Jun 13 14:20:18 AEST 2014
On Fri, 13 Jun 2014 12:41:18 +1000, Paul Bone <paul at bone.id.au> wrote:
> On Thu, Jun 12, 2014 at 06:15:57PM +1000, Peter Wang wrote:
> > On Thu, 12 Jun 2014 15:49:49 +1000, Paul Bone <paul at bone.id.au> wrote:
> > >
> > > So my main objection to spawn_native is that it creating and destroying
> > > Mercury engines at runtime, which is difficult. However spawning IO workers
> > > is much less of a problem. So I propose that we add support for IO workers
> > > to the runtime, and add library code so that the programmer can manage IO
> > > workers for libraries with these requirements.
> > >
> > > :- type io_worker_thread.
> > >
> > > :- pred spawn_io_worker(io_worker_thread::out, io::di, io::uo) is det.
> > >
> > > :- pred end_io_worker(io_worker_thread::in, io::di, io::uo) is det.
> > >
> > > This allows the programmer to manipulate IO workers like any other variable,
> > > they can use them in part of the representation of a library's state.
> > >
> > > Then we add something for foreign calls, maybe a pragma or a foreign call
> > > attribute, so that the compiler and runtime system know that the foreign
> > > call should be executed by the IO worker. The compiler and runtime system
> > > are responsible for passing the data to the IO worker thread, and returning
> > > ti from that thread and waking the Mercury context when the foreign call
> > > returns.
> > >
> > > This should be much easier than safely adding spawn_native.
> > >
> >
> > Interesting idea.
> > Where does the compiler get the IO worker variable from?
>
> The runtime system spawns the thread and gives the user a handle to it.
>
> So io_worker_thread literally represents the worker thread.
>
But how do you use the running io_worker_thread?
I assume there is an optional io_worker_thread assigned to each Mercury
context, and for a call to an annotated foreign proc p(Input, Output)
the compiler would transform that to something like:
get_io_worker_thread(MaybeWorker),
(
MaybeWorker = yes(Worker),
post(Worker, p_indirect(Input), Future),
wait(Worker, Future, Output)
% The engine can switch to another context during wait.
;
MaybeWorker = no,
p(Input, Output)
)
Or does the user write that explicitly? Obviously that would be the
place to start.
It seems difficult to have IO workers call back into Mercury code.
I guess it could work: if a foreign exported Mercury procedure is called
within an IO worker thread then it creates a new Mercury context to
perform that call, and puts it on the run queue. Until the result is
available, the IO worker must be able to service more IO requests, to
arbitrary depth. When Mercury callback returns it signals the IO worker
with the result, so the IO worker can return to the foreign proc.
Eventually the foreign proc returns, then the IO worker can return the
result to the original Mercury context that requested the call.
It doesn't really seem easier any more.
> > > However allowing the number of Mercury engines to change at runtime
> > > (spawn_native) has another benefit that I'd forgotten until now. It allows
> > > Mercury, given information from the OS, to adjust it's demands on the
> > > system's parallel processing resources. When Apple announced Grand Central
> > > Dispatch it caused Zoltan and I to think about how multiple programs like
> > > Mercury programs interact when running on the same hardware. If you have
> > > four processors and two CPU-bound Mercury programs, the Mercury programs by
> > > default try to use four processors because that's how many are detected.
> > > However, it'd be a lot more efficient if the programs didn't share the same
> > > four processors but instead used two processors each (two native threads
> > > each), so that there are fewer context switches. These kinds of adjustments
> > > are best made at runtime as situations change. As far as we know no OS
> > > provides this kind of information so this isn't a practical concern right
> > > now. However it is another reason why allowing on-the-fly creation and
> > > destruction of Mercury engines might be a good thing.
> > >
> > > Given that, I think that removing this restriction and implementing
> > > spawn_native may be easier than the combination of IO workers, asynchronous
> > > IO, library support for IO workers and a foreign call annotation for context
> > > pinning.
> > >
> > > What are your goals WRT timeliness? Depending on when you need it I can
> > > review the RTS code and remove this restriction.
> >
> > We have hlc.par.gc so there's no hurry.
> >
> > I started playing with it today. My plan was vaguely this:
> >
> > Separate Mercury engines into two kinds: a fixed number of "common"
> > engines created at startup which can perform work-stealing, and a
> > dynamic number of "exclusive" engines which do not steal work but only
> > execute code for a single context.
> >
> > spawn_native creates an exclusive engine and associated context.
> > Exclusive engines are assigned higher engine ids than all the common
> > engines.
> >
> > Prevent common engines from stealing work from exclusive
> > engines/contexts. Parallel conjunctions should still work, except that
> > they only execute sequentially in exclusive contexts. Good enough for
> > real programs ;)
>
> *grumble*
>
> > Preallocate longer arrays for MR_spark_deques and engine_sleep_sync_data
> > (etc?), up to some limit. The lower slots would be permanently taken by
> > common engines, the higher slots claimed dynamically by exclusive engines.
> > This imposes an arbitrary limit on the number of engines that can be
> > created by spawn_native. It should be okay in practice, but maybe we
> > can do better.
>
> We can try dynamically allocating these and protecting them. (or at least
> protecting the higher slots). If it doesn't slow things down noticeably it'd be better without this limit.
>
> A reasonable trade off is to allow this to be allocated when the program
> starts but make it configurable using the MERCURY_OPTIONS environment
> variable. Then at least it's not hard-coded. This has the same trade-offs
> for users as the stack sizes.
Right.
>
> > Maybe we just need to allocate address space without
> > committing?
>
> Do you mean map some anonymous memory and not write to it until we need it?
> The arrays aren't likely to be that big so I don't see the benefit.
>
Probably.
> > Add a new annotation for foreign procs. A call to a foreign proc with
> > that annotation must be executed on the exclusive engine that the
> > context originated on. (`may_not_migrate' basically)
>
> And/or expose the IO workers via the library.
>
> > Allow common engines to steal from exclusive engines (respecting the
> > annotation). This is no doubt fiddly, but at least the global arrays
> > aren't being reallocated at runtime. The fixed number of work-stealing
> > engines may help.
> >
> > What do you think? It sounds like you had something more ambitious in
> > mind.
>
> Generally I really don't like the idea that there are two kinds of Mercury
> engines. I'd prefer to just have one kind of Mercury engine only. With the
> addition of IO workers - which do not execute Mercury code, they only
> execute foreign code, eg foreign code that we cannot prevent from blocking
> or that needs it's own exclusive IO worker because some state is attached to
> that thread.
I don't really understand the objection to multiple kinds of engine.
The work-stealing engines are supposed to be CPU-bound so you get
(almost) full CPU utilisation already. Any more engines you add would
be, preferably, IO-bound.
Peter
More information about the developers
mailing list