[m-dev.] thread.spawn_native

Paul Bone paul at bone.id.au
Sat Jun 14 00:20:29 AEST 2014


On Fri, Jun 13, 2014 at 02:20:18PM +1000, Peter Wang wrote:
> On Fri, 13 Jun 2014 12:41:18 +1000, Paul Bone <paul at bone.id.au> wrote:
> > On Thu, Jun 12, 2014 at 06:15:57PM +1000, Peter Wang wrote:
> > > On Thu, 12 Jun 2014 15:49:49 +1000, Paul Bone <paul at bone.id.au> wrote:
> > > > 
> > > > So my main objection to spawn_native is that it creating and destroying
> > > > Mercury engines at runtime, which is difficult.  However spawning IO workers
> > > > is much less of a problem.  So I propose that we add support for IO workers
> > > > to the runtime, and add library code so that the programmer can manage IO
> > > > workers for libraries with these requirements.
> > > > 
> > > >     :- type io_worker_thread.
> > > > 
> > > >     :- pred spawn_io_worker(io_worker_thread::out, io::di, io::uo) is det.
> > > > 
> > > >     :- pred end_io_worker(io_worker_thread::in, io::di, io::uo) is det.
> > > > 
> > > > This allows the programmer to manipulate IO workers like any other variable,
> > > > they can use them in part of the representation of a library's state.
> > > > 
> > > > Then we add something for foreign calls, maybe a pragma or a foreign call
> > > > attribute, so that the compiler and runtime system know that the foreign
> > > > call should be executed by the IO worker.  The compiler and runtime system
> > > > are responsible for passing the data to the IO worker thread, and returning
> > > > ti from that thread and waking the Mercury context when the foreign call
> > > > returns.
> > > > 
> > > > This should be much easier than safely adding spawn_native.
> > > > 
> > > 
> > > Interesting idea.
> > > Where does the compiler get the IO worker variable from?
> > 
> > The runtime system spawns the thread and gives the user a handle to it.
> > 
> > So io_worker_thread literally represents the worker thread.
> > 
> 
> But how do you use the running io_worker_thread?
> 
> I assume there is an optional io_worker_thread assigned to each Mercury
> context, and for a call to an annotated foreign proc p(Input, Output)
> the compiler would transform that to something like:
> 
>     get_io_worker_thread(MaybeWorker),
>     (
> 	MaybeWorker = yes(Worker),
> 	post(Worker, p_indirect(Input), Future),
> 	wait(Worker, Future, Output)
> 	% The engine can switch to another context during wait.
>     ;
> 	MaybeWorker = no,
> 	p(Input, Output)
>     )
> 
> Or does the user write that explicitly?  Obviously that would be the
> place to start.

This is the part that I'm not quite sure about, so any feedback would be
great.  My intention is that the user always writes this explicitly,
they're always in charge of how these threads (the ones they create anyway)
are used.  (The RTS may also create IO workers, but the user doesn't see
those ones.)

The user is responsible for passing around their io_worker_thread.  And they
might want to use it to call some foreign code foo.

    :- pred foo(io_worker_thread::in, Arg1::Mode1, ..., ArgN::ModeN) is det.

They call it like this, passing the thread as an extra argument 

    foo(MyWorker, Param1, ..., ParamN)

foo is defined using foreign code:

    :- pragma foreign_proc("C", foo(Arg1::Mode1, ..., ArgN::ModeN),
        [thread_safe, will_not_call_mercury, other_attributes,
         use_io_worker_thread],
    "
        ...
    ".

The compiler observes the use_io_worker_thread annotation and executes the
code using the thread whose handle was passed in foo's first argument.  We
can talk about some of the specifics, for example should the thread appear
in the IO list?  Should the thread be named in a variable? eg:

    :- pragma foreign_proc("C", foo(Thread::in, Arg1::Mode1, ..., ArgN::ModeN),
        [thread_safe, will_not_call_mercury, other_attributes,
         use_io_worker_thread(Thread)],
    "
        ...
    ".

Previously I wondered if this should be some other kind of pragma or special
goal either attached to foo's declration or the call into foo:

        % foo's definition.
        %
    :- pred foo(Arg1::Mode1, ..., ArgN::ModeN) is det.

        % bar is foo's direct caller.
        %
    :- pred bar(io_worker_thread::in, ...) is det.

    bar(MyWorker, ...) :-
        ...,
        execute_using_worker(MyWorker) (
            bar(Arg1, ..., ArgN)
        ),
        ....

But now I don't think so.  If foo is defined in a different module than bar,
then this requires breaking the abstraction between foo and bar.  It exposes
implementation details of foo, eg, that it is foreign code when bar
shouldn't be required to know if foo is implemented directly in foreign code
or uses it's thread argument when it calls some other code (or not at all).

I'd be happy to hear any other ideas.


> It seems difficult to have IO workers call back into Mercury code.  
> I guess it could work: if a foreign exported Mercury procedure is called
> within an IO worker thread then it creates a new Mercury context to
> perform that call, and puts it on the run queue.  Until the result is
> available, the IO worker must be able to service more IO requests, to
> arbitrary depth.  When Mercury callback returns it signals the IO worker
> with the result, so the IO worker can return to the foreign proc.
> Eventually the foreign proc returns, then the IO worker can return the
> result to the original Mercury context that requested the call.
> 
> It doesn't really seem easier any more.

If we implement IO workers and non-blocking IO.  Then this might be a good
idea regardless of whether we make IO workers available to programmers
through the library.

I don't have a lot of confidence in the current mechanism for
may_call_mercury foreign codes ability not to block Mercury engines or even
deadlock the system.  introducing IO workers and using them for
may_call_mercury code may be good anyway.  But yes, it's going to be tedious
to implement.


> > Generally I really don't like the idea that there are two kinds of Mercury
> > engines.  I'd prefer to just have one kind of Mercury engine only.  With the
> > addition of IO workers - which do not execute Mercury code, they only
> > execute foreign code, eg foreign code that we cannot prevent from blocking
> > or that needs it's own exclusive IO worker because some state is attached to
> > that thread.
> 
> I don't really understand the objection to multiple kinds of engine.
> The work-stealing engines are supposed to be CPU-bound so you get
> (almost) full CPU utilisation already.  Any more engines you add would
> be, preferably, IO-bound.

What happens when a non-work stealing engine executes a parallel
conjunction?  That conjunction executes sequentially unless you allow the
other engines to steal from it (and that's the hard part).  Getting it to
steal from the others is easy.

If we're going to the effort of removing the restriction of a fixed number
of Mercury engines, we might as well remove this specific restriction as
well.  Then (in an ideal world) Mercury runtimes can adjust the number of
Engines they use as system utilisation changes.


-- 
Paul Bone



More information about the developers mailing list