[m-rev.] for review: Wait for work-stealing engine threads to terminate with pthread_join.
Peter Wang
novalazy at gmail.com
Tue Apr 14 12:47:39 AEST 2026
On Tue, 14 Apr 2026 12:13:46 +1000 Julien Fischer <jfischer at opturion.com> wrote:
> On Mon, 13 Apr 2026 at 17:25, Peter Wang <novalazy at gmail.com> wrote:
> >
> > Previously, we created _detached_ threads to run work-stealing engines.
> > The only reason for using detached threads instead of joinable threads
> > was because the code for thread creation was originally designed for
> > creating Mercury threads (the interface exported by the thread.m module
> > expects detached threads).
> >
> > When the program is about to end, the main thread notifies the engines
> > to shut down, then waits on a semaphore that is incremented when an
> > engine is shut down. But an engine can only increment the semaphore
> > BEFORE its thread terminates. That is, while the semaphore indicates
> > that the engine has shut down (no longer responding), the thread that
> > the engine was running on may continue for an indetermine amount of time
> > before it is terminated. The main thread may think that it is safe to
> > proceed, even while some of the engine threads are still running.
> >
> > I found that that on a Linux/glibc system, with a statically linked
> > binary, this setup could sometimes cause an "Aborted" error message at
> > program exits (after Mercury main/2).
>
> Was glibc itself statically linked in there?
Yes, the binaries are completely statically linked.
> > From backtraces, I believe the
> > problem is as described: the main thread is already in a exit() call
> > while engine threads are still performing their own cleanup, leading to
> > an abort() call.
>
> It looks like the libgcc stack unwinding code that thread 1 is
> executing cannot find
> some frame information (possibly the call to exit() has done
> something to it?). (Look at the libgcc source, that's why the
> call to abort() happens.)
>
Right.
Peter
More information about the reviews
mailing list