[m-rev.] for review: Wait for work-stealing engine threads to terminate with pthread_join.

Julien Fischer jfischer at opturion.com
Tue Apr 14 12:13:46 AEST 2026


On Mon, 13 Apr 2026 at 17:25, Peter Wang <novalazy at gmail.com> wrote:
>
> Previously, we created _detached_ threads to run work-stealing engines.
> The only reason for using detached threads instead of joinable threads
> was because the code for thread creation was originally designed for
> creating Mercury threads (the interface exported by the thread.m module
> expects detached threads).
>
> When the program is about to end, the main thread notifies the engines
> to shut down, then waits on a semaphore that is incremented when an
> engine is shut down. But an engine can only increment the semaphore
> BEFORE its thread terminates. That is, while the semaphore indicates
> that the engine has shut down (no longer responding), the thread that
> the engine was running on may continue for an indetermine amount of time
> before it is terminated. The main thread may think that it is safe to
> proceed, even while some of the engine threads are still running.
>
> I found that that on a Linux/glibc system, with a statically linked
> binary, this setup could sometimes cause an "Aborted" error message at
> program exits (after Mercury main/2).

Was glibc itself statically linked in there?

> From backtraces, I believe the
> problem is as described: the main thread is already in a exit() call
> while engine threads are still performing their own cleanup, leading to
> an abort() call.

It looks like the libgcc stack unwinding code that thread 1 is
executing cannot find
some frame information (possibly the call to exit() has done
something to it?). (Look at the libgcc source, that's why the
call to abort() happens.)

> The solution is to do what we should have done to begin with: run
> work-stealing engines in non-detached threads, and call pthread_join()
> to wait for engine threads to terminate before allowing the main thread
> to continue with program termination.

Agreed.

> runtime/mercury_context.c:
>     Delete references to shutdown_ws_semaphore.
>
> runtime/mercury_thread.c:
> runtime/mercury_thread.h:
>     Make MR_create_worksteal_thread create a non-detached thread.
>
> runtime/mercury_wrapper.c:
>     In mercury_runtime_init, record the IDs of the threads created for
>     running work-stealing engines in an array.
>
>     In mercury_runtime_terminate, after notifying each work-stealing
>     engine to shut down, wait for the engine threads to terminate
>     by calling pthread_join().

That looks fine.

Julien.


More information about the reviews mailing list