[m-rev.] for review: Wait for work-stealing engine threads to terminate with pthread_join.
Julien Fischer
jfischer at opturion.com
Tue Apr 14 12:13:46 AEST 2026
On Mon, 13 Apr 2026 at 17:25, Peter Wang <novalazy at gmail.com> wrote:
>
> Previously, we created _detached_ threads to run work-stealing engines.
> The only reason for using detached threads instead of joinable threads
> was because the code for thread creation was originally designed for
> creating Mercury threads (the interface exported by the thread.m module
> expects detached threads).
>
> When the program is about to end, the main thread notifies the engines
> to shut down, then waits on a semaphore that is incremented when an
> engine is shut down. But an engine can only increment the semaphore
> BEFORE its thread terminates. That is, while the semaphore indicates
> that the engine has shut down (no longer responding), the thread that
> the engine was running on may continue for an indetermine amount of time
> before it is terminated. The main thread may think that it is safe to
> proceed, even while some of the engine threads are still running.
>
> I found that that on a Linux/glibc system, with a statically linked
> binary, this setup could sometimes cause an "Aborted" error message at
> program exits (after Mercury main/2).
Was glibc itself statically linked in there?
> From backtraces, I believe the
> problem is as described: the main thread is already in a exit() call
> while engine threads are still performing their own cleanup, leading to
> an abort() call.
It looks like the libgcc stack unwinding code that thread 1 is
executing cannot find
some frame information (possibly the call to exit() has done
something to it?). (Look at the libgcc source, that's why the
call to abort() happens.)
> The solution is to do what we should have done to begin with: run
> work-stealing engines in non-detached threads, and call pthread_join()
> to wait for engine threads to terminate before allowing the main thread
> to continue with program termination.
Agreed.
> runtime/mercury_context.c:
> Delete references to shutdown_ws_semaphore.
>
> runtime/mercury_thread.c:
> runtime/mercury_thread.h:
> Make MR_create_worksteal_thread create a non-detached thread.
>
> runtime/mercury_wrapper.c:
> In mercury_runtime_init, record the IDs of the threads created for
> running work-stealing engines in an array.
>
> In mercury_runtime_terminate, after notifying each work-stealing
> engine to shut down, wait for the engine threads to terminate
> by calling pthread_join().
That looks fine.
Julien.
More information about the reviews
mailing list