[m-rev.] for post-commit review: document .par grades

Peter Wang novalazy at gmail.com
Wed Aug 20 13:10:50 AEST 2025


On Tue, 19 Aug 2025 23:07:53 +1000 Julien Fischer <jfischer at opturion.com> wrote:
> On Tue, 19 Aug 2025 at 22:30, Zoltan Somogyi <zoltan.somogyi at runbox.com> wrote:
> >
> >
> >
> > On Tue, 19 Aug 2025 21:41:15 +1000, Julien Fischer <jfischer at opturion.com> wrote:
> 
> > > > Because enabling parallel conjunctions comes with overheads
> > > > that we don't want to pay for programs that contain no parallel
> > > > conjunctions.
> > >
> > > Does enabling parallel conjunctions in the LLDS grades come with
> > > additional overhead, beyond that which is required to support concurrency?
> > > (Certainly, code that contains parallel conjunctions incurs some overheads,
> > > which you hopefully make back from the parallelisation, but I didn't think
> > > the overhead from parallel conjunction were of the distributed fat type
> > > we incur for things like trailing or debugging.)
> >
> > I don't know the current situation, but when .par was introduced, it definitely
> > did come with distributed fat type overheads. The details have gone vague
> > with time, but I seem to remember that it required a register instead of
> > of a global variable to point to either the engine or the current context.
> > Given the extreme lack of real machine registers we can use on x86s,
> > that hurt significantly. I just looked at the talk for my parallelism overlap
> > paper with Paul, and the table shows one benchmark running in 11 seconds
> > in a non.par grade, and in 14.6 second in a .par grade (both on one CPU),
> > which is a 32% slowdown. (The slowdowns for the other two benchmarks
> > were somewhat smaller.)
> 
> I think we are talking about different things here. I know that the
> non .par grades
> are quicker on serial workloads than the .par grades due to the latter being
> required to store the engine address in a real register.  What I was getting at
> is does enabling parallel conjunctions enable any sort of additional overhead
> in .par grades, above and beyond what is required to support concurrency
> in those grades? I would have thought the answer would be no, as the system of
> engines and contexts used to implement both concurrency and parallel
> conjunctions
> is mostly the same.  (I know there are various additions to the
> runtime like sparks
> to improve the performance of parallel conjunctions, but presumably
> there is little to
> no cost to those if they are not being used.)

I remembered something. On startup, LLDS .par grades will create a bunch
of threads for Mercury engines (by default, one for each hardware thread),
in preparation for running Mercury threads created with thread.spawn,
or to execute parallel conjuctions. Those engine threads will often
never be used because the program doesn't actually spawn non-native threads
or contain parallel conjuctions.

It might be possible to defer creating Mercury engine threads until they
are actually needed, but it would need someone to make that effort.
I don't think there is any interest now.

So, as an easier option, it *would* make sense to introduce a .mt grade
component for C grades that (unlike .par) supports multi-threading
but not non-native threads or parallel conjuction.

Peter


More information about the reviews mailing list