[m-rev.] for post-commit review: document .par grades
Zoltan Somogyi
zoltan.somogyi at runbox.com
Tue Aug 19 22:30:18 AEST 2025
On Tue, 19 Aug 2025 21:41:15 +1000, Julien Fischer <jfischer at opturion.com> wrote:
> On Tue, 19 Aug 2025 at 17:25, Zoltan Somogyi <zoltan.somogyi at runbox.com> wrote:
> > Or they may want concurrency, which is not the same thing.
>
> These days you are going to have to go out of your way to *not* get
> parallelism with your concurrency.
That is true if the people responsible for the plat form you are talking about
have put in the work needed to make it work. I don't know for sure that we have
done that.
> > I have no dog in this fight. You and Julien should agree on something,
> > and when you do, tell me what it is, and I will document it.
>
> So, if we don't want to introduce a new grade component, I suggest:
I wasn't against it, I was just pointing out that from the users' point
of view, it would be more of a rename.
> 1. Disabling concurrency in non .par LLDS grades. As discussed, except
> for preventing philosophers from getting hungry, it's not useful in practice.
> (spawn/3 and friends would throw an exception in all non .par C grades.)
>
> 2. Keep .par as the component that enables concurrency in C grades,
> noting that in LLDS C grades it also enables support for parallel
> conjunctions (and further noting that parallel conjunctions are not supported
> at all by the MLDS backend.)
>
> Does that work for everyone?
It works for me, but as I said, I don't care that much one way or the other.
> > Because enabling parallel conjunctions comes with overheads
> > that we don't want to pay for programs that contain no parallel
> > conjunctions.
>
> Does enabling parallel conjunctions in the LLDS grades come with
> additional overhead, beyond that which is required to support concurrency?
> (Certainly, code that contains parallel conjunctions incurs some overheads,
> which you hopefully make back from the parallelisation, but I didn't think
> the overhead from parallel conjunction were of the distributed fat type
> we incur for things like trailing or debugging.)
I don't know the current situation, but when .par was introduced, it definitely
did come with distributed fat type overheads. The details have gone vague
with time, but I seem to remember that it required a register instead of
of a global variable to point to either the engine or the current context.
Given the extreme lack of real machine registers we can use on x86s,
that hurt significantly. I just looked at the talk for my parallelism overlap
paper with Paul, and the table shows one benchmark running in 11 seconds
in a non.par grade, and in 14.6 second in a .par grade (both on one CPU),
which is a 32% slowdown. (The slowdowns for the other two benchmarks
were somewhat smaller.)
Of course, we are now using x64s instead of x86s, which have a different
set of usable registers. I have never dealt with our C-to-x64 interface,
but I doubt this issue would go away there either, although it may hurt
relatively less.
Zoltan.
More information about the reviews
mailing list