[m-rev.] for post-commit review: document .par grades
Julien Fischer
jfischer at opturion.com
Tue Aug 19 23:07:53 AEST 2025
On Tue, 19 Aug 2025 at 22:30, Zoltan Somogyi <zoltan.somogyi at runbox.com> wrote:
>
>
>
> On Tue, 19 Aug 2025 21:41:15 +1000, Julien Fischer <jfischer at opturion.com> wrote:
> > > Because enabling parallel conjunctions comes with overheads
> > > that we don't want to pay for programs that contain no parallel
> > > conjunctions.
> >
> > Does enabling parallel conjunctions in the LLDS grades come with
> > additional overhead, beyond that which is required to support concurrency?
> > (Certainly, code that contains parallel conjunctions incurs some overheads,
> > which you hopefully make back from the parallelisation, but I didn't think
> > the overhead from parallel conjunction were of the distributed fat type
> > we incur for things like trailing or debugging.)
>
> I don't know the current situation, but when .par was introduced, it definitely
> did come with distributed fat type overheads. The details have gone vague
> with time, but I seem to remember that it required a register instead of
> of a global variable to point to either the engine or the current context.
> Given the extreme lack of real machine registers we can use on x86s,
> that hurt significantly. I just looked at the talk for my parallelism overlap
> paper with Paul, and the table shows one benchmark running in 11 seconds
> in a non.par grade, and in 14.6 second in a .par grade (both on one CPU),
> which is a 32% slowdown. (The slowdowns for the other two benchmarks
> were somewhat smaller.)
I think we are talking about different things here. I know that the
non .par grades
are quicker on serial workloads than the .par grades due to the latter being
required to store the engine address in a real register. What I was getting at
is does enabling parallel conjunctions enable any sort of additional overhead
in .par grades, above and beyond what is required to support concurrency
in those grades? I would have thought the answer would be no, as the system of
engines and contexts used to implement both concurrency and parallel
conjunctions
is mostly the same. (I know there are various additions to the
runtime like sparks
to improve the performance of parallel conjunctions, but presumably
there is little to
no cost to those if they are not being used.)
Julien.
More information about the reviews
mailing list