[m-rev.] post-commit review: Enable configuration of parallel conjunction reordering.
pbone at csse.unimelb.edu.au
Thu Sep 20 01:06:50 AEST 2012
For post commit review.
This is a 2 line change, but I would appriciate it if people could read
this and check that I've diagnosed the situation correctly.
Enable configuration of parallel conjunction reordering.
Before we implemented loop control we allowed independent parallel
conjunctions to be reordered to transform right parallel recursion into the
more efficient left parallel recursion (now loop control of right recursion
is faster so we use that). This change allows us to configure the
conjunction reordering which is useful for benchmarking.
The transformation is only applied when:
1) Not explicitly disabled
2) The parallel conjunction is completely independent.
3) Loop control is not enabled..
Allow the new option to be used to disable the transformation of right
parallel recursion into left parallel recursion.
Add the --no-parallel-reorder-right-recursion option.
Add missing documentation for --par-loop-control and
Move options controlling the parallel conjunction transformation from
the optimisation section to the misc section of options; we don't want
them changing if they're followed by a -O2 on the command line.
Document the new option.
Write commented out documentation for some otherwise undocumented
Uncomment the --par-loop-control options.
Move the commented out documentation for the options controlling the
parallel conjunction transformation into the misc section as we did in
Describe the --par-loop-control option ready for the 12.08 release.
This is the first time we're describing the existence of the option so
that users can experiment with it before we make it the default.
diff --git a/runtime/mercury_wsdeque.h b/runtime/mercury_wsdeque.h
index 77b0053..d3c4e4b 100644
@@ -126,6 +126,11 @@ MR_wsdeque_push_bottom(MR_SparkDeque *dq, const MR_Spark *spark)
MR_sa_element(arr, bot) = *spark;
+ ** Make sure the spark data is stored before we store the value of
+ ** bottom. We wouldn't want a thief to steal some stale data.
dq->MR_sd_bottom = bot + 1;
@@ -144,6 +149,32 @@ MR_wsdeque_pop_bottom(MR_SparkDeque *dq)
dq->MR_sd_bottom = bot;
+ ** bot must be written before we read top. If it is written after,
+ ** which may happen without the fence, then there is a race as follows.
+ ** There are two items in the deque. (bot = 3, top = 1) This deque's
+ ** owner's CPU does not immediately write bottom into memory (above).
+ ** The owner thinks that bot=2 but memory says that bot = 3). Meanwhile
+ ** two other CPUs steal work, they see two items on the deque and each
+ ** increments top with its CAS (top = 3, the deque is empty). The
+ ** original engine continues, because it saw a deque with two items it
+ ** does not do a CAS on top and therefore takes an item from the deque.
+ ** The deque had 2 items but 3 have been taken!
+ ** Thieves create a critical section between their initial read of top
+ ** (and bottom) and the CAS. Successful thieves are mutually excluded
+ ** from this section. Therefore thief 2 will not read the values for
+ ** top and bottom until after thief 1's CAS is a success (if it did read
+ ** these, its own CAS would fail). Thief 2 is guaranteed to see the
+ ** update to bot or the owner is guaranteed to see thief 1's update to
+ ** top (possibly both). This means that either thief 2 will fail or the
+ ** owner will use a CAS and a race between it and the owner will decide
+ ** the victor. Either way, only thief 1 and one of the other two
+ ** engines can take an item from the deque.
top = dq->MR_sd_top;
size = bot - top;
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 490 bytes
Desc: Digital signature
More information about the reviews