[m-rev.] For review: Enable configuration of parallel conjunction reordering.

Paul Bone pbone at csse.unimelb.edu.au
Wed Aug 8 21:56:59 AEST 2012


For review by anyone.

Enable configuration of parallel conjunction reordering.

Before we implemented loop control we allowed independent parallel
conjunctions to be reordered to transform right parallel recursion into the
more efficient left parallel recursion (now loop control of right recursion
is faster so we use that).  This change allows us to configure the
conjunction reordering which is useful for benchmarking.

The transformation is only applied when:
    1) Not explicitly disabled
    2) The parallel conjunction is completely independent.
    3) Loop control is not enabled..

compiler/dep_par_conj.m:
    Allow the new option to be used to disable the transformation of right
    parallel recursion into left parallel recursion.

compiler/options.m:
    Add the --no-parallel-reorder-right-recursion option.

    Add missing documentation for --par-loop-control and
    --par-loop-control-preserve-tail-recursion options.

    Move options controlling the parallel conjunction transformation from
    the optimisation section to the misc section of options; we don't want
    them changing if they're followed by a -O2 on the command line.

doc/user_guide.texi:
    Document the new option.

    Write commented out documentation for some otherwise undocumented
    options.

    Uncomment the --par-loop-control options.

    Move the commented out documentation for the options controlling the
    parallel conjunction transformation into the misc section as we did in
    options.m.

NEWS:
    Describe the --par-loop-control option ready for the 12.08 release.
    This is the first time we're describing the existence of the option so
    that users can experiment with it before we make it the default.

diff --git a/NEWS b/NEWS
index 5175f85..cc05de0 100644
--- a/NEWS
+++ b/NEWS
@@ -43,6 +43,31 @@ Changes to the Mercury compiler:
 * The option `--warn-non-tail-recursion' no longer requires
   `--high-level-code'.
 
+* A new transformation has been written to avoid pathological behaviour in
+  some parallel loops such as:
+
+  p(...).
+  p(...) :-
+    (
+        q(...)
+    &
+        p(...)
+    ).
+
+  p is a 'right recursive parallel loop', it contains a parallel conjunction
+  that contains a recursive call on the right hand side of a & operator.
+  Code of this form either consumed too much memory or not enough
+  parallelism depending upon configuration.
+
+  The new transformation called Loop Control and presented at POPL'12 (see
+  the Mercury papers page).  It introduces a different method of executing
+  parallel conjunctions of this form.  It can be enabled with the
+  --par-loop-control compilation option, see the users guide.  Please test
+  it out and let us know if it does or does not work for you, it will likely
+  become the default in the future.  If you do test this, consider using the
+  --no-par-reorder-right-recursion option with your control group as it
+  disables the work around we used prior to loop control.
+
 Changes to the Mercury debugger:
 
 * We have added new capabilities to the "level", "retry" and "finish" mdb
diff --git a/compiler/dep_par_conj.m b/compiler/dep_par_conj.m
index 1523079..93170d1 100644
--- a/compiler/dep_par_conj.m
+++ b/compiler/dep_par_conj.m
@@ -482,16 +482,19 @@ maybe_sync_dep_par_conj(Conjuncts, GoalInfo, NewGoal, InstMap, !SyncInfo) :-
         % generate faster code.
         module_info_get_globals(ModuleInfo0, Globals),
         globals.lookup_bool_option(Globals, par_loop_control, ParLoopControl),
+        globals.lookup_bool_option(Globals, par_reorder_right_recursion,
+            ReorderParConj),
         (
             ParLoopControl = no,
+            ReorderParConj = yes
+        ->
             reorder_indep_par_conj(PredProcId, VarTypes0, InstMap, Conjuncts,
                 GoalInfo, NewGoal, ModuleInfo0, ModuleInfo),
             !:SyncInfo = sync_info(ModuleInfo, IgnoreVars, AllowSomePathsOnly,
                 VarSet0, VarTypes0, PredProcId, TSStringTable0)
         ;
-            ParLoopControl = yes,
-            % Don't swap the conjuncts, parallel loop control can do a better
-            % job of optimizing this code.
+            % Don't swap the conjuncts, Leave it to loop control or the user
+            % has asked for it to be disabled.
             NewGoal = hlds_goal(conj(parallel_conj, Conjuncts), GoalInfo)
         )
     ;
diff --git a/compiler/options.m b/compiler/options.m
index 82cfb86..0f39868 100644
--- a/compiler/options.m
+++ b/compiler/options.m
@@ -707,9 +707,6 @@
     ;       tuple_trace_counts_file
     ;       tuple_costs_ratio
     ;       tuple_min_args
-    ;       inline_par_builtins
-    ;       always_specialize_in_dep_par_conjs
-    ;       allow_some_paths_only_waits
     ;       region_analysis
 
     % Stuff for the CTGC system (structure sharing / structure reuse).
@@ -1028,6 +1025,10 @@
     ;       distance_granularity
     ;       implicit_parallelism
     ;       feedback_file
+    ;       inline_par_builtins
+    ;       always_specialize_in_dep_par_conjs
+    ;       allow_some_paths_only_waits
+    ;       par_reorder_right_recursion
     ;       par_loop_control
     ;       par_loop_control_preserve_tail_recursion.
 
@@ -1613,9 +1614,6 @@ option_defaults_2(optimization_option, [
     tuple_trace_counts_file             -   string(""),
     tuple_costs_ratio                   -   int(100),
     tuple_min_args                      -   int(4),
-    inline_par_builtins                 -   bool(no),
-    always_specialize_in_dep_par_conjs  -   bool(no),
-    allow_some_paths_only_waits         -   bool(yes),
     region_analysis                     -   bool(no),
 
     % HLDS -> LLDS
@@ -1914,6 +1912,10 @@ option_defaults_2(miscellaneous_option, [
     distance_granularity                -   int(0),
     implicit_parallelism                -   bool(no),
     feedback_file                       -   string(""),
+    inline_par_builtins                 -   bool(no),
+    always_specialize_in_dep_par_conjs  -   bool(no),
+    allow_some_paths_only_waits         -   bool(yes),
+    par_reorder_right_recursion         -   bool(yes),
     par_loop_control                    -   bool(no),
     par_loop_control_preserve_tail_recursion - bool(no)
 ]).
@@ -2548,11 +2550,6 @@ long_option("tuple",                tuple).
 long_option("tuple-trace-counts-file",  tuple_trace_counts_file).
 long_option("tuple-costs-ratio",    tuple_costs_ratio).
 long_option("tuple-min-args",       tuple_min_args).
-long_option("inline-par-builtins",  inline_par_builtins).
-long_option("always-specialize-in-dep-par-conjs",
-                                    always_specialize_in_dep_par_conjs).
-long_option("allow-some-paths-only-waits",
-                                    allow_some_paths_only_waits).
 
 % CTGC related options.
 long_option("structure-sharing",    structure_sharing_analysis).
@@ -2906,6 +2903,13 @@ long_option("control-granularity",  control_granularity).
 long_option("distance-granularity", distance_granularity).
 long_option("implicit-parallelism", implicit_parallelism).
 long_option("feedback-file",        feedback_file).
+long_option("inline-par-builtins",  inline_par_builtins).
+long_option("always-specialize-in-dep-par-conjs",
+                                    always_specialize_in_dep_par_conjs).
+long_option("allow-some-paths-only-waits",
+                                    allow_some_paths_only_waits).
+long_option("par-reorder-right-recursion",
+                                    par_reorder_right_recursion).
 long_option("par-loop-control",     par_loop_control).
 long_option("par-loop-control-preserve-tail-recursion",
                                     par_loop_control_preserve_tail_recursion).
@@ -5198,16 +5202,6 @@ options_help_hlds_hlds_optimization -->
         % "\ttransformation will consider passing together as a",
         % "\ttuple. This is mostly to speed up the compilation",
         % "\tprocess by not pursuing (presumably) unfruitful searches.",
-% This is for measurements by implementors only.
-%       "--no-inline-par-builtins",
-%       "\tGenerate calls to the predicates of par_builtin.m, instead of",
-%       "\tbodily including their definitions as C code.",
-%       Actually, this is the default for now.
-% This is for measurements by implementors only.
-%       "--always-specialize-in-dep-par-conjs",
-%       "\tWhen the transformation for handling dependent parallel conjunctions",
-%       "\tadds waits and/or signals around a call, create a specialized",
-%       "\tversion of the called procedure, even if this is not profitable.",
 % '--region-analysis' is not documented because it is still experimental.
 %        "--region-analysis",
 %        "\tEnable the analysis for region-based memory management."
@@ -5883,7 +5877,37 @@ options_help_misc -->
 
         "--feedback-file",
         "\tUse the specified profiling feedback file which may currently",
-        "\tonly be processed for implicit parallelism."
+        "\tonly be processed for implicit parallelism.",
+% This is for measurements by implementors only.
+%       "--inline-par-builtins",
+%       "\tGenerate calls to the predicates of par_builtin.m, instead of",
+%       "\tbodily including their definitions as C code.",
+%       Actually, this is the default for now.
+% This is for measurements by implementors only.
+%       "--always-specialize-in-dep-par-conjs",
+%       "\tWhen the transformation for handling dependent parallel conjunctions",
+%       "\tadds waits and/or signals around a call, create a specialized",
+%       "\tversion of the called procedure, even if this is not profitable.",
+%       "--no-allow-some-paths-only-waits",
+%       "\tWait operations for futures used to implement parallel",
+%       "\tconjunctions are placed on some but not all branches.  This",
+%       "\toption disables this optimisation and places waits on all",
+%       "\tbranches that may succeed."
+        "--par-loop-control",
+        "\tEnable the loop control transformation described in the loop",
+        "\tcontrol paper.  This compiles certain parallel loops differently",
+        "\tto make them more efficient.  This is experimental",
+        "--par-loop-control-preserve-tail-recursion",
+        "\tWhen loop control is enabled, and tail recursion can be used,",
+        "\tthen do not use the parent computation's stack frame for",
+        "\tcommunication.  This allow the parent computation to use a tail",
+        "\tcall when recursing.",
+        "--no-par-reorder-right-recursion",
+        "\tIf loop control is disabled then independent right recursive",
+        "\tparallel conjunctions are reordered to avoid pathological",
+        "\tbehaviour.  This option will disable this reordering.  This",
+        "\toption is provided for users who which to create a control group",
+        "\tto test loop control against"
     ]).
 
 :- pred write_tabbed_lines(list(string)::in, io::di, io::uo) is det.
diff --git a/doc/user_guide.texi b/doc/user_guide.texi
index 93dbbc2..090d5fb 100644
--- a/doc/user_guide.texi
+++ b/doc/user_guide.texi
@@ -8971,13 +8971,6 @@ This information is used to reduce the overhead of minimal model tabling.
 @c tuple. This is mostly to speed up the compilation
 @c process by not pursuing (presumably) unfruitful searches.
 
- at c @sp 1
- at c @item --always-specialize-in-dep-par-conjs
- at c @findex --always-specialize-in-dep-par-conjs
- at c When the transformation for handling dependent parallel conjunctions
- at c adds waits and/or signals around a call, create a specialized
- at c version of the called procedure, even if this is not profitable.
-
 @end table
 
 @node MLDS backend (MLDS -> MLDS) optimization options
@@ -9651,17 +9644,55 @@ Use the specified profiling feedback file
 which may currently only be processed for automatic parallelism.
 
 @c @sp 1
- at c @item --loop-control
- at c @findex --loop-control
- at c Enable the loop control transformation for parallel conjunctions.
- at c This causes right-recursive parallel conjunctions to use fewer contexts while
- at c maintaining parallelism.
- at c This transformation is under development, when it is ready it will
- at c probably be enabled by default.
+ at c @item --inline-par-builtins
+ at c @findex --inline-par-builtins
+ at c Always inline procedures used for parallelisation such as futures and
+ at c barrier code rather than generating calls.
 @c
 @c @sp 1
- at c @item --no-loop-control-preserve-tail-recursion
- at c @findex --no-loop-control-preserve-tail-recursion
+ at c @item --always-specialize-in-dep-par-conjs
+ at c @findex --always-specialize-in-dep-par-conjs
+ at c When the transformation for handling dependent parallel conjunctions
+ at c adds waits and/or signals around a call, create a specialized
+ at c version of the called procedure, even if this is not profitable.
+ at c
+ at c @sp 1
+ at c @item --no-allow-some-paths-only-waits
+ at c @findex --no-allow-some-paths-only-waits
+ at c During the introduction of futures for dependent parallel conjunctions
+ at c Mercury allows that some execution paths do not always wait for a future (as
+ at c they do not need its value).
+ at c This is an optimisation that allows fewer wait operations to be placed in
+ at c the generated code.
+ at c The @samp{--no-allow-some-paths-only-waits} option disabled this
+ at c optimisation by placing waits on all execution paths.
+
+ at sp 1
+ at item --par-loop-control
+ at findex --par-loop-control
+Enable the loop control transformation for parallel conjunctions.
+This causes right-recursive parallel conjunctions to use fewer contexts while
+maintaining parallelism.
+This transformation is experimental,
+it will probably become the default one day.
+See also the paper titled "Controlling Loops in Parallel Mercury Code".
+
+ at sp 1
+ at item --par-loop-control-preserve-tail-recursion
+ at findex --par-loop-control-preserve-tail-recursion
+When using either loop control or normal parallel conjunction execution the
+parent computation's stack frame is used to communcate values to and from
+the child computations,
+this normally prohibits tail call optimisation.
+In certain cases (usually cases where the equivilent sequential code would
+be tail recursive)
+we can preserve cail recursion by communicating through the child's stack
+frame,
+allowing the parent to make a tail-call which would otherwise clobber its
+stack frame.
+This option is experimental,
+it may become the default one day.
+
 @c Do not attempt to preserve tail recursion in the loop control transformation.
 @c This option causes all code spawned off using loop control to access it's
 @c parent stack frame through the parent stack pointer.
@@ -9671,6 +9702,19 @@ which may currently only be processed for automatic parallelism.
 @c non tail recursive code.
 @c It is intended for developers only.
 
+ at sp 1
+ at item --no-par-reorder-right-recursion
+ at findex --no-par-reorder-right-recursion
+A parallel conjunction where the first conjunct does not contain a recursive
+call and any other conjunct does is less efficient and can consume a large
+number of contexts.
+To avoid this the compiler will attempt to re-order independent
+right-recursive parallel conjunctions into left recursive ones, unless loop
+control is being used.
+This option disables this optimisation allowing developers to test right
+recursive parallel code.
+See also @samp{--par-loop-control}.
+
 @end table
 
 @node Target code compilation options
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20120808/912d2e80/attachment.sig>


More information about the reviews mailing list