[m-rev.] for post-commit review: Automatic Parallelisation improvements.

Paul Bone pbone at csse.unimelb.edu.au
Wed Aug 4 12:24:03 AEST 2010
Previous message: [m-rev.] for possible review: frameopt performance fix
Next message: [m-rev.] diff: support more grade tests in trace goals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Implement a linear alternative to the exponential algorithm that determines how
best to parallelise a conjunction.

Made other performance improvements.

mdbcomp/feedback.m:
    Add a field to the candidate_parallel_conjunction_params structure giving
    the preference of algorithm.

    Simplify the parallel exec metrics type here.  It is now used only to
    summarise information that has already been calculated.  The original code
    has been moved into deep_profiler/measurements.m

    Add a field to the candidate_par_conjunction structure giving the index
    within the conjunction of the first goal in the partition.  This is used
    for pretty-printing parallelisation reports.

    Incremented the feedback format version number.

deep_profiler/measurements.m:
    Move the original parallel exec metrics type and code here from
    mdbcomp/feedback.m

deep_profiler/create_report.m:
    Avoid a performance issue by memoizing create_proc_var_use_dump_report
    which is called by the analysis for the same procedure (at different
    dynamic call sites) many times.  In simple cases this more than doubled the
    execution time, in more complicated cases it should perform even better.

    Conform to changes in coverage.m

deep_profiler/mdprof_fb.automatic_parallelism.m:
    Implement the linear algorithm for parallelising a conjunction.

    Since we don't to parallelism specialisation don't try to parallelise the
    same procedure more than once.  This should avoid some performance problems
    but I haven't tested it.

    If it is impossible to generate an independent parallelisation generate a
    dependent one and then report it as something we cannot parallelise.  This
    can help programmers write more independent code.

    Use directed graphs rather than lookup maps to track dependencies.  This
    simplifies some code as the digraph standard library module already has
    code to compute reverse graphs and transitive closures of the graphs.

    Since there are now two parallelisation algorithms; code common to both of
    them has been factored out.

    The objective function used by the branch and bound search has been
    modified to take into account the overheads of parallel execution.  It is:
        minimise(ParTime + ParOverheads X 2.0)
    This way we allow the overheads to increase by 1csc provided that it
    reduces ParTime by more than 2csc.  (csc = call sequence counts)

    When pretty-printing parallelisation reports print each goal in the
    parallelised conjunction with it's new goal path.  This makes debugging
    easier for large procedures.

    Fix a bug where the goal path of scope goals was calculated incorrectly,
    this lead to a thrown exception in the coverage analysis code when it used
    the goalpath to lookup the call site of a call.

deep_profiler/mdprof_feedback.m:
    Support a new command line option for choosing which algorithm to use.
    Additionally the linear algorithm will be used if the problem is above a
    certain size and the exponential algorithm was chosen.  This can be
    configured including the fallback threshold.

    Print the user's choice of algorithm as part of the candidate parallel
    conjunctions report.

deep_profiler/message.m:
    Add an extra log message type for exceptions thrown during auto
    parallelisation.

deep_profiler/program_representation_utils.m:
    The goal_rep pretty printer now prints the goal path for each goal.

deep_profiler/coverage.m:
    procrep_annotate_with_coverage now catches and returns exceptions in a
    maybe_error result.

deep_profiler/cliques.m:
    Copy predicates from the standard library into cliques.m to prevent the
    lack of tail recursion from blowing the stack in some cases.  (cliques.m is
    compiled with --trace minimum).

deep_profiler/callgraph.m:
    Copy list.foldl from the standard library into callgraph.m and transform it
    so that it is less likely to smash the stack in non tail-recursive grades.

deep_profiler/read_profile.m:
    Transform read_nodes so that it is less likely to smash the stack in non
    tail-recursive grades.

deep_profiler/Mercury.options:
    Removed old options that where used to work around a bug.  The bug still
    exists but the work-around moved into the compiler long ago.

Index: deep_profiler/Mercury.options
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/Mercury.options,v
retrieving revision 1.15
diff -u -p -b -r1.15 Mercury.options
--- deep_profiler/Mercury.options	14 Jul 2010 00:40:22 -0000	1.15
+++ deep_profiler/Mercury.options	3 Aug 2010 04:39:41 -0000
@@ -6,35 +6,9 @@
 # Mercury.options - module-specific flags for Mmake and `mmc --make'.
 #-----------------------------------------------------------------------------#
 
-# --optimize-duplicate-calls interacts badly with the current definition
-# of the mode `array_uo'. `array_uo' is supposed to be a unique output,
-# but is currently defined as plain, nonunique output. This allows the
-# duplicate call optimization to merge two calls that generate separate arrays,
-# even though those arrays will be updated destructively later on.
-#
-# At the moment, this causes the compiler to miscompile topological_sort/2
-# in cliques.m, which (after inlining dfs_graph) has two seemingly identical
-# calls to dense_bitset__init, whose output has mode `array_uo'.
-#
-# We therefore turn off this optimization in all modules that may deal
-# with arrays. This should not cause any performance problem, since there
-# are no known duplicate calls in any performance critical predicates.
-#
-# XXX These flags should be deleted when the problem is fixed.
-
-MCFLAGS-array_util =	--no-optimize-duplicate-calls --trace minimum
-MCFLAGS-callgraph = 	--no-optimize-duplicate-calls
-MCFLAGS-canonical =	--no-optimize-duplicate-calls
-MCFLAGS-cliques = 	--no-optimize-duplicate-calls --trace minimum
-MCFLAGS-dense_bitset = 	--no-optimize-duplicate-calls 
-MCFLAGS-exclude = 	--no-optimize-duplicate-calls
-MCFLAGS-html_format = 	--no-optimize-duplicate-calls
-MCFLAGS-measurements = 	--no-optimize-duplicate-calls
-MCFLAGS-profile = 	--no-optimize-duplicate-calls
-MCFLAGS-query = 	--no-optimize-duplicate-calls
-MCFLAGS-read_profile = 	--no-optimize-duplicate-calls --trace minimum
-MCFLAGS-startup = 	--no-optimize-duplicate-calls
-MCFLAGS-top_procs = 	--no-optimize-duplicate-calls
+MCFLAGS-array_util = --trace minimum
+MCFLAGS-cliques = --trace minimum
+MCFLAGS-read_profile = --trace minimum
 
 # Uncomment this to debug the automatic parallelism code.
 #MCFLAGS-branch_and_bound = \
Index: deep_profiler/callgraph.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/callgraph.m,v
retrieving revision 1.11
diff -u -p -b -r1.11 callgraph.m
--- deep_profiler/callgraph.m	18 Aug 2008 02:14:51 -0000	1.11
+++ deep_profiler/callgraph.m	4 Aug 2010 00:49:03 -0000
@@ -55,9 +55,39 @@ find_cliques(InitDeep, BottomUpPDPtrCliq
     % because the list may be very long and map runs out of stack space, and
     % we want the final list in reverse order anyway because the propagation
     % algorithm works bottom up.
-    list.foldl(accumulate_pdptr_lists, TopDownPDICliqueList,
+    callgraph.foldl(accumulate_pdptr_lists, TopDownPDICliqueList,
         [], BottomUpPDPtrCliqueList).
 
+    % This version of foldl is safer when tail recursion isn't available.
+    %
+:- pred foldl(pred(X, A, A), list(X), A, A).
+:- mode foldl(pred(in, in, out) is det, in, in, out) is det.
+
+foldl(P, !.L, !A) :-
+    foldl_2(100000, P, !L, !A),
+    (
+        !.L = []
+    ;
+        !.L = [_ | _],
+        callgraph.foldl(P, !.L, !A)
+    ).
+
+:- pred foldl_2(int, pred(X, A, A), list(X), list(X), A, A).
+:- mode foldl_2(in, pred(in, in, out) is det, in, out, in, out) is det.
+
+foldl_2(Depth, P, !Xs, !A) :-
+    ( Depth > 0 ->
+        (
+            !.Xs = []
+        ;
+            !.Xs = [X | !:Xs],
+            P(X, !A),
+            foldl_2(Depth - 1, P, !Xs, !A)
+        )
+    ;
+        true
+    ).
+
 :- pred accumulate_pdptr_lists(set(int)::in, list(list(proc_dynamic_ptr))::in,
     list(list(proc_dynamic_ptr))::out) is det.
 
Index: deep_profiler/cliques.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/cliques.m,v
retrieving revision 1.10
diff -u -p -b -r1.10 cliques.m
--- deep_profiler/cliques.m	30 Apr 2010 13:09:54 -0000	1.10
+++ deep_profiler/cliques.m	3 Aug 2010 04:39:41 -0000
@@ -129,7 +129,7 @@ topological_sort(Graph, Cliques) :-
 
     Visit = dense_bitset.init,
     tsort(Dfs, InvGraph, Visit, [], Cliques0),
-    list.reverse(Cliques0, Cliques),
+    reverse(Cliques0, [], Cliques),
 
     trace [compiletime(flag("tsort")), io(!IO)] (
         io.nl(!IO),
@@ -139,6 +139,16 @@ topological_sort(Graph, Cliques) :-
         io.nl(!IO)
     ).
 
+    % This is a copy of list.reverse_2, we copy it here so that it can be
+    % compiled with --trace minimum even when the version in the standard
+    % library is compiled with --trace deep.
+    %
+:- pred reverse(list(T)::in, list(T)::in, list(T)::out) is det.
+
+reverse([], L, L).
+reverse([X | Xs], L0, L) :-
+    reverse(Xs, [X | L0], L).
+
 :- pred tsort(list(int)::in, graph::in, visit::array_di, list(set(int))::in,
     list(set(int))::out) is det.
 
Index: deep_profiler/coverage.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/coverage.m,v
retrieving revision 1.3
diff -u -p -b -r1.3 coverage.m
--- deep_profiler/coverage.m	16 Apr 2009 06:15:56 -0000	1.3
+++ deep_profiler/coverage.m	3 Aug 2010 04:39:41 -0000
@@ -24,6 +24,7 @@
 :- import_module report.
 
 :- import_module map.
+:- import_module maybe.
 
 :- type coverage_info
     --->    coverage_unknown
@@ -45,25 +46,26 @@
 
 %----------------------------------------------------------------------------%
 
-
     % Annotate the program representation structure with coverage information.
     %
 :- pred procrep_annotate_with_coverage(own_prof_info::in,
     map(goal_path, call_site_perf)::in, map(goal_path, coverage_point)::in,
     map(goal_path, coverage_point)::in, proc_rep::in,
-    proc_rep(coverage_info)::out) is det.
+    maybe_error(proc_rep(coverage_info))::out) is det.
 
 %-----------------------------------------------------------------------------%
 
 :- implementation.
 
+:- import_module message.
 :- import_module program_representation_utils.
 
 :- import_module bool.
+:- import_module cord.
+:- import_module exception.
 :- import_module int.
 :- import_module io.
 :- import_module list.
-:- import_module maybe.
 :- import_module require.
 :- import_module string.
 :- import_module unit.
@@ -107,7 +109,7 @@ get_coverage_before_and_after(coverage_k
     %       view erroneous output.
     %
 procrep_annotate_with_coverage(OwnProf, CallSites, SolnsCoveragePoints,
-        BranchCoveragePoints, !ProcRep) :-
+        BranchCoveragePoints, !.ProcRep, MaybeProcRep) :-
     some [!ProcDefn, !GoalRep] (
         !:ProcDefn = !.ProcRep ^ pr_defn,
         !:GoalRep = !.ProcDefn ^ pdr_goal,
@@ -117,12 +119,24 @@ procrep_annotate_with_coverage(OwnProf, 
         Before = before_known(Calls),
         CoverageReference = coverage_reference_info(ProcLabel, CallSites,
             SolnsCoveragePoints, BranchCoveragePoints),
+        promise_equivalent_solutions [MaybeProcRep]
+        ( try []
         goal_annotate_coverage(CoverageReference, empty_goal_path,
-            Before, After, !GoalRep),
+                Before, After, !GoalRep)
+        then
         require(unify(After, after_known(Exits)),
-            "Coverage after procedure not equal with exit count of procedure"),
+                "Coverage after procedure not equal with exit count of" ++
+                " procedure"),
         !:ProcDefn = !.ProcDefn ^ pdr_goal := !.GoalRep,
-        !:ProcRep = !.ProcRep ^ pr_defn := !.ProcDefn
+            !:ProcRep = !.ProcRep ^ pr_defn := !.ProcDefn,
+            MaybeProcRep = ok(!.ProcRep)
+        catch software_error(Error) ->
+            location_to_string(0, proc(ProcLabel), ProcNameCord),
+            ProcName = string.append_list(list(ProcNameCord)),
+            MaybeProcRep = error(format(
+                "While calculating coverage for %s: %s",
+                [s(ProcName), s(Error)]))
+        )
     ).
 
     % These maps are keyed by goal_path, comparing these structures is less
Index: deep_profiler/create_report.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/create_report.m,v
retrieving revision 1.19
diff -u -p -b -r1.19 create_report.m
--- deep_profiler/create_report.m	2 Oct 2009 04:37:57 -0000	1.19
+++ deep_profiler/create_report.m	3 Aug 2010 04:39:41 -0000
@@ -1106,9 +1106,15 @@ create_procrep_coverage_report(Deep, PSP
 
                 procrep_annotate_with_coverage(Own, CallSitesMap,
                     SolnsCoveragePointMap, BranchCoveragePointMap,
-                    ProcRep0, ProcRep),
+                    ProcRep0, MaybeProcRep),
+                (
+                    MaybeProcRep = ok(ProcRep),
                 MaybeReport = ok(procrep_coverage_info(PSPtr, ProcRep))
             ;
+                    MaybeProcRep = error(Error),
+                    MaybeReport = error(Error)
+                )
+            ;
                 MaybeReport =
                     error("Program Representation doesn't contain procedure: "
                         ++ string(PSPtr))
@@ -1242,6 +1248,8 @@ create_clique_dump_report(Deep, CliquePt
         MaybeCliqueDumpInfo = error("invalid clique_ptr")
     ).
 
+:- pragma memo(create_proc_var_use_dump_report/3,
+    [specified([addr, value, output])]).
 create_proc_var_use_dump_report(Deep, PSPtr, MaybeProcVarUseDump) :-
     generic_vars_first_use(head_vars_all, Deep, PSPtr, set.init,
         MaybeProcVarUseDump).
Index: deep_profiler/mdprof_fb.automatic_parallelism.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_fb.automatic_parallelism.m,v
retrieving revision 1.10
diff -u -p -b -r1.10 mdprof_fb.automatic_parallelism.m
--- deep_profiler/mdprof_fb.automatic_parallelism.m	14 Jul 2010 00:40:22 -0000	1.10
+++ deep_profiler/mdprof_fb.automatic_parallelism.m	4 Aug 2010 01:08:37 -0000
@@ -72,6 +72,7 @@
 
 :- import_module array.
 :- import_module assoc_list.
+:- import_module digraph.
 :- import_module float.
 :- import_module io.
 :- import_module list.
@@ -81,6 +82,7 @@
 :- import_module pqueue.
 :- import_module require.
 :- import_module set.
+:- import_module set_tree234.
 :- import_module string.
 :- import_module svmap.
 :- import_module svset.
@@ -112,7 +114,7 @@ candidate_parallel_conjunctions(Params, 
     RootCliqueCost = build_cs_cost_csq(1, float(TotalCallseqs) + 1.0),
     RootParallelism = no_parallelism,
     candidate_parallel_conjunctions_clique(Params, Deep, RootCliqueCost,
-        RootParallelism, RootCliquePtr, ConjunctionsMap, Messages),
+        RootParallelism, RootCliquePtr, ConjunctionsMap, Messages, init, _),
 
     map.to_assoc_list(ConjunctionsMap, ConjunctionsAssocList0),
     map_values_only(convert_candidate_par_conjunctions_proc(
@@ -222,7 +224,8 @@ pard_goal_detail_annon_to_pard_goal_anno
 :- type pard_goals_partition
     --->    pard_goals_partition(
                 pgp_goals               :: list(pard_goal_detail),
-                pgp_partition_num       :: int
+                pgp_partition_num       :: int,
+                pgp_first_conj_num      :: int
             ).
 
 %----------------------------------------------------------------------------%
@@ -242,10 +245,12 @@ pard_goal_detail_annon_to_pard_goal_anno
 :- pred candidate_parallel_conjunctions_clique(
     candidate_par_conjunctions_params::in, deep::in, cs_cost_csq::in,
     parallelism_amount::in, clique_ptr::in, candidate_par_conjunctions::out,
-    cord(message)::out) is det.
+    cord(message)::out,
+    set_tree234(proc_static_ptr)::in, set_tree234(proc_static_ptr)::out) 
+    is det.
 
 candidate_parallel_conjunctions_clique(Opts, Deep, ParentCSCostInfo, ParentParallelism,
-        CliquePtr, Candidates, Messages) :-
+        CliquePtr, Candidates, Messages, !VisitedProcs) :-
     create_clique_report(Deep, CliquePtr, MaybeCliqueReport),
     (
         MaybeCliqueReport = ok(CliqueReport),
@@ -261,7 +266,8 @@ candidate_parallel_conjunctions_clique(O
         make_clique_proc_map(CliqueProcs, CliqueProcMap),
         candidate_parallel_conjunctions_clique_proc(Opts, Deep,
             CliqueIsRecursive, ParentCSCostInfo, ParentParallelism, set.init,
-            CliqueProcMap, CliquePtr, FirstCliqueProc, Candidates, Messages)
+            CliqueProcMap, CliquePtr, FirstCliqueProc, Candidates, Messages,
+            !VisitedProcs)
     ;
         MaybeCliqueReport = error(Error),
         error(this_file ++ Error),
@@ -342,16 +348,21 @@ make_clique_proc_map(CliqueProcs, Clique
     cs_cost_csq::in, parallelism_amount::in, set(proc_desc)::in, 
     map(proc_desc, clique_proc_report)::in, clique_ptr::in,
     clique_proc_report::in, candidate_par_conjunctions::out,
-    cord(message)::out) is det.
+    cord(message)::out,
+    set_tree234(proc_static_ptr)::in, set_tree234(proc_static_ptr)::out) 
+    is det.
 
 candidate_parallel_conjunctions_clique_proc(Opts, Deep, 
         CliqueIsRecursive, ParentCSCostInfo, ParentParallelism, ProcsAnalysed0,
-        CliqueProcMap, CliquePtr, CliqueProc, Candidates, Messages) :-
+        CliqueProcMap, CliquePtr, CliqueProc, Candidates, Messages,
+        !VisitedProcs) :-
     some [!Messages]
     (
         !:Messages = cord.empty,
+            
         CliqueProcCalls = CliqueProc ^ cpr_proc_summary ^ perf_row_calls,
-        cs_cost_to_proc_cost(ParentCSCostInfo, CliqueProcCalls, ParentCostInfo),
+        cs_cost_to_proc_cost(ParentCSCostInfo, CliqueProcCalls,
+            ParentCostInfo),
         % Determine the costs of the call sites in the procedure.
         (
             CliqueIsRecursive = clique_is_recursive,
@@ -359,26 +370,36 @@ candidate_parallel_conjunctions_clique_p
                 ParentCostInfo, RecursiveCallSiteCostMap, CSCMMessages),
             !:Messages = !.Messages ++ CSCMMessages,
             trace [compile_time(flag("debug_cpc_search")), io(!IO)] (
-                format_recursive_call_site_cost_map(RecursiveCallSiteCostMap,
-                    PrettyCostMapCord),
+                format_recursive_call_site_cost_map(
+                    RecursiveCallSiteCostMap, PrettyCostMapCord),
                 PrettyCostMap = append_list(cord.list(PrettyCostMapCord)),
-                io.format(
-                    "D: In clique %s recursive call site cost map is:\n%s\n",
+                io.format("D: In clique %s recursive call site cost map"
+                        ++ " is:\n%s\n",
                     [s(string(CliquePtr)), s(PrettyCostMap)],
-                    !IO)
+                    !IO),
+                io.flush_output(!IO)
             )
         ;
             CliqueIsRecursive = clique_is_not_recursive,
             RecursiveCallSiteCostMap = map.init
         ),
 
+        ProcDesc = CliqueProc ^ cpr_proc_summary ^ perf_row_subject,
+        PsPtr = ProcDesc ^ pdesc_ps_ptr,
+        (
+            not member(!.VisitedProcs, PsPtr)
+        ->
+            insert(PsPtr, !VisitedProcs),
+
         % Analyse this procedure for parallelism opportunities.
         candidate_parallel_conjunctions_proc(Opts, Deep, CliqueProc,
-            RecursiveCallSiteCostMap, ProcCandidates, ProcCandidatesByGoalPath,
-            ProcMessages),
-        !:Messages = !.Messages ++ ProcMessages,
-
-        ProcDesc = CliqueProc ^ cpr_proc_summary ^ perf_row_subject,
+                RecursiveCallSiteCostMap, ProcCandidates,
+                ProcCandidatesByGoalPath, ProcMessages),
+            !:Messages = !.Messages ++ ProcMessages
+        ;
+            ProcCandidates = map.init,
+            ProcCandidatesByGoalPath = multi_map.init
+        ),
 
         % Get a list of call sites
         ( CliqueProc ^ cpr_other_proc_dynamics = [_ | _] ->
@@ -395,11 +416,11 @@ candidate_parallel_conjunctions_clique_p
         % cases call candidate_parallel_conjunctions_clique_proc on the
         % procedure within this clique that they call.
         set.insert(ProcsAnalysed0, ProcDesc, ProcsAnalysed),
-        list.map2(candidate_parallel_conjunctions_call_site(Opts, Deep,
+        list.map2_foldl(candidate_parallel_conjunctions_call_site(Opts, Deep,
                 ProcsAnalysed, CliqueIsRecursive, RecursiveCallSiteCostMap,
                 CliqueProcMap, CliquePtr, ParentParallelism,
                 ProcCandidatesByGoalPath),
-            CallSiteReports, CSCandidatesList, CSMessagesList),
+            CallSiteReports, CSCandidatesList, CSMessagesList, !VisitedProcs),
       
         list.foldl(map.union(merge_candidate_par_conjs_proc),
             CSCandidatesList, map.init, CSCandidates),
@@ -433,13 +454,14 @@ merge_candidate_par_conjs_proc(A, B, Res
     parallelism_amount::in,
     multi_map(string, candidate_par_conjunction(pard_goal_detail))::in,
     clique_call_site_report::in,
-    candidate_par_conjunctions::out,
-    cord(message)::out) is det.
+    candidate_par_conjunctions::out, cord(message)::out,
+    set_tree234(proc_static_ptr)::in, set_tree234(proc_static_ptr)::out)
+    is det.
 
 candidate_parallel_conjunctions_call_site(Opts, Deep, ProcsAnalysed,
         CliqueIsRecursive, RecursiveCallSiteCostMap, CliqueProcMap, CliquePtr,
         ParentParallelism, ProcParallelismCandidates,
-        CallSiteReport, Candidates, Messages) :-
+        CallSiteReport, Candidates, Messages, !VisitedProcs) :-
     % XXX: This does not weight the callees by the probability that they will
     % be called.  This is only a problem for higher order call sites.
     CalleePerfs = CallSiteReport ^ ccsr_callee_perfs,
@@ -449,10 +471,10 @@ candidate_parallel_conjunctions_call_sit
         ParallelismProbability, SubParallelism),
     sub_computation_parallelism(ParentParallelism, ParallelismProbability,
         SubParallelism, Parallelism), 
-    list.map2(candidate_parallel_conjunctions_callee(Opts, Deep, ProcsAnalysed,
-            CliqueIsRecursive, RecursiveCallSiteCostMap, CliqueProcMap,
-            CliquePtr, CallSiteDesc, Parallelism),
-        CalleePerfs, CandidatesList, MessagesList),
+    list.map2_foldl(candidate_parallel_conjunctions_callee(Opts, Deep,
+            ProcsAnalysed, CliqueIsRecursive, RecursiveCallSiteCostMap,
+            CliqueProcMap, CliquePtr, CallSiteDesc, Parallelism),
+        CalleePerfs, CandidatesList, MessagesList, !VisitedProcs),
     list.foldl(map.union(merge_candidate_par_conjs_proc), CandidatesList, 
         map.init, Candidates),
     Messages = cord_list_to_cord(MessagesList).
@@ -520,12 +542,13 @@ parallel_amount(SeqConjs, Conj, Parallel
     clique_is_recursive::in, map(goal_path, cs_cost_csq)::in,
     map(proc_desc, clique_proc_report)::in, clique_ptr::in, call_site_desc::in,
     parallelism_amount::in, perf_row_data(clique_desc)::in,
-    candidate_par_conjunctions::out, cord(message)::out) is det.
+    candidate_par_conjunctions::out, cord(message)::out,
+    set_tree234(proc_static_ptr)::in, set_tree234(proc_static_ptr)::out) is det.
 
 candidate_parallel_conjunctions_callee(Opts, Deep, ProcsAnalysed0,
         ParentCliqueIsRecursive, RecursiveCallSiteCostMap, CliqueProcReportMap,
         ParentCliquePtr, CallSiteDesc, Parallelism, CliquePerf, Candidates,
-        Messages) :-
+        Messages, !VisitedProcs) :-
     CliqueDesc = CliquePerf ^ perf_row_subject,
     CliquePtr = CliqueDesc ^ cdesc_clique_ptr,
     CliqueEntryProc = CliqueDesc ^ cdesc_entry_member,
@@ -547,7 +570,7 @@ candidate_parallel_conjunctions_callee(O
             candidate_parallel_conjunctions_clique_proc(Opts, Deep,
                 ParentCliqueIsRecursive, CallCostInfo, Parallelism, 
                 ProcsAnalysed, CliqueProcReportMap, ParentCliquePtr,
-                CliqueProcReport, Candidates, Messages)
+                CliqueProcReport, Candidates, Messages, !VisitedProcs)
         )
     ;
         CSCost = build_cs_cost_from_perf(CliquePerf),
@@ -569,7 +592,8 @@ candidate_parallel_conjunctions_callee(O
                 io.format("D: Not entering clique: %d, " ++ 
                     "it is below the clique threashold\n  " ++
                     "It has per-call cost %f and is called %f times\n\n",
-                    [i(CliqueNum), f(CSPercallCost), f(CSCalls)], !IO)
+                    [i(CliqueNum), f(CSPercallCost), f(CSCalls)], !IO),
+                io.flush_output(!IO)
             )
         ;
             % Also check check if the desired amount of parallelism has been
@@ -584,11 +608,12 @@ candidate_parallel_conjunctions_callee(O
                 io.format("D: Not entering clique: %d, " ++
                     "no parallel resources left\n  " ++
                     "There are already %s parallel resources used\n\n",
-                    [i(CliqueNum), s(string(Parallelism))], !IO)
+                    [i(CliqueNum), s(string(Parallelism))], !IO),
+                io.flush_output(!IO)
             )
         ;
             candidate_parallel_conjunctions_clique(Opts, Deep, CSCost,
-                Parallelism, CliquePtr, Candidates, Messages)
+                Parallelism, CliquePtr, Candidates, Messages, !VisitedProcs)
         )
     ).
 
@@ -612,6 +637,9 @@ build_recursive_call_site_cost_map(Deep,
         ProcDesc = PerfRowData ^ perf_row_subject,
         proc_label_from_proc_desc(Deep, ProcDesc, ProcLabel),
 
+        % XXX: Make this ITE a switch.
+        % XXX: Handle mutual recursion for the cases that we know how to
+        % analyse.
         ( CliqueProc ^ cpr_other_proc_dynamics = [_ | _] ->
             append_message(proc(ProcLabel),
                 error_extra_proc_dynamics_in_clique_proc, !Messages)
@@ -1306,11 +1334,12 @@ ite_get_conjunctions_worth_parallelising
 
 conj_build_candidate_conjunctions(Info, Conjs, GoalPath, Messages, 
         Candidates) :-
+    ProcLabel = Info ^ ipi_proc_label,
+    Location = goal(ProcLabel, GoalPath),
+    promise_equivalent_solutions [Messages, Candidates]
     some [!Messages] 
     (
         !:Messages = cord.empty,
-        ProcLabel = Info ^ ipi_proc_label,
-        Location = goal(ProcLabel, GoalPath),
 
         map_foldl(goal_to_pard_goal(Info, GoalPath, 
             ( func(Num) = step_conj(Num) ) ), Conjs, PardGoals, 1, _),
@@ -1319,9 +1348,9 @@ conj_build_candidate_conjunctions(Info, 
             append_message(Location,
                 info_found_conjs_above_callsite_threshold(NumCostlyCalls),
                 !Messages), 
-            % We don't parallelise across non-atomic goals, so split a list of
-            % pard goals into partitions where non-atomic goals seperate the
-            % partitions.
+            % We don't parallelise across non-atomic goals, so split a list
+            % of pard goals into partitions where non-atomic goals separate
+            % the partitions.
             partition_pard_goals(Location, PardGoals, [], _, 
                 1, _NumPartitions, 0, _, [], PartitionedGoals, !Messages),
             map(pardgoals_build_candidate_conjunction(Info, Location,
@@ -1360,8 +1389,8 @@ count_costly_calls(Goal, !NumCostlyCalls
 partition_pard_goals(Location, [], !Partition, !PartitionNum, !NumCostlyCalls,
         !Partitions, !Messages) :-
     ( !.NumCostlyCalls > 1 ->
-        Partition = 
-            pard_goals_partition(reverse(!.Partition), !.PartitionNum),
+        partition_pard_goals_build_partition(!.Partition, !.PartitionNum,
+            Partition),
         !:Partitions = [ Partition | !.Partitions ]
     ;
         true     
@@ -1372,6 +1401,7 @@ partition_pard_goals(Location, [], !Part
     ;
         true
     ),
+    !:Partition = [],
     reverse(!Partitions).
 partition_pard_goals(Location, [ PG | PGs ], !Partition, !PartitionNum,
         !NumCostlyCalls, !Partitions, !Messages) :-
@@ -1392,8 +1422,8 @@ partition_pard_goals(Location, [ PG | PG
     ;
         PGType = pgt_non_atomic_goal,
         ( !.NumCostlyCalls > 1 ->
-            Partition = 
-                pard_goals_partition(reverse(!.Partition), !.PartitionNum),
+            partition_pard_goals_build_partition(!.Partition, !.PartitionNum,
+                Partition),
             !:Partitions = [ Partition | !.Partitions ]
         ;
             append_message(Location,
@@ -1407,6 +1437,28 @@ partition_pard_goals(Location, [ PG | PG
     partition_pard_goals(Location, PGs, !Partition, !PartitionNum,
         !NumCostlyCalls, !Partitions, !Messages).
 
+:- pred partition_pard_goals_build_partition(list(pard_goal_detail)::in,
+    int::in, pard_goals_partition::out) is det.
+
+partition_pard_goals_build_partition(RevGoals, PartitionNum, Partition) :-
+    reverse(RevGoals, Goals),
+    (
+        Goals = [FirstGoal | _],
+        FirstGoalPath = FirstGoal ^ goal_annotation ^ pgd_original_path,
+        (
+            step_conj(ConjNumPrime) = goal_path_get_last(FirstGoalPath)
+        ->
+            ConjNum = ConjNumPrime
+        ;
+            error(this_file ++ "Expected goal to be part of a conjunction")
+        )
+    ;
+        Goals = [],
+        error(this_file ++ "Trying to build empty goal partition")
+    ),
+    Partition = 
+        pard_goals_partition(Goals, PartitionNum, ConjNum).
+
 :- pred pardgoals_build_candidate_conjunction(implicit_parallelism_info::in,
     program_location::in, goal_path::in, pard_goals_partition::in,
     maybe(candidate_par_conjunction(pard_goal_detail))::out) is det.
@@ -1420,19 +1472,24 @@ pardgoals_build_candidate_conjunction(In
     % efficient.  However if goals within other parallel conjuncts depend on
     % them and don't depend upon the first costly call then this would make the
     % conjunction dependant when it could be independent.
-    pard_goals_partition(Goals, PartNum) = GoalsPartition,
+    pard_goals_partition(Goals, PartNum, FirstConjNum) = GoalsPartition,
     find_best_parallelisation(Info, Location, PartNum, Goals,
-        MaybeParallelisation),
-    (
-        MaybeParallelisation = no,
-        MaybeCandidate = no
-    ;
-        MaybeParallelisation = yes(bp_parallel_execution(GoalsBefore, ParConjs,
-            GoalsAfter, IsDependent, Metrics)),
+        BestParallelisation),
+    ParalleliseDepConjs = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+    BestParallelisation = bp_parallel_execution(GoalsBefore, ParConjs,
+        GoalsAfter, IsDependent, Metrics),
         Speedup = parallel_exec_metrics_get_speedup(Metrics),
         Candidate = candidate_par_conjunction(goal_path_to_string(GoalPath),
-            PartNum, IsDependent, GoalsBefore, ParConjs, GoalsAfter, Metrics),
-        ( Speedup > 1.0 ->
+        PartNum, FirstConjNum, IsDependent, GoalsBefore, ParConjs, GoalsAfter,
+        Metrics),
+    (
+        Speedup > 1.0,
+        (
+            ParalleliseDepConjs = do_not_parallelise_dep_conjs
+        =>
+            IsDependent = conjuncts_are_independent
+        )
+    ->
             MaybeCandidate = yes(Candidate)
         ;
             MaybeCandidate = no,
@@ -1455,9 +1512,9 @@ pardgoals_build_candidate_conjunction(In
                 create_candidate_parallel_conj_report(VarTable,
                     ProcLabel, FBCandidate, Report),
                 io.write_string("Not parallelising conjunction, " ++ 
-                    "insufficient speedup:\n", !IO),
-                io.write_string(append_list(cord.list(Report)), !IO)
-            )
+                "insufficient speedup or too dependant:\n", !IO),
+            io.write_string(append_list(cord.list(Report)), !IO),
+            io.flush_output(!IO)
         )
     ).
 
@@ -1509,52 +1566,42 @@ find_costly_call(Goals, GoalsBefore0, Re
                 bp_par_exec_metrics     :: parallel_exec_metrics
             ).
 
-:- type incomplete_parallelisation
-    --->    incomplete_parallelisation(
-                ip_goals_before         :: list(pard_goal_detail),
-                ip_par_conjs            :: list(seq_conj(pard_goal_detail)),
-                ip_par_exec_overlap     :: parallel_execution_overlap,
-                ip_par_exec_metrics     :: parallel_exec_metrics_incomplete
-            ).
-
 :- pred find_best_parallelisation(implicit_parallelism_info::in, 
     program_location::in, int::in, 
-    list(pard_goal_detail)::in, maybe(best_parallelisation)::out) is det.
+    list(pard_goal_detail)::in, best_parallelisation::out) is det.
 
 find_best_parallelisation(Info, Location, PartNum, Goals,
-        MaybeBestParallelisation) :-
-    preprocess_conjunction(Goals, PreprocessedGoals),
-    PreprocessedGoals = goals_for_parallelisation(GoalGroups, 
-        DependencyMaps, CostlyCallsIndexes, NumCalls),
-    ( last(CostlyCallsIndexes, LastCostlyCallIndexPrime) ->
-        LastCostlyCallIndex = LastCostlyCallIndexPrime
+        BestParallelisation) :-
+    % Decide which algorithm to use.
+    ConjunctionSize = length(Goals),
+    choose_algorithm(Info, ConjunctionSize, Algorithm),
+    
+    preprocess_conjunction(Goals, Algorithm, PreprocessedGoals),
+    (
+        Algorithm = bpa_complete_bnb(_),
+        find_best_parallelisation_complete_bnb(Info, Location, PartNum,
+            PreprocessedGoals, BestParallelisation)
     ;
-        error(this_file ++ "no costly calls found")
-    ),
-    branch_and_bound(
-        generate_parallelisations(Info, Location, PartNum, LastCostlyCallIndex, 
-            NumCalls, DependencyMaps, GoalGroups),
-        parallelisation_get_par_time,
-        Solutions, Profile),
+        Algorithm = bpa_greedy,
+        find_best_parallelisation_greedy(Info, Location, PartNum,
+            PreprocessedGoals, BestParallelisation)
+    ).
     
-    trace [compile_time(flag("debug_branch_and_bound")), io(!IO)] (
-        io.format("D: Find best parallelisation for:\n%s\n",
-            [s(LocationString)], !IO),
-        location_to_string(1, Location, LocationCord),
-        string.append_list(list(LocationCord), LocationString),
-        io.format("D: Problem size: %d Solutions: %d\n",
-            [i(length(GoalGroups)), i(count(Solutions))], !IO),
-        io.format("D: Branch and bound profile: %s\n\n",
-            [s(string(Profile))], !IO)
-    ),
+:- pred choose_algorithm(implicit_parallelism_info::in,
+    int::in, best_par_algorithm::out) is det.
     
+choose_algorithm(Info, ConjunctionSize, Algorithm) :-
+    Algorithm0 = Info ^ ipi_opts ^ cpcp_best_par_alg,
     ( 
-        promise_equivalent_solutions [BestParallelisation]
-        member(BestParallelisation, Solutions) 
-    ->
-        MaybeBestParallelisation = yes(BestParallelisation)
+        Algorithm0 = bpa_complete_bnb(Limit),
+        ( Limit \= 0, Limit < ConjunctionSize ->
+            Algorithm = bpa_greedy
     ;
-        MaybeBestParallelisation = no
+            Algorithm = Algorithm0
+        )
+    ;
+        Algorithm0 = bpa_greedy,
+        Algorithm = Algorithm0
     ).
 
 :- type goal_group(T)
@@ -1564,6 +1611,25 @@ find_best_parallelisation(Info, Location
 :- inst gg_singleton
     --->    gg_singleton(ground, ground).
 
+:- pred gg_get_details(goal_group(T)::in, int::out, 
+    list(pard_goal_detail)::out, T::out) is det.
+
+gg_get_details(gg_singleton(Goal, P), 1, [Goal], P).
+gg_get_details(gg_group(Num, Goals, P), Num, Goals, P).
+
+:- func new_group(list(pard_goal_detail), T) = goal_group(T).
+
+new_group([], _) = _ :-
+    error(this_file ++ "Can not construct empty goal group").
+new_group([G | Gs], P) = GoalGroup :-
+    (
+        Gs = [],
+        GoalGroup = gg_singleton(G, P)
+    ;
+        Gs = [_ | _],
+        GoalGroup = gg_group(length(Gs)+1, [G | Gs], P)
+    ).
+
 % NOTE: These commented out types are relevant for some work that hasn't been
 % done.  They will either be used or removed in a future change.
 
@@ -1602,20 +1668,30 @@ find_best_parallelisation(Info, Location
     --->    goals_for_parallelisation(
                 gfp_groups                  :: 
                     list(goal_group(goal_classification)),
+                gfp_goals                   ::
+                    list(pard_goal_detail),
 
-                gfp_dependency_maps         :: dependency_maps,
+                gfp_dependency_graphs       :: dependency_graphs,
                 gfp_costly_call_indexes     :: list(int),
                 gfp_num_calls               :: int
             ).
 
+:- inst goals_for_parallelisation
+    --->    goals_for_parallelisation(
+                non_empty_list, ground, ground,
+                non_empty_list,
+                ground).
+
 :- pred preprocess_conjunction(list(pard_goal_detail)::in,
-    goals_for_parallelisation::out) is det.
+    best_par_algorithm::in,
+    goals_for_parallelisation::out(goals_for_parallelisation)) is det.
 
-preprocess_conjunction(Goals0, GoalsForParallelisation) :-
+preprocess_conjunction(Goals0, Algorithm, GoalsForParallelisation) :-
     % Phase 1: Build a dependency map.
-    build_dependency_maps(Goals0, DependencyMaps),
+    build_dependency_graphs(Goals0, DependencyGraphs),
     % Phase 2: Find the costly calls.
-    identify_costly_calls(Goals0, 1, GoalGroups, CostlyCallsIndexes),
+    identify_costly_calls(Goals0, Algorithm, 1, GoalGroups,
+        CostlyCallsIndexes),
    
     % Get the number of calls into this conjunction.
     (
@@ -1630,7 +1706,7 @@ preprocess_conjunction(Goals0, GoalsForP
     ),
 
     GoalsForParallelisation = goals_for_parallelisation(GoalGroups,
-        DependencyMaps, CostlyCallsIndexes, NumCalls).
+        Goals0, DependencyGraphs, CostlyCallsIndexes, NumCalls).
 
     % identify_costly_calls(Goals, 1, GoalGroups, SortedCostlyIndexes).
     %
@@ -1639,13 +1715,13 @@ preprocess_conjunction(Goals0, GoalsForP
     % indexes of the costly calls in the original list (starting at 1).  This
     % predicate is undefined if any of the goals in Goals are non-atomic.
     %
-:- pred identify_costly_calls(list(pard_goal_detail)::in, int::in,
-    list(goal_group(goal_classification))::out(list(gg_singleton)), 
-    list(int)::out) is det. 
-
-identify_costly_calls([], _, [], []).
-identify_costly_calls([Goal | Goals], Index, GoalGroups, Indexes) :-
-    identify_costly_calls(Goals, Index+1, GoalGroups0, Indexes0),
+:- pred identify_costly_calls(list(pard_goal_detail)::in,
+    best_par_algorithm::in, int::in,
+    list(goal_group(goal_classification))::out, list(int)::out) is det.
+
+identify_costly_calls([], _, _, [], []).
+identify_costly_calls([Goal | Goals], Alg, Index, GoalGroups, Indexes) :-
+    identify_costly_calls(Goals, Alg, Index+1, GoalGroups0, Indexes0),
     identify_costly_call(Goal, Costly),
     (
         Costly = is_costly_goal,
@@ -1659,59 +1735,165 @@ identify_costly_calls([Goal | Goals], In
         Costly = is_non_atomic_goal,
         error(this_file ++ "Unexpected pgt_non_atomic_goal")
     ),
+    (
+        Alg = bpa_greedy,
+        % Group cheap goals together.
+        (
+            GoalClassification = gc_cheap_goals,
+            GoalGroups0 = [LastGoalGroup | GoalGroups1Prime],
+            gg_get_details(LastGoalGroup, NumLastGoals, LastGoals,
+                gc_cheap_goals)
+        -> 
+            GoalGroup = gg_group(NumLastGoals+1, [Goal | LastGoals],
+                gc_cheap_goals),
+            GoalGroups1 = GoalGroups1Prime
+        ;
     GoalGroup = gg_singleton(Goal, GoalClassification),
-    GoalGroups = [GoalGroup | GoalGroups0].
+            GoalGroups1 = GoalGroups0
+        )
+    ;
+        Alg = bpa_complete_bnb(_),
+        GoalGroup = gg_singleton(Goal, GoalClassification),
+        GoalGroups1 = GoalGroups0
+    ),
+    GoalGroups = [GoalGroup | GoalGroups1].
 
-:- func parallelisation_get_par_time(best_parallelisation) = float.
+:- type incomplete_parallelisation
+    --->    incomplete_parallelisation(
+                ip_goals_before         :: list(pard_goal_detail),
+                ip_par_conjs            :: list(seq_conj(pard_goal_detail)),
+                ip_par_exec_overlap     :: parallel_execution_overlap,
+                ip_par_exec_metrics     :: parallel_exec_metrics_incomplete,
+                ip_productions_map      :: map(var_rep, float)
+            ).
 
-parallelisation_get_par_time(Parallelisation) = ParTime :-
+:- pred start_building_parallelisation(implicit_parallelism_info::in,
+    int::in, list(pard_goal_detail)::in, incomplete_parallelisation::out) 
+    is det.
+
+start_building_parallelisation(Info, NumCalls, GoalsBefore, Parallelisation)
+        :-
+    SparkCost = Info ^ ipi_opts ^ cpcp_sparking_cost,
+    SparkDelay = Info ^ ipi_opts ^ cpcp_sparking_delay,
+    ContextWakeupDelay = Info ^ ipi_opts ^ cpcp_context_wakeup_delay,
+    foldl(pardgoal_calc_cost, GoalsBefore, 0.0, CostBefore),
+    Metrics = init_empty_parallel_exec_metrics(CostBefore, NumCalls, 
+        float(SparkCost), float(SparkDelay), float(ContextWakeupDelay)),
+    Overlap = peo_empty_conjunct, 
+    Parallelisation = incomplete_parallelisation(GoalsBefore, [], Overlap,
+        Metrics, init).
+
+    % Finalise the metrics and determine if this is the best solution so
+    % far.
+    %
+:- pred finalise_parallelisation(list(pard_goal_detail)::in,
+    incomplete_parallelisation::in, best_parallelisation::out) is det.
+
+finalise_parallelisation(GoalsAfter, !Parallelisation) :-
+    !.Parallelisation = incomplete_parallelisation(GoalsBefore, Conjuncts,
+        Overlap, Metrics0, _),
+    foldl(pardgoal_calc_cost, GoalsAfter, 0.0, CostAfter),
+    Metrics = finalise_parallel_exec_metrics(Metrics0, CostAfter),
+    par_conj_overlap_is_dependant(Overlap, IsDependent),
+    !:Parallelisation = bp_parallel_execution(GoalsBefore, Conjuncts,
+        GoalsAfter, IsDependent, Metrics).
+
+%----------------------------------------------------------------------------%
+
+    % Find the best parallelisation using the branch and bound algorithm.
+    %
+:- pred find_best_parallelisation_complete_bnb(implicit_parallelism_info::in,
+    program_location::in, int::in, goals_for_parallelisation::in,
+    best_parallelisation::out) is det.
+
+find_best_parallelisation_complete_bnb(Info, Location, PartNum,
+        PreprocessedGoals, BestParallelisation) :-
+    PreprocessedGoals = goals_for_parallelisation(GoalGroups, _, 
+        DependencyMaps, CostlyCallsIndexes, NumCalls),
+    ( last(CostlyCallsIndexes, LastCostlyCallIndexPrime) ->
+        LastCostlyCallIndex = LastCostlyCallIndexPrime
+    ;
+        error(this_file ++ "no costly calls found")
+    ),
+    
+    branch_and_bound(
+        generate_parallelisations(Info, Location, PartNum, LastCostlyCallIndex, 
+            NumCalls, DependencyMaps, GoalGroups),
+        parallelisation_get_objective_value,
+        Solutions, Profile),
+    
+    trace [compile_time(flag("debug_branch_and_bound")), io(!IO)] (
+        io.format("D: Find best parallelisation for:\n%s\n",
+            [s(LocationString)], !IO),
+        location_to_string(1, Location, LocationCord),
+        string.append_list(list(LocationCord), LocationString),
+        io.format("D: Problem size: %d Solutions: %d\n",
+            [i(length(GoalGroups)), i(count(Solutions))], !IO),
+        io.format("D: Branch and bound profile: %s\n\n",
+            [s(string(Profile))], !IO),
+        io.flush_output(!IO)
+    ),
+    
+    ( 
+        promise_equivalent_solutions [BestParallelisationPrime]
+        member(BestParallelisationPrime, Solutions) 
+    ->
+        BestParallelisation = BestParallelisationPrime
+    ;
+        ParalleliseDepConjs = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+        (
+            ( ParalleliseDepConjs = parallelise_dep_conjs_overlap
+            ; ParalleliseDepConjs = parallelise_dep_conjs_naive
+            ; ParalleliseDepConjs = parallelise_dep_conjs_num_vars
+            ),
+            error(this_file ++ "Execpted at least one solution from bnb solver")
+        ;
+            ParalleliseDepConjs = do_not_parallelise_dep_conjs,
+            % Try again to get the best dependant parallelisation.  This is
+            % used for guided parallelisation.
+            TempInfo = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs := 
+                parallelise_dep_conjs_overlap,
+            find_best_parallelisation_complete_bnb(TempInfo, Location, PartNum,
+                PreprocessedGoals, BestParallelisation)
+        )
+    ).
+
+    % The objective function for the branch and bound search.  This is ParTime
+    % + ParOverheads * 2.  That is we're willing to pay 1 unit of parallel
+    % overheads to get a 2 unit improvement of parallel execution time.
+    %
+:- func parallelisation_get_objective_value(best_parallelisation) = float.
+
+parallelisation_get_objective_value(Parallelisation) = Value :-
     Metrics = Parallelisation ^ bp_par_exec_metrics,
-    ParTime = parallel_exec_metrics_get_par_time(Metrics).
+    Value = Metrics ^ pem_par_time + Metrics ^ pem_par_overheads * 2.0.
 
 :- semipure pred generate_parallelisations(implicit_parallelism_info::in,
-    program_location::in, int::in, int::in, int::in, dependency_maps::in,
+    program_location::in, int::in, int::in, int::in, dependency_graphs::in,
     list(goal_group(goal_classification))::in, 
     bnb_state(best_parallelisation)::in, best_parallelisation::out) is nondet.
 
 generate_parallelisations(Info, _Location, _PartNum, LastCostlyCallIndex,
         NumCalls, DependencyMaps, !.GoalGroups, BNBState, 
         BestParallelisation) :-
-    SparkCost = Info ^ ipi_opts ^ cpcp_sparking_cost,
-    SparkDelay = Info ^ ipi_opts ^ cpcp_sparking_delay,
-    ContextWakeupDelay = Info ^ ipi_opts ^ cpcp_context_wakeup_delay,
-
     some [!GoalNum, !Parallelisation] (
         !:GoalNum = 1,
         
         generate_parallelisations_goals_before(GoalsBeforeConj, !GoalNum,
             !GoalGroups),
-        foldl(pardgoal_calc_cost, GoalsBeforeConj, 0.0, CostBeforeConj),
-        Metrics0 = init_empty_parallel_exec_metrics(CostBeforeConj, NumCalls, 
-            float(SparkCost), float(SparkDelay), float(ContextWakeupDelay)),
-        !:Parallelisation = incomplete_parallelisation(GoalsBeforeConj, 
-            [], peo_empty_conjunct, Metrics0),
+        start_building_parallelisation(Info, NumCalls, GoalsBeforeConj,
+            !:Parallelisation),
 
         semipure generate_parallelisations_body(Info, DependencyMaps, 0,
-            map.init, BNBState, LastCostlyCallIndex, !GoalNum, !GoalGroups,
+            BNBState, LastCostlyCallIndex, !GoalNum, !GoalGroups,
             !Parallelisation),
 
         generate_parallelisations_goals_after(!.GoalNum, !.GoalGroups,
             GoalsAfterConj),
        
-        Parallelisation = !.Parallelisation
+        finalise_parallelisation(GoalsAfterConj, !.Parallelisation,
+            BestParallelisation)
     ),
-    
-    % Finalise the metrics and determine if this is the best solution so
-    % far.
-    foldl(pardgoal_calc_cost, GoalsAfterConj, 0.0, CostAfterConj),
-    Metrics1 = Parallelisation ^ ip_par_exec_metrics,
-    Metrics = finalise_parallel_exec_metrics(Metrics1, CostAfterConj),
-
-    Conjuncts = Parallelisation ^ ip_par_conjs,
-    Overlap = Parallelisation ^ ip_par_exec_overlap,
-    par_conj_overlap_is_dependant(Overlap, IsDependent),
-    BestParallelisation = bp_parallel_execution(GoalsBeforeConj, Conjuncts,
-        GoalsAfterConj, IsDependent, Metrics),
     semipure test_incomplete_solution(BNBState, BestParallelisation).
 
 :- pred generate_parallelisations_goals_before(list(pard_goal_detail)::out, 
@@ -1752,14 +1934,14 @@ generate_parallelisations_goals_after(Nu
     Goals = NewGoals ++ Goals0.
 
 :- semipure pred generate_parallelisations_body(implicit_parallelism_info::in,
-    dependency_maps::in, int::in, map(var_rep, float)::in,
-    bnb_state(best_parallelisation)::in, int::in, int::in, int::out,
+    dependency_graphs::in, int::in, bnb_state(best_parallelisation)::in,
+    int::in, int::in, int::out,
     list(goal_group(goal_classification))::in, 
     list(goal_group(goal_classification))::out,
     incomplete_parallelisation::in, incomplete_parallelisation::out) is nondet. 
 
 generate_parallelisations_body(Info, DependencyMaps, NumConjuncts0,
-        ProductionsMap0, BNBState, LastCostlyCallIndex, !GoalNum, !GoalGroups,
+        BNBState, LastCostlyCallIndex, !GoalNum, !GoalGroups,
         !Parallelisation) :-
     (
         !.GoalNum > LastCostlyCallIndex
@@ -1777,51 +1959,33 @@ generate_parallelisations_body(Info, Dep
         % calls.
         ( NumConjuncts0 = 0 => LastCostlyCallIndex >= !.GoalNum ),
         
-        Conjs0 = !.Parallelisation ^ ip_par_conjs,
-        Conjs = Conjs0 ++ [seq_conj(ParConjGoals)],
-        Metrics0 = !.Parallelisation ^ ip_par_exec_metrics,
-        Overlap0 = !.Parallelisation ^ ip_par_exec_overlap,
         ( LastCostlyCallIndex >= !.GoalNum ->
             IsInnermostConjunct = is_not_innermost_conjunct
         ;
             IsInnermostConjunct = is_innermost_conjunct
         ),
         calculate_parallel_cost_step(Info, IsInnermostConjunct, NumConjuncts0,
-            ParConjGoals, ProductionsMap0, ProductionsMap, Metrics0, Metrics, 
-            Overlap0, Overlap),
-        (
-            Overlap = peo_empty_conjunct,
-            DepVars = set.init
-        ;
-            Overlap = peo_conjunction(_, _, DepVars)
-        ),
+            ParConjGoals, !Parallelisation),
+        Overlap = !.Parallelisation ^ ip_par_exec_overlap, 
         
         % Reject parallelisations that have dependant variables if we're not
         % allowed to create such parallelisations.
         ParalleliseDepConjs = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+        par_conj_overlap_is_dependant(Overlap, IsDependent),
         (
-            ParalleliseDepConjs = do_not_parallelise_dep_conjs,
-            set.empty(DepVars)
-        ;
-            ParalleliseDepConjs = parallelise_dep_conjs
+            ParalleliseDepConjs = do_not_parallelise_dep_conjs
+        =>
+            IsDependent = conjuncts_are_independent
         ),
         
-        MetricsComplete = finalise_parallel_exec_metrics(Metrics, 0.0),
-        par_conj_overlap_is_dependant(Overlap, IsDependent),
-        Conjuncts = !.Parallelisation ^ ip_par_conjs,
-        GoalsBeforeConj = !.Parallelisation ^ ip_goals_before,
-        IncompleteParallelisation = bp_parallel_execution(GoalsBeforeConj,
-            Conjuncts, [], IsDependent, MetricsComplete),
-        semipure test_incomplete_solution(BNBState, IncompleteParallelisation),
+        % Test the corrent solution.
+        finalise_parallelisation([], !.Parallelisation, TestParallelisation),
+        semipure test_incomplete_solution(BNBState, TestParallelisation),
 
         NumConjuncts = NumConjuncts0 + 1,
-        !Parallelisation ^ ip_par_exec_overlap := Overlap,
-        !Parallelisation ^ ip_par_exec_metrics := Metrics,
-        !Parallelisation ^ ip_par_conjs := Conjs,
-
         semipure generate_parallelisations_body(Info, DependencyMaps, 
-            NumConjuncts, ProductionsMap, BNBState, LastCostlyCallIndex,
-            !GoalNum, !GoalGroups, !Parallelisation)
+            NumConjuncts, BNBState, LastCostlyCallIndex, !GoalNum, !GoalGroups,
+            !Parallelisation)
     ).
 
 :- pred generate_parallel_conjunct(
@@ -1854,6 +2018,342 @@ generate_parallel_conjunct(Goals, !GoalN
     ),
     Goals = NewGoals ++ Goals0.
 
+%----------------------------------------------------------------------------%
+
+    % The greedy algorithm for finding the best parallelisation of a
+    % conjunction.
+    %
+:- pred find_best_parallelisation_greedy(implicit_parallelism_info::in,
+    program_location::in, int::in, 
+    goals_for_parallelisation::in(goals_for_parallelisation), 
+    best_parallelisation::out) is det.
+
+find_best_parallelisation_greedy(Info, _Location, _PartNum,
+        PreprocessedGoals, !:Parallelisation) :-
+    some [!GoalGroups, !ConjNum] (
+        PreprocessedGoals = goals_for_parallelisation(!:GoalGroups, _,
+            DependencyMaps, CostlyCallIndexes, NumCalls),
+        !:ConjNum = 1,
+
+        !.GoalGroups = [ FirstGroup | !:GoalGroups ],
+        gg_get_details(FirstGroup, _, FirstGoals, FirstClassification),
+        (
+            FirstClassification = gc_cheap_goals,
+            build_goals_before_greedy(Info, DependencyMaps, CostlyCallIndexes,
+                GoalsBeforeConj, !ConjNum, FirstGoals, RemainingGoals),
+            (
+                RemainingGoals = []
+            ;
+                RemainingGoals = [_ | _],
+                !:GoalGroups = 
+                    [new_group(RemainingGoals, gc_cheap_goals) | !.GoalGroups]
+            )
+        ;
+            FirstClassification = gc_costly_goals,
+            !:GoalGroups = [ FirstGroup | !.GoalGroups ],
+            GoalsBeforeConj = []
+        ),
+        start_building_parallelisation(Info, NumCalls, GoalsBeforeConj,
+            !:Parallelisation),
+
+        build_parallel_conjuncts_greedy(Info, DependencyMaps,
+            CostlyCallIndexes, 0, [], [], LastParConj, !ConjNum,
+            !GoalGroups, !Parallelisation),
+   
+        (
+            !.GoalGroups = [LastGroup | EmptyGroups],
+            require(unify(EmptyGroups, []), "EmptyGroups \\= []"),
+            gg_get_details(LastGroup, _, LastGoals0, LastClassification),
+            require(unify(LastClassification, gc_cheap_goals), 
+                "LastClassification \\= gc_cheap_goals")
+        ;
+            !.GoalGroups = [],
+            LastGoals0 = []
+        ),
+        
+        build_goals_after_greedy(Info, !.ConjNum, LastParConj,
+            LastGoals0, LastGoals, !Parallelisation),
+        finalise_parallelisation(LastGoals, !Parallelisation)
+    ).
+
+    % build_goals_before_greedy(Info, Deps, CostlyCallIndexes,
+    %   GoalsBefore, GoalsWithin, !GoalNum, !Goals).
+    %
+    % Take GoalsBefore goals from !Goals, that should be run sequentially
+    % before the parallelisation begins.  !.GaolNum gives the index of the
+    % first goal in !.Goals.
+    %
+:- pred build_goals_before_greedy(implicit_parallelism_info::in, 
+    dependency_graphs::in, list(int)::in(non_empty_list),
+    list(pard_goal_detail)::out, int::in, int::out, 
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out) is det.
+
+build_goals_before_greedy(Info, DependencyMaps, CostlyCallIndexes,
+        GoalsBefore, !ConjNum, !Goals) :-
+    build_goals_par_before_first_call(Info, DependencyMaps, CostlyCallIndexes,
+        [], GoalsBeforeRev, [], GoalsParRev, !ConjNum, !Goals),
+    reverse(GoalsBeforeRev, GoalsBefore),
+    !:Goals = reverse(GoalsParRev) ++ !.Goals,
+    !:ConjNum = !.ConjNum - length(GoalsParRev).
+
+:- pred build_goals_par_before_first_call(implicit_parallelism_info::in,
+    dependency_graphs::in, list(int)::in(non_empty_list),
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out,
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out,
+    int::in, int::out,
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out) is det.
+
+build_goals_par_before_first_call(Info, DependencyMaps, CostlyCallIndexes,
+        !GoalsBeforeRev, !GoalsParRev, !ConjNum, !Goals) :-
+    Goals0 = !.Goals,
+    (
+        !.Goals = [Goal | !:Goals],
+        CostlyCallIndexes = [FirstCostlyCallIndex | OtherCostlyCallIndexes],
+        (
+            !.ConjNum >= FirstCostlyCallIndex
+        ->
+            % This goal is the first costly call.  This group mustn't be
+            % included in the goals before the parallel conjunction.
+            % Put it back before we return.
+            !:Goals = Goals0
+        ;
+            depends_lookup_tc(DependencyMaps, !.ConjNum, DepsTC),
+            depends_lookup(DependencyMaps, !.ConjNum, Deps),
+            (
+                (
+                    % This goal provides a direct dependency to a costly call
+                    % other than the first costly call.  It must be scheduled
+                    % before the parallel conjunction.
+                    some [Dep] (
+                        member(Dep, Deps),
+                        member(Dep, OtherCostlyCallIndexes)
+                    )
+                ;
+                    % This goal provides a dependency to a costly call other
+                    % than the first and does not provide a dependency to the
+                    % first.
+                    some [Dep] (
+                        member(Dep, DepsTC),
+                        member(Dep, OtherCostlyCallIndexes)
+                    )
+                    % XXX: Comment this out.  This means that we avoid some
+                    % dependencies at the cost of not generating some more
+                    % optimal solutions.  What we actually want here is to
+                    % sequentialize this goal if it provides a transitive
+                    % dependency to a later call that the first costly call is
+                    % _not_ on the same dependecy path.
+                    %not (
+                    %    member(FirstCostlyCallIndex, DepsTC)
+                    %)
+                )
+            ->
+                % Schedule this goal and all of !.GoalsParRev before the
+                % paralle conjunction.
+                !:GoalsBeforeRev = [Goal | !.GoalsParRev ++ !.GoalsBeforeRev],
+                !:GoalsParRev = []
+            ;
+                !:GoalsParRev = [Goal | !.GoalsParRev]
+            ),
+            !:ConjNum = !.ConjNum + 1,
+            build_goals_par_before_first_call(Info, DependencyMaps,
+                CostlyCallIndexes, !GoalsBeforeRev, !GoalsParRev, !ConjNum,
+                !Goals)
+        )
+    ;
+        !.Goals = []
+    ).
+
+:- pred build_parallel_conjuncts_greedy(implicit_parallelism_info::in,
+    dependency_graphs::in, list(int)::in, int::in, list(pard_goal_detail)::in,
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out, int::in, int::out, 
+    list(goal_group(goal_classification))::in, 
+    list(goal_group(goal_classification))::out, 
+    incomplete_parallelisation::in, incomplete_parallelisation::out) is det.
+
+build_parallel_conjuncts_greedy(Info, DependencyMaps, CostlyCallIndexes,
+        NumConjuncts0, InbetweenGoals0, !LastParConj, !GoalNum, !GoalGroups,
+        !Parallelisation) :-
+    (
+        !.GoalGroups = [GoalGroup | !:GoalGroups],
+        gg_get_details(GoalGroup, NumGoals, Goals, Classification),
+        
+        (
+            Classification = gc_cheap_goals,
+            % These cheap goals may be parallelised in between some other goals
+            % once we find a costly goal.
+            InbetweenGoals = InbetweenGoals0 ++ Goals, 
+            NumConjuncts = NumConjuncts0
+        ;
+            Classification = gc_costly_goals,
+            % Find the best construction of the most recent two parallel
+            % conjuncts.
+            (
+                !.LastParConj = [],
+                % There is no last parallel conjunction, the current goals will
+                % become the last one and the decision will be deffered.
+                !:LastParConj = InbetweenGoals0 ++ Goals,
+                InbetweenGoals = [],
+                NumConjuncts = NumConjuncts0
+            ;
+                !.LastParConj = [_ | _],
+                build_parallel_conjuncts_greedy_pair(Info,
+                    InbetweenGoals0, Goals, NumConjuncts0, !LastParConj, 
+                    !Parallelisation),
+                InbetweenGoals = [],
+                NumConjuncts = NumConjuncts0 + 1
+            )
+        ),
+        !:GoalNum = !.GoalNum + NumGoals,
+        build_parallel_conjuncts_greedy(Info, DependencyMaps, CostlyCallIndexes,
+            NumConjuncts, InbetweenGoals, !LastParConj, !GoalNum, !GoalGroups,
+            !Parallelisation)
+    ;
+        !.GoalGroups = [],
+        (
+            InbetweenGoals0 = [_ | _],
+            !:GoalGroups = [new_group(InbetweenGoals0, gc_cheap_goals)]
+        ;
+            InbetweenGoals0 = []
+        )
+    ).
+
+:- pred build_parallel_conjuncts_greedy_pair(implicit_parallelism_info::in,
+    list(pard_goal_detail)::in, list(pard_goal_detail)::in, int::in,
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out,
+    incomplete_parallelisation::in, incomplete_parallelisation::out) is det.
+
+build_parallel_conjuncts_greedy_pair(Info, InbetweenGoals, 
+        Goals, ConjunctNum, !LastParConj, !Parallelisation) :-
+    ParalleliseDepConjs = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+    branch_and_bound(
+        (semipure pred(_State::in, Parallelisation::out) is nondet :-
+            append(GoalsA, GoalsB, InbetweenGoals),
+            
+            ParConjA = !.LastParConj ++ GoalsA,
+            calculate_parallel_cost_step(Info, is_not_innermost_conjunct,
+                ConjunctNum, ParConjA, !.Parallelisation, Parallelisation0),
+            Overlap0 = Parallelisation0 ^ ip_par_exec_overlap,
+            par_conj_overlap_is_dependant(Overlap0, IsDependent0),
+            (
+                ParalleliseDepConjs = do_not_parallelise_dep_conjs
+            =>
+                IsDependent0 = conjuncts_are_independent
+            ),
+            
+            ParConjB = GoalsB ++ Goals,
+            calculate_parallel_cost_step(Info, is_not_innermost_conjunct,
+                ConjunctNum + 1, ParConjB, Parallelisation0, Parallelisation1),
+            Overlap1 = Parallelisation1 ^ ip_par_exec_overlap,
+            par_conj_overlap_is_dependant(Overlap1, IsDependent1),
+            (
+                ParalleliseDepConjs = do_not_parallelise_dep_conjs
+            =>
+                IsDependent1 = conjuncts_are_independent
+            ),
+
+            finalise_parallelisation([], Parallelisation1, Parallelisation)
+        ),
+        parallelisation_get_objective_value, Parallelisations, _),
+    (
+        promise_equivalent_solutions [BestParallelisation]
+        member(BestParallelisation, Parallelisations)
+    ->
+        ParConjsRev = reverse(BestParallelisation ^ bp_par_conjs),
+        ( 
+            ParConjsRev = 
+                [seq_conj(LastParConjPrime) | [seq_conj(SndLastParConj) | _]] 
+        ->
+            % Commit to the second last parallel conjunction.
+            calculate_parallel_cost_step(Info, is_not_innermost_conjunct,
+                ConjunctNum, SndLastParConj, !Parallelisation),         
+ 
+            % Save the last parallel conjunction as part of the next one.
+            !:LastParConj = LastParConjPrime 
+        ;
+            error(this_file ++ "Expected at least 2 parallel conjuncts")
+        )
+    ;
+        (
+            ( ParalleliseDepConjs = parallelise_dep_conjs_overlap
+            ; ParalleliseDepConjs = parallelise_dep_conjs_num_vars
+            ; ParalleliseDepConjs = parallelise_dep_conjs_naive
+            ),
+            error(this_file ++ "Execpted at least one solution from bnb solver")
+        ;
+            ParalleliseDepConjs = do_not_parallelise_dep_conjs,
+            % Try again to get the best dependant parallelisation.  This is
+            % used for guided parallelisation.
+            TempInfo = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs := 
+                parallelise_dep_conjs_overlap,
+            build_parallel_conjuncts_greedy_pair(TempInfo, InbetweenGoals,
+                Goals, ConjunctNum, !LastParConj, !Parallelisation)
+        )
+    ). 
+
+:- pred build_goals_after_greedy(implicit_parallelism_info::in, 
+    int::in, list(pard_goal_detail)::in, 
+    list(pard_goal_detail)::in, list(pard_goal_detail)::out, 
+    incomplete_parallelisation::in, incomplete_parallelisation::out) is det.
+
+build_goals_after_greedy(Info, ConjunctNum, !.LastParConj, !LastGoals,
+        !Parallelisation) :-
+    ParalleliseDepConjs = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+    
+    branch_and_bound(
+        (semipure pred(_State::in, (Parallelisation - LastGoals)::out) 
+                is nondet :-
+            append(GoalsPar, LastGoals, !.LastGoals),
+            ParConjTest = !.LastParConj ++ GoalsPar,
+            calculate_parallel_cost_step(Info, is_innermost_conjunct,
+                ConjunctNum, ParConjTest, !.Parallelisation, Parallelisation0),
+            Overlap0 = Parallelisation0 ^ ip_par_exec_overlap,
+            par_conj_overlap_is_dependant(Overlap0, IsDependent0),
+            (
+                ParalleliseDepConjs = do_not_parallelise_dep_conjs
+            =>
+                IsDependent0 = conjuncts_are_independent
+            ),
+            finalise_parallelisation([], Parallelisation0, Parallelisation)
+        ),
+        (func(Parallelisation - _) =
+            parallelisation_get_objective_value(Parallelisation)),
+        Parallelisations, _),
+    
+    (
+        promise_equivalent_solutions [BestParallelisationPair]
+        member(BestParallelisationPair, Parallelisations)
+    ->
+        BestParallelisationPair = BestParallelisation - !:LastGoals,
+        ParConjsRev = reverse(BestParallelisation ^ bp_par_conjs),
+        ( 
+            ParConjsRev = [seq_conj(LastParConjPrime) | _] 
+        ->
+            % Commit to the last parallel conjunction.
+            calculate_parallel_cost_step(Info, is_innermost_conjunct,
+                ConjunctNum, LastParConjPrime, !Parallelisation)
+        ;
+            error(this_file ++ "Expected at least 1 parallel conjunct")
+        )
+    ;
+        (
+            ( ParalleliseDepConjs = parallelise_dep_conjs_overlap
+            ; ParalleliseDepConjs = parallelise_dep_conjs_num_vars
+            ; ParalleliseDepConjs = parallelise_dep_conjs_naive
+            ),
+            error(this_file ++ "Execpted at least one solution from bnb solver")
+        ;
+            ParalleliseDepConjs = do_not_parallelise_dep_conjs,
+            % Try again to get the best dependant parallelisation.  This is
+            % used for guided parallelisation.
+            TempInfo = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs := 
+                parallelise_dep_conjs_overlap,
+            build_goals_after_greedy(TempInfo, ConjunctNum, !.LastParConj,
+                !LastGoals, !Parallelisation)
+        )
+    ). 
+
+%----------------------------------------------------------------------------%
+
     % This datastructure represents the execution of dependant conjuncts, it
     % tracks which variabes are produced and consumed.
     %
@@ -1888,7 +2388,7 @@ generate_parallel_conjunct(Goals, !GoalN
     --->    is_innermost_conjunct
     ;       is_not_innermost_conjunct.
 
-    % calculate_parallel_cost(Info, Conjunctions, IsOutermostConjunct,
+    % calculate_parallel_cost(Info, Conjunctions, IsInnermostConjunct,
     %   ParallelExecMetrics, ParallelExecOverlap, ProductionsMap, NumConjuncts).
     %
     % Analyse the parallel conjuncts and determine their likely performance.
@@ -1909,17 +2409,22 @@ generate_parallel_conjunct(Goals, !GoalN
     %
 :- pred calculate_parallel_cost_step(implicit_parallelism_info::in,
     is_innermost_conjunct::in, int::in, list(pard_goal_detail)::in, 
-    map(var_rep, float)::in, map(var_rep, float)::out,
-    parallel_exec_metrics_incomplete::in, parallel_exec_metrics_incomplete::out,
-    parallel_execution_overlap::in, parallel_execution_overlap::out) is det.
+    incomplete_parallelisation::in, incomplete_parallelisation::out) is det.
 
 calculate_parallel_cost_step(Info, IsInnermostConjunct, NumConjuncts, Goals,
-        !ProductionsMap, !Metrics, !Overlap) :-
+        !Parallelisation) :-
+    Conjs0 = !.Parallelisation ^ ip_par_conjs,
+    Conjs = Conjs0 ++ [seq_conj(Goals)],
+    !Parallelisation ^ ip_par_conjs := Conjs,
+    Algorithm = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
+   
+    ProductionsMap0 = !.Parallelisation ^ ip_productions_map,
+
     foldl(pardgoal_calc_cost, Goals, 0.0, CostB),
     foldl(pardgoal_consumed_vars_accum, Goals, set.init,
         RightConsumedVars),
     ProducedVars = 
-        set.from_sorted_list(map.sorted_keys(!.ProductionsMap)),
+        set.from_sorted_list(map.sorted_keys(ProductionsMap0)),
     Vars = set.intersect(ProducedVars, RightConsumedVars),
 
     % This conjunct will actually start after it has been sparked by
@@ -1941,6 +2446,9 @@ calculate_parallel_cost_step(Info, IsInn
         StartTime = StartTime0
     ),
 
+    (
+        Algorithm = parallelise_dep_conjs_overlap,
+
     % Get the list of variables consumed by this conjunct that will be
     % turned into futures.
     foldl3(get_consumptions_list, Goals, Vars, _, 0.0, _, 
@@ -1948,7 +2456,7 @@ calculate_parallel_cost_step(Info, IsInn
     reverse(ConsumptionsList0, ConsumptionsList),
 
     % Determine how the parallel conjuncts overlap.
-    foldl5(calculate_dependant_parallel_cost_2(Info, !.ProductionsMap), 
+        foldl5(calculate_dependant_parallel_cost_2(Info, ProductionsMap0), 
         ConsumptionsList, 0.0, LastSeqConsumeTime, 
         StartTime, LastParConsumeTime, StartTime, LastResumeTime, 
         [], RevExecution0, map.init, ConsumptionsMap),
@@ -1960,19 +2468,36 @@ calculate_parallel_cost_step(Info, IsInn
     RevExecution = [ (LastResumeTime - CostBPar) | RevExecution0 ],
     
     CostSignals = 
-        float(Info ^ ipi_opts ^ cpcp_future_signal_cost * count(Vars)),
-    !:Metrics = init_parallel_exec_metrics_incomplete(!.Metrics, CostSignals,
+            float(Info ^ ipi_opts ^ cpcp_future_signal_cost * count(Vars))
+    ;
+        ( Algorithm = parallelise_dep_conjs_naive
+        ; Algorithm = do_not_parallelise_dep_conjs
+        ; Algorithm = parallelise_dep_conjs_num_vars
+        ),
+        
+        CostBPar = StartTime + CostB,
+        Execution = [StartTime - CostBPar],
+        ConsumptionsMap = init,
+        CostSignals = 0.0
+    ),
+
+    Metrics0 = !.Parallelisation ^ ip_par_exec_metrics,
+    Metrics = init_parallel_exec_metrics_incomplete(Metrics0, CostSignals,
         CostB, CostBPar),
+    !Parallelisation ^ ip_par_exec_metrics := Metrics,
     
     % Build the productions map for our parent.  This map contains all
     % the variables produced by this code, not just that are used for
     % dependant parallelisation.
     foldl3(get_productions_map, Goals, StartTime, _, Execution, _,
-        !ProductionsMap),
+        ProductionsMap0, ProductionsMap),
+    !Parallelisation ^ ip_productions_map := ProductionsMap,
 
     DepConjExec = dependant_conjunct_execution(Execution,
-        !.ProductionsMap, ConsumptionsMap),
-    !:Overlap = peo_conjunction(!.Overlap, DepConjExec, Vars).
+        ProductionsMap, ConsumptionsMap),
+    Overlap0 = !.Parallelisation ^ ip_par_exec_overlap,
+    Overlap = peo_conjunction(Overlap0, DepConjExec, Vars),
+    !Parallelisation ^ ip_par_exec_overlap := Overlap.
 
     % calculate_dependant_parallel_cost_2(Info, ProductionsMap, 
     %   Var - SeqConsTime, !PrevSeqConsumeTime, !PrevParConsumeTime,
@@ -2060,11 +2585,19 @@ calculate_dependant_parallel_cost_2(Info
     conjuncts_are_dependant::out) is det.
 
 par_conj_overlap_is_dependant(peo_empty_conjunct, conjuncts_are_independent).
-par_conj_overlap_is_dependant(peo_conjunction(Left, _, VarSet), IsDependent) :-
-    ( set.empty(VarSet) ->
-        par_conj_overlap_is_dependant(Left, IsDependent)
+par_conj_overlap_is_dependant(peo_conjunction(Left, _, VarSet0), IsDependent) :-
+    par_conj_overlap_is_dependant(Left, IsDependent0),
+    (
+        IsDependent0 = conjuncts_are_dependant(VarSetLeft),
+        VarSet = union(VarSet0, VarSetLeft),
+        IsDependent = conjuncts_are_dependant(VarSet)
+    ;
+        IsDependent0 = conjuncts_are_independent,
+        ( set.empty(VarSet0) ->
+            IsDependent = conjuncts_are_independent
     ;
-        IsDependent = conjuncts_are_dependant
+            IsDependent = conjuncts_are_dependant(VarSet0)
+        )
     ).
 
 :- pred pardgoal_calc_cost(pard_goal_detail::in, float::in, float::out) 
@@ -2093,40 +2626,59 @@ pardgoal_calc_cost(Goal, !Cost) :-
         error(this_file ++ "unexpected non atomic goal")
     ).
 
-:- type dependency_maps
-    ---> dependency_maps(
-            % XXX: I'm pretty certain that these are transitive closures.
-            dm_depends_on           :: map(int, set(int)),
-                % This map maps from conjunct numbers to the conjunct numbers
-                % of goals that they depend upon.
-
-            dm_is_depended_on_by    :: map(int, set(int))
-                % This is the reverse dependency map.  It maps from a dependee
-                % to conjunct numbers that depend on this goal.
-         ).
-
-:- pred build_dependency_maps(list(pard_goal_detail)::in, 
-    dependency_maps::out) is det.
-
-build_dependency_maps(Goals, Maps) :-
-    length(Goals, GoalsLen),
-    % Both maps are initialised equally.
-    fold_up(insert_empty_set, 1, GoalsLen, map.init, InitialisedMap), 
-    build_dependency_map(Goals, 1, map.init, _VarDepMap, 
-        InitialisedMap, Map, InitialisedMap, RevMap),
-    Maps = dependency_maps(Map, RevMap).
-
-:- pred depends_lookup(dependency_maps::in, int::in, set(int)::out) is det.
-
-depends_lookup(DependencyMaps, GoalNum, Dependencies) :-
-    Map = DependencyMaps ^ dm_depends_on,
-    lookup(Map, GoalNum, Dependencies).
-
-:- pred depends_lookup_rev(dependency_maps::in, int::in, set(int)::out) is det.
-
-depends_lookup_rev(DependencyMaps, GoalNum, Dependencies) :-
-    RevMap = DependencyMaps ^ dm_is_depended_on_by,
-    lookup(RevMap, GoalNum, Dependencies).
+:- type dependency_graphs
+    ---> dependency_graphs(
+            dm_forward              :: digraph(int),
+            dm_forward_tc           :: digraph(int)
+        ).
+
+:- pred build_dependency_graphs(list(pard_goal_detail)::in, 
+    dependency_graphs::out) is det.
+
+build_dependency_graphs(Goals, Maps) :-
+    Graph0 = digraph.init,
+    build_dependency_graph(Goals, 1, map.init, _VarDepMap, 
+        Graph0, Graph),
+    Maps = dependency_graphs(Graph, tc(Graph)).
+
+:- pred depends_lookup(dependency_graphs::in, int::in, set(int)::out) is det.
+
+depends_lookup(DependencyGraphs, GoalNum, Deps) :-
+    graph_do_lookup(lookup_from, DependencyGraphs ^ dm_forward, GoalNum, 
+        Deps).
+
+:- pred depends_lookup_tc(dependency_graphs::in, int::in, set(int)::out) is det.
+
+depends_lookup_tc(DependencyGraphs, GoalNum, Deps) :-
+    graph_do_lookup(lookup_from, DependencyGraphs ^ dm_forward_tc, GoalNum, 
+        Deps).
+
+:- pred depends_lookup_rev(dependency_graphs::in, int::in, set(int)::out) 
+    is det.
+
+depends_lookup_rev(DependencyGraphs, GoalNum, RevDeps) :-
+    graph_do_lookup(lookup_to, DependencyGraphs ^ dm_forward, GoalNum,
+        RevDeps).
+
+:- pred depends_lookup_tc_rev(dependency_graphs::in, int::in, set(int)::out) 
+    is det.
+
+depends_lookup_tc_rev(DependencyGraphs, GoalNum, RevDeps) :-
+    graph_do_lookup(lookup_to, DependencyGraphs ^ dm_forward_tc, GoalNum,
+        RevDeps).
+
+    % Abstract code for querying a graph for a goal dependency.
+    %
+:- pred graph_do_lookup(
+    pred(digraph(int), digraph_key(int), set(digraph_key(int))),
+    digraph(int), int, set(int)).
+:- mode graph_do_lookup(
+    pred(in, in, out) is det,
+    in, in, out) is det.
+
+graph_do_lookup(Lookup, Graph, GoalNum, Deps) :-
+    Lookup(Graph, lookup_key(Graph, GoalNum), DepsKeys),
+    Deps = set(map(lookup_vertex(Graph), to_sorted_list(DepsKeys))).
 
 :- pred insert_empty_set(int::in, 
     map(int, set(int))::in, map(int, set(int))::out) is det.
@@ -2134,13 +2686,12 @@ depends_lookup_rev(DependencyMaps, GoalN
 insert_empty_set(K, !Map) :-
     svmap.det_insert(K, set.init, !Map).
 
-:- pred build_dependency_map(list(pard_goal_detail)::in, int::in, 
-    map(var_rep, set(int))::in, map(var_rep, set(int))::out,
-    map(int, set(int))::in, map(int, set(int))::out,
-    map(int, set(int))::in, map(int, set(int))::out) is det.
+:- pred build_dependency_graph(list(pard_goal_detail)::in, int::in, 
+    map(var_rep, int)::in, map(var_rep, int)::out,
+    digraph(int)::in, digraph(int)::out) is det.
 
-build_dependency_map([], _ConjNum, !VarDepMap, !Map, !RevMap).
-build_dependency_map([PG | PGs], ConjNum, !VarDepMap, !Map, !RevMap) :-
+build_dependency_graph([], _ConjNum, !VarDepMap, !Graph).
+build_dependency_graph([PG | PGs], ConjNum, !VarDepMap, !Graph) :-
     InstMapInfo = PG ^ goal_annotation ^ pgd_inst_map_info,
     
     % For each variable consumed by a goal we find out which goals instantiate
@@ -2149,29 +2700,22 @@ build_dependency_map([PG | PGs], ConjNum
     % and not those that are set.  This is safe because we only bother
     % analysing single assignment code.
     RefedVars = InstMapInfo ^ im_consumed_vars,
-    InstVars = InstMapInfo ^ im_bound_vars,
-    list.map((pred(RefedVar::in, DepConjsI::out) is det :-
-        map.search(!.VarDepMap, RefedVar, DepConjsPrime) -> 
-            DepConjsI = DepConjsPrime
-        ;
-            % If we consume a variable that we don't know the producer of, it
-            % may be a parameter to the procedure or have been produced by a
-            % goal outside of this conjunction.  Add an empty set of
-            % dependencies.
-            set.init(DepConjsI)
-        ), to_sorted_list(RefedVars), DepConjss),
-    DepConjs = union_list(DepConjss),
-    fold(transitive_map_insert(ConjNum), DepConjs, !Map),
-    fold((pred(K::in, MapI0::in, MapI::out) is det :-
-            transitive_map_insert(K, ConjNum, MapI0, MapI)
-        ), DepConjs, !RevMap), 
+    list.foldl((pred(RefedVar::in, GraphI0::in, GraphI::out) is det :-
+        map.search(!.VarDepMap, RefedVar, DepConj) ->
+            digraph.add_vertices_and_edge(DepConj, ConjNum, GraphI0, GraphI)
+        ;
+            GraphI = GraphI0
+        ), to_sorted_list(RefedVars), !Graph),
     
     % For each variable instantiated by this goal add it to the VarDepMap with
     % this goal as it's instantiator.  That is a maping from the variable to
-    % the conj num.  The var to conjnum map is not transitive.
-    fold(add_var_to_var_dep_map(ConjNum), InstVars, !VarDepMap),
+    % the conj num.
+    InstVars = InstMapInfo ^ im_bound_vars,
+    fold((pred(InstVar::in, MapI0::in, MapI::out) is det :-
+            svmap.det_insert(InstVar, ConjNum, MapI0, MapI)
+        ), InstVars, !VarDepMap),
 
-    build_dependency_map(PGs, ConjNum + 1, !VarDepMap, !Map, !RevMap).
+    build_dependency_graph(PGs, ConjNum + 1, !VarDepMap, !Graph).
 
 :- pred add_var_to_var_dep_map(int::in, var_rep::in, 
     map(var_rep, set(int))::in, map(var_rep, set(int))::out) is det. 
@@ -2578,7 +3122,7 @@ goal_to_pard_goal(Info, GoalPath0, Step,
             GoalExpr = negation_rep(SubGoal)
         ; 
             GoalExpr0 = scope_rep(SubGoal0, MaybeCut),
-            goal_to_pard_goal(Info, GoalPath, func(_) = step_neg,
+            goal_to_pard_goal(Info, GoalPath, func(_) = step_scope(MaybeCut),
                 SubGoal0, SubGoal, 1, _),
             GoalExpr = scope_rep(SubGoal, MaybeCut)
         ),
@@ -3057,70 +3601,102 @@ create_candidate_parallel_conj_proc_repo
 create_candidate_parallel_conj_report(VarTable, Proc, CandidateParConjunction,
         Report) :-
     print_proc_label_to_string(Proc, ProcString),
-    CandidateParConjunction = candidate_par_conjunction(GoalPath,
-        PartNum, IsDependant, GoalsBefore, Conjs, GoalsAfter, ParExecMetrics),
-    NumCalls = parallel_exec_metrics_get_num_calls(ParExecMetrics),
+    CandidateParConjunction = candidate_par_conjunction(GoalPathString,
+        PartNum, FirstConjNum, IsDependant, GoalsBefore, Conjs, GoalsAfter,
+        ParExecMetrics),
+    ParExecMetrics = parallel_exec_metrics(NumCalls, SeqTime, ParTime,
+        ParOverheads, FirstConjDeadTime, FutureDeadTime),
+    
     (
         IsDependant = conjuncts_are_independent,
         DependanceString = "no"
     ;
-        IsDependant = conjuncts_are_dependant,
-        DependanceString = "yes"
+        IsDependant = conjuncts_are_dependant(Vars),
+        map(lookup_var_name(VarTable), Vars, VarNames),
+        VarsString = join_list(", ", to_sorted_list(VarNames)),
+        DependanceString = format("on %s", [s(VarsString)])
     ),
-    SeqTime = parallel_exec_metrics_get_seq_time(ParExecMetrics),
-    ParTime = parallel_exec_metrics_get_par_time(ParExecMetrics),
     Speedup = parallel_exec_metrics_get_speedup(ParExecMetrics),
     TimeSaving = parallel_exec_metrics_get_time_saving(ParExecMetrics),
-    FirstConjDeadTime =
-        parallel_exec_metrics_get_first_conj_dead_time(ParExecMetrics),
-    FutureDeadTime =
-        parallel_exec_metrics_get_future_dead_time(ParExecMetrics),
-    TotalDeadTime = parallel_exec_metrics_get_total_dead_time(ParExecMetrics),
+    TotalDeadTime = FirstConjDeadTime + FutureDeadTime,
     format("      %s\n" ++
            "      Path and Partition Num: %s, %d\n" ++
            "      Dependant: %s\n" ++
            "      NumCalls: %d\n" ++
            "      SeqTime: %f\n" ++
            "      ParTime: %f\n" ++
+           "      ParOverheads: %f\n" ++
            "      Speedup: %f\n" ++
            "      Time saving: %f\n" ++
            "      First conj dead time: %f\n" ++
            "      Future dead time: %f\n" ++
            "      Total dead time: %f\n\n", 
-        [s(ProcString), s(GoalPath), i(PartNum), s(DependanceString),
-            i(NumCalls), f(SeqTime), f(ParTime), f(Speedup), f(TimeSaving), 
-            f(FirstConjDeadTime), f(FutureDeadTime), f(TotalDeadTime)],
+        [s(ProcString), s(GoalPathString), i(PartNum), s(DependanceString),
+            i(NumCalls), f(SeqTime), f(ParTime), f(ParOverheads), f(Speedup),
+            f(TimeSaving), f(FirstConjDeadTime), f(FutureDeadTime),
+            f(TotalDeadTime)],
         ReportHeaderStr),
     ReportHeader = singleton(ReportHeaderStr),
 
-    format_sequential_conjuncts(VarTable, 3, GoalsBefore, empty, 
-        ReportGoalsBefore),
-    format_parallel_conjunction(VarTable, 3, Conjs, ReportParConj),
-    format_sequential_conjuncts(VarTable, 3, GoalsAfter, empty, 
-        ReportGoalsAfter),
+    ( goal_path_from_string(GoalPathString, GoalPathPrime) ->
+        GoalPath = GoalPathPrime
+    ;
+        error(this_file ++ "couldn't parse goal path.")
+    ),
+    some [!ConjNum] (
+        !:ConjNum = FirstConjNum,
+        format_sequential_conjuncts(VarTable, 3, GoalPath, GoalsBefore, 
+            !ConjNum, empty, ReportGoalsBefore),
+        format_parallel_conjunction(VarTable, 3, GoalPath, !.ConjNum, Conjs,
+            ReportParConj),
+        !:ConjNum = !.ConjNum + 1,
+        format_sequential_conjuncts(VarTable, 3, GoalPath, GoalsAfter,
+            !ConjNum, empty, ReportGoalsAfter),
+        _ = !.ConjNum
+    ),
 
     Report = snoc(ReportHeader ++ ReportGoalsBefore ++ ReportParConj ++ 
         ReportGoalsAfter, "\n").
 
 :- pred format_parallel_conjunction(var_table::in, int::in, 
+    goal_path::in, int::in,
     list(seq_conj(pard_goal))::in, cord(string)::out) is det.
 
-format_parallel_conjunction(VarTable, Indent, Conjs, Report) :-
+format_parallel_conjunction(VarTable, Indent, GoalPath0, ConjNum, Conjs,
+        !:Report) :-
     IndentStr = indent(Indent),
-    Report0 = IndentStr ++ singleton("(\n"),
-    format_parallel_conjuncts(VarTable, Indent, Conjs, Report0, Report). 
+    !:Report = IndentStr ++ singleton("(\n"),
+    GoalPath = goal_path_add_at_end(GoalPath0, step_conj(ConjNum)),
+    format_parallel_conjuncts(VarTable, Indent, GoalPath, 1, Conjs, !Report). 
 
-:- pred format_parallel_conjuncts(var_table::in, int::in, 
-    list(seq_conj(pard_goal))::in, cord(string)::in, cord(string)::out) 
-    is det.
+:- pred format_parallel_conjuncts(var_table::in, int::in, goal_path::in, 
+    int::in, list(seq_conj(pard_goal))::in, 
+    cord(string)::in, cord(string)::out) is det.
 
-format_parallel_conjuncts(_VarTable, Indent, [], !Report) :-
+format_parallel_conjuncts(_VarTable, Indent, _GoalPath, _ConjNum0, [], 
+        !Report) :-
     IndentStr = indent(Indent),
     !:Report = snoc(!.Report ++ IndentStr, ")\n").
-format_parallel_conjuncts(VarTable, Indent, [ Conj | Conjs ], !Report) :-
+format_parallel_conjuncts(VarTable, Indent, GoalPath0, ConjNum0, 
+        [Conj | Conjs], !Report) :-
     Conj = seq_conj(Goals),
-    format_sequential_conjuncts(VarTable, Indent + 1, Goals, 
-        empty, ConjReport),
+    (
+        Goals = [],
+        error(this_file ++ " empty conjunct in parallel conjunction")
+    ;
+        Goals = [Goal | GoalsTail],
+        GoalPath = goal_path_add_at_end(GoalPath0, step_conj(ConjNum0)),
+        (
+            GoalsTail = [],
+            % A singleton conjunction gets printed as a single goal.
+            print_goal_to_strings(VarTable, Indent + 1, GoalPath, Goal,
+                ConjReport) 
+        ;
+            GoalsTail = [_ | _],
+            format_sequential_conjunction(VarTable, Indent + 1, GoalPath,
+                Goals, ConjReport)
+        )
+    ),
     !:Report = !.Report ++ ConjReport,
     (
         Conjs = []
@@ -3128,20 +3704,35 @@ format_parallel_conjuncts(VarTable, Inde
         Conjs = [_ | _],
         !:Report = snoc(!.Report ++ indent(Indent), "&\n")
     ),
-    format_parallel_conjuncts(VarTable, Indent, Conjs, !Report).
+    ConjNum = ConjNum0 + 1,
+    format_parallel_conjuncts(VarTable, Indent, GoalPath0, ConjNum, Conjs, 
+        !Report).
+
+:- pred format_sequential_conjunction(var_table::in, int::in, goal_path::in,
+    list(pard_goal)::in, cord(string)::out) is det.
+
+format_sequential_conjunction(VarTable, Indent, GoalPath, Goals, Report) :-
+    format_sequential_conjuncts(VarTable, Indent, GoalPath, Goals, 1, _,
+        empty, Report).
 
-:- pred format_sequential_conjuncts(var_table::in, int::in,
-    list(pard_goal)::in, cord(string)::in, cord(string)::out) is det.
+:- pred format_sequential_conjuncts(var_table::in, int::in, goal_path::in,
+    list(pard_goal)::in, int::in, int::out, 
+    cord(string)::in, cord(string)::out) is det.
 
-format_sequential_conjuncts(VarTable, Indent, Conjs, !Report) :-
-    % The determinism is an assumption here.
+format_sequential_conjuncts(_, _, _, [], !ConjNum, !Report).
+format_sequential_conjuncts(VarTable, Indent, GoalPath0, [Conj | Conjs],
+        !ConjNum, !Report) :-
+    GoalPath = goal_path_add_at_end(GoalPath0, step_conj(!.ConjNum)),
+    print_goal_to_strings(VarTable, Indent, GoalPath, Conj, ConjReport),
+    !:Report = !.Report ++ ConjReport,
+    !:ConjNum = !.ConjNum + 1,
     ( 
         Conjs = []
     ;
         Conjs = [_ | _],
-        Conj = goal_rep(conj_rep(Conjs), det_rep, pard_goal_non_atomic),
-        print_goal_to_strings(VarTable, Indent, Conj, ConjReport),
-        !:Report = !.Report ++ ConjReport
+        !:Report = !.Report ++ nl,
+        format_sequential_conjuncts(VarTable, Indent, GoalPath0, Conjs,
+            !ConjNum, !Report)
     ).
 
 :- instance goal_annotation(pard_goal_annotation) where [
Index: deep_profiler/mdprof_feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_feedback.m,v
retrieving revision 1.25
diff -u -p -b -r1.25 mdprof_feedback.m
--- deep_profiler/mdprof_feedback.m	14 Jul 2010 00:40:22 -0000	1.25
+++ deep_profiler/mdprof_feedback.m	4 Aug 2010 01:11:27 -0000
@@ -49,6 +49,7 @@
 :- import_module list.
 :- import_module map.
 :- import_module maybe.
+:- import_module parsing_utils.
 :- import_module require.
 :- import_module string.
 :- import_module svmap.
@@ -186,7 +187,8 @@ create_feedback_report(feedback_data_can
     Parameters = candidate_par_conjunctions_params(DesiredParallelism,
         SparkingCost, SparkingDelay, SignalCost, WaitCost, 
         ContextWakeupDelay, CliqueThreshold, CallSiteThreshold,
-        ParalleliseDepConjs),
+        ParalleliseDepConjs, BestParAlgorithm),
+    best_par_algorithm_string(BestParAlgorithm, BestParAlgorithmStr),
     ReportHeader = singleton(format("  Candidate Parallel Conjunctions:\n" ++
             "    Desired parallelism: %f\n" ++
             "    Sparking cost: %d\n" ++
@@ -197,15 +199,23 @@ create_feedback_report(feedback_data_can
             "    Clique threshold: %d\n" ++
             "    Call site threshold: %d\n" ++
             "    Parallelise dependant conjunctions: %s\n" ++
+            "    BestParallelisationAlgorithm: %s\n" ++
             "    Number of Parallel Conjunctions: %d\n" ++
             "    Parallel Conjunctions:\n\n",
         [f(DesiredParallelism), i(SparkingCost), i(SparkingDelay),
          i(SignalCost), i(WaitCost), i(ContextWakeupDelay), 
          i(CliqueThreshold), i(CallSiteThreshold), s(ParalleliseDepConjsStr),
-         i(NumConjs)])),
+         s(BestParAlgorithmStr), i(NumConjs)])),
     (
-        ParalleliseDepConjs = parallelise_dep_conjs,
-        ParalleliseDepConjsStr = "yes"
+        ParalleliseDepConjs = parallelise_dep_conjs_overlap,
+        ParalleliseDepConjsStr = "yes, use overlap calculation"
+    ;
+        ParalleliseDepConjs = parallelise_dep_conjs_num_vars,
+        ParalleliseDepConjsStr = 
+            "yes, the more shared variables then the less overlap there is"
+    ;
+        ParalleliseDepConjs = parallelise_dep_conjs_naive,
+        ParalleliseDepConjsStr = "yes, pretend they're independant"
     ;
         ParalleliseDepConjs = do_not_parallelise_dep_conjs,
         ParalleliseDepConjsStr = "no"
@@ -281,6 +291,26 @@ help_message =
                 Advise the compiler to parallelism dependant conjunctions.
                 This will become the default once the implementation is
                 complete. 
+    --implicit-parallelism-dependant-conjunctions-algorithm <option>
+                Choose the algorithm that is used to estimate the speedup for
+                dependant calculations.  The options are:
+                    overlap: Compute the 'overlap' between dependant
+                      conjunctions.
+                    num_vars: Use the number of shared variables as a proxy for
+                      the amount of overlap available.
+                    naive: Ignore dependencies.
+                The default is overlap.
+    --implicit-parallelism-best-parallelisation-algorithm <option>
+                Select which algorithm to use to find the best way to
+                parallelise a conjunction.  The options are:
+                    complete-bnb(N): A complete algorithm with a branch and
+                      bound search, this can be rather slow since it has an
+                      exponential time complexity.  This option allows a single
+                      parameter, N, Any conjunction with more than N conjuncts
+                      will be solved using the greedy algorithm instead.  If N
+                      is zero this check is disabled.
+                    greedy: A greedy algorithm with a linear time complexity.
+                The default is complete-bnb(10).
 
     The following options select specific types of feedback information
     and parameterise them:
@@ -374,7 +404,9 @@ read_deep_file(Input, Debug, MaybeDeep, 
     ;       implicit_parallelism_context_wakeup_delay
     ;       implicit_parallelism_clique_cost_threshold
     ;       implicit_parallelism_call_site_cost_threshold
-    ;       implicit_parallelism_dependant_conjunctions.
+    ;       implicit_parallelism_dependant_conjunctions
+    ;       implicit_parallelism_dependant_conjunctions_algorithm
+    ;       implicit_parallelism_best_parallelisation_algorithm.
 
 % TODO: Introduce an option to disable parallelisation of dependant
 % conjunctions, or switch to the simple calculations for independent
@@ -418,6 +450,10 @@ long("implicit-parallelism-call-site-cos
     implicit_parallelism_call_site_cost_threshold).
 long("implicit-parallelism-dependant-conjunctions",
     implicit_parallelism_dependant_conjunctions).
+long("implicit-parallelism-dependant-conjunctions-algorithm",
+    implicit_parallelism_dependant_conjunctions_algorithm).
+long("implicit-parallelism-best-parallelisation-algorithm",
+    implicit_parallelism_best_parallelisation_algorithm).
 
 :- pred defaults(option::out, option_data::out) is multi.
 
@@ -444,6 +480,10 @@ defaults(implicit_parallelism_context_wa
 defaults(implicit_parallelism_clique_cost_threshold,        int(100000)).
 defaults(implicit_parallelism_call_site_cost_threshold,     int(50000)).
 defaults(implicit_parallelism_dependant_conjunctions,       bool(no)).
+defaults(implicit_parallelism_dependant_conjunctions_algorithm,
+    string("overlap")).
+defaults(implicit_parallelism_best_parallelisation_algorithm,
+    string("complete-bnb(10)")).
 
 :- pred construct_measure(string::in, stat_measure::out) is semidet.
 
@@ -551,12 +591,42 @@ check_options(Options0, RequestedFeedbac
         lookup_bool_option(Options,
             implicit_parallelism_dependant_conjunctions,
             ParalleliseDepConjsBool),
+        lookup_string_option(Options,
+            implicit_parallelism_dependant_conjunctions_algorithm,
+            ParalleliseDepConjsString),
         (
-            ParalleliseDepConjsBool = yes,
-            ParalleliseDepConjs = parallelise_dep_conjs
+            parse_parallelise_dep_conjs_string(ParalleliseDepConjsBool,
+                ParalleliseDepConjsString, ParalleliseDepConjsPrime) 
+        ->
+            ParalleliseDepConjs = ParalleliseDepConjsPrime
         ;
-            ParalleliseDepConjsBool = no,
-            ParalleliseDepConjs = do_not_parallelise_dep_conjs
+            error(format(
+                "Couldn't parse '%s' into a parallelise dependant conjs "
+                    ++ "option",
+                [s(ParalleliseDepConjsString)]))
+        ),
+        lookup_string_option(Options,
+            implicit_parallelism_best_parallelisation_algorithm,
+            BestParAlgorithmStr),
+        parse_best_par_algorithm(BestParAlgorithmStr,
+            MaybeBestParAlgorithm),
+        (
+            MaybeBestParAlgorithm = ok(BestParAlgorithm)
+        ;
+            MaybeBestParAlgorithm = error(MaybeMessage, _Line, _Col),
+            (
+                MaybeMessage = yes(Message),
+                Error = format(
+                    "Couldn't parse %s as a best parallelsation algorithm:" ++
+                        " %s\n",
+                    [s(BestParAlgorithmStr), s(Message)])
+            ;
+                MaybeMessage = no,
+                Error = format(
+                    "Couldn't parse %s as a best parallelsation algorithm\n",
+                    [s(BestParAlgorithmStr)])
+            ),
+            error(Error)
         ),
         CandidateParallelConjunctionsOpts =
             candidate_par_conjunctions_params(DesiredParallelism, 
@@ -567,7 +637,8 @@ check_options(Options0, RequestedFeedbac
                 ContextWakeupDelay,
                 CPCCliqueThreshold,
                 CPCCallSiteThreshold,
-                ParalleliseDepConjs),
+                ParalleliseDepConjs,
+                BestParAlgorithm),
         MaybeCandidateParallelConjunctionsOpts =
             yes(CandidateParallelConjunctionsOpts)
     ;
@@ -578,6 +649,53 @@ check_options(Options0, RequestedFeedbac
         requested_feedback_info(MaybeCallsAboveThresholdSortedOpts,
             MaybeCandidateParallelConjunctionsOpts).
 
+:- pred parse_best_par_algorithm(string::in,
+    parse_result(best_par_algorithm)::out) is det.
+
+parse_best_par_algorithm(String, Result) :-
+    promise_equivalent_solutions [Result]
+    (
+        parse(String, best_par_algorithm_parser, Result)
+    ).
+        
+:- pred best_par_algorithm_parser(src::in, best_par_algorithm::out, 
+    ps::in, ps::out) is semidet.
+
+best_par_algorithm_parser(Src, Algorithm, !PS) :-
+    whitespace(Src, _, !PS),
+    (
+        keyword(idchars, "greedy", Src, _, !PS)
+    ->
+        Algorithm = bpa_greedy
+    ;
+        keyword(idchars, "complete-bnb", Src, _, !PS),
+        brackets("(", ")", int_literal, Src, N, !PS),
+        N >= 0,
+        Algorithm = bpa_complete_bnb(N)
+    ),
+    eof(Src, _, !PS).
+
+:- func idchars = string.
+
+idchars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_".
+
+:- pred best_par_algorithm_string(best_par_algorithm::in, string::out) is det.
+
+best_par_algorithm_string(bpa_greedy, "greedy").
+best_par_algorithm_string(bpa_complete_bnb(N), 
+    format("complete_bnb(%d)", [i(N)])).
+
+:- pred parse_parallelise_dep_conjs_string(bool::in, string::in, 
+    parallelise_dep_conjs::out) is semidet.
+
+parse_parallelise_dep_conjs_string(no, _, do_not_parallelise_dep_conjs).
+parse_parallelise_dep_conjs_string(yes, "overlap", 
+    parallelise_dep_conjs_overlap).
+parse_parallelise_dep_conjs_string(yes, "num_vars", 
+    parallelise_dep_conjs_num_vars).
+parse_parallelise_dep_conjs_string(yes, "naive", 
+    parallelise_dep_conjs_naive).
+
     % Adjust command line options when one option implies other options.
     %
 :- pred option_implies(option::in, option::in, bool::in,
Index: deep_profiler/measurements.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/measurements.m,v
retrieving revision 1.17
diff -u -p -b -r1.17 measurements.m
--- deep_profiler/measurements.m	9 Jan 2010 05:49:41 -0000	1.17
+++ deep_profiler/measurements.m	3 Aug 2010 04:39:41 -0000
@@ -19,6 +19,8 @@
 
 :- import_module list.
 
+:- import_module mdbcomp.
+:- import_module mdbcomp.feedback.
 :- import_module measurement_units.
 
 %-----------------------------------------------------------------------------%
@@ -176,12 +178,59 @@
     is semidet.
 
 %-----------------------------------------------------------------------------%
+
+    % Represent the metrics of part of a parallel execution.
+    %
+:- type parallel_exec_metrics_incomplete.
+    
+    % ParMetrics = init_parallel_exec_metrics_incomplete(PartMetricsA,
+    %   TimeSignal, TimeBSeq, TimeBPar) 
+    %
+    % Use this function to build parallel execution metrics for a parallel
+    % conjunction of any size greater than one.
+    %
+    % Although the parallel conjunction operator is operationally
+    % right-associative, parallel overlap due in dependant parallel
+    % conjunctions is easier to model if we consider it to be left associative.
+    % That is the conjunction ( A & B & C ) should be modeled as two conjuncts
+    % (A & B) (which is a conjunction itself and C.  This is because how C
+    % waits for variables in either A or B may depend on How B waits for
+    % variables in A.
+    %
+:- func init_parallel_exec_metrics_incomplete(parallel_exec_metrics_incomplete,
+    float, float, float) = parallel_exec_metrics_incomplete.
+
+    % StartMetrics = init_empty_parallel_exec_metrics(CostBefore, NumCalls,
+    %   SparkCost, SparkDelay, ContextWakeupDelay).
+    %
+    % Use this function to start with an empty set of metrics for an empty
+    % conjunction.  Then use init_parallel_exec_metrics_incomplete to continue
+    % adding conjuncts on the right.
+    %
+:- func init_empty_parallel_exec_metrics(float, int, float, float, float) = 
+    parallel_exec_metrics_incomplete.
+
+    % Metrics = finalise_parallel_exec_metrics(IncompleteMetrics,
+    %   CostAfterConj).
+    %
+    % Make the metrics structure complete.
+    %
+    % RightConjDelay is the delay before the conjunct to the right of & will
+    % begin executing.  & is considered to be right-associative since that's
+    % how sparks are sparked.
+    %
+:- func finalise_parallel_exec_metrics(parallel_exec_metrics_incomplete, float)
+    = parallel_exec_metrics.
+
+%-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
 :- implementation.
 
+:- import_module bool.
 :- import_module float.
 :- import_module int.
+:- import_module maybe.
 :- import_module require.
 :- import_module string.
 
@@ -665,6 +714,178 @@ exceeded_desired_parallelism(DesiredPara
 
 %----------------------------------------------------------------------------%
 
+:- type parallel_exec_metrics_incomplete
+    --->    pem_incomplete(
+                pemi_time_before_conj       :: float,
+
+                pemi_num_calls              :: int,
+
+                pemi_spark_cost             :: float,
+
+                pemi_spark_delay            :: float,
+
+                pemi_context_wakeup_delay   :: float,
+
+                pemi_internal               ::
+                        maybe(parallel_exec_metrics_internal)
+                    % If there are no internal conjuncts then the parallel
+                    % conjunction is empty.
+            ).
+
+:- type parallel_exec_metrics_internal
+    --->    pem_left_most(
+                pemi_time_seq               :: float,
+                pemi_time_par               :: float
+            )
+    ;       pem_additional(
+                pemi_time_left              :: parallel_exec_metrics_internal,
+                    % The time of the left conjunct (that may be a conjunction),
+
+                pemi_time_left_signals      :: float,
+                    % The additional cost of calling signal within the left
+                    % conjunct.
+                    % NOTE: Note that this should be added to each of the
+                    % individual conjuncts _where_ they call signal but thta is
+                    % more difficult and may not be required.  We may visit it
+                    % in the future.
+
+                pemi_time_right_seq         :: float,
+                    % The time of the right conjunct if it is running after
+                    % the left in normal sequential execution.
+
+                pemi_time_right_par         :: float
+                    % The time of the right conjunct if it is running in
+                    % parallel with the left conjunct.  It may have to stop and
+                    % wait for variables to be produced; therefore this time is
+                    % different to time_right_seq.  This time also includes
+                    % parallel execution overheads and delays.
+            ).
+
+init_parallel_exec_metrics_incomplete(Metrics0, TimeSignals, TimeBSeq, 
+        TimeBPar) = Metrics :-
+    MaybeInternal0 = Metrics0 ^ pemi_internal,
+    (
+        MaybeInternal0 = yes(Internal0),
+        Internal = pem_additional(Internal0, TimeSignals, TimeBSeq, TimeBPar)
+    ;
+        MaybeInternal0 = no,
+        Internal = pem_left_most(TimeBSeq, TimeBPar),
+        require(unify(TimeSignals, 0.0),
+            this_file ++ "TimeSignal != 0")
+    ),
+    Metrics = Metrics0 ^ pemi_internal := yes(Internal).
+
+init_empty_parallel_exec_metrics(TimeBefore, NumCalls, SparkCost, 
+        SparkDelay, ContextWakeupDelay) = 
+    pem_incomplete(TimeBefore, NumCalls, SparkCost, SparkDelay,
+        ContextWakeupDelay, no).
+
+finalise_parallel_exec_metrics(IncompleteMetrics, TimeAfter) = Metrics :-
+    IncompleteMetrics = pem_incomplete(TimeBefore, NumCalls, SparkCost,
+        SparkDelay, ContextWakeupDelay, MaybeInternal),
+    (
+        MaybeInternal = yes(Internal)
+    ;
+        MaybeInternal = no,
+        error(this_file ++ "Cannot finalise empty parallel metrics.")
+    ),
+    BeforeAndAfterTime = TimeBefore + TimeAfter,    
+
+    % Calculate par time.
+    InnerParTime = parallel_exec_metrics_internal_get_par_time(Internal),
+    FirstConjParTime = pem_get_first_conj_par_time(Internal),
+    ( FirstConjDeadTime > 0.0 ->
+        FirstConjWakeupPenalty = ContextWakeupDelay
+    ;
+        FirstConjWakeupPenalty = 0.0
+    ),
+    ParTime = InnerParTime + BeforeAndAfterTime + FirstConjWakeupPenalty,
+
+    % Calculate the sequential execution time.
+    InnerSeqTime = parallel_exec_metrics_internal_get_seq_time(Internal),
+    SeqTime = InnerSeqTime + BeforeAndAfterTime,
+    
+    % Calculate the amount of time that the first conjunct is blocked for.
+    FirstConjDeadTime = InnerParTime - FirstConjParTime,
+
+    % Calculate the amount of time that the conjunction spends blocking on
+    % futures.
+    FutureDeadTime = pem_get_future_dead_time(Internal, yes, SparkCost,
+        SparkDelay),
+
+    % Calculate the overheads of parallelisation.
+    ParOverheads = pem_get_par_overheads(Internal),
+
+    Metrics = parallel_exec_metrics(NumCalls, SeqTime, ParTime, ParOverheads,
+        FirstConjDeadTime, FutureDeadTime).
+
+    % The expected parallel execution time.
+    %
+:- func parallel_exec_metrics_internal_get_par_time(
+    parallel_exec_metrics_internal) = float.
+
+parallel_exec_metrics_internal_get_par_time(pem_left_most(_, Time)) = Time.
+parallel_exec_metrics_internal_get_par_time(pem_additional(MetricsLeft,
+        TimeLeftSignal, _, TimeRight)) = Time :-
+    TimeLeft = parallel_exec_metrics_internal_get_par_time(MetricsLeft) +
+        TimeLeftSignal,
+    Time = max(TimeLeft, TimeRight).
+
+    % The expected sequential execution time.
+    %
+:- func parallel_exec_metrics_internal_get_seq_time(
+    parallel_exec_metrics_internal) = float.
+
+parallel_exec_metrics_internal_get_seq_time(pem_left_most(Time, _)) = Time.
+parallel_exec_metrics_internal_get_seq_time(pem_additional(MetricsLeft, _,
+        TimeRight, _)) = Time :-
+    TimeLeft = parallel_exec_metrics_internal_get_seq_time(MetricsLeft),
+    Time = TimeLeft + TimeRight.
+
+    % Get the parallel execution time of the first conjunct.  This is used for
+    % calculating the first conjunct's dead time (above).
+    %
+:- func pem_get_first_conj_par_time(parallel_exec_metrics_internal) = float.
+
+pem_get_first_conj_par_time(pem_left_most(_, Time)) = Time.
+pem_get_first_conj_par_time(pem_additional(Left, LeftSignalTime0, _, _)) = 
+        Time :-
+    (
+        Left = pem_left_most(_, _),
+        LeftSignalTime = LeftSignalTime0
+    ;
+        Left = pem_additional(_, _, _, _),
+        LeftSignalTime = 0.0
+    ),
+    Time = pem_get_first_conj_par_time(Left) + LeftSignalTime.
+
+:- func pem_get_future_dead_time(parallel_exec_metrics_internal, bool,
+    float, float) = float.
+
+    % XXX: We should make this an attribute of pem_additional.
+pem_get_future_dead_time(pem_left_most(_, _), _, _, _) = 0.0.
+pem_get_future_dead_time(pem_additional(Left, _, Seq, Par), 
+        IsRightmostConj, ForkCost, ForkDelay) = DeadTime :-
+    DeadTime = ThisDeadTime + LeftDeadTime,
+    ThisDeadTime0 = Par - Seq - ForkDelay,
+    (
+        IsRightmostConj = yes,
+        ThisDeadTime = ThisDeadTime0
+    ;
+        IsRightmostConj = no,
+        ThisDeadTime = ThisDeadTime0 - ForkCost
+    ),
+    LeftDeadTime = pem_get_future_dead_time(Left, no, ForkCost, ForkDelay).
+
+:- func pem_get_par_overheads(parallel_exec_metrics_internal) = float.
+
+pem_get_par_overheads(pem_left_most(Seq, Par))= Par - Seq.
+pem_get_par_overheads(pem_additional(Left, Signals, Seq, Par)) = Overheads :-
+    Overheads = LeftOverheads + Signals + Par - Seq,
+    pem_get_par_overheads(Left) = LeftOverheads.
+
+%----------------------------------------------------------------------------%
+
 :- func this_file = string.
 
 this_file = "measurements.m".
Index: deep_profiler/message.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/message.m,v
retrieving revision 1.5
diff -u -p -b -r1.5 message.m
--- deep_profiler/message.m	30 Apr 2010 13:09:54 -0000	1.5
+++ deep_profiler/message.m	3 Aug 2010 04:39:41 -0000
@@ -138,7 +138,9 @@
 
                 % An error in the generation of a coverage_procrep report.
                 %
-    ;       error_coverage_procrep_error(string).
+    ;       error_coverage_procrep_error(string)
+    
+    ;       error_exception_thrown(string).
 
 %-----------------------------------------------------------------------------%
 
@@ -151,6 +153,9 @@
     %
 :- func nl_indent(int) = cord(string).
 
+    % A newline symbol.
+:- func nl = cord(string).
+
     % The size of an indentation level.  2 x the input.
     %
 :- func indent_size(int) = int.
@@ -244,6 +249,7 @@ message_type_to_level(error_extra_proc_d
     message_error.
 message_type_to_level(error_coverage_procrep_error(_)) =
     message_error.
+message_type_to_level(error_exception_thrown(_)) = message_error.
 
 %-----------------------------------------------------------------------------%
 
@@ -295,9 +301,14 @@ message_type_to_string(MessageType) = Co
         String = "extra proc dynamnics for a clique proc are not currenty"
             ++ " handled."
     ;
+        (
         MessageType = error_coverage_procrep_error(ErrorStr),
-        string.format("Error generating coverage procedure report: %s",
-            [s(ErrorStr)], String)
+            Template = "Error generating coverage procedure report: %s"
+        ;
+            MessageType = error_exception_thrown(ErrorStr),
+            Template = "Exception thrown: %s"
+        ),
+        string.format(Template, [s(ErrorStr)], String)
     ),
     Cord = singleton(String).
 
@@ -312,7 +323,9 @@ indent(N) = Indent :-
         Indent = snoc(indent(N - 1), "  ")
     ).
 
-nl_indent(N) = singleton("\n") ++ indent(N).
+nl_indent(N) = nl ++ indent(N).
+
+nl = singleton("\n").
 
 indent_size(N) = 2 * N.
 
Index: deep_profiler/program_representation_utils.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/program_representation_utils.m,v
retrieving revision 1.24
diff -u -p -b -r1.24 program_representation_utils.m
--- deep_profiler/program_representation_utils.m	4 Jul 2010 10:24:09 -0000	1.24
+++ deep_profiler/program_representation_utils.m	3 Aug 2010 04:39:41 -0000
@@ -50,8 +50,10 @@
     %
     % Print a goal (recursively) to a string representation.
     %
-:- pred print_goal_to_strings(var_table::in, int::in, goal_rep(GoalAnn)::in,
-    cord(string)::out) is det <= goal_annotation(GoalAnn).
+:- pred print_goal_to_strings(var_table::in, int::in, goal_path::in, 
+        goal_rep(GoalAnn)::in, cord(string)::out) is det 
+    <= goal_annotation(GoalAnn).
+
 
 %----------------------------------------------------------------------------%
 
@@ -206,7 +208,7 @@ print_proc_to_strings(ProcRep, Strings) 
     ProcLabelString = DetismString ++ cord.singleton(" ") ++ 
         cord.singleton(ProcLabelString0),
     print_args_to_strings(print_head_var, VarTable, ArgVarReps, ArgsString),
-    print_goal_to_strings(VarTable, 1, GoalRep, GoalString),
+    print_goal_to_strings(VarTable, 1, empty_goal_path, GoalRep, GoalString),
     Strings = ProcLabelString ++ ArgsString ++ cord.singleton(" :-\n") ++
         GoalString ++ nl.
 
@@ -232,35 +234,16 @@ print_proc_label_to_string(ProcLabel, St
 
 %-----------------------------------------------------------------------------%
 
-print_goal_to_strings(VarTable, Indent, GoalRep, Strings) :-
+print_goal_to_strings(VarTable, Indent, GoalPath, GoalRep, Strings) :-
     GoalRep = goal_rep(GoalExprRep, DetismRep, GoalAnnotation),
-    detism_to_string(DetismRep, DetismString0),
-    print_goal_annotation_to_strings(GoalAnnotation, GoalAnnotationString0),
-   
-    % Indicate which detisms and annotations apply to whole conjunctions.
-    ( GoalExprRep = conj_rep(_) ->
-        AnnotationPrefix = cord.singleton("% conj: ")
-    ;
-        AnnotationPrefix = cord.singleton("% ")
-    ),
-    GoalAnnotationString1 = AnnotationPrefix ++ GoalAnnotationString0,
-    DetismString = AnnotationPrefix ++ DetismString0,
-
-    % Don't print empty annotations, including their newline.
-    ( not is_empty(GoalAnnotationString0) ->
-        GoalAnnotationString = nl_indent(Indent) ++ GoalAnnotationString1
-    ;
-        GoalAnnotationString = cord.empty
-    ),
-    Strings = indent(Indent) ++ DetismString ++ 
-        GoalAnnotationString ++ 
-        nl ++ ExprString,
     (
         GoalExprRep = conj_rep(ConjGoalReps),
-        print_conj_to_strings(VarTable, Indent, ConjGoalReps, ExprString)
+        print_conj_to_strings(VarTable, Indent, GoalPath, ConjGoalReps, 
+            ExprString)
     ;
         GoalExprRep = disj_rep(DisjGoalReps),
-        print_disj_to_strings(VarTable, Indent, DisjGoalReps, no, DisjString),
+        print_disj_to_strings(VarTable, Indent, GoalPath, 1, DisjGoalReps, 
+            no, DisjString),
         ExprString = indent(Indent) ++ 
             cord.singleton("(\n") ++ DisjString ++ indent(Indent) ++
             cord.singleton(")\n")
@@ -269,15 +252,22 @@ print_goal_to_strings(VarTable, Indent, 
         lookup_var_name(VarTable, SwitchVarRep, SwitchVarName),
         string.format("%s switch on %s\n",
             [s(string(CanFail)), s(SwitchVarName)], SwitchOnString),
-        print_switch_to_strings(VarTable, Indent, CasesRep, no, SwitchString),
+        print_switch_to_strings(VarTable, Indent, GoalPath, 1, CasesRep, 
+            no, SwitchString),
         ExprString = indent(Indent) ++ cord.singleton(SwitchOnString) ++ 
             indent(Indent) ++ cord.singleton("(\n") ++ SwitchString ++ 
             indent(Indent) ++ cord.singleton(")\n")
     ;
         GoalExprRep = ite_rep(CondRep, ThenRep, ElseRep),
-        print_goal_to_strings(VarTable, Indent + 1, CondRep, CondString),
-        print_goal_to_strings(VarTable, Indent + 1, ThenRep, ThenString),
-        print_goal_to_strings(VarTable, Indent + 1, ElseRep, ElseString),
+        GoalPathCond = goal_path_add_at_end(GoalPath, step_ite_cond),
+        GoalPathThen = goal_path_add_at_end(GoalPath, step_ite_then),
+        GoalPathElse = goal_path_add_at_end(GoalPath, step_ite_else),
+        print_goal_to_strings(VarTable, Indent + 1, GoalPathCond, CondRep, 
+            CondString),
+        print_goal_to_strings(VarTable, Indent + 1, GoalPathThen, ThenRep, 
+            ThenString),
+        print_goal_to_strings(VarTable, Indent + 1, GoalPathElse, ElseRep, 
+            ElseString),
         IndentString = indent(Indent),
         ExprString = IndentString ++ cord.singleton("(\n") ++ CondString ++ 
             IndentString ++ cord.singleton("->\n") ++ ThenString ++ 
@@ -285,7 +275,9 @@ print_goal_to_strings(VarTable, Indent, 
             IndentString ++ cord.singleton(")\n")
     ;
         GoalExprRep = negation_rep(SubGoalRep),
-        print_goal_to_strings(VarTable, Indent + 1, SubGoalRep, SubGoalString),
+        SubGoalPath = goal_path_add_at_end(GoalPath, step_neg),
+        print_goal_to_strings(VarTable, Indent + 1, SubGoalPath, SubGoalRep,
+            SubGoalString),
         ExprString = indent(Indent) ++ cord.singleton("not (\n") ++ 
             SubGoalString ++ indent(Indent) ++ cord.singleton(")\n")
     ;
@@ -297,7 +289,9 @@ print_goal_to_strings(VarTable, Indent, 
             MaybeCut = scope_is_no_cut,
             CutString = cord.empty
         ),
-        print_goal_to_strings(VarTable, Indent + 1, SubGoalRep, SubGoalString),
+        SubGoalPath = goal_path_add_at_end(GoalPath, step_scope(MaybeCut)),
+        print_goal_to_strings(VarTable, Indent + 1, SubGoalPath, SubGoalRep,
+            SubGoalString),
         ExprString = indent(Indent) ++ cord.singleton("scope") ++ CutString ++ 
             cord.singleton(" (\n") ++
             SubGoalString ++ indent(Indent) ++ cord.singleton(")\n")
@@ -306,43 +300,86 @@ print_goal_to_strings(VarTable, Indent, 
             _BoundVars, AtomicGoalRep),
         print_atomic_goal_to_strings(VarTable, AtomicGoalRep, ExprString0),
         ExprString = indent(Indent) ++ ExprString0
+    ),
+    
+    detism_to_string(DetismRep, DetismString0),
+    print_goal_annotation_to_strings(GoalAnnotation, GoalAnnotationString0),
+   
+    AnnotationPrefix = cord.singleton("% "),
+    GoalPathString0 = goal_path_to_string(GoalPath),
+    ( GoalPathString0 = "" ->
+        GoalPathString1 = "root goal"
+    ;
+        GoalPathString1 = GoalPathString0
+    ),
+    GoalPathString = AnnotationPrefix ++ cord.singleton(GoalPathString1),
+    DetismString = AnnotationPrefix ++ DetismString0,
+
+    % Don't print empty annotations, including their newline.
+    ( not is_empty(GoalAnnotationString0) ->
+        ( GoalExprRep = conj_rep(_) ->
+            % If this annotation belongs to a conjunction make sure that this
+            % is clear.
+            GoalAnnotationString = indent(Indent) ++ AnnotationPrefix 
+                ++ cord.singleton("conjunction: ") ++ GoalAnnotationString0 
+                ++ nl
+        ;
+            GoalAnnotationString = indent(Indent) ++ AnnotationPrefix 
+                ++ GoalAnnotationString0 ++ nl
+        )
+    ;
+        GoalAnnotationString = cord.empty
+    ),
+    ( GoalExprRep = conj_rep(_) ->
+        % Don't print determinism information or the goal path for conjunctions.
+        Strings = GoalAnnotationString
+            ++ ExprString
+    ; 
+        Strings = indent(Indent) ++ GoalPathString ++ nl
+            ++ indent(Indent) ++ DetismString ++ nl 
+            ++ GoalAnnotationString
+            ++ ExprString
     ).
 
 :- pred print_conj_to_strings(var_table::in, int::in,
-    list(goal_rep(GoalAnn))::in, cord(string)::out) is det
+    goal_path::in, list(goal_rep(GoalAnn))::in, cord(string)::out) is det
     <= goal_annotation(GoalAnn).
 
-print_conj_to_strings(VarTable, Indent, GoalReps, Strings) :-
+print_conj_to_strings(VarTable, Indent, GoalPath, GoalReps, Strings) :-
     (
         GoalReps = [],
         Strings = cord.snoc(indent(Indent), "true\n")
     ;
         GoalReps = [_ | _],
-        print_conj_2_to_strings(VarTable, Indent, GoalReps, Strings)
+        print_conj_2_to_strings(VarTable, Indent, GoalPath, 1, GoalReps, 
+            Strings)
     ).
 
 :- pred print_conj_2_to_strings(var_table::in, int::in,
-    list(goal_rep(GoalAnn))::in, cord(string)::out) is det
-    <= goal_annotation(GoalAnn).
+    goal_path::in, int::in, list(goal_rep(GoalAnn))::in, cord(string)::out)
+    is det <= goal_annotation(GoalAnn).
 
-print_conj_2_to_strings(_, _Indent, [], cord.empty).
-print_conj_2_to_strings(VarTable, Indent, [GoalRep | GoalReps], Strings) :-
+print_conj_2_to_strings(_, _Indent, _, _, [], cord.empty).
+print_conj_2_to_strings(VarTable, Indent, GoalPath0, ConjNum, 
+        [GoalRep | GoalReps], Strings) :-
     % We use the absence of a separator to denote conjunction.
     %
     % We could try to append the comma at the end of each goal that is
     % not last in a conjunction, but that would be significant work,
     % and (at least for now) there is no real need for it.
-    print_goal_to_strings(VarTable, Indent, GoalRep, GoalString),
-    print_conj_2_to_strings(VarTable, Indent, GoalReps, ConjString),
+    GoalPath = goal_path_add_at_end(GoalPath0, step_conj(ConjNum)),
+    print_goal_to_strings(VarTable, Indent, GoalPath, GoalRep, GoalString),
+    print_conj_2_to_strings(VarTable, Indent, GoalPath0, ConjNum+1, 
+        GoalReps, ConjString),
     Strings = GoalString ++ ConjString.
 
-:- pred print_disj_to_strings(var_table::in, int::in,
-    list(goal_rep(GoalAnn))::in, bool::in, cord(string)::out) is det
-    <= goal_annotation(GoalAnn).
-
-print_disj_to_strings(_, _Indent, [], _PrintSemi, cord.empty).
-print_disj_to_strings(VarTable, Indent, [GoalRep | GoalReps], PrintSemi,
-        Strings) :-
+:- pred print_disj_to_strings(var_table::in, int::in, goal_path::in, 
+    int::in, list(goal_rep(GoalAnn))::in, bool::in, cord(string)::out) 
+    is det <= goal_annotation(GoalAnn).
+
+print_disj_to_strings(_, _Indent, _, _, [], _PrintSemi, cord.empty).
+print_disj_to_strings(VarTable, Indent, GoalPath0, DisjNum,
+        [GoalRep | GoalReps], PrintSemi, Strings) :-
     (
         PrintSemi = no,
         DelimString = cord.empty
@@ -350,17 +387,19 @@ print_disj_to_strings(VarTable, Indent, 
         PrintSemi = yes,
         DelimString = indent(Indent) ++ cord.singleton(";\n")
     ),
-    print_goal_to_strings(VarTable, Indent + 1, GoalRep, GoalString),
-    print_disj_to_strings(VarTable, Indent, GoalReps, yes, DisjString),
+    GoalPath = goal_path_add_at_end(GoalPath0, step_disj(DisjNum)),
+    print_goal_to_strings(VarTable, Indent + 1, GoalPath, GoalRep, GoalString),
+    print_disj_to_strings(VarTable, Indent, GoalPath0, DisjNum+1, GoalReps,
+        yes, DisjString),
     Strings = DelimString ++ GoalString ++ DisjString.
 
-:- pred print_switch_to_strings(var_table::in, int::in,
-    list(case_rep(GoalAnn))::in, bool::in, cord(string)::out) is det
+:- pred print_switch_to_strings(var_table::in, int::in, goal_path::in, 
+    int::in, list(case_rep(GoalAnn))::in, bool::in, cord(string)::out) is det
     <= goal_annotation(GoalAnn).
 
-print_switch_to_strings(_, _Indent, [], _PrintSemi, cord.empty).
-print_switch_to_strings(VarTable, Indent, [CaseRep | CaseReps], PrintSemi,
-        Strings) :-
+print_switch_to_strings(_, _Indent, _, _, [], _PrintSemi, cord.empty).
+print_switch_to_strings(VarTable, Indent, GoalPath0, CaseNum, 
+        [CaseRep | CaseReps], PrintSemi, Strings) :-
     (
         PrintSemi = no,
         DelimString = cord.empty
@@ -373,8 +412,10 @@ print_switch_to_strings(VarTable, Indent
         ConsIdArityString),
     list.map(print_cons_id_and_arity_to_strings(Indent + 1),
         OtherConsIdArityRep, OtherConsIdArityStrings),
-    print_goal_to_strings(VarTable, Indent + 1, GoalRep, GoalString),
-    print_switch_to_strings(VarTable, Indent, CaseReps, yes, CaseStrings),
+    GoalPath = goal_path_add_at_end(GoalPath0, step_switch(CaseNum, no)),
+    print_goal_to_strings(VarTable, Indent + 1, GoalPath, GoalRep, GoalString),
+    print_switch_to_strings(VarTable, Indent, GoalPath0, CaseNum+1, CaseReps,
+        yes, CaseStrings),
     Strings = DelimString ++ ConsIdArityString ++
         cord_list_to_cord(OtherConsIdArityStrings) ++ GoalString ++
         CaseStrings.
Index: deep_profiler/read_profile.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/read_profile.m,v
retrieving revision 1.27
diff -u -p -b -r1.27 read_profile.m
--- deep_profiler/read_profile.m	8 Sep 2009 02:37:15 -0000	1.27
+++ deep_profiler/read_profile.m	4 Aug 2010 01:13:49 -0000
@@ -225,7 +225,45 @@ init_deep(ProgName, MaxCSD, MaxCSS, MaxP
 :- pred read_nodes(initial_deep::in, maybe_error(initial_deep)::out,
     io::di, io::uo) is det.
 
-read_nodes(!.InitDeep, MaybeInitDeep, !IO) :-
+read_nodes(InitDeep0, MaybeInitDeep, !IO) :-
+    % Wrap the real function inside another loop.  This strategy ensures that
+    % this code works in grades that lack tail recursion such as debugging
+    % grades.  read_nodes_2 will return after it has exceeded a depth limit,
+    % unwinding it's stack.  The outer loop will continue as long as
+    % read_nodes_2 thinks that more work remains.
+    % The depth of 50,000 has been chosen as it is roughly less than half the
+    % stack depth that causes crashes during debugging.
+    read_nodes_2(50000, InitDeep0, MaybeInitDeep0, !IO),
+    (
+        MaybeInitDeep0 = init_deep_complete(InitDeep),
+        MaybeInitDeep = ok(InitDeep)
+    ;
+        MaybeInitDeep0 = error(Error),
+        MaybeInitDeep = error(Error)
+    ;
+        MaybeInitDeep0 = init_deep_incomplete(InitDeep1),
+        read_nodes(InitDeep1, MaybeInitDeep, !IO)
+    ).
+
+:- type maybe_init_deep_complete
+    --->    init_deep_complete(initial_deep)
+    ;       init_deep_incomplete(initial_deep)
+    ;       error(string).
+
+:- pred read_nodes_2(int::in, initial_deep::in, maybe_init_deep_complete::out,
+    io::di, io::uo) is det.
+
+read_nodes_2(Depth, !.InitDeep, MaybeInitDeep, !IO) :-
+    ( Depth < 1 ->
+        MaybeInitDeep = init_deep_incomplete(!.InitDeep)
+    ;
+        read_nodes_3(Depth - 1, !.InitDeep, MaybeInitDeep, !IO)
+    ).
+
+:- pred read_nodes_3(int::in, initial_deep::in, maybe_init_deep_complete::out,
+    io::di, io::uo) is det.
+
+read_nodes_3(Depth, !.InitDeep, MaybeInitDeep, !IO) :-
     read_byte(MaybeByte, !IO),
     (
         MaybeByte = ok(Byte),
@@ -238,7 +276,7 @@ read_nodes(!.InitDeep, MaybeInitDeep, !I
                     CSDs0 = !.InitDeep ^ init_call_site_dynamics,
                     deep_insert(CSDs0, CSDI, CallSiteDynamic, CSDs),
                     !InitDeep ^ init_call_site_dynamics := CSDs,
-                    read_nodes(!.InitDeep, MaybeInitDeep, !IO)
+                    read_nodes_2(Depth, !.InitDeep, MaybeInitDeep, !IO)
                 ;
                     MaybeCSD = error2(Error),
                     MaybeInitDeep = error(Error)
@@ -251,7 +289,7 @@ read_nodes(!.InitDeep, MaybeInitDeep, !I
                     PDs0 = !.InitDeep ^ init_proc_dynamics,
                     deep_insert(PDs0, PDI, ProcDynamic, PDs),
                     !InitDeep ^ init_proc_dynamics := PDs,
-                    read_nodes(!.InitDeep, MaybeInitDeep, !IO)
+                    read_nodes_2(Depth, !.InitDeep, MaybeInitDeep, !IO)
                 ;
                     MaybePD = error2(Error),
                     MaybeInitDeep = error(Error)
@@ -264,7 +302,7 @@ read_nodes(!.InitDeep, MaybeInitDeep, !I
                     CSSs0 = !.InitDeep ^ init_call_site_statics,
                     deep_insert(CSSs0, CSSI, CallSiteStatic, CSSs),
                     !InitDeep ^ init_call_site_statics := CSSs,
-                    read_nodes(!.InitDeep, MaybeInitDeep, !IO)
+                    read_nodes_2(Depth, !.InitDeep, MaybeInitDeep, !IO)
                 ;
                     MaybeCSS = error(Error),
                     MaybeInitDeep = error(Error)
@@ -277,14 +315,14 @@ read_nodes(!.InitDeep, MaybeInitDeep, !I
                     PSs0 = !.InitDeep ^ init_proc_statics,
                     deep_insert(PSs0, PSI, ProcStatic, PSs),
                     !InitDeep ^ init_proc_statics := PSs,
-                    read_nodes(!.InitDeep, MaybeInitDeep, !IO)
+                    read_nodes_2(Depth, !.InitDeep, MaybeInitDeep, !IO)
                 ;
                     MaybePS = error2(Error),
                     MaybeInitDeep = error(Error)
                 )
             ;
                 NextItem = deep_item_end,
-                MaybeInitDeep = ok(!.InitDeep)
+                MaybeInitDeep = init_deep_complete(!.InitDeep)
             )
         ;
             string.format("unexpected token %d", [i(Byte)], Msg),
@@ -292,7 +330,8 @@ read_nodes(!.InitDeep, MaybeInitDeep, !I
         )
     ;
         MaybeByte = eof,
-        MaybeInitDeep = ok(!.InitDeep)
+        % XXX: Shouldn't this be an error since there's a deep_item_end token?
+        MaybeInitDeep = init_deep_complete(!.InitDeep)
     ;
         MaybeByte = error(Error),
         io.error_message(Error, Msg),
Index: mdbcomp/feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/mdbcomp/feedback.m,v
retrieving revision 1.13
diff -u -p -b -r1.13 feedback.m
--- mdbcomp/feedback.m	14 Jul 2010 00:40:22 -0000	1.13
+++ mdbcomp/feedback.m	4 Aug 2010 01:14:49 -0000
@@ -26,6 +26,7 @@
 :- import_module int.
 :- import_module io.
 :- import_module list.
+:- import_module set.
 :- import_module string.
 
 %-----------------------------------------------------------------------------%
@@ -125,15 +126,44 @@
                     % The cost threshold in call sequence counts of a call site
                     % before it is considered for parallel execution.
 
-                cpcp_parallelise_dep_conjs  :: parallelise_dep_conjs
+                cpcp_parallelise_dep_conjs  :: parallelise_dep_conjs,
                     % Whether we will allow parallelisation to result in
                     % dependant parallel conjunctions.
+
+                cpcp_best_par_alg           :: best_par_algorithm
             ).
 
 :- type parallelise_dep_conjs
-    --->    parallelise_dep_conjs
+    --->    parallelise_dep_conjs_overlap
+                % Use the overlap calculation for dependant parallelism.
+
+    ;       parallelise_dep_conjs_num_vars
+                % Use the num vars approximation for how much conjuncts
+                % overlap.
+
+    ;       parallelise_dep_conjs_naive
+                % Be naive to dependant parallelism, pretend its independant.
+
     ;       do_not_parallelise_dep_conjs.
 
+    % This type is used to select which algorithm is used to find the most
+    % profitable parallelisation of a particular conjunction.
+    %
+    % TODO: The type name could be improved to make it distinct from the
+    % algorithm use use to search through the clique graph.
+    %
+:- type best_par_algorithm
+    --->    bpa_complete_bnb(
+                % Use a complete but exponential algorithm.
+                
+                int
+                    % If nonzero a conjunct with more than this many conjuncts
+                    % will be solved with the greedy algorithm instead of this
+                    % slower one.  (10 is the recommended value).
+            )
+    ;       bpa_greedy.
+                % Use a greedy and linear algorithm.
+
     % The set of candidate parallel conjunctions within a procedure.
     %
 :- type candidate_par_conjunctions_proc(GoalType)
@@ -169,6 +199,11 @@
                     % conjunction.  Partitions are separated by non atomic
                     % goals, the first partition has the number 1.
 
+                cpc_first_conj_num      :: int,
+                    % The first conjunct number in the partition.  This is only
+                    % used for pretty-printing these reports with meaningful
+                    % goal paths.
+
                 cpc_is_dependant        :: conjuncts_are_dependant,
 
                 cpc_goals_before        :: list(GoalType),
@@ -239,7 +274,7 @@
                 % compiler about _how_ to parallelise calls around it.
 
 :- type conjuncts_are_dependant
-    --->    conjuncts_are_dependant
+    --->    conjuncts_are_dependant(set(var_rep))
     ;       conjuncts_are_independent.
 
 :- pred convert_candidate_par_conjunctions_proc(pred(A, B),
@@ -258,85 +293,45 @@
 
     % Represent the metrics of a parallel execution.
     %
-:- type parallel_exec_metrics.
-
-    % Represent the metrics of part of a parallel execution.
-    %
-:- type parallel_exec_metrics_incomplete.
-    
-    % ParMetrics = init_parallel_exec_metrics_incomplete(PartMetricsA,
-    %   TimeSignal, TimeBSeq, TimeBPar) 
-    %
-    % Use this function to build parallel execution metrics for a parallel
-    % conjunction of any size greater than one.
-    %
-    % Although the parallel conjunction operator is operationally
-    % right-associative, parallel overlap due in dependant parallel
-    % conjunctions is easier to model if we consider it to be left associative.
-    % That is the conjunction ( A & B & C ) should be modeled as two conjuncts
-    % (A & B) (which is a conjunction itself and C.  This is because how C
-    % waits for variables in either A or B may depend on How B waits for
-    % variables in A.
-    %
-:- func init_parallel_exec_metrics_incomplete(parallel_exec_metrics_incomplete,
-    float, float, float) = parallel_exec_metrics_incomplete.
-
-    % StartMetrics = init_empty_parallel_exec_metrics(CostBefore, NumCalls,
-    %   SparkCost, SparkDelay, ContextWakeupDelay).
-    %
-    % Use this function to start with an empty set of metrics for an empty
-    % conjunction.  Then use init_parallel_exec_metrics_incomplete to continue
-    % adding conjuncts on the right.
-    %
-:- func init_empty_parallel_exec_metrics(float, int, float, float, float) = 
-    parallel_exec_metrics_incomplete.
-
-    % Metrics = finalise_parallel_exec_metrics(IncompleteMetrics,
-    %   CostAfterConj).
-    %
-    % Make the metrics structure complete.
-    %
-    % RightConjDelay is the delay before the conjunct to the right of & will
-    % begin executing.  & is considered to be right-associative since that's
-    % how sparks are sparked.
-    %
-:- func finalise_parallel_exec_metrics(parallel_exec_metrics_incomplete, float)
-    = parallel_exec_metrics.
-
-:- func parallel_exec_metrics_get_num_calls(parallel_exec_metrics) = int.
+:- type parallel_exec_metrics
+    --->    parallel_exec_metrics(
+                pem_num_calls               :: int,
+                    % The number of calls into this parallelisation.
 
-:- func parallel_exec_metrics_get_seq_time(parallel_exec_metrics) = float.
+                pem_seq_time                :: float,
+                    % The elapsed time of the original sequential execution.
 
-:- func parallel_exec_metrics_get_par_time(parallel_exec_metrics) = float.
+                pem_par_time                :: float,
+                    % The elapsed time of the parallel execution.
 
-    % The amount of time saved per-call. SeqTime - ParTime.
-    %
-:- func parallel_exec_metrics_get_time_saving(parallel_exec_metrics) = float.
+                pem_par_overheads           :: float,
+                    % The overheads of parallel execution.  These are already
+                    % included in pem_par_time.
+                    % Add these to pem_seq_time to get the 'time on cpu' of
+                    % this execution.
+
+                pem_first_conj_dead_time    :: float,
+                    % The amount of time the initial (left most) conjunct
+                    % spends waiting for the other conjuncts.  During this time
+                    % the context used by this conjunct must be kept alive
+                    % because it will resume executing sequential code after
+                    % the conjunct, however we know that it cannot be resumed
+                    % before it's children have completed.
+                    %
+
+                pem_future_dead_time        :: float
+                    % The amount of time all conjuncts spend blocked on the
+                    % production of futures.
+            ).
 
     % The speedup per call.  SeqTime / ParTime.  For example, a value of 2.0
     % means that this is twice as fast when parallelised.
     %
 :- func parallel_exec_metrics_get_speedup(parallel_exec_metrics) = float.
 
-    % The amount of time the initial (left most) conjunct spends waiting for
-    % the other conjuncts.  During this time the context used by this conjunct
-    % must be kept alive because it will resume executing sequential code after
-    % the conjunct, however we know that it cannot be resumed before it's
-    % children have completed.
-    %
-:- func parallel_exec_metrics_get_first_conj_dead_time(parallel_exec_metrics) =
-    float.
-
-    % The some of the amount of time spent waiting for variables to be produced
-    % throughout the whole conjunction.
-    %
-:- func parallel_exec_metrics_get_future_dead_time(parallel_exec_metrics) =
-    float.
-
-    % The sum of the above two times.
+    % The amount of time saved per-call. SeqTime - ParTime.
     %
-:- func parallel_exec_metrics_get_total_dead_time(parallel_exec_metrics) =
-    float.
+:- func parallel_exec_metrics_get_time_saving(parallel_exec_metrics) = float.
 
 %-----------------------------------------------------------------------------%
 
@@ -789,13 +784,23 @@ write_feedback_file_2(Stream, ProgName, 
 
 %-----------------------------------------------------------------------------%
 
+parallel_exec_metrics_get_speedup(PEM) = SeqTime / ParTime :-
+    SeqTime = PEM ^ pem_seq_time,
+    ParTime = PEM ^ pem_par_time.
+
+parallel_exec_metrics_get_time_saving(PEM) = SeqTime - ParTime :-
+    SeqTime = PEM ^ pem_seq_time,
+    ParTime = PEM ^ pem_par_time.
+
+%-----------------------------------------------------------------------------%
+
 :- func feedback_first_line = string.
 
 feedback_first_line = "Mercury Compiler Feedback".
 
 :- func feedback_version = string.
 
-feedback_version = "10".
+feedback_version = "11".
 
 %-----------------------------------------------------------------------------%
 %
@@ -810,220 +815,19 @@ convert_candidate_par_conjunctions_proc(
     CPCProcB = candidate_par_conjunctions_proc(VarTable, CPCB).
 
 convert_candidate_par_conjunction(Conv, CPC0, CPC) :-
-    CPC0 = candidate_par_conjunction(GoalPath, PartNum, IsDependent, 
-        GoalsBefore0, Conjs0, GoalsAfter0, Metrics),
+    CPC0 = candidate_par_conjunction(GoalPath, PartNum, FirstGoalNum,
+        IsDependent, GoalsBefore0, Conjs0, GoalsAfter0, Metrics),
     map(convert_seq_conj(Conv), Conjs0, Conjs),
     map(Conv, GoalsBefore0, GoalsBefore),
     map(Conv, GoalsAfter0, GoalsAfter),
-    CPC = candidate_par_conjunction(GoalPath, PartNum, IsDependent, 
-        GoalsBefore, Conjs, GoalsAfter, Metrics).
+    CPC = candidate_par_conjunction(GoalPath, PartNum, FirstGoalNum,
+        IsDependent, GoalsBefore, Conjs, GoalsAfter, Metrics).
 
 convert_seq_conj(Conv, seq_conj(Conjs0), seq_conj(Conjs)) :-
     map(Conv, Conjs0, Conjs).
 
 %-----------------------------------------------------------------------------%
 
-:- type parallel_exec_metrics
-    --->    parallel_exec_metrics(
-                pem_inner_metrics           :: parallel_exec_metrics_internal,
-                pem_num_calls               :: int,
-                pem_time_before_conj        :: float,
-                pem_time_after_conj         :: float,
-                pem_left_conj_cost          :: float,
-                    % The cost of calling fork() in the conjunct to the left of
-                    % a & symbol.
-                pem_right_conj_delay        :: float,
-                    % The delay before a conjunct to the right of & begins
-                    % executing.
-
-                pem_context_wakeup_delay    :: float
-            ).
-
-:- type parallel_exec_metrics_incomplete
-    --->    pem_incomplete(
-                pemi_time_before_conj       :: float,
-
-                pemi_num_calls              :: int,
-
-                pemi_spark_cost             :: float,
-
-                pemi_spark_delay            :: float,
-
-                pemi_context_wakeup_delay   :: float,
-
-                pemi_internal               ::
-                        maybe(parallel_exec_metrics_internal)
-                    % If there are no internal conjuncts then the parallel
-                    % conjunction is empty.
-            ).
-
-:- type parallel_exec_metrics_internal
-    --->    pem_left_most(
-                pemi_time_seq               :: float,
-                pemi_time_par               :: float
-            )
-    ;       pem_additional(
-                pemi_time_left              :: parallel_exec_metrics_internal,
-                    % The time of the left conjunct (that may be a conjunction),
-
-                pemi_time_left_signals      :: float,
-                    % The additional cost of calling signal within the left
-                    % conjunct.
-                    % NOTE: Note that this should be added to each of the
-                    % individual conjuncts _where_ they call signal but thta is
-                    % more difficult and may not be required.  We may visit it
-                    % in the future.
-
-                pemi_time_right_seq         :: float,
-                    % The time of the right conjunct if it is running after
-                    % the left in normal sequential execution.
-
-                pemi_time_right_par         :: float
-                    % The time of the right conjunct if it is running in
-                    % parallel with the left conjunct.  It may have to stop and
-                    % wait for variables to be produced; therefore this time is
-                    % different to time_right_seq.  This time also includes
-                    % parallel execution overheads and delays.
-            ).
-
-init_parallel_exec_metrics_incomplete(Metrics0, TimeSignals, TimeBSeq, 
-        TimeBPar) = Metrics :-
-    MaybeInternal0 = Metrics0 ^ pemi_internal,
-    (
-        MaybeInternal0 = yes(Internal0),
-        Internal = pem_additional(Internal0, TimeSignals, TimeBSeq, TimeBPar)
-    ;
-        MaybeInternal0 = no,
-        Internal = pem_left_most(TimeBSeq, TimeBPar),
-        require(unify(TimeSignals, 0.0),
-            this_file ++ "TimeSignal != 0")
-    ),
-    Metrics = Metrics0 ^ pemi_internal := yes(Internal).
-
-init_empty_parallel_exec_metrics(TimeBefore, NumCalls, SparkCost, 
-        SparkDelay, ContextWakeupDelay) = 
-    pem_incomplete(TimeBefore, NumCalls, SparkCost, SparkDelay,
-        ContextWakeupDelay, no).
-
-finalise_parallel_exec_metrics(IncompleteMetrics, TimeAfter) = Metrics :-
-    IncompleteMetrics = pem_incomplete(TimeBefore, NumCalls, SparkCost,
-        SparkDelay, ContextWakeupDelay, MaybeInternal),
-    (
-        MaybeInternal = yes(Internal)
-    ;
-        MaybeInternal = no,
-        error(this_file ++ "Cannot finalise empty parallel metrics.")
-    ),
-    Metrics = parallel_exec_metrics(Internal, NumCalls, TimeBefore, TimeAfter,
-        SparkCost, SparkDelay, ContextWakeupDelay).
-
-parallel_exec_metrics_get_num_calls(PEM) = NumCalls :-
-    NumCalls = PEM ^ pem_num_calls.
-
-parallel_exec_metrics_get_par_time(PEM) = Time :-
-    Inner = PEM ^ pem_inner_metrics,
-    InnerTime = parallel_exec_metrics_internal_get_par_time(Inner),
-    FirstConjTime = pem_get_first_conj_time(Inner),
-    BeforeAndAfterTime = PEM ^ pem_time_before_conj + PEM ^ pem_time_after_conj,
-    ( FirstConjTime < InnerTime ->
-        FirstConjWakeupPenalty = PEM ^ pem_context_wakeup_delay
-    ;
-        FirstConjWakeupPenalty = 0.0
-    ),
-    Time = InnerTime + BeforeAndAfterTime + FirstConjWakeupPenalty.
-
-parallel_exec_metrics_get_seq_time(PEM) = Time :- 
-    Inner = PEM ^ pem_inner_metrics,
-    InnerTime = parallel_exec_metrics_internal_get_seq_time(Inner),
-    BeforeAndAfterTime = PEM ^ pem_time_before_conj + PEM ^ pem_time_after_conj,
-    Time = InnerTime + BeforeAndAfterTime.
-
-parallel_exec_metrics_get_speedup(Metrics) = SeqTime / ParTime :-
-    SeqTime = parallel_exec_metrics_get_seq_time(Metrics),
-    ParTime = parallel_exec_metrics_get_par_time(Metrics).
-
-    % The expected parallel execution time.
-    %
-:- func parallel_exec_metrics_internal_get_par_time(
-    parallel_exec_metrics_internal) = float.
-
-parallel_exec_metrics_internal_get_par_time(pem_left_most(_, Time)) = Time.
-parallel_exec_metrics_internal_get_par_time(pem_additional(MetricsLeft,
-        TimeLeftSignal, _, TimeRight)) = Time :-
-    TimeLeft = parallel_exec_metrics_internal_get_par_time(MetricsLeft) +
-        TimeLeftSignal,
-    Time = max(TimeLeft, TimeRight).
-
-    % The expected sequential execution time.
-    %
-:- func parallel_exec_metrics_internal_get_seq_time(
-    parallel_exec_metrics_internal) = float.
-
-parallel_exec_metrics_internal_get_seq_time(pem_left_most(Time, _)) = Time.
-parallel_exec_metrics_internal_get_seq_time(pem_additional(MetricsLeft, _,
-        TimeRight, _)) = Time :-
-    TimeLeft = parallel_exec_metrics_internal_get_seq_time(MetricsLeft),
-    Time = TimeLeft + TimeRight.
-
-parallel_exec_metrics_get_time_saving(Metrics) = SeqTime - ParTime :-
-    SeqTime = parallel_exec_metrics_get_seq_time(Metrics),
-    ParTime = parallel_exec_metrics_get_par_time(Metrics).
-
-parallel_exec_metrics_get_first_conj_dead_time(Metrics) = DeadTime :-
-    Inner = Metrics ^ pem_inner_metrics,
-    FirstConjTime = pem_get_first_conj_time(Inner),
-    ParTime = parallel_exec_metrics_get_par_time(Metrics),
-    DeadTime = ParTime - FirstConjTime.
-
-    % Get the parallel execution time of the first conjunct.  This is used for
-    % calculating the first conjunct's dead time (above).
-    %
-:- func pem_get_first_conj_time(parallel_exec_metrics_internal) = float.
-
-pem_get_first_conj_time(pem_left_most(_, Time)) = Time.
-pem_get_first_conj_time(pem_additional(Left, LeftSignalTime0, _, _)) = Time :-
-    (
-        Left = pem_left_most(_, _),
-        LeftSignalTime = LeftSignalTime0
-    ;
-        Left = pem_additional(_, _, _, _),
-        LeftSignalTime = 0.0
-    ),
-    Time = pem_get_first_conj_time(Left) + LeftSignalTime.
-
-parallel_exec_metrics_get_future_dead_time(Metrics) = DeadTime :-
-    Inner = Metrics ^ pem_inner_metrics,
-    RightConjDelay = Metrics ^ pem_right_conj_delay,
-    LeftConjCost = Metrics ^ pem_left_conj_cost,
-    DeadTime = pem_get_future_dead_time(Inner, yes, LeftConjCost,
-        RightConjDelay).
-
-:- func pem_get_future_dead_time(parallel_exec_metrics_internal, bool,
-    float, float) = float.
-
-    % XXX: Delays may new be build into the times.
-pem_get_future_dead_time(pem_left_most(_, _), _, _, _) = 0.0.
-pem_get_future_dead_time(pem_additional(Left, _, Seq, Par), 
-        IsRightmostConj, ForkCost, ForkDelay) = DeadTime :-
-    DeadTime = ThisDeadTime + LeftDeadTime,
-    ThisDeadTime0 = Par - Seq - ForkDelay,
-    (
-        IsRightmostConj = yes,
-        ThisDeadTime = ThisDeadTime0
-    ;
-        IsRightmostConj = no,
-        ThisDeadTime = ThisDeadTime0 + ForkCost
-    ),
-    LeftDeadTime = pem_get_future_dead_time(Left, no, ForkCost, ForkDelay).
-
-parallel_exec_metrics_get_total_dead_time(Metrics) = DeadTime :-
-    DeadTime = FirstConjDeadTime + FutureDeadTime,
-    FirstConjDeadTime = 
-        parallel_exec_metrics_get_first_conj_dead_time(Metrics),
-    FutureDeadTime = parallel_exec_metrics_get_future_dead_time(Metrics).
-
-%-----------------------------------------------------------------------------%
-
 :- func this_file = string.
 
 this_file = "feedback.m: ".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20100804/c2094454/attachment.sig>
Previous message: [m-rev.] for possible review: frameopt performance fix
Next message: [m-rev.] diff: support more grade tests in trace goals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list