[m-rev.] For post-commit review: Implicit parallelisation work.

Paul Bone pbone at csse.unimelb.edu.au
Sat Jan 9 16:48:50 AEDT 2010


For post commit review by Zoltan.

Implicit parallelism work.

The implicit parallelism algorithm, feedback file format and therefore compiler
have been updated.  They now support parallelisation across other goals and, in
theory, parallelising three or more calls against one another.  The algorithm
is far from complete and very much experimental, it has been tested on a
modified version of icfp_2000 where it improves the runtime.  Note that
automatic parallelisation of dependant conjunctions is disabled for now.

mdbcomp/feedback.m:
    Modify deep profiling feedback data, a candidate parallel conjunct now
    contains a list of sequential conjunctions that contain other goals.
    Previously only two calls to be parallelised against one-another where
    listed.

    Document restrictions on the new candidate parallel conjunct structure that
    can't be expressed by the type system.

    Incremented the feedback file format number.

mdbcomp/program_representation.m:
    Made a semidet predicate version of empty_goal_path.
    
    Created maybe_search_var_name which returns it's result wrapped in a maybe
    structure, this is a deterministic alternative to search_var_name.  It is
    useful as an argument to list.map

deep_profiler/mdprof_feedback.m:
    When printing messages out to stderr also print the newlines between the
    messages to stderr.

deep_profiler/measurements.m:
    Re-aranged the order of arguments and added a four argument version for
    sub_computation_parallelism.

    Added a new function, some_parallelism/1, that initialises a parallelism
    amount as.

deep_profiler/message.m:
    Added extra messages.

    Pretty-print program locations using the conventional string representation
    for procedures and goal paths.  Export the predicate that does this.

deep_profiler/program_representation_utils.m:
    Export a predicate to format a procedure identifier nicely.

    Add code for calculating and manipulating inst_map_delta objects similar to
    those in the compiler.

deep_profiler/mdprof_fb.automatic_parallelism.m:
    Various code cleanups/simplifications.

    Re-worked the parallelisation algorithm, it can now parallelise across
    cheaper calls and (theoretically) handle parallel conjunctions with any
    number of conjuncts.

    Conform to new candidate parallel conjunction representation.
    
    Internally use a structure similar to the candidate parallel conjunct
    structure in feedback.m  This makes the maybe_call_conjunct structure
    obsolete, the old structure has been removed.

compiler/implicit_parallelism.m:
    Updated implicit parallelism transformation to conform to the new feedback
    file format.

compiler/goal_util.m:
    Added goal_is_atomic/2
    Modified create_conj_from_list to simply return the only goal in the list
    when the list contains exactly one goal.

library/maybe.m:
    Add a simple predicate (maybe_is_yes/2) that 'opens' a maybe and returns the result or
    fails.

NEWS:
    Announce maybe_is_yes/2

Index: NEWS
===================================================================
RCS file: /home/mercury1/repository/mercury/NEWS,v
retrieving revision 1.513
diff -u -p -b -r1.513 NEWS
--- NEWS	14 Oct 2009 05:28:24 -0000	1.513
+++ NEWS	9 Jan 2010 05:25:36 -0000
@@ -149,6 +149,8 @@ Changes to the Mercury standard library:
 * The following function has been added to the pqueue module
 	pqueue.length/1
 
+* We have added maybe_is_yes/2 semidet predicate to the maybe module.
+
 * We have changed the interface of the ops module to make lookups of operators
   more efficient.
 
Index: compiler/goal_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/goal_util.m,v
retrieving revision 1.171
diff -u -p -b -r1.171 goal_util.m
--- compiler/goal_util.m	25 Sep 2009 05:13:01 -0000	1.171
+++ compiler/goal_util.m	9 Jan 2010 01:28:07 -0000
@@ -232,6 +232,14 @@
     %
 :- pred pred_proc_ids_from_goal(hlds_goal::in, list(pred_proc_id)::out) is det.
 
+:- type goal_is_atomic
+    --->    goal_is_atomic
+    ;       goal_is_nonatomic.
+
+    % Returns whether a goal is atomic.  This is undefined for shorthand goals.
+    %
+:- pred goal_is_atomic(hlds_goal::in, goal_is_atomic::out) is det.
+
 %-----------------------------------------------------------------------------%
 
     % Convert a switch back into a disjunction. This is needed
@@ -1498,11 +1506,9 @@ create_conj(GoalA, GoalB, Type, ConjGoal
 
 create_conj_from_list(GoalsInConj, Type, ConjGoal) :-
     (
-        GoalsInConj = [ GoalA | _ ]
-    ;
-        GoalsInConj = [],
-        unexpected(this_file, "create_conj_from_list: empty conjunction")
-    ),
+        GoalsInConj = [ GoalA | GoalsTail ],
+        (
+            GoalsTail = [ _ | _ ],
     ConjGoalExpr = conj(Type, GoalsInConj),
     goal_list_nonlocals(GoalsInConj, NonLocals),
     goal_list_instmap_delta(GoalsInConj, InstMapDelta),
@@ -1512,7 +1518,15 @@ create_conj_from_list(GoalsInConj, Type,
     Context = goal_info_get_context(GoalAInfo),
     goal_info_init(NonLocals, InstMapDelta, Detism, Purity, Context,
         ConjGoalInfo),
-    ConjGoal = hlds_goal(ConjGoalExpr, ConjGoalInfo).
+            ConjGoal = hlds_goal(ConjGoalExpr, ConjGoalInfo)
+        ;
+            GoalsTail = [],
+            ConjGoal = GoalA
+        )
+    ;
+        GoalsInConj = [],
+        unexpected(this_file, "create_conj_from_list: empty conjunction")
+    ).
 
 %-----------------------------------------------------------------------------%
 
@@ -1804,6 +1818,29 @@ pred_proc_ids_from_goal(Goal, PredProcId
     P = (pred(PredProcId::out) is nondet :- goal_calls(Goal, PredProcId)),
     solutions.solutions(P, PredProcIds).
 
+goal_is_atomic(Goal, GoalIsAtomic) :-
+    GoalExpr = Goal ^ hlds_goal_expr,
+    (
+        ( GoalExpr = unify(_, _, _, _, _)
+        ; GoalExpr = plain_call(_, _, _, _, _, _)
+        ; GoalExpr = generic_call(_, _, _, _)
+        ; GoalExpr = call_foreign_proc(_, _, _, _, _, _, _)
+        ),
+        GoalIsAtomic = goal_is_atomic
+    ;
+        ( GoalExpr = conj(_, _)
+        ; GoalExpr = disj(_)
+        ; GoalExpr = switch(_, _, _)
+        ; GoalExpr = negation(_)
+        ; GoalExpr = scope(_, _)
+        ; GoalExpr = if_then_else(_, _, _, _)
+        ),
+        GoalIsAtomic = goal_is_nonatomic
+    ;
+        GoalExpr = shorthand(_),
+        unexpected(this_file, "goal_is_atomic/2: shorthand goal")
+    ).
+        
 %-----------------------------------------------------------------------------%
 
 foreign_code_uses_variable(Impl, VarName) :-
Index: compiler/implicit_parallelism.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/implicit_parallelism.m,v
retrieving revision 1.12
diff -u -p -b -r1.12 implicit_parallelism.m
--- compiler/implicit_parallelism.m	8 Sep 2009 02:43:32 -0000	1.12
+++ compiler/implicit_parallelism.m	9 Jan 2010 04:49:15 -0000
@@ -247,7 +247,7 @@ maybe_parallelise_proc(ModuleInfo, Paral
     CPCMap = ParallelismInfo ^ pi_cpc_map,
     ( map.search(CPCMap, IMProcLabel, CPC) ->
         proc_info_get_goal(ProcInfo0, Goal0),
-        TargetGoalPathString = CPC ^ goal_path,
+        TargetGoalPathString = CPC ^ cpc_goal_path,
         ( goal_path_from_string(TargetGoalPathString, TargetGoalPathPrime) ->
             TargetGoalPath = TargetGoalPathPrime
         ;
@@ -298,95 +298,275 @@ maybe_parallelise_conj(ProcInfo, CPC, Go
     GoalExpr0 = Goal0 ^ hlds_goal_expr,
     % We've reached the point indicated by the goal path, Find the
     % conjuncts that we wish to parallelise.
+    PartitionNum = CPC ^ cpc_partition_number,
+    FeedbackParConjuncts = CPC ^ cpc_conjs, 
     ( 
+        % XXX: Temporarily avoid parallelising dependant conjunctions,  This
+        % appears to trigger bugs in later stages of the compiler.
+        CPC ^ cpc_is_dependant = conjuncts_are_independent, 
         GoalExpr0 = conj(plain_conj, Conjs0),
-        index1_of_candidate_par_conjunct(ProcInfo, CPC ^ par_conjunct_a,
-            Conjs0, AIdx),
-        index1_of_candidate_par_conjunct(ProcInfo, CPC ^ par_conjunct_b,
-            Conjs0, BIdx),
-        AIdx \= BIdx,
-        % In the future there may be some goals between the two calls to
-        % parallelise.  The analysis will say how many of these goals should be
-        % run with each of the two calls, but it may be incorrect in cases
-        % where the code has changed.  Thus we need to ensure that the code is
-        % still mode-correct.
-        GoalA = list.det_index1(Conjs0, AIdx),
-        GoalB = list.det_index1(Conjs0, BIdx),
-        model_det_and_at_least_semipure(GoalA),
-        model_det_and_at_least_semipure(GoalB),
-        ( BIdx - AIdx = 1 ->
-            % The conjuncts are adjacent with GoalA occuring first.
-            FirstParGoal = GoalA,
-            SecondParGoal = GoalB,
-            MaxIdx = BIdx,
-            MinIdx = AIdx
-        ; AIdx - BIdx = 1 ->
-            FirstParGoal = GoalB,
-            SecondParGoal = GoalA,
-            MaxIdx = AIdx,
-            MinIdx = BIdx
-        ;
-            fail
-        )
+        flatten_conj(Conjs0, Conjs1),
+        find_partition(PartitionNum, Conjs1, GoalsBeforePartition,
+            yes(Partition0), GoalsAfterPartition),
+        build_par_conjuncts(ProcInfo, Partition0, FeedbackParConjuncts,
+            yes(ParConjuncts), GoalsBefore, GoalsAfter)
     ->
-        ParConjExprList = [FirstParGoal, SecondParGoal],
-        ParConjExpr = conj(parallel_conj, ParConjExprList),
-        goal_list_nonlocals(ParConjExprList, ParConjNonLocals),
-        goal_list_instmap_delta(ParConjExprList, ParConjInstmapDelta),
-        goal_list_determinism(ParConjExprList, ParConjDetism),
-        goal_list_purity(ParConjExprList, ParConjPurity),
-        goal_info_init(ParConjNonLocals, ParConjInstmapDelta, ParConjDetism,
-            ParConjPurity, ParConjInfo),
-        ParConj = hlds_goal(ParConjExpr, ParConjInfo),
-        (
-            take(MinIdx - 1, Conjs0, GoalsBeforeParPrime),
-            drop(MaxIdx, Conjs0, GoalsAfterParPrime)
-        ->
-            GoalsBeforePar = GoalsBeforeParPrime,
-            GoalsAfterPar = GoalsAfterParPrime
-        ;
-            unexpected(this_file, "Miscalculated conjunct list operations.")
-        ),
-        Conjs = GoalsBeforePar ++ [ ParConj | GoalsAfterPar ],
+        create_conj_from_list(ParConjuncts, parallel_conj, ParConjunction),
+        Conjs = GoalsBeforePartition ++ GoalsBefore ++ 
+            [ ParConjunction | GoalsAfter ++ GoalsAfterPartition ], 
         GoalExpr = conj(plain_conj, Conjs),
         MaybeGoal = yes(hlds_goal(GoalExpr, Goal0 ^ hlds_goal_info))
     ;
         MaybeGoal = no
     ).
 
-:- pred index1_of_candidate_par_conjunct(proc_info::in, 
-    candidate_par_conjunct::in, list(hlds_goal)::in, int::out) is semidet.
+:- pred find_partition(int::in, list(hlds_goal)::in, list(hlds_goal)::out,
+    maybe(list(hlds_goal))::out, list(hlds_goal)::out) is det.
 
-index1_of_candidate_par_conjunct(ProcInfo, CPC, Goals, Index) :-
-    MaybeCallee = CPC ^ callee,
+find_partition(_, [], [], no, []).
+find_partition(PartitionNum0, [Goal | Goals], GoalsBefore, MaybePartition,
+        GoalsAfter) :-
+    ( PartitionNum0 = 1 ->
+        % We've found the correct partition.
+        GoalsBefore = [],
+        find_end_of_partition(Goals, Partition, GoalsAfter),
+        MaybePartition = yes([ Goal | Partition ])
+    ;
+        goal_is_atomic(Goal, GoalIsAtomic),
+        (
+            GoalIsAtomic = goal_is_atomic,
+            PartitionNum = PartitionNum0
+        ;
+            GoalIsAtomic = goal_is_nonatomic,
+            PartitionNum = PartitionNum0 - 1
+        ),
+        find_partition(PartitionNum, Goals, GoalsBefore0, MaybePartition,
+            GoalsAfter),
+        GoalsBefore = [ Goal | GoalsBefore0 ]
+    ).
+
+:- pred find_end_of_partition(list(hlds_goal)::in, list(hlds_goal)::out, 
+    list(hlds_goal)::out) is det.
+
+find_end_of_partition([], [], []).
+find_end_of_partition([ Goal | Goals ], Partition, GoalsAfter) :-
+    goal_is_atomic(Goal, GoalIsAtomic),
+    (
+        GoalIsAtomic = goal_is_atomic,
+        find_end_of_partition(Goals, Partition0, GoalsAfter),
+        Partition = [ Goal | Partition0 ]
+    ;
+        GoalIsAtomic = goal_is_nonatomic,
+        Partition = [ Goal ],
+        GoalsAfter = Goals
+    ).
+
+:- pred build_par_conjuncts(proc_info::in, hlds_goals::in, list(seq_conj)::in, 
+    maybe(hlds_goals)::out, hlds_goals::out, hlds_goals::out) is det.
+
+build_par_conjuncts(ProcInfo, Goals, FeedbackParConjuncts0,
+        MaybeParConjuncts, GoalsBefore, GoalsAfter) :-
+    (
+        FeedbackParConjuncts0 = [ FeedbackParConjunct | FeedbackParConjuncts ],
+        FeedbackParConjunct = seq_conj(FeedbackSeqConjuncts),
+        build_seq_conjuncts(ProcInfo, Goals, FeedbackSeqConjuncts,
+            MaybeSeqConjuncts0, GoalsBefore, GoalsAfterSeqConj),
+        (
+            MaybeSeqConjuncts0 = yes(SeqConjuncts0),
+            build_par_conjuncts(ProcInfo, GoalsAfterSeqConj,
+                FeedbackParConjuncts, MaybeParConjuncts0, GoalsBefore0,
+                GoalsAfter0),
+            (
+                MaybeParConjuncts0 = yes(ParConjuncts0),
+                % GoalsBefore0 must be put into either the current parallel
+                % conjunct or the next parallel conjunct.
+                (
+                    ParConjuncts0 = [],
+                    % If there are no parallel conjuncts remaining then
+                    % schedule all these goals after the parallel conjunction,
+                    % this is a trivial conservative strategy.
+                    GoalsAfter = GoalsBefore0 ++ GoalsAfter0,
+                    SeqConjuncts = SeqConjuncts0
+                ;
+                    ParConjuncts0 = [_ | _],
+                    GoalsAfter = GoalsAfter0,
+                    SeqConjuncts = SeqConjuncts0,
+                    (
+                        GoalsBefore0 = []
+                    ;
+                        GoalsBefore0 = [_ | _],
+                        sorry(this_file, 
+                          "Extra goals found during automatic parallelisation")
+                    )
+                ),
+                create_conj_from_list(SeqConjuncts, plain_conj, ParConjunct),
+                % Mercury only supports parallelisation of deterministic code,
+                % furthermore it's just not a good idea to parallelise anything
+                % that is impure.
+                ( model_det_and_at_least_semipure(ParConjunct) ->
+                    MaybeParConjuncts = yes([ ParConjunct | ParConjuncts0 ])
+                ;
+                    MaybeParConjuncts = no
+                )
+            ;
+                MaybeParConjuncts0 = no,
+                MaybeParConjuncts = no,
+                GoalsAfter = Goals
+            )
+        ;
+            MaybeSeqConjuncts0 = no,
+            MaybeParConjuncts = no,
+            GoalsAfter = Goals
+        )
+    ;
+        FeedbackParConjuncts0 = [],
+        MaybeParConjuncts = yes([]),
+        GoalsBefore = [],
+        GoalsAfter = Goals
+    ).
+
+:- pred build_seq_conjuncts(proc_info::in, hlds_goals::in, list(inner_goal)::in,
+    maybe(hlds_goals)::out, hlds_goals::out, hlds_goals::out) is det.
+
+build_seq_conjuncts(ProcInfo, Goals, FeedbackSeqConjuncts0, MaybeSeqConjuncts,
+        GoalsBefore, GoalsAfter) :-
+    (
+        FeedbackSeqConjuncts0 = [ FeedbackSeqConjunct | FeedbackSeqConjuncts ],
+        inner_goal_match_hlds_goal(ProcInfo, FeedbackSeqConjunct, Goals,
+            MaybeGoal, GoalsBefore0, GoalsAfter0),
+        (
+            MaybeGoal = yes(Goal),
+            GoalsBefore = GoalsBefore0,
+            SeqConjunct = Goal,
+            build_seq_conjuncts(ProcInfo, GoalsAfter0, FeedbackSeqConjuncts,
+                MaybeSeqConjuncts0, GoalsBetween, GoalsAfter),
+            (
+                MaybeSeqConjuncts0 = yes(SeqConjuncts0),
+                SeqConjuncts = [ SeqConjunct | 
+                    (GoalsBetween ++ SeqConjuncts0) ],
+                MaybeSeqConjuncts = yes(SeqConjuncts)
+            ;
+                MaybeSeqConjuncts0 = no,
+                MaybeSeqConjuncts = no
+            )
+        ;
+            MaybeGoal = no,
+            MaybeSeqConjuncts = no,
+            GoalsBefore = [],
+            GoalsAfter = Goals
+        )
+    ;
+        FeedbackSeqConjuncts0 = [],
+        MaybeSeqConjuncts = yes([]),
+        GoalsBefore = [],
+        GoalsAfter = Goals
+    ).
+
+:- pred inner_goal_match_hlds_goal(proc_info::in, inner_goal::in,
+    hlds_goals::in, maybe(hlds_goal)::out, hlds_goals::out, hlds_goals::out)
+    is det.
+
+inner_goal_match_hlds_goal(_ProcInfo, _, [], no, [], []).
+inner_goal_match_hlds_goal(ProcInfo, InnerGoal, [ Goal | Goals ], MaybeGoal,
+        GoalsBefore, GoalsAfter) :-
+    proc_info_get_varset(ProcInfo, Varset),
+    GoalExpr = Goal ^ hlds_goal_expr,
+    (
+        ( GoalExpr = unify(_, _, _, _, _)
+        ; GoalExpr = call_foreign_proc(_, _, _, _, _, _, _)
+        ),
+        (
+            ( InnerGoal = ig_call(_, _, _)
+            ; InnerGoal = ig_cheap_call(_, _)
+            ),
+            Match = no
+        ;
+            InnerGoal = ig_other_atomic_goal,
+            Match = yes
+        )
+    ;
+        (
+            GoalExpr = plain_call(_, _, Args, Builtin, _, _)
+        ;
+            GoalExpr = generic_call(_, Args, _, _),
+            Builtin = not_builtin
+        ),
+        (
+            ( InnerGoal = ig_call(Callee, FbArgs, _)
+            ; InnerGoal = ig_cheap_call(Callee, FbArgs)
+            ),
+            % Try to match the callee and the vars.
+            (
+                Builtin = not_builtin,
+                (
+                    (
+                        GoalExpr = generic_call(_, _, _, _),
+                        Callee = unknown_callee
+                    ;
+                        GoalExpr = plain_call(_, _, _, _, _, SymName),
+                        Callee = named_callee(ModuleName, ProcName),
+                        match_sym_name(SymName, ModuleName, ProcName)
+                    ),
+                    map(args_match(Varset), Args, FbArgs)
+                ->
+                    Match = yes
+                ;
+                    Match = no
+                )
+            ;
+                ( Builtin = inline_builtin
+                ; Builtin = out_of_line_builtin
+                ),
+                Match = no
+            )
+        ;
+            InnerGoal = ig_other_atomic_goal,
     (
-        MaybeCallee = yes(Callee),
-        NamePt1 - NamePt2 = Callee,
-        MaybeName = yes(string_to_sym_name(NamePt1 ++ "." ++ NamePt2))
-    ;
-        MaybeCallee = no,
-        MaybeName = no
-    ),
-    CPC ^ vars = CPCArgs,
-    proc_info_get_varset(ProcInfo, VarSet),
-
-    find_index_of_match((pred(Goal::in) is semidet :-
-            Goal = hlds_goal(CallGoal, _),
-            % Some calls know the name of the callee, others dont.  match the
-            % if the name is know and what it is if known against the candidate
-            % parallel conjunct.
-            % Note: we don't (yet) allow parallelisation of foreign code,
-            (
-                CallGoal = plain_call(_, _, Args, _, _, Name),
-                MaybeName = yes(Name)
-            ; 
-                CallGoal = generic_call(_, Args, _, _),
-                MaybeName = no
-            ),
-            % Match arguments which have user defined names in the profiled
-            % program against the names in the current program.
-            list.map(args_match(VarSet), Args, CPCArgs)
-        ), Goals, 1, Index).
+                Builtin = not_builtin,
+                Match = no
+            ;
+                ( Builtin = inline_builtin
+                ; Builtin = out_of_line_builtin
+                ),
+                Match = yes 
+            )
+        )
+    ;
+        ( GoalExpr = conj(_, _)
+        ; GoalExpr = disj(_)
+        ; GoalExpr = switch(_, _, _)
+        ; GoalExpr = negation(_)
+        ; GoalExpr = scope(_, _)
+        ; GoalExpr = if_then_else(_, _, _, _)
+        ; GoalExpr = shorthand(_)
+        ),
+        unexpected(this_file, 
+            "Found non-atomic goal in inner_goal_match_hlds_goal")
+    ),
+    (
+        Match = yes,
+        MaybeGoal = yes(Goal),
+        GoalsBefore = [],
+        GoalsAfter = Goals
+    ;
+        Match = no,
+        inner_goal_match_hlds_goal(ProcInfo, InnerGoal, Goals, MaybeGoal,
+            GoalsBefore0, GoalsAfter),
+        GoalsBefore = [ Goal | GoalsBefore0 ]
+    ).
+
+:- pred match_sym_name(sym_name::in, string::in, string::in) is semidet.
+
+match_sym_name(unqualified(ProcName), _, ProcName).
+match_sym_name(qualified(SymModuleName, ProcName), ModuleName, ProcName) :-
+    ModuleNameParts = reverse(split_at_separator(unify('.'), ModuleName)),
+    match_sym_module_name(SymModuleName, ModuleNameParts).
+
+:- pred match_sym_module_name(sym_name::in, list(string)::in) is semidet.
+
+match_sym_module_name(unqualified(Name), [Name]).
+match_sym_module_name(qualified(SymName, Name), [Name | Names]) :-
+    match_sym_module_name(SymName, Names).
 
 :- pred args_match(prog_varset::in, prog_var::in, maybe(string)::in) 
     is semidet.
Index: deep_profiler/mdprof_fb.automatic_parallelism.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_fb.automatic_parallelism.m,v
retrieving revision 1.3
diff -u -p -b -r1.3 mdprof_fb.automatic_parallelism.m
--- deep_profiler/mdprof_fb.automatic_parallelism.m	16 Dec 2009 23:47:30 -0000	1.3
+++ deep_profiler/mdprof_fb.automatic_parallelism.m	9 Jan 2010 05:28:05 -0000
@@ -128,21 +128,36 @@ candidate_parallel_conjunctions(Opts, De
     = candidate_par_conjunction.
 
 internal_cpconjunction_to_cpconjunction(CPCI) = CPC :-
-    CPCI = candidate_par_conjunction_internal(GoalPath, ConjAI, ConjBI,
-        Dependence, Speedup),
-    internal_cpconjunct_to_cpconjunct(ConjAI, ConjA),
-    internal_cpconjunct_to_cpconjunct(ConjBI, ConjB),
-    CPC = candidate_par_conjunction(GoalPath, ConjA, ConjB, Dependence,
+    CPCI = candidate_par_conjunction_internal(GoalPath, PartNum, Dependance, 
+        ConjsI, Speedup),
+    map(internal_seq_conj_to_seq_conj, ConjsI, Conjs),
+    CPC = candidate_par_conjunction(GoalPath, PartNum, Dependance, Conjs,
         Speedup).
 
-:- pred internal_cpconjunct_to_cpconjunct(
-        candidate_par_conjunct_internal::in, candidate_par_conjunct::out) 
+:- pred internal_seq_conj_to_seq_conj(seq_conj_internal::in, seq_conj::out) 
     is det.
 
-internal_cpconjunct_to_cpconjunct(CPCI, CPC) :-
-    CPCI = candidate_par_conjunct_internal(Callee, Vars, CostPercall,
-        _ConjNum),
-    CPC = candidate_par_conjunct(Callee, Vars, CostPercall).
+internal_seq_conj_to_seq_conj(seq_conj_internal(ConjsI), seq_conj(Conjs)) :-
+    map(inner_goal_internal_to_inner_goal, ConjsI, Conjs).
+
+:- pred inner_goal_internal_to_inner_goal(inner_goal_internal::in, 
+    inner_goal::out) is det.
+
+inner_goal_internal_to_inner_goal(inner_goal_internal(IGT, _, _, _), IG) :-
+    (
+        IGT = igt_call(Callee, Vars, CostPercall, _, _),
+        IG = ig_call(Callee, Vars, CostPercall)
+    ;
+        IGT = igt_cheap_call(Callee, Vars, _, _),
+        IG = ig_cheap_call(Callee, Vars)
+    ;
+        IGT = igt_other_atomic_goal,
+        IG = ig_other_atomic_goal
+    ;
+        IGT = igt_non_atomic_goal,
+        error(this_file ++ 
+            "Unexpected: non atomic goal in inner_goal_internal_to_inner_goal")
+    ).
 
 %----------------------------------------------------------------------------%
 
@@ -167,38 +182,101 @@ internal_cpconjunct_to_cpconjunct(CPCI, 
                 cpci_goal_path          :: goal_path_string,
                     % The path within the procedure to this conjunction.
                 
-                cpci_par_conjunct_a     :: candidate_par_conjunct_internal,
-                cpci_par_conjunct_b     :: candidate_par_conjunct_internal,
-                    % The conjuncts worth parallelising
+                cpci_partition_number   :: int,
+                    % Used to locate the goals to be parallelised within the
+                    % conjunction.  Partitions are separated by non atomic
+                    % goals, the first partition has the number 1.
+
+                cpci_is_dependant       :: conjuncts_are_dependant,
 
-                cpci_dependence         :: conjuncts_are_dependant,
+                cpci_conjs              :: list(seq_conj_internal),
+                    % The suggested parallel expression.
 
                 cpci_speedup            :: float
             ).
 
-:- type candidate_par_conjunct_internal
-    --->    candidate_par_conjunct_internal(
-                cpci_callee              :: maybe(pair(string, string)),
-                    % If the name of the callee is known (it's not a HO call),
-                    % then store the module and symbol names here.
-                    % Note: arity and mode are not represented.
+:- type seq_conj_internal
+    --->    seq_conj_internal(
+                sci_conjs           :: list(inner_goal_internal)
+            ).
                     
-                cpci_vars                :: list(maybe(string)),
-                    % The names of variables (if used defined) given as
-                    % arguments to this call.
+    % A representation of a goal within a parallel conjunction.  We don't have
+    % to represent many types of goals or details about them, at least for now.
+    %
+:- type inner_goal_internal
+    --->    inner_goal_internal(
+                igi_ig_type             :: inner_goal_type,
+                    % The type and type-specific values of the inner goal.
                     
-                cpci_cost_percall        :: float,
-                    % The per-call cost of this call in call sequence counts.
+                igi_detism              :: detism_rep,
+                    % The determinism of the call.
                 
-                cpci_conj_num            :: int
+                igi_conj_num            :: int,
                     % The place within the conjunction that this conjunct
                     % lies.
+
+                igi_inst_map_info       :: inst_map_info
+                    % The inst map info attached to the original goal.
             ).
 
+:- type inner_goal_type 
+    --->    igt_call(
+                igtc_callee             :: callee_rep,
+                    
+                igtc_vars               :: list(maybe(string)),
+                    % The names of variables (if user defined) given as
+                    % arguments to this call.
+
+                igtc_cost_percall       :: float,
+                    % The per-call cost of this call in call sequence counts.
+             
+                igtc_args               :: list(var_mode_and_use),
+                    % The argument modes and use information.
+
+                igtc_call_site          :: clique_call_site_report
+                    % The call site report from the deep profiler.
+            )
+    ;       igt_cheap_call(
+                igtcc_callee            :: callee_rep,
+                igtcc_vars              :: list(maybe(string)),
+                igtcc_args              :: list(var_mode_and_use),
+                igtcc_cost              :: cs_cost_csq 
+            )
+    ;       igt_other_atomic_goal
+    ;       igt_non_atomic_goal.
+
+:- inst igt_call 
+    --->    igt_call(ground, ground, ground, ground, ground).
+
+:- inst igt_atomic_goal
+    --->    igt_call(ground, ground, ground, ground, ground)
+    ;       igt_cheap_call(ground, ground, ground, ground)
+    ;       igt_other_atomic_goal.
+    
+    % A variable, it's mode and it's usage in the callee.  The mode information
+    % is also summarised within the variable use information.
+    %
+:- type var_mode_and_use
+    --->    var_mode_and_use(
+                vmu_var                 :: var_rep,
+                vmu_mode                :: var_mode_rep,
+                vmu_use                 :: var_use_info
+            ).
 
 :- type candidate_par_conjunctions ==
     multi_map(string_proc_label, candidate_par_conjunction_internal).
 
+:- type inner_goals_partition
+    --->    inner_goals_partition(
+                igp_goals               :: list(inner_goal_internal),
+                igp_partition_num       :: int
+            ).
+
+%----------------------------------------------------------------------------%
+%
+% Recurse the call graph searching for parallelisation opportunities.
+%
+
     % candidate_parallel_conjunctions_clique(Opts, Deep, ParentCSCost, 
     %   ParentUsedParallelism, CliquePtr, CandidateParallelConjunctions,
     %   Messages)
@@ -393,9 +471,9 @@ candidate_parallel_conjunctions_call_sit
     CallSiteDesc = CallSiteReport ^ ccsr_call_site_summary ^ perf_row_subject,
     GoalPath = CallSiteDesc ^ csdesc_goal_path,
     parallelism_probability(ProcParallelismCandidates, GoalPath,
-        ParallelismProbability),
-    sub_computation_parallelism(ParallelismProbability, 
-        ParentParallelism, Parallelism), 
+        ParallelismProbability, SubParallelism),
+    sub_computation_parallelism(ParentParallelism, ParallelismProbability,
+        SubParallelism, Parallelism), 
     list.map2(candidate_parallel_conjunctions_callee(Opts, Deep, ProcsAnalysed,
             CliqueIsRecursive, RecursiveCallSiteCostMap, CliqueProcMap,
             CliquePtr, CallSiteDesc, Parallelism),
@@ -405,9 +483,10 @@ candidate_parallel_conjunctions_call_sit
 
 :- pred parallelism_probability(
     multi_map(goal_path_string, candidate_par_conjunction_internal)::in,
-    goal_path::in, probability::out) is det.
+    goal_path::in, probability::out, parallelism_amount::out) is det.
 
-parallelism_probability(Candidates, ConjGoalPath, ParallelismProbability) :-
+parallelism_probability(Candidates, ConjGoalPath, ParallelismProbability,
+        ParallelismAmount) :-
     ( goal_path_remove_last(ConjGoalPath, GoalPath, step_conj(ConjNum)) ->
         StringGoalPath = goal_path_to_string(GoalPath),
         ( multi_map.search(Candidates, StringGoalPath, CandidateList) ->
@@ -415,80 +494,45 @@ parallelism_probability(Candidates, Conj
                 % We assume that we can only ever be a member of one
                 % parallel conjunction.  That is that overlapping
                 % conjunctions are never recommended/seen.
-                promise_equivalent_solutions [Candidate, Conj] (
+                promise_equivalent_solutions [Amount] some [Candidate, Conj] (
                     member(Candidate, CandidateList),
-                    ( Conj = Candidate ^ cpci_par_conjunct_a
-                    ; Conj = Candidate ^ cpci_par_conjunct_b
-                    ),
-                    ConjNum = Conj ^ cpci_conj_num
+                    parallel_amount(Candidate ^ cpci_conjs, Conj, Amount),
+                    ConjNum = Conj ^ igi_conj_num
                 )
             ->
-                Dependence = Candidate ^ cpci_dependence,
-                (
-                    Dependence = conjuncts_are_dependant(_),
-                    ParallelismProbability =
-                        calculate_dependant_probability(Candidate, Conj)
-                ;
-                    Dependence = conjuncts_are_independent,
-                    ParallelismProbability = 
-                        calculate_independent_probability(Candidate, Conj) 
-                )
+                % XXX: Wait until we have the new calculations for how
+                % dependant parallel conjuncts overlap before we bother to
+                % calculate the probability of parallelisation.  For now assume
+                % a pesimistic default.
+                ParallelismProbability = certain,
+                ParallelismAmount = Amount
             ;
-                ParallelismProbability = impossible 
+                ParallelismProbability = impossible,
+                ParallelismAmount = no_parallelism
             )
         ;
             % This callsite is not within a parallel conjunction.
-            ParallelismProbability = impossible
+            ParallelismProbability = impossible,
+            ParallelismAmount = no_parallelism
         )
     ;
         % We may not be able to remove this goal if it is not directly in a
         % conjunction.  Perhaps it's directly within some branch goal or at the
         % root of a goal path.  In these cases it is not in a parallel
         % conjunction so the parallelism probability is 0.0. 
-        ParallelismProbability = impossible
+        ParallelismProbability = impossible,
+        ParallelismAmount = no_parallelism 
     ).
 
-    % Determine the probability that the conjunct executes in parallel with
-    % some other conjunct in the conjunction.  Provided that these two
-    % conjuncts are dependant.
-    %
-:- func calculate_dependant_probability(candidate_par_conjunction_internal,
-    candidate_par_conjunct_internal) = probability.
+:- pred parallel_amount(list(seq_conj_internal)::in, 
+    inner_goal_internal::out, parallelism_amount::out) is nondet.
 
-calculate_dependant_probability(Conjunction, Conj) =
-    % XXX: Implement this after the new dependant conjunction analysis
-    % proposed on Zoltan's talk (Leuven 2009).
-    calculate_independent_probability(Conjunction, Conj).
-
-    % Determine the probability that the conjunct executes in parallel with
-    % some other conjunct in the conjunction.  Provided that these two
-    % conjuncts are dependant.
-    %
-:- func calculate_independent_probability(candidate_par_conjunction_internal,
-    candidate_par_conjunct_internal) = probability.
-
-calculate_independent_probability(Conjunction, Conj) = Prob :-
-    ConjA = Conjunction ^ cpci_par_conjunct_a,
-    ConjB = Conjunction ^ cpci_par_conjunct_b,
-    ConjACPC = ConjA ^ cpci_cost_percall,
-    ConjBCPC = ConjB ^ cpci_cost_percall,
-    % Determine which conjunct's perspective we're looking from then determine
-    % the probability that the other conjunct will be executing.  If this
-    % conjunct is the smaller one then the other one will always be running,
-    % otherwise divide the cost of the other one by the cost of this one.
-    ( ConjA = Conj ->
-        ( ConjACPC < ConjBCPC ->
-            Prob = certain
-        ;
-            Prob = probable(ConjBCPC / ConjACPC)
-        )
-    ;
-        ( ConjBCPC < ConjACPC ->
-            Prob = certain
-        ;
-            Prob = probable(ConjACPC / ConjBCPC)
-        )
-    ).
+parallel_amount(SeqConjs, Conj, Parallelism) :-
+    member(SeqConj, SeqConjs),
+    SeqConj = seq_conj_internal(Conjs),
+    Parallelism0 = some_parallelism(float(length(Conjs))),
+    member(Conj, Conjs),
+    sub_computation_parallelism(Parallelism0, certain, Parallelism).
 
 :- pred candidate_parallel_conjunctions_callee(
     candidate_parallel_conjunctions_opts::in, deep::in, set(proc_desc)::in,
@@ -557,6 +601,10 @@ candidate_parallel_conjunctions_callee(O
     ).
 
 %----------------------------------------------------------------------------%
+%
+% Estimate the cost of the recursive calls under the assumption that current
+% call to this procedure had a particular cost.
+%
 
 :- pred build_recursive_call_site_cost_map(deep::in, clique_proc_report::in,
     clique_ptr::in, proc_cost_csq::in, map(goal_path, cs_cost_csq)::out,
@@ -1004,6 +1052,9 @@ build_recursive_call_site_cost_map_call_
     svmap.det_insert(GoalPath, Cost, !Map).
 
 %----------------------------------------------------------------------------%
+%
+% Search for paralleliation opportunities within a procedure.
+%
 
     % Find candidate parallel conjunctions within the given procedure.
     %
@@ -1017,6 +1068,9 @@ build_recursive_call_site_cost_map_call_
 candidate_parallel_conjunctions_proc(Opts, Deep, CliqueProc,
         RecursiveCallSiteCostMap, Candidates, CandidatesByGoalPath,
         Messages) :-
+    some [!Messages] (
+        !:Messages = cord.empty,
+
     % Lookup the proc static to find the ProcLabel.
     PerfRowData = CliqueProc ^ cpr_proc_summary,
     ProcDesc = PerfRowData ^ perf_row_subject,
@@ -1026,32 +1080,48 @@ candidate_parallel_conjunctions_proc(Opt
     ( CliqueProc ^ cpr_other_proc_dynamics = [_ | _] ->
         append_message(proc(ProcLabel), 
             error_extra_proc_dynamics_in_clique_proc,
-            cord.empty, Messages0)
+                !Messages)
     ;
-        Messages0 = cord.empty
+            true
     ),
     list.foldl(add_call_site_report_to_map,
         CallSites, map.init, CallSitesMap),
     deep_get_progrep_det(Deep, ProgRep),
     ( progrep_search_proc(ProgRep, ProcLabel, ProcRep) ->
         ProcRep ^ pr_defn = ProcDefnRep,
-        ProcDefnRep ^ pdr_goal = Goal,
+            ProcDefnRep ^ pdr_goal = Goal0,
         ProcDefnRep ^ pdr_var_table = VarTable,
         Info = implicit_parallelism_info(Deep, ProgRep, Opts,
             CallSitesMap, RecursiveCallSiteCostMap, VarTable),
-        goal_get_conjunctions_worth_parallelising(Goal, empty_goal_path, Info,
-            ProcLabel, Candidates0, _, _,
-            Messages, initial_inst_map(ProcDefnRep), _),
+            goal_annotate_with_instmap(Goal0, Goal,
+                initial_inst_map(ProcDefnRep), _FinalInstMap,
+                SeenDuplicateInstantiation, _AllVars),
+            goal_get_conjunctions_worth_parallelising(Goal, empty_goal_path,
+                Info, ProcLabel, Candidates0, MessagesA),
+            !:Messages = !.Messages ++ MessagesA,
+            (
+                SeenDuplicateInstantiation =
+                    have_not_seen_duplicate_instantiation,
         list.foldl2(build_candidate_par_conjunction_maps(ProcLabel),
             Candidates0, multi_map.init, Candidates, 
             map.init, CandidatesByGoalPath)
     ;
-        % Builtin procedures cannot be found in the program representation, and
-        % cannot be parallelised either.
+                SeenDuplicateInstantiation = seen_duplicate_instantiation,
+                Candidates = multi_map.init,
+                CandidatesByGoalPath = map.init,
+                append_message(proc(ProcLabel), 
+                    notice_duplicate_instantiation(length(Candidates0)),
+                    !Messages)
+            )
+        ;
+            % Builtin procedures cannot be found in the program representation,
+            % and cannot be parallelised either.
         Candidates = multi_map.init,
         CandidatesByGoalPath = map.init,
         append_message(proc(ProcLabel), warning_cannot_lookup_proc_defn,
-            Messages0, Messages)
+                !Messages)
+        ),
+        Messages = !.Messages
     ).
 
 :- pred build_candidate_par_conjunction_maps(string_proc_label::in,
@@ -1066,335 +1136,608 @@ build_candidate_par_conjunction_maps(Pro
     GoalPath = Candidate ^ cpci_goal_path,
     multi_map.set(!.GPMap, GoalPath, Candidate, !:GPMap).
     
-:- pred goal_get_conjunctions_worth_parallelising(goal_rep::in, goal_path::in,
-    implicit_parallelism_info::in, string_proc_label::in,
+:- pred goal_get_conjunctions_worth_parallelising(goal_rep(inst_map_info)::in, 
+    goal_path::in, implicit_parallelism_info::in, string_proc_label::in,
     list(candidate_par_conjunction_internal)::out,
-    seen_duplicate_instantiation::out, maybe_call_conjunct::out,
-    cord(message)::out, inst_map::in, inst_map::out) 
-    is det.
+    cord(message)::out) is det.
 
 goal_get_conjunctions_worth_parallelising(Goal, GoalPath, Info, ProcLabel, 
-        Candidates, SeenDuplicateInstantiation, MaybeCallConjunct,
-        Messages, !InstMap) :-
-    Goal = goal_rep(GoalExpr, Detism, _),
+        Candidates, Messages) :-
+    Goal = goal_rep(GoalExpr, _, _),
     (
         (
             GoalExpr = conj_rep(Conjuncts),
-            conj_get_conjunctions_worth_parallelising(Conjuncts, GoalPath,
-                Info, ProcLabel, Candidates, SeenDuplicateInstantiation,
-                Messages, !InstMap) 
+            conj_get_conjunctions_worth_parallelising(Conjuncts, GoalPath, 1,
+                Info, ProcLabel, CandidatesA, MessagesA),
+            conj_build_candidate_conjunctions(Conjuncts, GoalPath, 
+                Info, ProcLabel, MessagesB, CandidatesB),
+            Messages = MessagesA ++ MessagesB,
+            Candidates = CandidatesA ++ CandidatesB
         ;
             GoalExpr = disj_rep(Disjuncts),
             disj_get_conjunctions_worth_parallelising(Disjuncts, GoalPath, 1,
-                Info, ProcLabel, Candidates, SeenDuplicateInstantiation,
-                Messages, !InstMap)
+                Info, ProcLabel, Candidates, Messages)
         ;
             GoalExpr = switch_rep(_, _, Cases),
             switch_case_get_conjunctions_worth_parallelising(Cases, GoalPath, 1,
-                Info, ProcLabel, Candidates, SeenDuplicateInstantiation,
-                Messages, !InstMap)
+                Info, ProcLabel, Candidates, Messages)
         ;
             GoalExpr = ite_rep(Cond, Then, Else),
             ite_get_conjunctions_worth_parallelising(Cond, Then, Else,
-                GoalPath, Info, ProcLabel, Candidates,
-                SeenDuplicateInstantiation, Messages, !InstMap)
+                GoalPath, Info, ProcLabel, Candidates, Messages)
         ;
             GoalExpr = scope_rep(SubGoal, MaybeCut),
             ScopeGoalPath = 
                 goal_path_add_at_end(GoalPath, step_scope(MaybeCut)),
             goal_get_conjunctions_worth_parallelising(SubGoal, ScopeGoalPath,
-                Info, ProcLabel, Candidates, SeenDuplicateInstantiation, _,
-                Messages, !InstMap) 
+                Info, ProcLabel, Candidates, Messages) 
         ;
             GoalExpr = negation_rep(SubGoal),
             NegGoalPath = goal_path_add_at_end(GoalPath, step_neg),
             goal_get_conjunctions_worth_parallelising(SubGoal, NegGoalPath,
-                Info, ProcLabel, Candidates, SeenDuplicateInstantiation, _,
-                Messages, !InstMap) 
-        ),
-        % TODO: Parallelising conjunctions like 
-        %   ( call1(A, B) , not call2(C, D) )
-        % may be easy to do when writing the compiler's part of the code, if so
-        % then MaybeCallAboveThreshold should probably be set from some of
-        % these non-atomic goals based on goals within them.
-        MaybeCallConjunct = non_atomic_goal 
-    ;
-        GoalExpr = atomic_goal_rep(_, _, BoundVars, AtomicGoal),
-        InstMapBeforeCall = !.InstMap,
-        % The binding of a variable may depend on any number of other
-        % variables, and recursively the variables that those depended-on
-        % variables depend upon.  
-        % Except that variables involved in control flow (switched on vars,
-        % vars in ITE conds) however this never comes up as for-now we only
-        % consider atomic goals.
-        atomic_goal_get_vars(AtomicGoal, AtomicGoalVars0),
-        list.foldl((pred(Var::in, Set0::in, Set::out) is det :-
-                ( set.remove(Set0, Var, SetPrime) ->
-                    Set = SetPrime
-                ;
-                    Set = Set0
+                Info, ProcLabel, Candidates, Messages) 
                 )
-            ), BoundVars, AtomicGoalVars0, AtomicGoalVars),
-        inst_map_ground_vars(BoundVars, AtomicGoalVars, !InstMap,
-            SeenDuplicateInstantiation),
-        maybe_call_site_conjunct(Info, GoalPath, AtomicGoal, Detism,
-            InstMapBeforeCall, !.InstMap, BoundVars, MaybeCallConjunct),
+    ;
+        GoalExpr = atomic_goal_rep(_, _, _, _),
         Messages = cord.empty,
         Candidates = []
     ).
 
-:- pred conj_get_conjunctions_worth_parallelising(list(goal_rep)::in,
-    goal_path::in, implicit_parallelism_info::in, string_proc_label::in,
+:- pred conj_get_conjunctions_worth_parallelising(
+    list(goal_rep(inst_map_info))::in, goal_path::in, int::in,
+    implicit_parallelism_info::in, string_proc_label::in,
     list(candidate_par_conjunction_internal)::out,
-    seen_duplicate_instantiation::out, cord(message)::out, 
-    inst_map::in, inst_map::out) is det.
-
-conj_get_conjunctions_worth_parallelising(Conjs, GoalPath, Info,
-        ProcLabel, Candidates, SeenDuplicateInstantiation, Messages, 
-        !InstMap) :-
-    some [!Messages] 
-    (
-        % Note: it will be better to look at each pair of conjuncts, determine
-        % if they are parallelisable (perhaps by placing middle goals in either
-        % of the parallel conjuncts to create the optimum amount of
-        % parallelism.  This will need to have an in-order representation of
-        % goals, and for each variable seen have a tree of variables it depends
-        % upon.
-        %
-        % For now consider parallelising conjuncts that separated only by other
-        % atomic goals.
-        conj_get_conjunctions_worth_parallelising_2(Conjs, GoalPath, 1, Info,
-            ProcLabel, Candidates0, CallSiteConjuncts, 
-            SeenDuplicateInstantiation, !:Messages, !InstMap),
-         
-        build_candidate_conjunctions(Info, !.InstMap, GoalPath, ProcLabel,
-            list(CallSiteConjuncts), MessagesBCC, pqueue.init, CandidatesQueue),
-        !:Messages = !.Messages ++ MessagesBCC,
-        % Pick best candidate from queue.
-        (
-            SeenDuplicateInstantiation = have_not_seen_duplicate_instantiation,
-            ( 
-                pqueue.remove(CandidatesQueue, _, Candidate, 
-                    CandidatesQueuePrime)
-            ->
-                Candidates = [Candidate | Candidates0],
-                (
-                    pqueue.length(CandidatesQueuePrime) = Length,
-                    Length > 0
-                ->
-                    append_message(goal(ProcLabel, GoalPath),
-                        notice_extra_callpairs_in_conjunction(Length), 
-                        !Messages)
-                ;
-                    true
-                ),
-                append_message(goal(ProcLabel, GoalPath),
-                    info_found_candidate_conjunction,
-                    !Messages)
-            ;
-                Candidates = Candidates0
-            )
-        ;
-            SeenDuplicateInstantiation = seen_duplicate_instantiation,
-            Candidates = Candidates0,
-            (
-                pqueue.length(CandidatesQueue) = Length,
-                Length >= 1 
-            ->
-                append_message(goal(ProcLabel, GoalPath), 
-                    notice_duplicate_instantiation(Length), 
-                    !Messages)
-            ;
-                true
-            )
-        ),
-        Messages = !.Messages
-    ).
+    cord(message)::out) is det.
 
-:- pred conj_get_conjunctions_worth_parallelising_2(list(goal_rep)::in,
-    goal_path::in, int::in, implicit_parallelism_info::in,
-    string_proc_label::in, list(candidate_par_conjunction_internal)::out,
-    cord(maybe_call_conjunct)::out, seen_duplicate_instantiation::out,
-    cord(message)::out, inst_map::in, inst_map::out) is det.
-
-conj_get_conjunctions_worth_parallelising_2([], _, _, _, _, [], cord.empty, 
-        have_not_seen_duplicate_instantiation, cord.empty, !InstMap).
-conj_get_conjunctions_worth_parallelising_2([Conj | Conjs], GoalPath,
-        ConjunctNum, Info, ProcLabel, Candidates, CallSiteConjuncts,
-        SeenDuplicateInstantiation, Messages, !InstMap) :-
+conj_get_conjunctions_worth_parallelising([], _, _, _, _, [], cord.empty).
+conj_get_conjunctions_worth_parallelising([Conj | Conjs], GoalPath,
+        ConjunctNum, Info, ProcLabel, Candidates, Messages) :-
     ConjGoalPath = goal_path_add_at_end(GoalPath, step_conj(ConjunctNum)),
     goal_get_conjunctions_worth_parallelising(Conj, ConjGoalPath, Info,
-        ProcLabel, CandidatesHead, SeenDuplicateInstantiationHead,
-        MaybeCallSiteConjunct, MessagesHead, !InstMap), 
+        ProcLabel, CandidatesHead, MessagesHead), 
     
-    conj_get_conjunctions_worth_parallelising_2(Conjs, GoalPath, ConjunctNum+1,
-        Info, ProcLabel, CandidatesTail, CallSiteConjuncts0,
-        SeenDuplicateInstantiationTail, MessagesTail, !InstMap),
+    conj_get_conjunctions_worth_parallelising(Conjs, GoalPath, ConjunctNum+1,
+        Info, ProcLabel, CandidatesTail, MessagesTail),
 
     Candidates = CandidatesHead ++ CandidatesTail,
-    Messages = MessagesHead ++ MessagesTail,
-    CallSiteConjuncts = cord.cons(MaybeCallSiteConjunct, CallSiteConjuncts0),
-    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
-        SeenDuplicateInstantiationHead,
-        SeenDuplicateInstantiationTail).
+    Messages = MessagesHead ++ MessagesTail.
 
-:- pred disj_get_conjunctions_worth_parallelising(list(goal_rep)::in,
-    goal_path::in, int::in, implicit_parallelism_info::in, 
-    string_proc_label::in, list(candidate_par_conjunction_internal)::out,
-    seen_duplicate_instantiation::out, cord(message)::out,
-    inst_map::in, inst_map::out) is det.
+:- pred disj_get_conjunctions_worth_parallelising(
+    list(goal_rep(inst_map_info))::in, goal_path::in, int::in,
+    implicit_parallelism_info::in, string_proc_label::in,
+    list(candidate_par_conjunction_internal)::out, cord(message)::out) is det.
 
-disj_get_conjunctions_worth_parallelising([], _, _, _, _, [],
-    have_not_seen_duplicate_instantiation, cord.empty, !InstMap).
+disj_get_conjunctions_worth_parallelising([], _, _, _, _, [], cord.empty).
 disj_get_conjunctions_worth_parallelising([Disj | Disjs], GoalPath, DisjNum,
-        Info, ProcLabel, Candidates, SeenDuplicateInstantiation, 
-        Messages, InstMap0, InstMap) :-
+        Info, ProcLabel, Candidates, Messages) :-
     DisjGoalPath = goal_path_add_at_end(GoalPath, step_disj(DisjNum)),
-    HeadDetism = Disj ^ goal_detism_rep,
     goal_get_conjunctions_worth_parallelising(Disj, DisjGoalPath, Info,
-        ProcLabel, HeadCandidates, HeadSeenDuplicateInstantiation, 
-        _MaybeCallConjunct, HeadMessages, InstMap0, HeadInstMap),
+        ProcLabel, HeadCandidates, HeadMessages),
     disj_get_conjunctions_worth_parallelising(Disjs, GoalPath, DisjNum + 1,
-        Info, ProcLabel, TailCandidates, TailSeenDuplicateInstantiation,
-        TailMessages, InstMap0, TailInstMap),
+        Info, ProcLabel, TailCandidates, TailMessages),
     Candidates = HeadCandidates ++ TailCandidates,
-    Messages = HeadMessages ++ TailMessages,
-    % merge_inst_map requires the detism of goals that produce both inst maps,
-    % we can create fake values that satisfy merge_inst_map easily.
-    (
-        Disjs = [],
-        TailDetism = failure_rep
-    ;
-        Disjs = [_ | _],
-        TailDetism = det_rep
-    ),
-    InstMap = merge_inst_map(HeadInstMap, HeadDetism, TailInstMap, TailDetism),
-    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
-        HeadSeenDuplicateInstantiation,
-        TailSeenDuplicateInstantiation).
+    Messages = HeadMessages ++ TailMessages.
 
-:- pred switch_case_get_conjunctions_worth_parallelising(list(case_rep)::in,
-    goal_path::in, int::in, implicit_parallelism_info::in,
-    string_proc_label::in, list(candidate_par_conjunction_internal)::out,
-    seen_duplicate_instantiation::out, cord(message)::out, 
-    inst_map::in, inst_map::out) is det.
+:- pred switch_case_get_conjunctions_worth_parallelising(
+    list(case_rep(inst_map_info))::in, goal_path::in, int::in,
+    implicit_parallelism_info::in, string_proc_label::in,
+    list(candidate_par_conjunction_internal)::out, cord(message)::out) is det.
 
 switch_case_get_conjunctions_worth_parallelising([], _, _, _, _, [],
-    have_not_seen_duplicate_instantiation, cord.empty, !InstMap).
+        cord.empty).
 switch_case_get_conjunctions_worth_parallelising([Case | Cases], GoalPath,
-        CaseNum, Info, ProcLabel, Candidates, SeenDuplicateInstantiation,
-        Messages, InstMap0, InstMap) :-
+        CaseNum, Info, ProcLabel, Candidates, Messages) :-
     Case = case_rep(_, _, Goal),
-    HeadDetism = Goal ^ goal_detism_rep,
     CaseGoalPath = goal_path_add_at_end(GoalPath, step_switch(CaseNum, no)),
     goal_get_conjunctions_worth_parallelising(Goal, CaseGoalPath, Info,
-        ProcLabel, HeadCandidates, HeadSeenDuplicateInstantiation, 
-        _MaybeCallConjs, HeadMessages, InstMap0, HeadInstMap),
+        ProcLabel, HeadCandidates, HeadMessages),
     switch_case_get_conjunctions_worth_parallelising(Cases, GoalPath, 
         CaseNum + 1, Info, ProcLabel, TailCandidates,
-        TailSeenDuplicateInstantiation, TailMessages, InstMap0, TailInstMap),
+        TailMessages),
     Candidates = HeadCandidates ++ TailCandidates,
-    Messages = HeadMessages ++ TailMessages,
-    % merge_inst_map requires the detism of goals that produce both inst maps,
-    % we can create fake values that satisfy merge_inst_map easily.
-    (
-        Cases = [],
-        TailDetism = failure_rep
-    ;
-        Cases = [_ | _],
-        TailDetism = det_rep
-    ),
-    InstMap = merge_inst_map(HeadInstMap, HeadDetism, TailInstMap, TailDetism),
-    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
-        HeadSeenDuplicateInstantiation,
-        TailSeenDuplicateInstantiation).
+    Messages = HeadMessages ++ TailMessages.
 
-:- pred ite_get_conjunctions_worth_parallelising(goal_rep::in, goal_rep::in,
-    goal_rep::in, goal_path::in, implicit_parallelism_info::in,
-    string_proc_label::in, list(candidate_par_conjunction_internal)::out,
-    seen_duplicate_instantiation::out, cord(message)::out,
-    inst_map::in, inst_map::out) is det.
+:- pred ite_get_conjunctions_worth_parallelising(goal_rep(inst_map_info)::in,
+    goal_rep(inst_map_info)::in, goal_rep(inst_map_info)::in, goal_path::in,
+    implicit_parallelism_info::in, string_proc_label::in,
+    list(candidate_par_conjunction_internal)::out, cord(message)::out) is det.
 
 ite_get_conjunctions_worth_parallelising(Cond, Then, Else, GoalPath, Info,
-        ProcLabel, Candidates, SeenDuplicateInstantiation, Messages, !InstMap) :-
+        ProcLabel, Candidates, Messages) :-
     CondGoalPath = goal_path_add_at_end(GoalPath, step_ite_cond),
     ThenGoalPath = goal_path_add_at_end(GoalPath, step_ite_then),
     ElseGoalPath = goal_path_add_at_end(GoalPath, step_ite_else),
     goal_get_conjunctions_worth_parallelising(Cond, CondGoalPath, Info,
-        ProcLabel, CondCandidates, CondSeenDuplicateInstantiation, _,
-        CondMessages, !.InstMap, PostCondInstMap),
+        ProcLabel, CondCandidates, CondMessages),
     goal_get_conjunctions_worth_parallelising(Then, ThenGoalPath, Info, 
-        ProcLabel, ThenCandidates, ThenSeenDuplicateInstantiation, _,
-        ThenMessages, PostCondInstMap, PostThenInstMap),
+        ProcLabel, ThenCandidates, ThenMessages),
     goal_get_conjunctions_worth_parallelising(Else, ElseGoalPath, Info, 
-        ProcLabel, ElseCandidates, ElseSeenDuplicateInstantiation, _, 
-        ElseMessages, PostCondInstMap, PostElseInstMap),
+        ProcLabel, ElseCandidates, ElseMessages),
     Candidates = CondCandidates ++ ThenCandidates ++ ElseCandidates,
+    Messages = CondMessages ++ ThenMessages ++ ElseMessages.
+
+    % At the end of every conjunction we call this predicate to check the list
+    % of calls we've found and make any parallelisation decisions.
+    %
+:- pred conj_build_candidate_conjunctions(list(goal_rep(inst_map_info))::in, 
+    goal_path::in, implicit_parallelism_info::in, string_proc_label::in,
+    cord(message)::out, list(candidate_par_conjunction_internal)::out)
+    is det.
+
+conj_build_candidate_conjunctions(Conjs, GoalPath, Info, ProcLabel, Messages,
+        Candidates) :-
+    some [!Messages] 
+    (
+        !:Messages = cord.empty,
+        Location = goal(ProcLabel, GoalPath),
+
+        conj_to_inner_goal_list(Conjs, GoalPath, 1, Info, [], InnerGoals, 0,
+            NumCostlyCalls),
+        ( NumCostlyCalls > 1 -> 
+            append_message(Location,
+                info_found_conjs_above_callsite_threshold(NumCostlyCalls),
+                !Messages), 
+            build_dependency_maps(InnerGoals, DependencyMaps),
+            % We don't parallelise across non-atomic goals, so split a list of
+            % inner goals into partitions where non-atomic goals seperate the
+            % partitions.
+            partition_inner_goals(Location, InnerGoals, [], _, 
+                1, _NumPartitions, 0, _, [], PartitionedInnerGoals, !Messages),
+            map(innergoals_build_candidate_conjunction(Info, 
+                    DependencyMaps, Location, GoalPath), 
+                PartitionedInnerGoals, MaybeCandidates),
+            filter_map(maybe_is_yes, MaybeCandidates, Candidates),
+            append_message(Location,
+                info_found_n_conjunctions_with_positive_speedup(
+                    length(Candidates)), !Messages)
+        ;
+            Candidates = []
+        ),
+        Messages = !.Messages
+    ).
+
+:- pred partition_inner_goals(program_location::in, 
+    list(inner_goal_internal)::in,
+    list(inner_goal_internal)::in, list(inner_goal_internal)::out,
+    int::in, int::out, int::in, int::out,
+    list(inner_goals_partition)::in, list(inner_goals_partition)::out,
+    cord(message)::in, cord(message)::out) is det.
+
+partition_inner_goals(Location, [], !Partition, !PartitionNum, !NumCostlyCalls,
+        !Partitions, !Messages) :-
+    ( !.NumCostlyCalls > 1 ->
+        Partition = 
+            inner_goals_partition(reverse(!.Partition), !.PartitionNum),
+        !:Partitions = [ Partition | !.Partitions ]
+    ;
+        true     
+    ),
+    ( !.PartitionNum \= 1 ->
+        append_message(Location,
+            info_split_conjunction_into_partitions(!.PartitionNum), !Messages)
+    ;
+        true
+    ),
+    reverse(!Partitions).
+partition_inner_goals(Location, [ IG | IGs ], !Partition, !PartitionNum,
+        !NumCostlyCalls, !Partitions, !Messages) :-
+    IGType = IG ^ igi_ig_type,
+    (
+        (
+            IGType = igt_call(_, _, _, _, _),
+            !:NumCostlyCalls = !.NumCostlyCalls + 1
+        ;
+            IGType = igt_cheap_call(_, _, _, _)
+        ;
+            IGType = igt_other_atomic_goal
+        ),
+        !:Partition = [ IG | !.Partition ]
+    ;
+        IGType = igt_non_atomic_goal,
+        ( !.NumCostlyCalls > 1 ->
+            Partition = 
+                inner_goals_partition(reverse(!.Partition), !.PartitionNum),
+            !:Partitions = [ Partition | !.Partitions ]
+        ;
+            append_message(Location,
+                notice_partition_does_not_have_costly_calls(!.PartitionNum,
+                    !.NumCostlyCalls), !Messages)
+        ),
+        !:PartitionNum = !.PartitionNum + 1,
+        !:NumCostlyCalls = 0,
+        !:Partition = [] 
+    ),
+    partition_inner_goals(Location, IGs, !Partition, !PartitionNum,
+        !NumCostlyCalls, !Partitions, !Messages).
+
+:- pred innergoals_build_candidate_conjunction(implicit_parallelism_info::in,
+    dependency_maps::in, program_location::in, goal_path::in,
+    inner_goals_partition::in, maybe(candidate_par_conjunction_internal)::out)
+    is det.
+
+innergoals_build_candidate_conjunction(_Info, DependencyMaps, Location, GoalPath,
+        InnerGoalsPartition, MaybeCandidate) :-
+    % Setting up the first parallel conjunct is a different algorithm to the
+    % latter ones, at this point we have the option of moving goals from before
+    % the first costly call to either before or during the parallel
+    % conjunction.  Executing them during the parallel conjunction can be more
+    % efficient.  However if goals within other parallel conjuncts depend on
+    % them and don't depend upon the first costly call then this would make the
+    % conjunction dependant when it could be independent.
+    inner_goals_partition(InnerGoalsList, PartNum) = InnerGoalsPartition,
+    InnerGoals = cord.from_list(InnerGoalsList),
+    find_costly_call(InnerGoals, cord.empty, GoalsBefore0, MaybeFirstCall,
+        GoalsDuringAfter),
+    (
+        MaybeFirstCall = yes(FirstCall),
+        FirstCallType = FirstCall ^ igi_ig_type,
+        (
+            FirstCallType = igt_call(_, _, _, _, FirstCallCallSite)
+        ;
+            ( FirstCallType = igt_cheap_call(_, _, _, _)
+            ; FirstCallType = igt_other_atomic_goal
+            ; FirstCallType = igt_non_atomic_goal
+            ),
+            location_to_string(Location, LocationString),
+            error(format(
+                "%sFirst call in conjunction is not a call in %s partition %d",
+                [s(this_file), s(LocationString), i(PartNum)]))
+        )
+    ;
+        MaybeFirstCall = no,
+        location_to_string(Location, LocationString),
+        error(format(
+            "%sCouldn't find first call in conjunction in %s partition %d",
+            [s(this_file), s(LocationString), i(PartNum)]))
+
+    ),
+    build_first_seq_conjunction(DependencyMaps, 
+        FirstCall ^ igi_conj_num, length(GoalsBefore0),
+        length(GoalsDuringAfter), GoalsBefore0,
+        cord.singleton(FirstCall), FirstSeqConjunction0, GoalsBefore),
+    ( get_first(FirstSeqConjunction0, FirstConj) ->
+        FirstConjNumOfFirstParConjunct = FirstConj ^ igi_conj_num
+    ;
+        error(this_file ++ "Found empty first parallel conjunct")
+    ),
+    innergoals_build_par_conjs(GoalsDuringAfter, DependencyMaps,
+        FirstConjNumOfFirstParConjunct, FirstSeqConjunction0, GoalsAfter,
+        cord.empty, ParConjsCord, 0.0, ParConjCost, 
+        conjuncts_are_independent, IsDependant),
+    
+    % Calculate Speedup.
+    foldl_pred(innergoal_calc_cost, InnerGoals, 0.0, SequentialCost),
+    foldl_pred(innergoal_calc_cost, GoalsBefore, 0.0, GoalsBeforeCost),
+    foldl_pred(innergoal_calc_cost, GoalsAfter, 0.0, GoalsAfterCost),
+    ParallelCost = GoalsBeforeCost + GoalsAfterCost + ParConjCost,
+    NumCalls = FirstCallCallSite ^ ccsr_call_site_summary ^ perf_row_calls,
+    Speedup = (SequentialCost - ParallelCost) * float(NumCalls), 
     (
-        CondSeenDuplicateInstantiation = have_not_seen_duplicate_instantiation,
-        ThenSeenDuplicateInstantiation = have_not_seen_duplicate_instantiation,
-        ElseSeenDuplicateInstantiation = have_not_seen_duplicate_instantiation
+        length(ParConjsCord) > 1, 
+        Speedup > 0.0
     ->
-        SeenDuplicateInstantiation = have_not_seen_duplicate_instantiation
+        ParConjs = list(ParConjsCord),
+        MaybeCandidate = yes(candidate_par_conjunction_internal(
+            goal_path_to_string(GoalPath), PartNum, IsDependant, ParConjs,
+            Speedup))
+    ;
+        MaybeCandidate = no
+    ).
+
+:- pred build_first_seq_conjunction(dependency_maps::in, 
+    int::in, int::in, int::in, cord(inner_goal_internal)::in, 
+    cord(inner_goal_internal)::in, cord(inner_goal_internal)::out,
+    cord(inner_goal_internal)::out) is det. 
+
+build_first_seq_conjunction(DependencyMaps, CallConjNum, NumGoalsBefore,
+        NumGoalsAfter, Goals0, !FirstSeqConjunction, GoalsBefore) :-
+    % Move over goals in reverse order.
+    ( split_last(Goals0, Goals, Goal) ->
+        ConjNum = Goal ^ igi_conj_num,
+        depends_lookup_rev(DependencyMaps, ConjNum, GoalRevDeps),
+        depends_lookup_rev(DependencyMaps, CallConjNum, CallRevDeps),
+        (
+            % If later goals depend on this goal but not on the first call.
+            member(LaterGoalNum, (CallConjNum+1)..(CallConjNum+NumGoalsAfter)),
+            (
+                member(LaterGoalNum, GoalRevDeps)
+            =>
+                not member(LaterGoalNum, CallRevDeps)
+            )
+        ->
+            % Then this goal and all those before it must be in GoalsBefore.
+            % This can be pessimistic but we're not allowed to re-order goals.
+            GoalsBefore = Goals0
+        ;
+            % This goal may be parallelised as part of the first conjunction.
+            !:FirstSeqConjunction = cons(Goal, !.FirstSeqConjunction),
+            build_first_seq_conjunction(DependencyMaps, CallConjNum,
+                NumGoalsBefore, NumGoalsAfter, Goals, !FirstSeqConjunction,
+                GoalsBefore)
+        )
+    ;
+        GoalsBefore = cord.empty
+    ).
+
+:- pred innergoals_build_par_conjs(cord(inner_goal_internal)::in,
+    dependency_maps::in, int::in,
+    cord(inner_goal_internal)::in, cord(inner_goal_internal)::out,
+    cord(seq_conj_internal)::in, cord(seq_conj_internal)::out,
+    float::in, float::out, 
+    conjuncts_are_dependant::in, conjuncts_are_dependant::out) is det.
+
+innergoals_build_par_conjs(Goals0, DependencyMaps,
+        FirstConjNumOfFirstParConjunct, CurSeqConjunction0, GoalsAfter,
+        !ParConjs, !Cost, !ConjsAreDependant) :-
+    % Find the next costly call.
+    find_costly_call(Goals0, cord.empty, GoalsBefore, MaybeNextCall, 
+        Goals),
+    (
+        MaybeNextCall = yes(NextCall),
+        
+        % XXX: This seems to work, but it's terrible, we need to implement the
+        % algorithm discussed in Zoltan's lecture slides and use my work to
+        % compute when things may be produced and consumed in dependant
+        % parallel conjunctions. -pbone
+        % XXX: At the very least we need to compute !Cost to reduce false
+        % positives.
+        CurSeqConjunction = CurSeqConjunction0,
+        NewSeqConjunction = snoc(GoalsBefore, NextCall),
+      
+        % Has the conjunction become dependant.
+        map_pred(ig_get_conj_num, CurSeqConjunction, CurSeqConjNumsCord),
+        CurSeqConjNumsSet = set(list(CurSeqConjNumsCord)),
+        map_pred(ig_get_conj_num, NewSeqConjunction, NewSeqConjNumsCord),
+        (
+            !.ConjsAreDependant = conjuncts_are_independent,
+            member(NewSeqConjNum, NewSeqConjNumsCord),
+            depends_lookup(DependencyMaps, NewSeqConjNum, Dependencies),
+            intersect(Dependencies, CurSeqConjNumsSet, Intersection),
+            not set.empty(Intersection)
+        ->
+            !:ConjsAreDependant = conjuncts_are_dependant
     ;
-        SeenDuplicateInstantiation = seen_duplicate_instantiation
+            true
     ),
-    Messages = CondMessages ++ ThenMessages ++ ElseMessages,
-    ThenDetism = Then ^ goal_detism_rep,
-    ElseDetism = Else ^ goal_detism_rep,
-    !:InstMap = merge_inst_map(PostThenInstMap, ThenDetism, 
-        PostElseInstMap, ElseDetism).
 
-    % This type represents a goal, if the goal is a call extra information used
-    % for parallelising it with another call is provided.
-    %
-    % This is similar to candidate_parallel_conjunct, except that it stores
-    % information that's used for the rest of the implicit parallelism
-    % analysis.  It must contain information for the following.
-    %
-    %   - Average cost information for this call site,
-    %
-    %   - Enough information to resolve the procedure call, (detism and
-    %     argument modes).
-    %
-    %   - Information that can be used to determine if this is part of a
-    %     dependant conjunction, and its role in the dependant conjunction.
-    %
-    %   - Enough information so that a candidate_par_conjuct structure can be
-    %     constructed from it and the deep profiling info.
-    %
-:- type maybe_call_conjunct 
-    --->    call(
-                mccc_callee             :: maybe(pair(string, string)),
-                mccc_detism             :: detism_rep,
-                mccc_args               :: list(var_mode_and_use),
-                mccc_call_site          :: clique_call_site_report
+        SeqConjunction = seq_conj_internal(list(CurSeqConjunction)),
+        !:ParConjs = snoc(!.ParConjs, SeqConjunction),
+        innergoals_build_par_conjs(Goals, DependencyMaps,
+            FirstConjNumOfFirstParConjunct, NewSeqConjunction, GoalsAfter,
+            !ParConjs, !Cost, !ConjsAreDependant)
+    ;
+        MaybeNextCall = no,
+        ( cord.get_first(CurSeqConjunction0, FirstGoalInLastConjunct) ->
+            FirstConjNumOfLastParConjunct = 
+                FirstGoalInLastConjunct ^ igi_conj_num
+        ; 
+            error(this_file ++ " empty parallel conjunct")
+        ),
+        % Because this is the last parallel conjunction we have the
+        % option of putting some remaining goals into the parallel conjunction
+        % as part of the last parallel conjunct.
+        sorted_list_to_set(
+            FirstConjNumOfFirstParConjunct..(FirstConjNumOfLastParConjunct-1),
+            GoalsInOtherParConjuncts),
+        build_last_seq_conjunction(Goals0, DependencyMaps,
+            GoalsInOtherParConjuncts, GoalsAfter, 
+            cord.empty, GoalsInLastConjunct),
+        SeqConjunction = seq_conj_internal(list(
+            CurSeqConjunction0 ++ GoalsInLastConjunct)),
+        !:ParConjs = snoc(!.ParConjs, SeqConjunction)
+    ).
+
+:- pred find_costly_call(cord(inner_goal_internal)::in,
+    cord(inner_goal_internal)::in, cord(inner_goal_internal)::out,
+    maybe(inner_goal_internal)::out,
+    cord(inner_goal_internal)::out) is det.
+
+find_costly_call(Goals, !GoalsBefore, MaybeCall, GoalsAfter) :-
+    ( head_tail(Goals, Goal, GoalsTail) ->
+        GoalType = Goal ^ igi_ig_type,
+        ( 
+            GoalType = igt_call(_, _, _, _, _),
+            MaybeCall = yes(Goal),
+            GoalsAfter = GoalsTail
+        ;
+            ( GoalType = igt_cheap_call(_, _, _, _)
+            ; GoalType = igt_other_atomic_goal
+            ),
+            !:GoalsBefore = snoc(!.GoalsBefore, Goal),
+            find_costly_call(GoalsTail, !GoalsBefore, MaybeCall, GoalsAfter)
+        ;
+            GoalType = igt_non_atomic_goal,
+            error(this_file ++ "Found non-atomic goal")
+        )
+    ;
+        MaybeCall = no,
+        GoalsAfter = cord.empty
+    ).
+
+:- pred build_last_seq_conjunction(cord(inner_goal_internal)::in, 
+    dependency_maps::in, set(int)::in, cord(inner_goal_internal)::out,
+    cord(inner_goal_internal)::in, cord(inner_goal_internal)::out) is det.
+
+build_last_seq_conjunction(Goals0, DependencyMaps, GoalsInOtherParConjuncts, 
+        GoalsAfter, !GoalsInLastConjunct) :-
+    ( head_tail(Goals0, Goal, Goals) ->
+        ConjNum = Goal ^ igi_conj_num,
+        depends_lookup(DependencyMaps, ConjNum, Dependencies),
+        intersect(Dependencies, GoalsInOtherParConjuncts, Intersection),
+        (
+            % The goal is dependant apon a goal in a previous parallel conjunct.
+            not set.empty(Intersection) => 
+            % But not dependant upon a goal in the current parallel conjunct.
+            (
+                map_pred(ig_get_conj_num, !.GoalsInLastConjunct, 
+                    ConjNumsInLastConjunct),
+                intersect(Dependencies, set(list(ConjNumsInLastConjunct)),
+                    Intersection2),
+                set.empty(Intersection2)
             )
-    ;       trivial_atomic_goal(
-                mccag_detism            :: detism_rep,
-                mccag_bound_vars        :: list(var_rep)
+        ->
+            % This goal and all those after it must be placed after the
+            % conjunction.
+            GoalsAfter = Goals0
+        ;
+            % This goal does not depend on goals in other parallel conjuncts,
+            % we can parallelise it here without introducing an extra future.
+            !:GoalsInLastConjunct = snoc(!.GoalsInLastConjunct, Goal),
+            build_last_seq_conjunction(Goals, DependencyMaps,
+                GoalsInOtherParConjuncts, GoalsAfter, !GoalsInLastConjunct)
             )
-    ;       non_atomic_goal.
+    ;
+        GoalsAfter = cord.empty
+    ).
 
-:- inst call
-    --->    call(ground, ground, ground, ground).
+:- pred ig_get_conj_num(inner_goal_internal::in, int::out) is det.
 
-    % A variable, it's mode and it's usage in the callee.  The mode information
-    % is also summarised within the variable use information.
-    %
-:- type var_mode_and_use
-    --->    var_mode_and_use(
-                vmu_var                 :: var_rep,
-                vmu_mode                :: var_mode_rep,
-                vmu_use                 :: var_use_info
+ig_get_conj_num(IG, IG ^ igi_conj_num).
+
+:- pred innergoal_calc_cost(inner_goal_internal::in, float::in, float::out) 
+    is det.
+
+innergoal_calc_cost(Goal, !Cost) :-
+    GoalType = Goal ^ igi_ig_type,
+    (
+        GoalType = igt_call(_, _, CostPercall, _, _),
+        !:Cost = !.Cost + CostPercall
+    ; 
+        GoalType = igt_cheap_call(_, _, _, Cost),
+        ( cs_cost_get_calls(Cost) > 0.0 ->
+            !:Cost = !.Cost + cs_cost_get_percall(Cost)
+        ;
+            % Goals that are never called have no cost
+            true
+        )
+    ;
+        GoalType = igt_other_atomic_goal
+    ;
+        GoalType = igt_non_atomic_goal,
+        error(this_file ++ "unexpected non atomic goal")
             ).
 
-:- pred maybe_call_site_conjunct(implicit_parallelism_info::in, goal_path::in,
-    atomic_goal_rep::in, detism_rep::in, inst_map::in, inst_map::in,
-    list(var_rep)::in, maybe_call_conjunct::out) is det.
+:- type dependency_maps
+    ---> dependency_maps(
+            dm_depends_on           :: map(int, set(int)),
+                % This map maps from conjunct numbers to the conjunct numbers
+                % of goals that they depend upon.
+
+            dm_is_depended_on_by    :: map(int, set(int))
+                % This is the reverse dependency map.  It maps from a dependee
+                % to conjunct numbers that depend on this goal.
+         ).
+
+:- pred build_dependency_maps(list(inner_goal_internal)::in, 
+    dependency_maps::out) is det.
+
+build_dependency_maps(InnerGoals, Maps) :-
+    length(InnerGoals, InnerGoalsLen),
+    % Both maps are initialised equally.
+    fold_up(insert_empty_set, 1, InnerGoalsLen, map.init, InitialisedMap), 
+    build_dependency_map(InnerGoals, 1, map.init, _VarDepMap, 
+        InitialisedMap, Map, InitialisedMap, RevMap),
+    Maps = dependency_maps(Map, RevMap).
+
+:- pred depends_lookup(dependency_maps::in, int::in, set(int)::out) is det.
+
+depends_lookup(DependencyMaps, GoalNum, Dependencies) :-
+    Map = DependencyMaps ^ dm_depends_on,
+    lookup(Map, GoalNum, Dependencies).
+
+:- pred depends_lookup_rev(dependency_maps::in, int::in, set(int)::out) is det.
+
+depends_lookup_rev(DependencyMaps, GoalNum, Dependencies) :-
+    RevMap = DependencyMaps ^ dm_is_depended_on_by,
+    lookup(RevMap, GoalNum, Dependencies).
+
+:- pred insert_empty_set(int::in, 
+    map(int, set(int))::in, map(int, set(int))::out) is det.
+
+insert_empty_set(K, !Map) :-
+    svmap.det_insert(K, set.init, !Map).
+
+:- pred build_dependency_map(list(inner_goal_internal)::in, int::in, 
+    map(var_rep, set(int))::in, map(var_rep, set(int))::out,
+    map(int, set(int))::in, map(int, set(int))::out,
+    map(int, set(int))::in, map(int, set(int))::out) is det.
+
+build_dependency_map([], _ConjNum, !VarDepMap, !Map, !RevMap).
+build_dependency_map([IG | IGs], ConjNum, !VarDepMap, !Map, !RevMap) :-
+    InstMapInfo = IG ^ igi_inst_map_info,
+    
+    % For each variable consumed by a goal we find out which goals instantiate
+    % that variable and add them as it's dependencies along with their
+    % dependencies.  NOTE: We only consider variables that are read
+    % and not those that are set.  This is safe because we only bother
+    % analysing single assignment code.
+    AllVars = InstMapInfo ^ im_all_vars,
+    set.difference(AllVars, InstVars, RefedVars), 
+    list.map((pred(RefedVar::in, DepConjsI::out) is det :-
+        map.search(!.VarDepMap, RefedVar, DepConjsPrime) -> 
+            DepConjsI = DepConjsPrime
+        ;
+            % If we consume a variable that we don't know the producer of, it
+            % may be a parameter to the procedure or have been produced by a
+            % goal outside of this conjunction.  Add an empty set of
+            % dependencies.
+            set.init(DepConjsI)
+        ), to_sorted_list(RefedVars), DepConjss),
+    DepConjs = union_list(DepConjss),
+    fold(transitive_map_insert(ConjNum), DepConjs, !Map),
+    fold((pred(K::in, MapI0::in, MapI::out) is det :-
+            transitive_map_insert(K, ConjNum, MapI0, MapI)
+        ), DepConjs, !RevMap), 
+    
+    % For each variable instantiated by this goal add it to the VarDepMap with
+    % this goal as it's instantiator.  That is a maping from the variable to
+    % the conj num.  The var to conjnum map is not transitive.
+    calc_inst_map_delta(InstMapInfo ^ im_before, InstMapInfo ^ im_after, 
+        InstMapDelta),
+    inst_map_delta_get_var_set(InstMapDelta, InstVars),
+    fold(add_var_to_var_dep_map(ConjNum), InstVars, !VarDepMap),
+
+    build_dependency_map(IGs, ConjNum + 1, !VarDepMap, !Map, !RevMap).
+
+:- pred add_var_to_var_dep_map(int::in, var_rep::in, 
+    map(var_rep, set(int))::in, map(var_rep, set(int))::out) is det. 
+
+add_var_to_var_dep_map(ConjNum, Var, !VarDepMap) :-
+    ( map.search(!.VarDepMap, Var, ConjNums0) ->
+        % This is a multiple instantiation.
+        svmap.det_update(Var, insert(ConjNums0, ConjNum), !VarDepMap)
+    ;
+        singleton_set(ConjNums, ConjNum),
+        svmap.det_insert(Var, ConjNums, !VarDepMap)
+    ).
+
+    % Check if it is appropriate to parallelise this call.  That is it must be
+    % model_det and have a cost above the call site cost threshold.
+    %
+:- pred can_parallelise_call(implicit_parallelism_info::in,
+    detism_rep::in, clique_call_site_report::in) is semidet.
+
+can_parallelise_call(Info, Detism, CallSiteReport) :-
+    ( Detism = det_rep
+    ; Detism = cc_multidet_rep ),
+    CallSiteCost = get_call_site_cost(Info, CallSiteReport),
+    ( cs_cost_get_calls(CallSiteCost) > 0.0 ->
+        % This is conditional so that we can gauretee that it never causes a
+        % divide by zero error,
+        PercallCost = cs_cost_get_percall(CallSiteCost),
+        PercallCost > float(Info ^ ipi_opts ^ cpc_call_site_threshold)
+    ;
+        fail 
+    ).
 
-maybe_call_site_conjunct(Info, GoalPath, AtomicGoal, Detism,
-        InstMapBefore, InstMapAfter, BoundVars, MaybeCallConjunct) :-
+:- pred maybe_costly_call(implicit_parallelism_info::in, goal_path::in,
+    atomic_goal_rep::in, detism_rep::in, inst_map_info::in,
+    inner_goal_type::out(igt_atomic_goal)) is det.
+
+maybe_costly_call(Info, GoalPath, AtomicGoal, Detism,
+        InstMapInfo, InnerGoalType) :-
+    InstMapBefore = InstMapInfo ^ im_before,
+    InstMapAfter = InstMapInfo ^ im_after,
     (
         ( AtomicGoal = unify_construct_rep(_, _, _)
         ; AtomicGoal = unify_deconstruct_rep(_, _, _)
@@ -1408,7 +1751,7 @@ maybe_call_site_conjunct(Info, GoalPath,
         ; AtomicGoal = builtin_call_rep(_, _, _)
         ; AtomicGoal = event_call_rep(_, _)
         ),
-        MaybeCallConjunct = trivial_atomic_goal(Detism, BoundVars) 
+        InnerGoalType = igt_other_atomic_goal 
     ;
         ( AtomicGoal = higher_order_call_rep(_, Args)
         ; AtomicGoal = method_call_rep(_, _, Args)
@@ -1418,10 +1761,10 @@ maybe_call_site_conjunct(Info, GoalPath,
             ( AtomicGoal = higher_order_call_rep(_, _)
             ; AtomicGoal = method_call_rep(_, _, _)
             ),
-            MaybeCallee = no
+            Callee = unknown_callee 
         ; 
             AtomicGoal = plain_call_rep(ModuleName, CalleeName, _),
-            MaybeCallee = yes(ModuleName - CalleeName)
+            Callee = named_callee(ModuleName, CalleeName)
         ),
         map.lookup(Info ^ ipi_call_sites, GoalPath, CallSite),
         % Lookup var use information.
@@ -1450,8 +1793,18 @@ maybe_call_site_conjunct(Info, GoalPath,
                         VarUseInfo)
                 ), Args, VarModeAndUses)
         ),
-        MaybeCallConjunct = 
-            call(MaybeCallee, Detism, VarModeAndUses, CallSite)
+        map(maybe_search_var_name(Info ^ ipi_var_table), Args, Vars),
+
+        ( can_parallelise_call(Info, Detism, CallSite) ->
+            CostPercall = cs_cost_get_percall(get_call_site_cost(Info,
+                CallSite)),
+            InnerGoalType = 
+                igt_call(Callee, Vars, CostPercall, VarModeAndUses, CallSite)
+        ;
+            CallSiteCost = get_call_site_cost(Info, CallSite),
+            InnerGoalType = igt_cheap_call(Callee, Vars, VarModeAndUses,
+                CallSiteCost)
+        )
     ).
 
 :- pred var_get_mode(inst_map::in, inst_map::in, var_rep::in, var_mode_rep::out)
@@ -1462,145 +1815,58 @@ var_get_mode(InstMapBefore, InstMapAfter
     inst_map_get(InstMapAfter, VarRep, InstAfter, _),
     VarModeRep = var_mode_rep(InstBefore, InstAfter).
 
-    % Note: this runs in quadratic time.
+    % Transform a conjunction of goals into a list of inner goals..
     %
-:- pred build_candidate_conjunctions(implicit_parallelism_info::in,
-    inst_map::in, goal_path::in, string_proc_label::in, 
-    list(maybe_call_conjunct)::in, cord(message)::out,
-    pqueue(float, candidate_par_conjunction_internal)::in, 
-    pqueue(float, candidate_par_conjunction_internal)::out) is det.
-
-build_candidate_conjunctions(_, _, _, _, [], cord.empty, !Candidates).
-build_candidate_conjunctions(Info, InstMap, GoalPath, ProcLabel,
-        [MaybeCall | MaybeCalls], Messages, !Candidates) :-
-    (
-        MaybeCall = call(_, _, _, CallSiteReport),
-        CallSiteCost = get_call_site_cost(Info, CallSiteReport),
+    % The results are returned in the order that they appear.
+    %
+:- pred conj_to_inner_goal_list(list(goal_rep(inst_map_info))::in,
+    goal_path::in, int::in, implicit_parallelism_info::in,
+    list(inner_goal_internal)::in, list(inner_goal_internal)::out,
+    int::in, int::out) is det.
+
+conj_to_inner_goal_list([], _, _, _, !InnerGoals, !NumCostlyCalls) :-
+    list.reverse(!InnerGoals).
+
+conj_to_inner_goal_list([Goal | Goals], GoalPath0, ConjNum, Info,
+        !InnerGoals, !NumCostlyCalls) :-
+    Goal = goal_rep(GoalExpr, Detism, InstMapInfo),
         (
-            ( cs_cost_get_calls(CallSiteCost) > 0.0 ->
-                PercallCost = cs_cost_get_percall(CallSiteCost),
-                PercallCost > float(Info ^ ipi_opts ^ cpc_call_site_threshold)
-            ;
-                fail 
-            )
-        ->
-            % This conjunction is a call and is expensive enough to
-            % parallelise, find some later conjunct to parallelise against it.
-            build_candidate_conjunctions_2(Info, InstMap, GoalPath, ProcLabel,
-                MaybeCall, cord.empty, MaybeCalls, Messages0, !Candidates)
-            % XXX: pick the most expensive non-overlapping candidates from the
-            % result.
+        ( GoalExpr = conj_rep(_)
+        ; GoalExpr = disj_rep(_)
+        ; GoalExpr = switch_rep(_, _, _)
+        ; GoalExpr = ite_rep(_, _, _)
+        ; GoalExpr = negation_rep(_)
+        ; GoalExpr = scope_rep(_, _)
+        ),
+        % XXX: We my consider lifting calls out of non-atomic goals so that
+        % they can be parallelised,  or parallelising the whole non-atomic
+        % goal.
+        InnerGoalType = igt_non_atomic_goal
         ;
-            Messages0 = cord.empty,
-            trace [compile_time(flag("debug_cpc_search")), io(!IO)]
-                io.format("D: Call too cheap: %s %s %s\n", 
-                    [s(string(ProcLabel)), 
-                     s(goal_path_to_string(CallSiteReport 
-                        ^ ccsr_call_site_summary 
-                        ^ perf_row_subject 
-                        ^ csdesc_goal_path)),
-                     s(string(CallSiteCost))], !IO)
-        )
-    ;
-        MaybeCall = non_atomic_goal,
-        Messages0 = cord.empty
-    ;
-        MaybeCall = trivial_atomic_goal(_, _),
-        Messages0 = cord.empty
-    ),
-    build_candidate_conjunctions(Info, InstMap, GoalPath, ProcLabel, MaybeCalls,
-        MessagesTail, !Candidates),
-    Messages = Messages0 ++ MessagesTail.
-
-:- pred build_candidate_conjunctions_2(implicit_parallelism_info::in,
-    inst_map::in, goal_path::in, string_proc_label::in, 
-    maybe_call_conjunct::in(call), cord(maybe_call_conjunct)::in,
-    list(maybe_call_conjunct)::in, cord(message)::out,
-    pqueue(float, candidate_par_conjunction_internal)::in, 
-    pqueue(float, candidate_par_conjunction_internal)::out) is det.
-
-build_candidate_conjunctions_2(_, _, _, _, _, _, [], cord.empty, !Candidates).
-build_candidate_conjunctions_2(Info, InstMap, GoalPath, ProcLabel, CallA,
-        IntermediateGoals, [MaybeCall | MaybeCalls], Messages, !Candidates) :-
-    (
-        some [!Messages]
+        GoalExpr = atomic_goal_rep(_Context, _Line, _BoundVars, AtomicGoal),
+        GoalPath = goal_path_add_at_end(GoalPath0, step_conj(ConjNum)),
+        maybe_costly_call(Info, GoalPath, AtomicGoal, Detism,
+            InstMapInfo, InnerGoalType),
         (
-            MaybeCall = call(_, _, _, CallSiteReport),
-            !:Messages = cord.empty,
-            CallB = MaybeCall,
-            Cost = cs_cost_get_percall(
-                get_call_site_cost(Info, CallSiteReport)),
-            ( Cost > float(Info ^ ipi_opts ^ cpc_call_site_threshold) ->
-                % This conjunct is a call and is expensive enough to
-                % parallelise.
-                are_conjuncts_dependant(CallA, CallB, InstMap, Dependence),
-                (
-                    Dependence = conjuncts_are_dependant(DepVars),
-                    compute_optimal_dependant_parallelisation(Info, 
-                        CallA, CallB, DepVars, IntermediateGoals, InstMap,
-                        CPCA, CPCB, Speedup, CODPMessages),
-                    !:Messages = !.Messages ++ CODPMessages
-                ;
-                    Dependence = conjuncts_are_independent,
-                    compute_independent_parallelisation_speedup(Info, 
-                        CallA, CallB, CPCA, CPCB, Speedup)
-                ),
-                % XXX: This threshold should be configurable.
-                ( Speedup > 0.0 ->
-                    ( length(IntermediateGoals) = 0 -> 
-                        GoalPathString = goal_path_to_string(GoalPath),
-                        Candidate = candidate_par_conjunction_internal(
-                            GoalPathString, CPCA, CPCB, Dependence, Speedup),
-                        % So that the candidates with the greatest speedup are
-                        % removed first multiply speedup by -1.0.
-                        pqueue.insert(!.Candidates, Speedup * -1.0, Candidate,
-                            !:Candidates)
+            InnerGoalType = igt_call(_, _, _, _, _),
+            !:NumCostlyCalls = !.NumCostlyCalls + 1
                     ;
-                        append_message(goal(ProcLabel, GoalPath),
-                            notice_candidate_callpairs_not_adjacent,
-                            !Messages)
-                    )
+            InnerGoalType = igt_cheap_call(_, _, _, _)
                 ;
-                    true
+            InnerGoalType = igt_other_atomic_goal
                 )
-            ;
-                % Don't recurse here, we don't parallelise over call goals.
-                append_message(goal(ProcLabel, GoalPath),
-                    notice_cannot_parallelise_over_cheap_call_goal, !Messages)
             ),
-            Messages = !.Messages
-        )
-    ;
-        MaybeCall = trivial_atomic_goal(_, _),
-        build_candidate_conjunctions_2(Info, InstMap, GoalPath, ProcLabel, 
-            CallA, cord.snoc(IntermediateGoals, MaybeCall), MaybeCalls,
-            Messages, !Candidates)
-    ;
-        MaybeCall = non_atomic_goal,
-        % Don't recurse in this case, we don't parallelise over non-atomic
-        % goals yet.
-        append_message(goal(ProcLabel, GoalPath),
-            notice_cannot_parallelise_over_nonatomic_goal,
-            cord.empty, Messages)
-    ).
-
-:- pred are_conjuncts_dependant(maybe_call_conjunct::in(call),
-    maybe_call_conjunct::in(call), inst_map::in, conjuncts_are_dependant::out)
-    is det.
-
-are_conjuncts_dependant(CallA, CallB, InstMap, Dependence) :-
-    % Conjuncts are dependant if there exists an input variable in CallB that
-    % is made ground by CallA or depends upon a variable made ground by CallA.
-    list.foldl(add_output_var_to_set, CallA ^ mccc_args, 
-        set.init, CallAOutputs),
-    list.foldl(are_conjuncts_dependant_var(CallAOutputs, InstMap), 
-        CallB ^ mccc_args, set.init, DepVars),
-    ( set.empty(DepVars) ->
-        Dependence = conjuncts_are_independent
-    ;
-        Dependence = conjuncts_are_dependant(DepVars)
-    ).
+    InnerGoal = inner_goal_internal(InnerGoalType, Detism, ConjNum,
+        InstMapInfo),
+    !:InnerGoals = [InnerGoal | !.InnerGoals],
+    conj_to_inner_goal_list(Goals, GoalPath0, ConjNum+1, Info, 
+        !InnerGoals, !NumCostlyCalls).
 
+    % are_conjuncts_dependant(CallOutputs, InstMap, VarModeAndUse, !DepVars),
+    %
+    % Determine if a variables depends on an output from an earlier call either
+    % directly or indirectly.  If it does add it to the DepVars set.
+    %
 :- pred are_conjuncts_dependant_var(set(var_rep)::in, inst_map::in,
     var_mode_and_use::in, set(var_rep)::in, set(var_rep)::out) is det.
 
@@ -1658,119 +1924,6 @@ get_call_site_cost(Info, CallSite) = Cos
         Cost = build_cs_cost_csq(Calls, float(TotalCost))
     ).
 
-:- pred compute_independent_parallelisation_speedup(
-    implicit_parallelism_info::in, 
-    maybe_call_conjunct::in(call), maybe_call_conjunct::in(call),
-    candidate_par_conjunct_internal::out,
-    candidate_par_conjunct_internal::out,
-    float::out) is det.
-
-compute_independent_parallelisation_speedup(Info, CallA, CallB, 
-        CPCA, CPCB, Speedup) :-
-    CostA = cs_cost_get_percall(
-        get_call_site_cost(Info, CallA ^ mccc_call_site)),
-    CostB = cs_cost_get_percall(
-        get_call_site_cost(Info, CallB ^ mccc_call_site)),
-    SequentialCost = CostA + CostB,
-    ParallelCost = max(CostA, CostB) + 
-        float(Info ^ ipi_opts ^ cpc_sparking_cost),
-    Speedup = SequentialCost - ParallelCost,
-    call_site_conj_to_candidate_par_conjunct(Info, CallA, CPCA),
-    call_site_conj_to_candidate_par_conjunct(Info, CallB, CPCB).
-
-:- pred compute_optimal_dependant_parallelisation(
-    implicit_parallelism_info::in,
-    maybe_call_conjunct::in(call), maybe_call_conjunct::in(call),
-    set(var_rep)::in, cord(maybe_call_conjunct)::in, inst_map::in,
-    candidate_par_conjunct_internal::out,
-    candidate_par_conjunct_internal::out,
-    float::out, cord(message)::out) is det.
-
-compute_optimal_dependant_parallelisation(Info, CallA, CallB,
-        DepVars, _IntermediateGoals, InstMap, CPCA, CPCB,
-        Speedup, Messages) :-
-    CostA = cs_cost_get_percall(
-        get_call_site_cost(Info, CallA ^ mccc_call_site)),
-    CostB = cs_cost_get_percall(
-        get_call_site_cost(Info, CallB ^ mccc_call_site)),
-    SequentialCost = CostA + CostB,
-    ( singleton_set(DepVars, DepVar) ->
-        % Only parallelise conjunctions with a single dependant variable for
-        % now.
-        ( get_var_use_from_args(CallB ^ mccc_args, DepVar, DepVarConsume) ->
-            DepVarConsume = var_use_info(CostUntilConsume, _),
-            (
-                get_var_use_from_args(CallA ^ mccc_args, DepVar,
-                    DepVarProduce)
-            ->
-                DepVarProduce = var_use_info(CostUntilProduction, _),
-                CostBeforeProduction = 
-                    cost_until_to_cost_since_start(CostUntilProduction, CostA),
-                CostBeforeConsume = 
-                    cost_until_to_cost_since_start(CostUntilConsume, CostB),
-                CostAfterConsume = 
-                    cost_until_to_cost_before_end(CostUntilConsume, CostB)
-            ;
-                inst_map_get_var_deps(InstMap, DepVar, DepVarDeps),
-                set.fold(get_var_use_add_to_queue(CallA ^ mccc_args), 
-                    DepVarDeps, pqueue.init, ProduceDepVarQueue),
-                ( 
-                    pqueue.remove(ProduceDepVarQueue, _,
-                        CostUntilProductionPrime, _)
-                ->
-                    CostUntilProduction = CostUntilProductionPrime
-                ;
-                    error(this_file ++ 
-                        "Expected to find at least one dependant variable")
-                ),
-                CostBeforeProduction0 = 
-                    cost_until_to_cost_since_start(CostUntilProduction, CostA),
-                CostAfterProduction0 = 
-                    cost_until_to_cost_before_end(CostUntilProduction, CostA),
-                CostBeforeConsume0 = 
-                    cost_until_to_cost_since_start(CostUntilConsume, CostB),
-                CostAfterConsume0 = 
-                    cost_until_to_cost_before_end(CostUntilConsume, CostB),
-                % Compare time before consume vs time after production, the
-                % lesser one should have the unifications added to it.  This
-                % maximises the amount of parallelism.
-                ( CostBeforeConsume0 > CostAfterProduction0 ->
-                    CostBeforeProduction = CostBeforeProduction0,
-                    CostBeforeConsume = CostA,
-                    CostAfterConsume = 0.0
-                ;
-                    CostBeforeProduction = 0.0,
-                    CostBeforeConsume = CostA - CostAfterConsume0,
-                    CostAfterConsume = CostAfterConsume0 
-                )
-            ),
-            SparkingCost = float(Info ^ ipi_opts ^ cpc_sparking_cost),
-            LockingCost = float(Info ^ ipi_opts ^ cpc_locking_cost),
-            ParallelCost = max(CostA, 
-                    max(CostBeforeProduction, CostBeforeConsume) 
-                        + CostAfterConsume) 
-                + SparkingCost + LockingCost,
-            Speedup = SequentialCost - ParallelCost
-        ;
-            error("Dependant var not in consumer's arguments")
-        ),
-        Messages = cord.empty
-    ;
-        % Post a notice saying that we tried to parallelise this but gave up.
-        CallSiteDesc = 
-            CallA ^ mccc_call_site ^ ccsr_call_site_summary ^ perf_row_subject,
-        PSPtr = CallSiteDesc ^ csdesc_container,
-        deep_lookup_proc_statics(Info ^ ipi_deep, PSPtr, ProcStatic),
-        ProcLabel = ProcStatic ^ ps_id,
-        GoalPath = CallSiteDesc ^ csdesc_goal_path,
-        append_message(goal(ProcLabel, GoalPath), 
-            notice_callpair_has_more_than_one_dependant_var,
-            cord.empty, Messages),
-        Speedup = -1.0
-    ),
-    call_site_conj_to_candidate_par_conjunct(Info, CallA, CPCA),
-    call_site_conj_to_candidate_par_conjunct(Info, CallB, CPCB).
-
 :- pred get_var_use_from_args(list(var_mode_and_use)::in, var_rep::in, 
     var_use_info::out) is semidet.
 
@@ -1798,36 +1951,205 @@ get_var_use_add_to_queue(VarsModeAndUse,
         true
     ).
 
-:- pred call_site_conj_to_candidate_par_conjunct(
-    implicit_parallelism_info::in, maybe_call_conjunct::in(call),
-    candidate_par_conjunct_internal::out) is det.
-
-call_site_conj_to_candidate_par_conjunct(Info, Call, CPC) :-
-    Call = call(MaybeCallee, _Detism, Args, CliqueCallSiteReport),
-    VarTable = Info ^ ipi_var_table,
-    list.map(var_mode_use_to_var_in_par_conj(VarTable), Args, Vars),
-    Cost = cs_cost_get_percall(get_call_site_cost(Info, CliqueCallSiteReport)),
-    GoalPath = CliqueCallSiteReport ^ ccsr_call_site_summary ^ perf_row_subject 
-        ^ csdesc_goal_path,
-    ( step_conj(ConjNumPrime) = goal_path_get_last(GoalPath) ->
-        ConjNum = ConjNumPrime
+%----------------------------------------------------------------------------%
+%
+% Annotate a goal with instantiation information.
+%
+
+    % XXX: This is poor, we need to associate an inst map delta with each goal
+    % rather than a pair of inst maps, without this some information cannot be
+    % recovered when trying to build the inst map, for example duplicate
+    % instantiations can't be recovered properly.
+:- type inst_map_info
+    --->    inst_map_info(
+                im_before           :: inst_map,
+                    % The inst map before this goal is executed.
+
+                im_after            :: inst_map,
+                    % The inst map after this goal was executed.
+
+                im_all_vars         :: set(var_rep)
+                    % Variables referenced by this goal, both consumed and
+                    % produced.
+            ).
+
+    % Note: It may be useful to add other annotations such as goal path or cost 
+    % information.
+    %
+    % SeenDuplicateInstiation is used to assert that we're analysing single
+    % assignment code only.
+    %
+    % Vars is the set of variables used by this goal, both consumed and
+    % produced.
+    %
+:- pred goal_annotate_with_instmap(goal_rep::in, goal_rep(inst_map_info)::out, 
+    inst_map::in, inst_map::out, seen_duplicate_instantiation::out, 
+    set(var_rep)::out) is det.
+
+goal_annotate_with_instmap(Goal0, Goal, !InstMap, SeenDuplicateInstantiation,
+        Vars) :-
+    Goal0 = goal_rep(GoalExpr0, Detism, _),
+    InstMapBefore = !.InstMap,
+    (
+        GoalExpr0 = conj_rep(Conjs0),
+        conj_annotate_with_instmap(Conjs0, Conjs, !InstMap, 
+            SeenDuplicateInstantiation, Vars),
+        GoalExpr = conj_rep(Conjs)
+    ;
+        GoalExpr0 = disj_rep(Disjs0),
+        disj_annotate_with_instmap(Disjs0, Disjs, !InstMap,
+            SeenDuplicateInstantiation, Vars),
+        GoalExpr = disj_rep(Disjs)
+    ;
+        GoalExpr0 = switch_rep(Var, CanFail, Cases0),
+        switch_annotate_with_instmap(Cases0, Cases, !InstMap,
+            SeenDuplicateInstantiation, Vars0),
+        set.insert(Vars0, Var, Vars),
+        GoalExpr = switch_rep(Var, CanFail, Cases)
+    ;
+        GoalExpr0 = ite_rep(Cond0, Then0, Else0),
+        ite_annotate_with_instmap(Cond0, Cond, Then0, Then, Else0, Else,
+            !InstMap, SeenDuplicateInstantiation, Vars),
+        GoalExpr = ite_rep(Cond, Then, Else)
+    ;
+        GoalExpr0 = scope_rep(Subgoal0, MaybeCut),
+        goal_annotate_with_instmap(Subgoal0, Subgoal, !InstMap, 
+            SeenDuplicateInstantiation, Vars),
+        GoalExpr = scope_rep(Subgoal, MaybeCut)
+    ;
+        GoalExpr0 = negation_rep(Subgoal0),
+        % A negated goal cannot affect instantiation.
+        goal_annotate_with_instmap(Subgoal0, Subgoal, !.InstMap, 
+            _InstMap, SeenDuplicateInstantiation, Vars),
+        GoalExpr = negation_rep(Subgoal)
     ;
-        error(this_file ++ 
-            "Couldn't determine conjunct number of when creating a " ++ 
-            "candidate_par_conjunct")
+        GoalExpr0 = atomic_goal_rep(File, Line, BoundVars, AtomicGoal),
+        % The binding of a variable may depend on any number of other
+        % variables, and recursively the variables that those depended-on
+        % variables depend upon.  
+        % XXX: This doesn't include variables that can affect control flow and
+        % therefore the values of other variables, this includes variables
+        % referenced from conditions in ITE goals, and variables switched-on.
+        % We may get away with this as our new system for determining
+        % goal-dependance takes these into account.
+        atomic_goal_get_vars(AtomicGoal, Vars),
+        BoundVarsSet = set.from_list(BoundVars),
+        set.difference(Vars, BoundVarsSet, RefedVars),
+        inst_map_ground_vars(BoundVars, RefedVars, !InstMap,
+            SeenDuplicateInstantiation),
+        GoalExpr = atomic_goal_rep(File, Line, BoundVars, AtomicGoal) 
     ),
-    CPC = candidate_par_conjunct_internal(MaybeCallee, Vars, Cost, ConjNum).
+    InstMapAfter = !.InstMap,
+    InstMapInfo = inst_map_info(InstMapBefore, InstMapAfter, Vars),
+    Goal = goal_rep(GoalExpr, Detism, InstMapInfo).
+
+:- pred conj_annotate_with_instmap(list(goal_rep)::in,
+    list(goal_rep(inst_map_info))::out, inst_map::in, inst_map::out,
+    seen_duplicate_instantiation::out, set(var_rep)::out) is det.
+
+conj_annotate_with_instmap([], [], !InstMap,
+    have_not_seen_duplicate_instantiation, set.init).
+conj_annotate_with_instmap([Conj0 | Conjs0], [Conj | Conjs], !InstMap, 
+        SeenDuplicateInstantiation, Vars) :-
+    goal_annotate_with_instmap(Conj0, Conj, !InstMap,
+        SeenDuplicateInstantiationHead, VarsHead),
+    conj_annotate_with_instmap(Conjs0, Conjs, !InstMap,
+        SeenDuplicateInstantiationTail, VarsTail),
+    set.union(VarsTail, VarsHead, Vars),
+    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
+        SeenDuplicateInstantiationHead,
+        SeenDuplicateInstantiationTail).
 
-:- pred var_mode_use_to_var_in_par_conj(var_table::in, var_mode_and_use::in,
-    maybe(string)::out) is det.
+:- pred disj_annotate_with_instmap(list(goal_rep)::in,
+    list(goal_rep(inst_map_info))::out, inst_map::in, inst_map::out,
+    seen_duplicate_instantiation::out, set(var_rep)::out) is det.
+
+disj_annotate_with_instmap([], [], !InstMap,
+        have_not_seen_duplicate_instantiation, set.init).
+disj_annotate_with_instmap([Disj0 | Disjs0], [Disj | Disjs], InstMap0, InstMap,
+        SeenDuplicateInstantiation, Vars) :-
+    HeadDetism = Disj0 ^ goal_detism_rep,
+    goal_annotate_with_instmap(Disj0, Disj, InstMap0, InstMapHead,
+        SeenDuplicateInstantiationHead, VarsHead),
+    disj_annotate_with_instmap(Disjs0, Disjs, InstMap0, InstMapTail,
+        SeenDuplicateInstantiationTail, VarsTail),
 
-var_mode_use_to_var_in_par_conj(VarTable, var_mode_and_use(Var, _, _),
-        MaybeName) :-
-    ( search_var_name(VarTable, Var, Name) ->
-        MaybeName = yes(Name)
+    set.union(VarsTail, VarsHead, Vars),
+
+    % merge_inst_map requires the detism of goals that produce both inst maps,
+    % we can create fake values that satisfy merge_inst_map easily.
+    % XXX: Consider inferring determinism as another simple analysis.
+    (
+        Disjs = [],
+        TailDetism = failure_rep
     ;
-        MaybeName = no
-    ).
+        Disjs = [_ | _],
+        TailDetism = det_rep
+    ),
+    InstMap = merge_inst_map(InstMapHead, HeadDetism, InstMapTail, TailDetism),
+    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
+        SeenDuplicateInstantiationHead,
+        SeenDuplicateInstantiationTail).
+
+:- pred switch_annotate_with_instmap(list(case_rep)::in, 
+    list(case_rep(inst_map_info))::out, inst_map::in, inst_map::out,
+    seen_duplicate_instantiation::out, set(var_rep)::out) is det.
+
+switch_annotate_with_instmap([], [], !InstMap,
+        have_not_seen_duplicate_instantiation, set.init).
+switch_annotate_with_instmap([Case0 | Cases0], [Case | Cases], 
+        InstMap0, InstMap, SeenDuplicateInstantiation, Vars) :-
+    Case0 = case_rep(ConsIdArity, ExtraConsIdAritys, Goal0),
+    HeadDetism = Goal0 ^ goal_detism_rep,
+    goal_annotate_with_instmap(Goal0, Goal, InstMap0, InstMapHead,
+        SeenDuplicateInstantiationHead, VarsHead),
+    Case = case_rep(ConsIdArity, ExtraConsIdAritys, Goal),
+    switch_annotate_with_instmap(Cases0, Cases, InstMap0, InstMapTail,
+        SeenDuplicateInstantiationTail, VarsTail),
+    set.union(VarsTail, VarsHead, Vars),
+    % merge_inst_map requires the detism of goals that produce both inst maps,
+    % we can create fake values that satisfy merge_inst_map easily.
+    (
+        Cases = [],
+        TailDetism = failure_rep
+    ;
+        Cases = [_ | _],
+        TailDetism = det_rep
+    ),
+    InstMap = merge_inst_map(InstMapHead, HeadDetism, InstMapTail, TailDetism),
+    SeenDuplicateInstantiation = merge_seen_duplicate_instantiation(
+        SeenDuplicateInstantiationHead,
+        SeenDuplicateInstantiationTail).
+
+:- pred ite_annotate_with_instmap(goal_rep::in, goal_rep(inst_map_info)::out,
+    goal_rep::in, goal_rep(inst_map_info)::out,
+    goal_rep::in, goal_rep(inst_map_info)::out,
+    inst_map::in, inst_map::out, 
+    seen_duplicate_instantiation::out, set(var_rep)::out) is det.
+
+ite_annotate_with_instmap(Cond0, Cond, Then0, Then, Else0, Else, InstMap0, InstMap,
+        SeenDuplicateInstantiation, Vars) :-
+    goal_annotate_with_instmap(Cond0, Cond, InstMap0, InstMapAfterCond,
+        SeenDuplicateInstantiationCond, VarsCond),
+    goal_annotate_with_instmap(Then0, Then, InstMapAfterCond, InstMapAfterThen,
+        SeenDuplicateInstantiationThen, VarsThen),
+    goal_annotate_with_instmap(Else0, Else, InstMap0, InstMapAfterElse,
+        SeenDuplicateInstantiationElse, VarsElse),
+    (
+        SeenDuplicateInstantiationCond = have_not_seen_duplicate_instantiation,
+        SeenDuplicateInstantiationThen = have_not_seen_duplicate_instantiation,
+        SeenDuplicateInstantiationElse = have_not_seen_duplicate_instantiation
+    ->
+        SeenDuplicateInstantiation = have_not_seen_duplicate_instantiation
+    ;
+        SeenDuplicateInstantiation = seen_duplicate_instantiation
+    ),
+    set.union(VarsCond, VarsThen, VarsCondThen),
+    set.union(VarsCondThen, VarsElse, Vars),
+    ThenDetism = Then ^ goal_detism_rep,
+    ElseDetism = Else ^ goal_detism_rep,
+    InstMap = merge_inst_map(InstMapAfterThen, ThenDetism, 
+        InstMapAfterElse, ElseDetism).
 
 %----------------------------------------------------------------------------%
 %
@@ -1985,6 +2307,52 @@ build_cs_cost_from_perf(Perf) = CSCost :
     Calls = Perf ^ perf_row_calls,
     CSCost = build_cs_cost_csq(Calls, float(TotalCSQ)).
 
+:- pred transitive_map_insert(T::in, T::in, 
+    map(T, set(T))::in, map(T, set(T))::out) is det.
+
+transitive_map_insert(K, V, !Map) :-
+    ( map.search(!.Map, K, Vs0) ->
+        ( member(V, Vs0) ->
+            true
+        ;
+            insert(Vs0, V, Vs1),
+            ( map.search(!.Map, V, VsTransitive) ->
+                union(Vs1, VsTransitive, Vs)
+            ;
+                Vs = Vs1
+            ),
+            svmap.det_update(K, Vs, !Map)
+        )
+    ;
+        ( map.search(!.Map, V, VsTransitive) ->
+            insert(VsTransitive, V, Vs)
+        ;
+            singleton_set(Vs, V)
+        ),
+        svmap.det_insert(K, Vs, !Map)
+    ).
+
+    % split_on(Pred, List, SplitLists).
+    %
+    % Split a list on items for which Pred is true, the delimiting items are
+    % not returned in the result, empty sections are not removed.
+    %
+    % split_on(unify(1), [2, 3, 1, 2, 6, 1, 1, 3, 1], 
+    %   [[2, 3], [2, 6], [], 3, []])
+    %
+:- pred split_on(pred(T), list(T), list(list(T))).
+:- mode split_on(pred(in) is semidet, in, out(non_empty_list)) is det.
+
+split_on(_Pred, [], [[]]).
+split_on(Pred, [X | Xs], SplitLists) :-
+    split_on(Pred, Xs, SplitLists0),
+    ( Pred(X) ->
+        SplitLists = [ [] | SplitLists0 ]
+    ;
+        SplitLists0 = [SplitList | T], 
+        SplitLists = [ [ X | SplitList ] | T ]
+    ).
+
 %-----------------------------------------------------------------------------%
 :- end_module mdprof_fb.automatic_parallelism.
 %-----------------------------------------------------------------------------%
Index: deep_profiler/mdprof_feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_feedback.m,v
retrieving revision 1.18
diff -u -p -b -r1.18 mdprof_feedback.m
--- deep_profiler/mdprof_feedback.m	17 Mar 2009 06:27:07 -0000	1.18
+++ deep_profiler/mdprof_feedback.m	8 Jan 2010 23:48:44 -0000
@@ -111,7 +111,7 @@ main(!IO) :-
                             ( message_level_to_int(Level) =< VerbosityLevel ->
                                 message_to_string(Message, MessageStr),
                                 io.write_string(Stderr, MessageStr, IO0, IO1),
-                                io.nl(IO1, IO)
+                                io.nl(Stderr, IO1, IO)
                             ;
                                 IO = IO0
                             )
Index: deep_profiler/measurements.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/measurements.m,v
retrieving revision 1.16
diff -u -p -b -r1.16 measurements.m
--- deep_profiler/measurements.m	16 Dec 2009 23:47:30 -0000	1.16
+++ deep_profiler/measurements.m	8 Jan 2010 23:48:44 -0000
@@ -146,17 +146,26 @@
 
 :- func no_parallelism = parallelism_amount.
 
-    % sub_computation_parallelism(ChildRunnableProb, 
-    %   ParentParallelism, ChildParallelism).
+:- func some_parallelism(float) = parallelism_amount.
+
+    % sub_computation_parallelism(ParentParallelism, ChildRunnableProb, 
+    %   ChildParallelism, Parallelism).
+    % sub_computation_parallelism(ParentParallelism, ChildRunnableProb, 
+    %   Parallelism).
+    %
+    % Compute the total parallelism seen during the child's execution if the
+    % parent is executed in a context with parallelism and the child it's self
+    % has some amount of parallelism and there is a probability of
+    % ChildRunnableProb that the child will be executed.
     %
-    % When control passes from a parent to a child a parallel thread is invoked
-    % that parallelises a job against another.  ParentParallelism is the amount
-    % of contention for CPUs in the parent and ChildParallelism is the amount
-    % of contention for CPUs in either child given that the other child has a
-    % ChildRunnableProb chance of being on CPU or in the run queue during the
-    % execution of the first child.
+    % In the three argument version we assume that the child has no parallelism.
+    % In this version it's useful to think of the ChildRunnableProb as the
+    % chance a forked off child (of a pair of two) will be 'runnable' during
+    % the execution of it's sibling.
     %
-:- pred sub_computation_parallelism(probability::in, parallelism_amount::in,
+:- pred sub_computation_parallelism(parallelism_amount::in, probability::in, 
+    parallelism_amount::in, parallelism_amount::out) is det.
+:- pred sub_computation_parallelism(parallelism_amount::in, probability::in, 
     parallelism_amount::out) is det.
 
     % exceeded_desired_parallelism(DesiredParallelism, Parallelism)
@@ -173,6 +182,7 @@
 
 :- import_module float.
 :- import_module int.
+:- import_module require.
 :- import_module string.
 
 %----------------------------------------------------------------------------%
@@ -629,16 +639,36 @@ Cost0 / Denom = Cost :-
 
 no_parallelism = parallelism_amount(1.0).
 
-sub_computation_parallelism(Prob, ParentParallelism, ChildParallelism) :-
+some_parallelism(Num) = parallelism_amount(Num) :-
+    ( Num < 1.0 ->
+        error(this_file ++ "some_parallelism/1+1: " ++ 
+            "Parallelism amount cannot ever be less than 1.0")
+    ;
+        true
+    ).
+
+sub_computation_parallelism(ParentParallelism, Prob, ChildParallelism,
+        Parallelism) :-
     probability_to_float(Prob) = ProbFloat,
     ParentParallelism = parallelism_amount(ParLikely),
-    CldLikely = ParLikely + ProbFloat,
-    ChildParallelism = parallelism_amount(CldLikely).
+    ChildParallelism = parallelism_amount(ChildLikely),
+    Likely = ParLikely + (ProbFloat * ChildLikely),
+    Parallelism = parallelism_amount(Likely).
+
+sub_computation_parallelism(ParentParallelism, Prob, Parallelism) :-
+    sub_computation_parallelism(ParentParallelism, Prob, no_parallelism,
+        Parallelism).
 
 exceeded_desired_parallelism(DesiredParallelism, Parallelism) :-
     Parallelism = parallelism_amount(LikelyParallelism),
     DesiredParallelism < LikelyParallelism.
 
 %----------------------------------------------------------------------------%
+
+:- func this_file = string.
+
+this_file = "measurements.m".
+
+%----------------------------------------------------------------------------%
 :- end_module measurements.
 %----------------------------------------------------------------------------%
Index: deep_profiler/message.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/message.m,v
retrieving revision 1.3
diff -u -p -b -r1.3 message.m
--- deep_profiler/message.m	16 Apr 2009 06:15:56 -0000	1.3
+++ deep_profiler/message.m	9 Jan 2010 05:19:36 -0000
@@ -59,6 +59,8 @@
 
 :- pred message_to_string(message::in, string::out) is det.
 
+:- pred location_to_string(program_location::in, string::out) is det.
+
 :- pred append_message(program_location::in, message_type::in,
     cord(message)::in, cord(message)::out) is det.
 
@@ -72,6 +74,22 @@
                 % A candidate parallel conjunction has been found.
     --->    info_found_candidate_conjunction
                 
+                % There are a number of conjuncts containing calls above the
+                % configured call site threshold, we're considering them for
+                % parallelisation against one another.
+                %
+    ;       info_found_conjs_above_callsite_threshold(int)
+
+                % The conjunction being consdered for parallelisation had to be
+                % split into several 'partitions' because it contains some non
+                % atomic goals, this can limit the amount of parallelism
+                % available.
+    ;       info_split_conjunction_into_partitions(int)
+
+                % There are N conjunctions whose speedup due to parallelisation
+                % is positive.
+    ;       info_found_n_conjunctions_with_positive_speedup(int)
+
                 % This occurs when a variable is instantiated twice in a
                 % procedure body (different instantiation states are used).  We
                 % don't bother parallelising such procedures.
@@ -82,32 +100,21 @@
                     % parallelised.
             )
 
-                % Extra call pairs could have been parallelised but weren't in
-                % this implementation.
-                %
-    ;       notice_extra_callpairs_in_conjunction(
-                int
-                    % The number of call pairs that were not parallelised.
-            )
-   
-                % A pair of calls that we want to parallelise are separated by
-                % some other goal.
-                %
-    ;       notice_candidate_callpairs_not_adjacent
-
-                % As above, except that the goal in between is a call goal.
-                %
-    ;       notice_cannot_parallelise_over_cheap_call_goal
-        
-                % As above, except that the goal in between is non-atomic.
-                %
-    ;       notice_cannot_parallelise_over_nonatomic_goal
-            
                 % A pair of calls that could be parallelised have many
                 % dependant variables.  We don't yet calculate the speedup in
                 % these situations.
+                %
     ;       notice_callpair_has_more_than_one_dependant_var
                 
+                % A partition does not enough enough costly calls (>1) and
+                % could not be parallelised, we could have parallelised them if
+                % we could parallelise over non-atomic code.
+                % 
+                % The parameters are the partition number and the number of
+                % costly calls found.
+                %
+    ;       notice_partition_does_not_have_costly_calls(int, int)
+
                 % Couldn't find the proc defn in the progrep data, maybe the
                 % procedure is built-in.
                 %
@@ -135,6 +142,8 @@
 
 :- import_module list.
 
+:- import_module program_representation_utils.
+
 %-----------------------------------------------------------------------------%
 
 message_get_level(message(_, Type)) =
@@ -143,13 +152,28 @@ message_get_level(message(_, Type)) =
 %-----------------------------------------------------------------------------%
 
 message_to_string(message(Location, MessageType), String) :-
-    LocationString = string(Location),
+    location_to_string(Location, LocationString),
     Level = message_type_to_level(MessageType),
     LevelString = message_level_to_string(Level),
     MessageStr = message_type_to_string(MessageType),
     string.format("%s: In %s: %s",
         [s(LevelString), s(LocationString), s(MessageStr)], String).
 
+location_to_string(proc(ProcLabel), String) :-
+    print_proc_label_to_string(ProcLabel, ProcLabelString),
+    format("procedure %s", [s(ProcLabelString)], String). 
+location_to_string(goal(ProcLabel, GoalPath), String) :-
+    print_proc_label_to_string(ProcLabel, ProcLabelString),
+    ( empty_goal_path(GoalPath) ->
+        GoalPathString = "root goal"
+    ;
+        GoalPathString = goal_path_to_string(GoalPath)
+    ),
+    format("goal %s in procedure %s", 
+        [s(GoalPathString), s(ProcLabelString)], String).
+location_to_string(clique(clique_ptr(Id)), String) :-
+    format("clique %d", [i(Id)], String).
+
 %-----------------------------------------------------------------------------%
 
 append_message(Location, MessageType, !Messages) :-
@@ -159,7 +183,6 @@ append_message(Location, MessageType, !M
 %-----------------------------------------------------------------------------%
 
 :- func message_level_to_string(message_level) = string.
-
 message_level_to_string(message_info) = "Info".
 message_level_to_string(message_notice) = "Notice".
 message_level_to_string(message_warning) = "Warning".
@@ -178,17 +201,16 @@ message_level_to_int(message_error) = 1.
 
 message_type_to_level(info_found_candidate_conjunction) =
     message_info.
+message_type_to_level(info_found_conjs_above_callsite_threshold(_)) =
+    message_info.
+message_type_to_level(info_found_n_conjunctions_with_positive_speedup(_)) =
+    message_info.
+message_type_to_level(info_split_conjunction_into_partitions(_)) = message_info.
 message_type_to_level(notice_duplicate_instantiation(_)) = message_notice.
-message_type_to_level(notice_extra_callpairs_in_conjunction(_)) =
-    message_notice.
-message_type_to_level(notice_candidate_callpairs_not_adjacent) = 
-    message_notice.
-message_type_to_level(notice_cannot_parallelise_over_cheap_call_goal) =
-    message_notice.
-message_type_to_level(notice_cannot_parallelise_over_nonatomic_goal) =
-    message_notice.
 message_type_to_level(notice_callpair_has_more_than_one_dependant_var) =
     message_notice.
+message_type_to_level(notice_partition_does_not_have_costly_calls(_, _)) =
+    message_notice.
 message_type_to_level(warning_cannot_lookup_proc_defn) = message_warning.
 message_type_to_level(warning_cannot_compute_procrep_coverage_fallback(_)) =
     message_warning.
@@ -206,32 +228,35 @@ message_type_to_string(MessageType) = St
         MessageType = info_found_candidate_conjunction, 
         String = "Found candidate conjunction"
     ;
+        (
+            MessageType = info_found_conjs_above_callsite_threshold(Num),
+            MessageStr = "Found %d conjuncts above callsite threashold"
+        ;
+            MessageType = info_found_n_conjunctions_with_positive_speedup(Num),
+            MessageStr = "Found %d conjunctions with a positive speedup due"
+                ++ " to parallelisation"
+        ;
+            MessageType = info_split_conjunction_into_partitions(Num),
+            MessageStr = "Split conjunction into %d partitions, this may reduce"
+                ++ " parallelism"
+        ),
+        string.format(MessageStr, [i(Num)], String)
+    ;
         MessageType = notice_duplicate_instantiation(CandidateConjuncts), 
         string.format(
             "%d conjunctions not parallelised: Seen duplicate instantiations",
             [i(CandidateConjuncts)], String)
     ;
-        MessageType = notice_extra_callpairs_in_conjunction(NumCPCs),
-        string.format(
-            "%d potential call pairs not parallelised in this conjunction",
-            [i(NumCPCs)], String)
-    ;
-        MessageType = notice_candidate_callpairs_not_adjacent,
-        String = "Two callpairs are difficult to parallelise because they are"
-            ++ " not adjacent"
-    ;
-        MessageType = notice_cannot_parallelise_over_cheap_call_goal,
-        String = "Parallelising expensive call goals with cheap call goals"
-            ++ " between them is not supported"
-    ;
-        MessageType = notice_cannot_parallelise_over_nonatomic_goal, 
-        String = "Parallelising call goals with non-atomic goals between them"
-            ++ " is not supported"
-    ;
         MessageType = notice_callpair_has_more_than_one_dependant_var,
         String = "Parallelising call pairs that have more than one dependant"
             ++ " variable is not yet supported."
     ;
+        MessageType = notice_partition_does_not_have_costly_calls(PartNum,
+            NumCalls),
+        string.format("Partition %d has only %d costly calls and cannot be"
+                ++ " parallelised", 
+            [i(PartNum), i(NumCalls)], String)
+    ;
         MessageType = warning_cannot_lookup_proc_defn,
         String = "Could not look up proc defn, perhaps this procedure is"
             ++ " built-in"
Index: deep_profiler/program_representation_utils.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/program_representation_utils.m,v
retrieving revision 1.21
diff -u -p -b -r1.21 program_representation_utils.m
--- deep_profiler/program_representation_utils.m	6 Nov 2008 05:47:38 -0000	1.21
+++ deep_profiler/program_representation_utils.m	8 Jan 2010 23:48:44 -0000
@@ -42,6 +42,10 @@
 :- pred print_proc_to_strings(proc_rep(GoalAnn)::in, cord(string)::out) is det
     <= goal_annotation(GoalAnn).
 
+    % Print a proc label to a string.
+    %
+:- pred print_proc_label_to_string(string_proc_label::in, string::out) is det.
+
 %----------------------------------------------------------------------------%
 
 :- typeclass goal_annotation(T) where [
@@ -118,6 +122,33 @@
 
 %----------------------------------------------------------------------------%
 
+    % A difference between too inst maps.  This lists the variables that are
+    % instantiated by a particular goal.
+    %
+:- type inst_map_delta.
+
+    % Get the set of variables that are instantiated by this inst map.
+    %
+:- pred inst_map_delta_get_var_set(inst_map_delta::in, set(var_rep)::out)
+    is det.
+
+    % The empty inst_map_delta.  Nothing is instantiated.
+    %
+:- pred empty_inst_map_delta(inst_map_delta::out) is det.
+:- func empty_inst_map_delta = inst_map_delta.
+
+    % calc_inst_map_delta(InstMapBefore, InstMapAfter, InstMapDelta)
+    %
+    % Calculate the difference between two inst maps.
+    %
+    % InstMapAfter is InstMapBefore after the variables in InstMapDelta have
+    % been instantiated.
+    %
+:- pred calc_inst_map_delta(inst_map::in, inst_map::in, inst_map_delta::out) 
+    is det.
+
+%----------------------------------------------------------------------------%
+
     % Retrieve a set of all the vars involved with this atomic goal.
     %
 :- pred atomic_goal_get_vars(atomic_goal_rep::in, set(var_rep)::out) is det.
@@ -160,16 +191,16 @@ accumulate_print_proc_to_strings(_, Proc
 print_proc_to_strings(ProcRep, Strings) :-
     ProcRep = proc_rep(ProcLabel, ProcDefnRep),
     ProcDefnRep = proc_defn_rep(ArgVarReps, GoalRep, VarTable, Detism),
-    print_proc_label_to_strings(Detism, ProcLabel, ProcLabelString),
+    print_proc_label_to_string(ProcLabel, ProcLabelString0),
+    detism_to_string(Detism, DetismString),
+    ProcLabelString = DetismString ++ cord.singleton(" ") ++ 
+        cord.singleton(ProcLabelString0),
     print_args_to_strings(print_head_var, VarTable, ArgVarReps, ArgsString),
     print_goal_to_strings(VarTable, 1, GoalRep, GoalString),
     Strings = ProcLabelString ++ ArgsString ++ cord.singleton(" :-\n") ++
         GoalString ++ nl.
 
-:- pred print_proc_label_to_strings(detism_rep::in, string_proc_label::in,
-    cord(string)::out) is det.
-
-print_proc_label_to_strings(Detism, ProcLabel, Strings) :-
+print_proc_label_to_string(ProcLabel, String) :-
     (
         ProcLabel = str_ordinary_proc_label(PredFunc, DeclModule, _DefModule,
             Name, Arity, Mode),
@@ -180,16 +211,14 @@ print_proc_label_to_strings(Detism, Proc
             PredFunc = pf_function,
             PF = "func"
         ),
-        string.format(" %s %s.%s/%d-%d",
+        string.format("%s %s.%s/%d-%d",
             [s(PF), s(DeclModule), s(Name), i(Arity), i(Mode)], String)
     ;
         ProcLabel = str_special_proc_label(TypeName, TypeModule, _DefModule,
             Name, Arity, Mode),
-        string.format(" %s for %s.%s/%d-%d",
+        string.format("%s for %s.%s/%d-%d",
             [s(Name), s(TypeModule), s(TypeName), i(Arity), i(Mode)], String)
-    ),
-    detism_to_string(Detism, DetismString),
-    Strings = DetismString ++ cord.singleton(String).
+    ).
 
 %-----------------------------------------------------------------------------%
 
@@ -563,10 +592,10 @@ modulerep_search_proc(ModuleRep, ProcLab
 
 :- type inst_map
     --->    inst_map(
-                map(var_rep, inst_rep),
+                im_inst_map         :: map(var_rep, inst_rep),
                     % The actual inst map.
 
-                map(var_rep, set(var_rep))
+                im_var_dep_map      :: map(var_rep, set(var_rep))
                     % A tree describing dependencies between bound variables.
             ).
 
@@ -704,6 +733,33 @@ inst_map_get_var_deps_2(VarToDepVars, Va
 
 %----------------------------------------------------------------------------%
 
+:- type inst_map_delta
+    --->    inst_map_delta(set(var_rep)).
+
+inst_map_delta_get_var_set(inst_map_delta(Vars), Vars).
+
+empty_inst_map_delta(inst_map_delta(Vars)) :-
+    set.init(Vars).
+empty_inst_map_delta = InstMap :-
+    empty_inst_map_delta(InstMap).
+
+calc_inst_map_delta(Before, After, inst_map_delta(DeltaVars)) :-
+    AfterVars = map.sorted_keys(After ^ im_inst_map),
+    filter((pred(Var::in) is semidet :-
+            (
+                map.search(Before ^ im_inst_map, Var, BeforeInst)
+            ->
+                BeforeInst = ir_free_rep
+            ;
+                % If we couldn't find the variable then it was free, It may
+                % have been in the head of the procedure.
+                true
+            )
+        ), AfterVars, DeltaVarsList),
+    DeltaVars = sorted_list_to_set(DeltaVarsList).
+
+%----------------------------------------------------------------------------%
+
 atomic_goal_get_vars(AtomicGoal, Vars) :-
     (
         ( AtomicGoal = unify_construct_rep(Var, _, VarsL)
Index: library/maybe.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/maybe.m,v
retrieving revision 1.3
diff -u -p -b -r1.3 maybe.m
--- library/maybe.m	19 Apr 2006 05:17:53 -0000	1.3
+++ library/maybe.m	8 Jan 2010 23:48:44 -0000
@@ -84,6 +84,12 @@
 :- mode map_fold2_maybe(pred(in, out, in, out, di, uo) is det,
     in, out, in, out, di, uo) is det.
 
+    % maybe_is_yes(yes(X), X).
+    %
+    % This is useful as an argument to list.filter_map
+    %
+:- pred maybe_is_yes(maybe(T)::in, T::out) is semidet.
+
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
@@ -109,6 +115,8 @@ map_fold2_maybe(_, no, no, !A, !B).
 map_fold2_maybe(P, yes(T0), yes(T), !A, !B) :-
     P(T0, T, !A, !B).
 
+maybe_is_yes(yes(X), X).
+
 %-----------------------------------------------------------------------------%
 :- end_module maybe.
 %-----------------------------------------------------------------------------%
Index: mdbcomp/feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/mdbcomp/feedback.m,v
retrieving revision 1.6
diff -u -p -b -r1.6 feedback.m
--- mdbcomp/feedback.m	12 Mar 2009 03:33:42 -0000	1.6
+++ mdbcomp/feedback.m	9 Jan 2010 05:26:57 -0000
@@ -27,8 +27,6 @@
 :- import_module io.
 :- import_module list.
 :- import_module maybe.
-:- import_module pair.
-:- import_module set.
 :- import_module string.
 
 %-----------------------------------------------------------------------------%
@@ -107,38 +105,89 @@
     %
 :- type candidate_par_conjunction
     --->    candidate_par_conjunction(
-                goal_path       :: goal_path_string,
+                cpc_goal_path           :: goal_path_string,
                     % The path within the procedure to this conjunuction.
                 
-                par_conjunct_a  :: candidate_par_conjunct,
-                par_conjunct_b  :: candidate_par_conjunct,
-                    % The conjuncts worth parallelising
+                cpc_partition_number    :: int,
+                    % Used to locate the goals to be parallelised within the
+                    % conjunction.  Partitions are separated by non atomic
+                    % goals, the first partition has the number 1.
 
-                dependence      :: conjuncts_are_dependant,
+                cpc_is_dependant        :: conjuncts_are_dependant,
 
-                speedup         :: float
+                cpc_conjs               :: list(seq_conj),
+                    % A list of parallel conjuncts, each is a sequential
+                    % conjunction of inner goals.  All inner goals that are
+                    % seen in the program presentation must be stored here
+                    % unless they are to be scheduled before or after the
+                    % sequential conjunction.  If these conjuncts are flattened
+                    % the inner goals will appear in the same order as the
+                    % program representation.  By maintaining these two rules
+                    % the compiler and analysis tools can use similar
+                    % algorithms to construct the same parallel conjunction
+                    % from the same program representation/HLDS structure.
+
+                cpc_speedup             :: float
+                    % Speedup is absolute so different speedups for different
+                    % peices of code can be compared.  That is the formula used
+                    % in many text books is _not_ used.  Instead we use:
+                    %
+                    %  Speedup = (SequentialTime - ParallelTime) * NumCalls
+                    %
+                    % This way speedup directly measures the benifit of
+                    % parallelisation, Any costs of parallel execution are
+                    % assumed to be in ParallelTime.
+            ).
+
+:- type seq_conj
+    --->    seq_conj(
+                sc_conjs            :: list(inner_goal)
+            ).
+
+:- type callee_rep
+    --->    unknown_callee
+                % An unknown callee such as a higher order or method call.
+                
+    ;       named_callee(
+                % A known callee.  note that arrity and mode are not stored at
+                % all.
+               
+                nc_module_name  :: string,
+                nc_proc_name    :: string
             ).
 
-:- type candidate_par_conjunct
-    --->    candidate_par_conjunct(
-                callee                  :: maybe(pair(string, string)),
-                    % If the name of the callee is known (it's not a HO call),
-                    % then store the module and symbol names here.
-                    % Note: arity and mode are not represented.
+    % A representation of a goal within a parallel conjunction.  We don't have
+    % to represent many types of goals or details about them, at least for now.
+    %
+:- type inner_goal
+    --->    ig_call(
+                % This is a call that we're considering parallelising.  It has
+                % a significant enough cost to be considered for
+                % parallelisation.
+                
+                igc_callee                  :: callee_rep,
                     
-                vars                    :: list(maybe(string)),
+                igc_vars                    :: list(maybe(string)),
                     % The names of variables (if used defined) given as
                     % arguments to this call.
                     
-                cost_percall            :: float
+                igc_cost_percall            :: float
                     % The per-call cost of this call in call sequence counts.
-                   
-            ).
+            )
+    ;       ig_cheap_call(
+                % This call is to cheap to be considered for parallelisation,
+                % we track it in the feedback information to help inform the
+                % compiler about _how_ to parallelise calls around it.
+                
+                igcc_callee                  :: callee_rep,
+                igcc_vars                    :: list(maybe(string))
+                    % As above.
+            )
+    ;       ig_other_atomic_goal.
+                % Some other (cheap) atomic goal.
 
 :- type conjuncts_are_dependant
-    --->    conjuncts_are_dependant(
-                dependant_vars          :: set(var_rep)
-            )
+    --->    conjuncts_are_dependant
     ;       conjuncts_are_independent.
 
 %-----------------------------------------------------------------------------%
@@ -550,7 +599,7 @@ feedback_first_line = "Mercury Compiler 
 
 :- func feedback_version = string.
 
-feedback_version = "4".
+feedback_version = "5".
 
 %-----------------------------------------------------------------------------%
 :- end_module mdbcomp.feedback.
Index: mdbcomp/program_representation.m
===================================================================
RCS file: /home/mercury1/repository/mercury/mdbcomp/program_representation.m,v
retrieving revision 1.48
diff -u -p -b -r1.48 program_representation.m
--- mdbcomp/program_representation.m	11 Jun 2009 08:28:31 -0000	1.48
+++ mdbcomp/program_representation.m	8 Jan 2010 23:48:44 -0000
@@ -377,6 +377,9 @@
     %
 :- pred search_var_name(var_table::in, var_rep::in, string::out) is semidet.
 
+:- pred maybe_search_var_name(var_table::in, var_rep::in, maybe(string)::out)
+    is det.
+
     % If the given atomic goal behaves like a call in the sense that it
     % generates events as ordinary calls do, then return the list of variables
     % that are passed as arguments.
@@ -479,6 +482,9 @@
     % The empty goal path.
     %
 :- func empty_goal_path = goal_path.
+:- pred empty_goal_path(goal_path).
+:- mode empty_goal_path(out) is det.
+:- mode empty_goal_path(in) is semidet.
 
     % A singleton goal path.
     %
@@ -832,7 +838,10 @@ goal_rep_type = type_of(_ : goal_rep).
                 gp_steps    :: list(goal_path_step)
             ).
 
-empty_goal_path = goal_path([]).
+empty_goal_path = Empty :-
+    empty_goal_path(Empty).
+
+empty_goal_path(goal_path([])).
 
 singleton_goal_path(Step) = goal_path([Step]).
 
@@ -1056,6 +1065,13 @@ lookup_var_name(VarTable, VarRep, String
 search_var_name(VarTable, VarRep, String) :-
     map.search(VarTable, VarRep, String).
 
+maybe_search_var_name(VarTable, VarRep, MaybeString) :-
+    ( search_var_name(VarTable, VarRep, String) ->
+        MaybeString = yes(String)
+    ;
+        MaybeString = no
+    ).
+
 %-----------------------------------------------------------------------------%
 
 :- pred read_file_as_bytecode(string::in, io.res(bytecode)::out,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20100109/312170fd/attachment.sig>


More information about the reviews mailing list