[m-rev.] diff: Make our implementation match the algorithms in the overlap paper.

Paul Bone pbone at csse.unimelb.edu.au
Sat Jan 22 15:28:23 AEDT 2011
Previous message: [m-rev.] diff: Fix costs of recursive calls in seldom-used branches.
Next message: [m-rev.] diff: fix installation of csharp grade on test hosts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Various fixes, mostly to measuring the overlap between dependant parallel
conjunctions.  These fixes ensure that our implementation now matches the
algorithms in our paper.

Benchmarking can now begin for the paper.

deep_profiler/mdprof_fb.automatic_parallelism.m:
    Remove build_candidate_par_conjunction_maps since it's no-longer called.

    Fix a bug where candidate procedures where generated with no candidate
    conjunctions in them.

    Fix a bug where !ConjNum was not incremented in a loop, this caused
    SparkDelay to be calculated incorrectly when calculating the cost of a
    parallel conjunction.

    Account for the cost of calling signal in the right place when calculating
    the cost of a parallel conjunction.

    Conform to changes in measurements.m.

deep_profiler/mdprof_feedback.m:
    Add the command line option for the barrier cost during parallel execution.

deep_profiler/measurements.m:
    The incomplete parallel exec metrics structure now tracks deadtime due to
    futures explicitly.  Previously it was calculated from other values.

    Conform to the parallel execution time calculations in
    mdprof_fb.automatic_parallelism.m.  Each conjunct is delayed by:
        SparkDelay * (ConjNum - 1) except for the first.

    Fix signal costs, they're now stored with the conjunct that incurred them
    rather than the one that waited on the variable.  This also prevents them
    from being counted more than once.

    Added support for the new parallel execution overhead 'barrier cost'.

mdbcomp/feedback.automatic_parallelism.m:
    Added support for the new parallel execution overhead 'barrier cost'.

    Modified the parallel execution metrics so that different overheads are
    accounted for separately.

    Changed a comment so that it clarifies how the range of goals in the
    push_goal type should be interpreted.

mdbcomp/feedback.m:
    Increment feedback_version.

Index: deep_profiler/mdprof_fb.automatic_parallelism.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_fb.automatic_parallelism.m,v
retrieving revision 1.36
diff -u -p -b -r1.36 mdprof_fb.automatic_parallelism.m
--- deep_profiler/mdprof_fb.automatic_parallelism.m	21 Jan 2011 10:41:00 -0000	1.36
+++ deep_profiler/mdprof_fb.automatic_parallelism.m	22 Jan 2011 04:02:01 -0000
@@ -708,10 +708,17 @@ candidate_parallel_conjunctions_proc(Opt
                 (
                     SeenDuplicateInstantiation =
                         have_not_seen_duplicate_instantiation,
+                    (
+                        Candidates0 = [],
+                        Candidates = map.init
+                    ;
+                        Candidates0 = [_ | _],
                     merge_pushes_for_proc(Pushes, MergedPushes),
-                    CandidateProc = candidate_par_conjunctions_proc(VarTable,
-                        MergedPushes, Candidates0),
-                    svmap.set(ProcLabel, CandidateProc, map.init, Candidates)
+                        CandidateProc = candidate_par_conjunctions_proc(
+                            VarTable, MergedPushes, Candidates0),
+                        map.det_insert(map.init, ProcLabel, CandidateProc,
+                            Candidates)
+                    )
                 ;
                     SeenDuplicateInstantiation = seen_duplicate_instantiation,
                     Candidates = map.init,
@@ -1804,7 +1811,8 @@ find_best_parallelisation_complete_bnb(I
 
 parallelisation_get_objective_value(Parallelisation) = Value :-
     Metrics = Parallelisation ^ bp_par_exec_metrics,
-    Value = Metrics ^ pem_par_time + Metrics ^ pem_par_overheads * 2.0.
+    Value = Metrics ^ pem_par_time +
+        parallel_exec_metrics_get_overheads(Metrics) * 2.0.
 
 :- impure pred generate_parallelisations(implicit_parallelism_info::in,
     best_par_algorithm_simple::in, goals_for_parallelisation::in,
@@ -2099,6 +2107,7 @@ test_dependance(Info, CostData) :-
 
 :- type parallelisation_cost_data
     ---> parallelisation_cost_data(
+                pcd_shared_vars         :: set(var_rep),
                 pcd_par_exec_overlap    :: parallel_execution_overlap,
                 pcd_par_exec_metrics    :: parallel_exec_metrics_incomplete,
                 pcd_productions_map     :: map(var_rep, float)
@@ -2157,6 +2166,40 @@ ip_get_num_goals_middle(Incomplete) = La
     FirstParGoal = Incomplete ^ ip_first_par_goal,
     LastParGoal = Incomplete ^ ip_last_par_goal.
 
+:- func ip_calc_sharedvars_set(incomplete_parallelisation) = set(var_rep).
+
+ip_calc_sharedvars_set(Incomplete) = SharedVars :-
+    ParConjs = ip_get_par_conjs(Incomplete),
+    foldl2(build_sharedvars_set, ParConjs, set.init, _, set.init, SharedVars).
+
+:- pred build_sharedvars_set(seq_conj(pard_goal_detail)::in,
+    set(var_rep)::in, set(var_rep)::out,
+    set(var_rep)::in, set(var_rep)::out) is det.
+
+build_sharedvars_set(seq_conj(Conjs), !BoundVars, !SharedVars) :-
+    foldl2(conj_produced_and_consumed_vars, Conjs, set.init, ProducedVars,
+        set.init, ConsumedVars),
+    % The new shared vars are previously bound variables that are cosumed in
+    % this conjunct.  This must be calculated before !BoundVars is updated.
+    SharedVars = intersect(!.BoundVars, ConsumedVars),
+    !:SharedVars = union(!.SharedVars, SharedVars),
+    !:BoundVars = union(!.BoundVars, ProducedVars).
+
+    % Build sets of produced and consumed vars for a conjunct in a conjunction.
+    % Use with foldl to build these sets up for the whole conjunction.  At the
+    % end of a conjunction there may be variables in the intersection of the
+    % sets, that's okay, those goals are produced early in the conjunction and
+    % consumed later in the conjunction.
+    %
+:- pred conj_produced_and_consumed_vars(pard_goal_detail::in,
+    set(var_rep)::in, set(var_rep)::out,
+    set(var_rep)::in, set(var_rep)::out) is det.
+
+conj_produced_and_consumed_vars(Conj, !Produced, !Consumed) :-
+    InstMapInfo = Conj ^ goal_annotation ^ pgd_inst_map_info,
+    !:Produced = union(!.Produced, InstMapInfo ^ im_bound_vars),
+    !:Consumed = union(!.Consumed, InstMapInfo ^ im_consumed_vars).
+
 :- pred start_building_parallelisation(implicit_parallelism_info::in,
     goals_for_parallelisation::in,
     incomplete_parallelisation::out) is det.
@@ -2187,7 +2230,7 @@ finalise_parallelisation(Incomplete, Bes
         MaybeCostData = no,
         unexpected($module, $pred, "parallelisation has no cost data")
     ),
-    CostData = parallelisation_cost_data(Overlap, Metrics0, _),
+    CostData = parallelisation_cost_data(_, Overlap, Metrics0, _),
 
     Metrics = finalise_parallel_exec_metrics(Metrics0),
     par_conj_overlap_is_dependent(Overlap, IsDependent),
@@ -2275,13 +2318,15 @@ calculate_parallel_cost(CostData, !Paral
     Opts = Info ^ ipi_opts,
     SparkCost = Opts ^ cpcp_sparking_cost,
     SparkDelay = Opts ^ cpcp_sparking_delay,
+    BarrierCost = Opts ^ cpcp_barrier_cost,
     ContextWakeupDelay = Opts ^ cpcp_context_wakeup_delay,
     Metrics0 = init_empty_parallel_exec_metrics(CostBeforePercall,
         CostAfterPercall, NumCalls, float(SparkCost), float(SparkDelay),
-        float(ContextWakeupDelay)),
+        float(BarrierCost), float(ContextWakeupDelay)),
     Overlap0 = peo_empty_conjunct,
 
-    CostData0 = parallelisation_cost_data(Overlap0, Metrics0, init),
+    SharedVars = ip_calc_sharedvars_set(!.Parallelisation),
+    CostData0 = parallelisation_cost_data(SharedVars, Overlap0, Metrics0, init),
     NumMiddleGoals = ip_get_num_goals_middle(!.Parallelisation),
     foldl3(calculate_parallel_cost_step(Info, NumMiddleGoals), ParConj, 1, _,
         0, _, CostData0, CostData),
@@ -2316,7 +2361,8 @@ maybe_calc_sequential_cost(GetMaybe, Set
 
 calculate_parallel_cost_step(Info, NumMiddleGoals, Conjunct, !ConjNum,
         !NumGoals, !CostData) :-
-    !.CostData = parallelisation_cost_data(Overlap0, Metrics0, PM0),
+    !.CostData = parallelisation_cost_data(SharedVars, Overlap0, Metrics0,
+        PM0),
     !:NumGoals = !.NumGoals + length(Conjuncts),
     ( !.NumGoals = NumMiddleGoals ->
         IsLastConjunct = is_last_par_conjunct
@@ -2324,26 +2370,28 @@ calculate_parallel_cost_step(Info, NumMi
         IsLastConjunct = not_last_par_conjunct
     ),
     Conjunct = seq_conj(Conjuncts),
-    calculate_parallel_cost_step(Info, IsLastConjunct, Conjuncts, !ConjNum,
-        PM0, PM, Overlap0, Overlap, Metrics0, Metrics),
-    !:CostData = parallelisation_cost_data(Overlap, Metrics, PM).
+    calculate_parallel_cost_step(Info, SharedVars, IsLastConjunct, Conjuncts,
+        !ConjNum, PM0, PM, Overlap0, Overlap, Metrics0, Metrics),
+    !:CostData = parallelisation_cost_data(SharedVars, Overlap, Metrics, PM).
 
 :- pred calculate_parallel_cost_step(implicit_parallelism_info::in,
-    is_last_par_conjunct::in, list(pard_goal_detail)::in, int::in, int::out,
-    map(var_rep, float)::in, map(var_rep, float)::out,
+    set(var_rep)::in, is_last_par_conjunct::in, list(pard_goal_detail)::in,
+    int::in, int::out, map(var_rep, float)::in, map(var_rep, float)::out,
     parallel_execution_overlap::in, parallel_execution_overlap::out,
     parallel_exec_metrics_incomplete::in,
     parallel_exec_metrics_incomplete::out) is det.
 
-calculate_parallel_cost_step(Info, IsLastConjunct, Conjunct, !ConjNum,
-        !ProductionsMap, !Overlap, !Metrics) :-
+calculate_parallel_cost_step(Info, AllSharedVars, IsLastConjunct, Conjunct,
+        !ConjNum, !ProductionsMap, !Overlap, !Metrics) :-
     Algorithm = Info ^ ipi_opts ^ cpcp_parallelise_dep_conjs,
 
     Calls = parallel_exec_metrics_get_num_calls(!.Metrics),
     conj_calc_cost(Conjunct, Calls, CostB0),
     CostB = goal_cost_get_percall(CostB0),
-    foldl(pardgoal_consumed_vars_accum, Conjunct, set.init,
-        RightConsumedVars),
+    foldl2(conj_produced_and_consumed_vars, Conjunct,
+        set.init, RightProducedVars0, set.init, RightConsumedVars0),
+    RightProducedVars = intersect(RightProducedVars0, AllSharedVars),
+    RightConsumedVars = intersect(RightConsumedVars0, AllSharedVars),
     ProducedVars =
         set.from_sorted_list(map.sorted_keys(!.ProductionsMap)),
     Vars = set.intersect(ProducedVars, RightConsumedVars),
@@ -2358,60 +2406,73 @@ calculate_parallel_cost_step(Info, IsLas
     % the additional cost of sparking them.
     (
         IsLastConjunct = not_last_par_conjunct,
-        SparkCost = Info ^ ipi_opts ^ cpcp_sparking_cost,
-        StartTime = StartTime0 + float(SparkCost)
+        SparkCost = float(Info ^ ipi_opts ^ cpcp_sparking_cost)
     ;
         IsLastConjunct = is_last_par_conjunct,
-        StartTime = StartTime0
+        SparkCost = 0.0
     ),
+    StartTime = StartTime0 + SparkCost,
 
     (
         Algorithm = parallelise_dep_conjs_overlap,
 
         % Get the list of variables consumed by this conjunct
         % that will be turned into futures.
-        foldl3(get_consumptions_list, Conjunct, Vars, _, 0.0, _,
-            [], ConsumptionsList0),
-        reverse(ConsumptionsList0, ConsumptionsList),
+        foldl4(get_consumptions_and_productions_list, Conjunct, Vars, _,
+            RightProducedVars, _, 0.0, _,
+            [], ConsumptionsAndProductionsList0),
+        reverse(ConsumptionsAndProductionsList0,
+            ConsumptionsAndProductionsList),
 
         % Determine how the parallel conjuncts overlap.
         foldl5(calculate_dependent_parallel_cost_2(Info, !.ProductionsMap),
-            ConsumptionsList, 0.0, LastSeqConsumeTime,
+            ConsumptionsAndProductionsList, 0.0, LastSeqConsumeTime,
             StartTime, LastParConsumeTime, StartTime, LastResumeTime,
             [], RevExecution0, map.init, ConsumptionsMap),
 
         % Calculate the point at which this conjunct finishes execution
         % and complete the RevExecutions structure..
         reverse(RevExecution, Execution),
-        CostBPar = LastParConsumeTime + (CostB - LastSeqConsumeTime),
-        RevExecution = [ (LastResumeTime - CostBPar) | RevExecution0 ],
+        CostBParElapsed = LastParConsumeTime + (CostB - LastSeqConsumeTime),
+        RevExecution = [ (LastResumeTime - CostBParElapsed) | RevExecution0 ],
 
-        CostSignals =
-            float(Info ^ ipi_opts ^ cpcp_future_signal_cost * count(Vars))
+        CostSignals = float(Info ^ ipi_opts ^ cpcp_future_signal_cost *
+            count(RightProducedVars)),
+        CostWaits = float(Info ^ ipi_opts ^ cpcp_future_wait_cost *
+            count(Vars)),
+        calc_cost_and_dead_time(Execution, CostBPar, DeadTime)
     ;
         ( Algorithm = parallelise_dep_conjs_naive
         ; Algorithm = do_not_parallelise_dep_conjs
         ; Algorithm = parallelise_dep_conjs_num_vars
         ),
 
-        CostBPar = StartTime + CostB,
-        Execution = [StartTime - CostBPar],
+        CostBPar = CostB + SparkCost,
+        Execution = [StartTime - (StartTime + CostB)],
         ConsumptionsMap = init,
-        CostSignals = 0.0
+        CostSignals = 0.0,
+        CostWaits = 0.0,
+        DeadTime = 0.0
     ),
 
+    % CostB    - the cost of B if it where to be executed in sequence.
+    % CostBPar - CostB plus the overheads of parallel exection (not including
+    %            the dead time).
+    % DeadTime - The time that B spends blocked on other computations.
+    % XXX: Need to account for SparkDelay here,
     !:Metrics = init_parallel_exec_metrics_incomplete(!.Metrics, CostSignals,
-        CostB, CostBPar),
+        CostWaits, CostB, CostBPar, DeadTime),
 
     % Build the productions map for the next conjunct. This map contains
     % all the variables produced by this code, not just that are used for
     % dependent parallelisation.
-    foldl3(get_productions_map, Conjunct, StartTime, _, Execution, _,
-        !ProductionsMap),
+    foldl3(get_productions_map(RightProducedVars), Conjunct, StartTime, _,
+        Execution, _, !ProductionsMap),
 
     DepConjExec = dependent_conjunct_execution(Execution,
         !.ProductionsMap, ConsumptionsMap),
-    !:Overlap = peo_conjunction(!.Overlap, DepConjExec, Vars).
+    !:Overlap = peo_conjunction(!.Overlap, DepConjExec, Vars),
+    !:ConjNum = !.ConjNum + 1.
 
     % calculate_dependent_parallel_cost_2(Info, ProductionsMap,
     %   Var - SeqConsTime, !PrevSeqConsumeTime, !PrevParConsumeTime,
@@ -2424,8 +2485,9 @@ calculate_parallel_cost_step(Info, IsLas
     %
     % * Var: The current variable under consideration.
     %
-    % * SeqConsTime: The consumption time of the Var during sequential
-    %   execution.
+    % * SeqConsTime: The type of event for this variable in this conjunct and
+    %   the time at which it occurs.  It is either consumed or produced by this
+    %   conjunct.
     %
     % * !PrevSeqConsumeTime: Accumulates the time of the previous consumption
     %   during sequential execution, or if there is none it represents the
@@ -2447,16 +2509,37 @@ calculate_parallel_cost_step(Info, IsLas
     % * !ConsumptionsMap: Accumuates a map of variable consumptions.
     %
 :- pred calculate_dependent_parallel_cost_2(implicit_parallelism_info::in,
-    map(var_rep, float)::in, pair(var_rep, float)::in, float::in, float::out,
-    float::in, float::out, float::in, float::out,
+    map(var_rep, float)::in, pair(var_rep, production_or_consumption)::in,
+    float::in, float::out, float::in, float::out, float::in, float::out,
     assoc_list(float, float)::in, assoc_list(float, float)::out,
     map(var_rep, float)::in, map(var_rep, float)::out) is det.
 
-calculate_dependent_parallel_cost_2(Info, ProductionsMap, Var - SeqConsTime,
+calculate_dependent_parallel_cost_2(Info, ProductionsMap, Var - SeqEventTime,
         !PrevSeqConsumeTime, !PrevParConsumeTime, !ResumeTime,
         !RevExecution, !ConsumptionsMap) :-
-    map.lookup(ProductionsMap, Var, ProdTime),
+    (
+        SeqEventTime = consumption(SeqConsTime),
+        calculate_dependent_parallel_cost_consumption(Info, ProductionsMap,
+            Var - SeqConsTime, !PrevSeqConsumeTime, !PrevParConsumeTime,
+            !ResumeTime, !RevExecution, !ConsumptionsMap)
+    ;
+        SeqEventTime = production(SeqProdTime),
+        calculate_dependent_parallel_cost_production(Info, SeqProdTime,
+            !PrevSeqConsumeTime, !PrevParConsumeTime, !ResumeTime,
+            !RevExecution, !ConsumptionsMap)
+    ).
+
+:- pred calculate_dependent_parallel_cost_consumption(
+    implicit_parallelism_info::in, map(var_rep, float)::in,
+    pair(var_rep, float)::in, float::in, float::out,
+    float::in, float::out, float::in, float::out,
+    assoc_list(float, float)::in, assoc_list(float, float)::out,
+    map(var_rep, float)::in, map(var_rep, float)::out) is det.
 
+calculate_dependent_parallel_cost_consumption(Info, ProductionsMap,
+        Var - SeqConsTime, !PrevSeqConsumeTime, !PrevParConsumeTime,
+        !ResumeTime, !RevExecution, !ConsumptionsMap) :-
+    map.lookup(ProductionsMap, Var, ProdTime),
     % Consider (P & Q):
     %
     % Q cannot consume the variable until P produces it. Also Q cannot consume
@@ -2495,6 +2578,22 @@ calculate_dependent_parallel_cost_2(Info
 
     svmap.det_insert(Var, ParConsTime, !ConsumptionsMap).
 
+:- pred calculate_dependent_parallel_cost_production(
+    implicit_parallelism_info::in, float::in, float::in, float::out,
+    float::in, float::out, float::in, float::out,
+    assoc_list(float, float)::in, assoc_list(float, float)::out,
+    map(var_rep, float)::in, map(var_rep, float)::out) is det.
+
+calculate_dependent_parallel_cost_production(Info,
+        SeqProdTime, !PrevSeqConsumeTime, !PrevParConsumeTime,
+        !ResumeTime, !RevExecution, !ConsumptionsMap) :-
+    SignalCost = float(Info ^ ipi_opts ^ cpcp_future_signal_cost),
+
+    ParProdTime = !.PrevParConsumeTime +
+        (SeqProdTime - !.PrevSeqConsumeTime) + SignalCost,
+    !:PrevSeqConsumeTime = SeqProdTime,
+    !:PrevParConsumeTime = ParProdTime.
+
 :- pred par_conj_overlap_is_dependent(parallel_execution_overlap::in,
     conjuncts_are_dependent::out) is det.
 
@@ -2610,13 +2709,15 @@ build_dependency_graph([PG | PGs], ConjN
     %
     % Build a map of variable productions in Goals.
     %
-:- pred get_productions_map(pard_goal_detail::in, float::in, float::out,
+:- pred get_productions_map(set(var_rep)::in, pard_goal_detail::in,
+    float::in, float::out,
     assoc_list(float, float)::in, assoc_list(float, float)::out,
     map(var_rep, float)::in, map(var_rep, float)::out) is det.
 
-get_productions_map(Goal, !Time, !Executions, !Map) :-
+get_productions_map(Vars, Goal, !Time, !Executions, !Map) :-
     InstMapInfo = Goal ^ goal_annotation ^ pgd_inst_map_info,
-    BoundVars = InstMapInfo ^ im_bound_vars,
+    BoundVars0 = InstMapInfo ^ im_bound_vars,
+    BoundVars = intersect(BoundVars0, Vars),
     adjust_time_for_waits(!Time, !Executions),
     fold(var_production_time_to_map(!.Time, Goal), BoundVars, !Map),
     !:Time = !.Time + goal_cost_get_percall(Goal ^ goal_annotation ^ pgd_cost).
@@ -2677,6 +2778,26 @@ adjust_time_for_waits_2(LastEnd, !Time, 
 
 adjust_time_for_waits_epsilon = 0.0001.
 
+    % Calculate the time spend during execution and the time spent between
+    % executions (dead time).
+    %
+:- pred calc_cost_and_dead_time(assoc_list(float, float)::in, float::out,
+    float::out) is det.
+
+calc_cost_and_dead_time([], 0.0, 0.0).
+calc_cost_and_dead_time([Start - Stop | Executions], !:Time, DeadTime) :-
+    !:Time = Stop - Start,
+    calc_cost_and_dead_time_2(Executions, Stop, !Time, 0.0, DeadTime).
+
+:- pred calc_cost_and_dead_time_2(assoc_list(float, float)::in, float::in,
+    float::in, float::out, float::in, float::out) is det.
+
+calc_cost_and_dead_time_2([], _, !Time, !DeadTime).
+calc_cost_and_dead_time_2([Start - Stop | Executions], LastStop, !Time, !DeadTime) :-
+    !:Time = !.Time + Stop - Start,
+    !:DeadTime = !.DeadTime + Start - LastStop,
+    calc_cost_and_dead_time_2(Executions, Stop, !Time, !DeadTime).
+
     % var_production_time_to_map(TimeBefore, Goal, Var, !Map).
     %
     % Find the latest production time of Var in Goal, and add TimeBefore + the
@@ -2690,41 +2811,95 @@ var_production_time_to_map(TimeBefore, G
     var_first_use_time(find_production, TimeBefore, Goal, Var, Time),
     svmap.det_insert(Var, Time, !Map).
 
+    % Either a production or consumption time.  Consumptions should sort before
+    % productions.
+    %
+:- type production_or_consumption
+    --->    consumption(float)
+    ;       production(float).
+
     % foldl(get_consumptions_list(Vars), Goals, 0.0, _, [], RevConsumptions),
     %
     % Compute the order and time of variable consumptions in goals.
     %
-:- pred get_consumptions_list(pard_goal_detail::in,
+:- pred get_consumptions_and_productions_list(pard_goal_detail::in,
+    set(var_rep)::in, set(var_rep)::out,
     set(var_rep)::in, set(var_rep)::out, float::in, float::out,
-    assoc_list(var_rep, float)::in, assoc_list(var_rep, float)::out) is det.
+    assoc_list(var_rep, production_or_consumption)::in,
+    assoc_list(var_rep, production_or_consumption)::out) is det.
 
-get_consumptions_list(Goal, !Vars, !Time, !List) :-
+get_consumptions_and_productions_list(Goal, !ConsumedVars, !ProducedVars,
+        !Time, !List) :-
     InstMapInfo = Goal ^ goal_annotation ^ pgd_inst_map_info,
+
     AllConsumptionVars = InstMapInfo ^ im_consumed_vars,
-    ConsumptionVars = intersect(!.Vars, AllConsumptionVars),
+    ConsumptionVars = intersect(!.ConsumedVars, AllConsumptionVars),
     map(var_consumptions(!.Time, Goal),
         ConsumptionVars, ConsumptionTimesSet0),
+    !:ConsumedVars = difference(!.ConsumedVars, ConsumptionVars),
     % Since we re-sort the list we don't need a sorted one to start with,
     % but the set module doesn't export a "to_list" predicate. (Getting
     % a sorted list has no cost since the set is a sorted list internally).
     set.to_sorted_list(ConsumptionTimesSet0, ConsumptionTimes0),
-    CompareTimes = (
-        pred((_ - TimeA)::in, (_ - TimeB)::in, Result::out) is det :-
+    list.sort(compare_times, ConsumptionTimes0, ConsumptionTimes),
+
+    AllProductionVars = InstMapInfo ^ im_bound_vars,
+    ProductionVars = intersect(!.ProducedVars, AllProductionVars),
+    map(var_productions(!.Time, Goal),
+        ProductionVars, ProductionTimesSet0),
+    !:ProducedVars = difference(!.ProducedVars, ProductionVars),
+    set.to_sorted_list(ProductionTimesSet0, ProductionTimes0),
+    list.sort(compare_times, ProductionTimes0, ProductionTimes),
+
+    merge_consumptions_and_productions(ConsumptionTimes, ProductionTimes,
+        ConsumptionAndProductionTimes),
+    !:List = ConsumptionAndProductionTimes ++ !.List,
+    !:Time = !.Time + goal_cost_get_percall(Goal ^ goal_annotation ^ pgd_cost).
+
+:- pred compare_times(pair(A, float)::in, pair(A, float)::in,
+    comparison_result::out) is det.
+
+compare_times(_ - TimeA, _ - TimeB, Result) :-
             % Note that the Time arguments are swapped, this list must be
             % produced in latest to earliest order.
-            compare(Result, TimeB, TimeA)
-    ),
-    list.sort(CompareTimes, ConsumptionTimes0, ConsumptionTimes),
-    !:List = ConsumptionTimes ++ !.List,
-    !:Vars = difference(!.Vars, ConsumptionVars),
-    !:Time = !.Time + goal_cost_get_percall(Goal ^ goal_annotation ^ pgd_cost).
+    compare(Result, TimeB, TimeA).
+
+:- pred merge_consumptions_and_productions(
+    assoc_list(var_rep, float)::in, assoc_list(var_rep, float)::in,
+    assoc_list(var_rep, production_or_consumption)::out) is det.
+
+merge_consumptions_and_productions([], [], []).
+merge_consumptions_and_productions([],
+        [Var - Time | Prods0], [Var - production(Time) | Prods]) :-
+    merge_consumptions_and_productions([], Prods0, Prods).
+merge_consumptions_and_productions([Var - Time | Cons0], [],
+        [Var - consumption(Time) | Cons]) :-
+    merge_consumptions_and_productions(Cons0, [], Cons).
+merge_consumptions_and_productions(Cons@[ConsVar - ConsTime | Cons0],
+        Prods@[ProdVar - ProdTime | Prods0], [ProdOrCons | ProdsAndCons]) :-
+    ( ProdTime < ConsTime ->
+        % Order earlier events first,
+        ProdOrCons = ProdVar - production(ProdTime),
+        merge_consumptions_and_productions(Cons, Prods0, ProdsAndCons)
+    ;
+        % In this branch either the consumption occurs first or the events
+        % occur at the same time in which case we order consumptions first.
+        ProdOrCons = ConsVar - consumption(ConsTime),
+        merge_consumptions_and_productions(Cons0, Prods, ProdsAndCons)
+    ).
 
 :- pred var_consumptions(float::in, pard_goal_detail::in, var_rep::in,
-    pair.pair(var_rep, float)::out) is det.
+    pair(var_rep, float)::out) is det.
 
 var_consumptions(TimeBefore, Goal, Var, Var - Time) :-
     var_first_use_time(find_consumption, TimeBefore, Goal, Var, Time).
 
+:- pred var_productions(float::in, pard_goal_detail::in, var_rep::in,
+    pair(var_rep, float)::out) is det.
+
+var_productions(TimeBefore, Goal, Var, Var - Time) :-
+    var_first_use_time(find_production, TimeBefore, Goal, Var, Time).
+
 :- type find_production_or_consumption
     --->    find_production
     ;       find_consumption.
@@ -3746,7 +3921,9 @@ create_candidate_parallel_conj_report(Va
         MaybePushGoal, FirstConjNum, IsDependent, GoalsBefore, GoalsBeforeCost,
         Conjs, GoalsAfter, GoalsAfterCost, ParExecMetrics),
     ParExecMetrics = parallel_exec_metrics(NumCalls, SeqTime, ParTime,
-        ParOverheads, FirstConjDeadTime, FutureDeadTime),
+        SparkCost, BarrierCost, SignalsCost, WaitsCost, FirstConjDeadTime,
+        FutureDeadTime),
+    ParOverheads = parallel_exec_metrics_get_overheads(ParExecMetrics),
     (
         IsDependent = conjuncts_are_independent,
         DependanceString = "no"
@@ -3786,7 +3963,11 @@ create_candidate_parallel_conj_report(Va
         "      NumCalls: %s\n" ++
         "      SeqTime: %s\n" ++
         "      ParTime: %s\n" ++
-        "      ParOverheads: %s\n" ++
+        "      SparkCost: %s\n" ++
+        "      BarrierCost: %s\n" ++
+        "      SignalsCost: %s\n" ++
+        "      WaitsCost: %s\n" ++
+        "      ParOverheads total: %s\n" ++
         "      Speedup: %s\n" ++
         "      Time saving: %s\n" ++
         "      First conj dead time: %s\n" ++
@@ -3796,6 +3977,10 @@ create_candidate_parallel_conj_report(Va
          s(commas(NumCalls)),
          s(two_decimal_fraction(SeqTime)),
          s(two_decimal_fraction(ParTime)),
+         s(two_decimal_fraction(SparkCost)),
+         s(two_decimal_fraction(BarrierCost)),
+         s(two_decimal_fraction(SignalsCost)),
+         s(two_decimal_fraction(WaitsCost)),
          s(two_decimal_fraction(ParOverheads)),
          s(four_decimal_fraction(Speedup)),
          s(two_decimal_fraction(TimeSaving)),
Index: deep_profiler/mdprof_feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/mdprof_feedback.m,v
retrieving revision 1.32
diff -u -p -b -r1.32 mdprof_feedback.m
--- deep_profiler/mdprof_feedback.m	15 Dec 2010 06:30:33 -0000	1.32
+++ deep_profiler/mdprof_feedback.m	22 Jan 2011 04:02:01 -0000
@@ -178,9 +178,9 @@ create_feedback_report(feedback_data_can
         Parameters, Conjs), Report) :-
     NumConjs = length(Conjs),
     Parameters = candidate_par_conjunctions_params(DesiredParallelism,
-        IntermoduleVarUse, SparkingCost, SparkingDelay, SignalCost, WaitCost,
-        ContextWakeupDelay, CliqueThreshold, CallSiteThreshold,
-        ParalleliseDepConjs, BestParAlgorithm),
+        IntermoduleVarUse, SparkingCost, SparkingDelay, BarrierCost,
+        SignalCost, WaitCost, ContextWakeupDelay, CliqueThreshold,
+        CallSiteThreshold, ParalleliseDepConjs, BestParAlgorithm),
     best_par_algorithm_string(BestParAlgorithm, BestParAlgorithmStr),
     ReportHeader = singleton(format(
         "  Candidate Parallel Conjunctions:\n" ++
@@ -188,6 +188,7 @@ create_feedback_report(feedback_data_can
         "    Intermodule var use: %s\n" ++
         "    Sparking cost: %d\n" ++
         "    Sparking delay: %d\n" ++
+        "    Barrier cost: %d\n" ++
         "    Future signal cost: %d\n" ++
         "    Future wait cost: %d\n" ++
         "    Context wakeup delay: %d\n" ++
@@ -201,6 +202,7 @@ create_feedback_report(feedback_data_can
          s(string(IntermoduleVarUse)),
          i(SparkingCost),
          i(SparkingDelay),
+         i(BarrierCost),
          i(SignalCost),
          i(WaitCost),
          i(ContextWakeupDelay),
@@ -276,6 +278,9 @@ help_message =
                 The time taken from the time a spark is created until the spark
                 is executed by another processor, assuming that there is a free
                 processor.
+    --implicit-parallelism-barrier-cost <value>
+                The cost of executing the barrier code at the end of each
+                parallel conjunct.
     --implicit-parallelism-future-signal-cost <value>
                 The cost of the signal() call for the producer of a shared
                 variable, measured in the profiler's call sequence counts.
@@ -410,6 +415,7 @@ read_deep_file(Input, Debug, MaybeDeep, 
     ;       implicit_parallelism_intermodule_var_use
     ;       implicit_parallelism_sparking_cost
     ;       implicit_parallelism_sparking_delay
+    ;       implicit_parallelism_barrier_cost
     ;       implicit_parallelism_future_signal_cost
     ;       implicit_parallelism_future_wait_cost
     ;       implicit_parallelism_context_wakeup_delay
@@ -460,6 +466,8 @@ long("implicit-parallelism-sparking-dela
     implicit_parallelism_sparking_delay).
 long("implicit-parallelism-future-signal-cost",
     implicit_parallelism_future_signal_cost).
+long("implicit-parallelism-barrier-cost",
+    implicit_parallelism_barrier_cost).
 long("implicit-parallelism-future-wait-cost",
     implicit_parallelism_future_wait_cost).
 long("implicit-parallelism-context-wakeup-delay",
@@ -495,6 +503,7 @@ defaults(desired_parallelism,           
 defaults(implicit_parallelism_intermodule_var_use,          bool(no)).
 defaults(implicit_parallelism_sparking_cost,                int(100)).
 defaults(implicit_parallelism_sparking_delay,               int(1000)).
+defaults(implicit_parallelism_barrier_cost,                 int(100)).
 defaults(implicit_parallelism_future_signal_cost,           int(100)).
 defaults(implicit_parallelism_future_wait_cost,             int(250)).
 defaults(implicit_parallelism_context_wakeup_delay,         int(1000)).
@@ -600,6 +609,8 @@ check_options(Options0, RequestedFeedbac
             SparkingCost),
         lookup_int_option(Options, implicit_parallelism_sparking_delay,
             SparkingDelay),
+        lookup_int_option(Options, implicit_parallelism_barrier_cost,
+            BarrierCost),
         lookup_int_option(Options, implicit_parallelism_future_signal_cost,
             FutureSignalCost),
         lookup_int_option(Options, implicit_parallelism_future_wait_cost,
@@ -656,6 +667,7 @@ check_options(Options0, RequestedFeedbac
                 IntermoduleVarUse,
                 SparkingCost,
                 SparkingDelay,
+                BarrierCost,
                 FutureSignalCost,
                 FutureWaitCost,
                 ContextWakeupDelay,
Index: deep_profiler/measurements.m
===================================================================
RCS file: /home/mercury1/repository/mercury/deep_profiler/measurements.m,v
retrieving revision 1.27
diff -u -p -b -r1.27 measurements.m
--- deep_profiler/measurements.m	21 Jan 2011 10:41:00 -0000	1.27
+++ deep_profiler/measurements.m	22 Jan 2011 04:16:12 -0000
@@ -276,17 +276,17 @@
     % variables in A.
     %
 :- func init_parallel_exec_metrics_incomplete(parallel_exec_metrics_incomplete,
-    float, float, float) = parallel_exec_metrics_incomplete.
+    float, float, float, float, float) = parallel_exec_metrics_incomplete.
 
     % StartMetrics = init_empty_parallel_exec_metrics(CostBefore, CostAfter,
-    %   NumCalls, SparkCost, SparkDelay, ContextWakeupDelay).
+    %   NumCalls, SparkCost, SparkDelay, BarrierCost, ContextWakeupDelay).
     %
     % Use this function to start with an empty set of metrics for an empty
     % conjunction.  Then use init_parallel_exec_metrics_incomplete to continue
     % adding conjuncts on the right.
     %
 :- func init_empty_parallel_exec_metrics(float, float, int, float, float,
-    float) = parallel_exec_metrics_incomplete.
+    float, float) = parallel_exec_metrics_incomplete.
 
     % Metrics = finalise_parallel_exec_metrics(IncompleteMetrics).
     %
@@ -998,6 +998,8 @@ exceeded_desired_parallelism(DesiredPara
 
                 pemi_spark_delay            :: float,
 
+                pemi_barrier_cost           :: float,
+
                 pemi_context_wakeup_delay   :: float,
 
                 % If there are no internal conjuncts then the parallel
@@ -1008,58 +1010,70 @@ exceeded_desired_parallelism(DesiredPara
 
 :- type parallel_exec_metrics_internal
     --->    pem_left_most(
-                pemi_time_seq               :: float,
-                pemi_time_par               :: float
+                pemi_time_left_seq          :: float,
+                pemi_time_left_par          :: float,
+                pemi_time_left_signals      :: float
+
+                % While the leftmost conjunct does have dead time it's not
+                % possible to calculate this until we know the parallel
+                % execution time of the whole conjunction, therefore it is not
+                % included here and is in parallel_exec_metrics_incomplete.
             )
     ;       pem_additional(
                 % The time of the left conjunct (that may be a conjunction).
                 pemi_time_left              :: parallel_exec_metrics_internal,
 
-                % The additional cost of calling signal within the left
-                % conjunct.
-                % NOTE: Note that this should be added to each of the
-                % individual conjuncts _where_ they call signal but thta is
-                % more difficult and may not be required.  We may visit it
-                % in the future.
-                pemi_time_left_signals      :: float,
+                % The additional cost of calling signal during this conjunct.
+                pemi_time_signals           :: float,
+
+                % The additional cost of calling wait during this conjunct.
+                pemi_time_waits             :: float,
 
                 % The time of the right conjunct if it is running after
                 % the left in normal sequential execution.
                 pemi_time_right_seq         :: float,
 
-                % The time of the right conjunct if it is running in
-                % parallel with the left conjunct.  It may have to stop and
-                % wait for variables to be produced; therefore this time is
-                % different to time_right_seq.  This time also includes
-                % parallel execution overheads and delays.
-                pemi_time_right_par         :: float
+                % The time of the right conjunct if it is running in parallel
+                % with the left conjunct.  Overheads are included in this value
+                % so it will usually be larger than time_right_seq.
+                pemi_time_right_par         :: float,
+
+                % THe dead time of this conjunct, This is the time that the
+                % contaxt will be blocked on futures.  It does not include the
+                % spark delay because the contact may not exist for most of
+                % that time.
+                pemi_time_right_dead        :: float
             ).
 
-init_parallel_exec_metrics_incomplete(Metrics0, TimeSignals, TimeBSeq,
-        TimeBPar) = Metrics :-
+init_parallel_exec_metrics_incomplete(Metrics0, TimeSignals, TimeWaits,
+        TimeBSeq, TimeBPar, TimeBDead) = Metrics :-
     MaybeInternal0 = Metrics0 ^ pemi_internal,
     (
         MaybeInternal0 = yes(Internal0),
-        Internal = pem_additional(Internal0, TimeSignals, TimeBSeq, TimeBPar)
+        Internal = pem_additional(Internal0, TimeSignals, TimeWaits, TimeBSeq,
+            TimeBPar, TimeBDead)
     ;
         MaybeInternal0 = no,
-        Internal = pem_left_most(TimeBSeq, TimeBPar),
-        ( unify(TimeSignals, 0.0) ->
+        Internal = pem_left_most(TimeBSeq, TimeBPar, TimeSignals),
+        (
+            TimeBDead = 0.0,
+            TimeWaits = 0.0
+        ->
             true
         ;
-            unexpected($module, $pred, "TimeSignal != 0")
+            unexpected($module, $pred, "TimeWaits != 0 or TimeBDead != 0")
         )
     ),
     Metrics = Metrics0 ^ pemi_internal := yes(Internal).
 
 init_empty_parallel_exec_metrics(TimeBefore, TimeAfter, NumCalls, SparkCost,
-        SparkDelay, ContextWakeupDelay) =
+        SparkDelay, BarrierCost, ContextWakeupDelay) =
     pem_incomplete(TimeBefore, TimeAfter, NumCalls, SparkCost, SparkDelay,
-        ContextWakeupDelay, no).
+        BarrierCost, ContextWakeupDelay, no).
 
 finalise_parallel_exec_metrics(IncompleteMetrics) = Metrics :-
     IncompleteMetrics = pem_incomplete(TimeBefore, TimeAfter, NumCalls,
-        SparkCost, SparkDelay, ContextWakeupDelay, MaybeInternal),
+        SparkCost, SparkDelay, BarrierCost, ContextWakeupDelay, MaybeInternal),
     (
         MaybeInternal = yes(Internal)
     ;
@@ -1069,8 +1083,9 @@ finalise_parallel_exec_metrics(Incomplet
     BeforeAndAfterTime = TimeBefore + TimeAfter,
 
     % Calculate par time.
-    InnerParTime = parallel_exec_metrics_internal_get_par_time(Internal),
-    FirstConjParTime = pem_get_first_conj_par_time(Internal),
+    NumConjuncts = parallel_exec_metrics_internal_get_num_conjs(Internal),
+    InnerParTime = parallel_exec_metrics_internal_get_par_time(Internal,
+        SparkDelay, NumConjuncts),
     ( FirstConjDeadTime > 0.0 ->
         FirstConjWakeupPenalty = ContextWakeupDelay
     ;
@@ -1083,32 +1098,51 @@ finalise_parallel_exec_metrics(Incomplet
     SeqTime = InnerSeqTime + BeforeAndAfterTime,
 
     % Calculate the amount of time that the first conjunct is blocked for.
+    FirstConjParTime = pem_get_first_conj_par_time(Internal),
     FirstConjDeadTime = InnerParTime - FirstConjParTime,
 
     % Calculate the amount of time that the conjunction spends blocking on
     % futures.
-    FutureDeadTime = pem_get_future_dead_time(Internal, yes, SparkCost,
-        SparkDelay),
+    FutureDeadTime = pem_get_future_dead_time(Internal),
 
     % Calculate the overheads of parallelisation.
-    ParOverheads = pem_get_par_overheads(Internal),
-
-    Metrics = parallel_exec_metrics(NumCalls, SeqTime, ParTime, ParOverheads,
-        FirstConjDeadTime, FutureDeadTime).
+    %
+    % These are already included in ParTime, we don't need to add them, just
+    % calculate what they would be for reporting.
+    SparkCosts = float(NumConjuncts - 1) * SparkCost,
+    BarrierCosts = float(NumConjuncts) * BarrierCost,
+    SignalCosts = pem_get_signal_costs(Internal),
+    WaitCosts = pem_get_wait_costs(Internal),
+
+    Metrics = parallel_exec_metrics(NumCalls, SeqTime, ParTime, SparkCosts,
+        BarrierCosts, SignalCosts, WaitCosts, FirstConjDeadTime,
+        FutureDeadTime).
 
 parallel_exec_metrics_get_num_calls(Pem) =
     Pem ^ pemi_num_calls.
 
-    % The expected parallel execution time.
+:- func parallel_exec_metrics_internal_get_num_conjs(
+    parallel_exec_metrics_internal) = int.
+
+parallel_exec_metrics_internal_get_num_conjs(pem_left_most(_, _, _)) = 1.
+parallel_exec_metrics_internal_get_num_conjs(
+        pem_additional(Left, _, _, _, _, _)) =
+    1 + parallel_exec_metrics_internal_get_num_conjs(Left).
+
+    % The expected parallel execution time, because this is the elapsed
+    % execution time of the whole parallel conjunct it must include dead time.
     %
 :- func parallel_exec_metrics_internal_get_par_time(
-    parallel_exec_metrics_internal) = float.
+    parallel_exec_metrics_internal, float, int) = float.
 
-parallel_exec_metrics_internal_get_par_time(pem_left_most(_, Time)) = Time.
+parallel_exec_metrics_internal_get_par_time(pem_left_most(_, ParTime, _), _, _) =
+        ParTime.
 parallel_exec_metrics_internal_get_par_time(pem_additional(MetricsLeft,
-        TimeLeftSignal, _, TimeRight)) = Time :-
-    TimeLeft = parallel_exec_metrics_internal_get_par_time(MetricsLeft) +
-        TimeLeftSignal,
+        _, _, _, TimeRightPar, TimeRightDead), SparkDelay, Depth) = Time :-
+    TimeRight = TimeRightPar + TimeRightDead + SparkDelay * float(Depth - 1),
+    TimeLeft =
+        parallel_exec_metrics_internal_get_par_time(MetricsLeft, SparkDelay,
+            Depth - 1),
     Time = max(TimeLeft, TimeRight).
 
     % The expected sequential execution time.
@@ -1116,9 +1150,9 @@ parallel_exec_metrics_internal_get_par_t
 :- func parallel_exec_metrics_internal_get_seq_time(
     parallel_exec_metrics_internal) = float.
 
-parallel_exec_metrics_internal_get_seq_time(pem_left_most(Time, _)) = Time.
+parallel_exec_metrics_internal_get_seq_time(pem_left_most(Time, _, _)) = Time.
 parallel_exec_metrics_internal_get_seq_time(pem_additional(MetricsLeft, _,
-        TimeRight, _)) = Time :-
+        _, TimeRight, _, _)) = Time :-
     TimeLeft = parallel_exec_metrics_internal_get_seq_time(MetricsLeft),
     Time = TimeLeft + TimeRight.
 
@@ -1127,43 +1161,45 @@ parallel_exec_metrics_internal_get_seq_t
     %
 :- func pem_get_first_conj_par_time(parallel_exec_metrics_internal) = float.
 
-pem_get_first_conj_par_time(pem_left_most(_, Time)) = Time.
-pem_get_first_conj_par_time(pem_additional(Left, LeftSignalTime0, _, _)) =
+pem_get_first_conj_par_time(pem_left_most(_, Time, _)) = Time.
+pem_get_first_conj_par_time(pem_additional(Left, _, _, _, _, _)) =
         Time :-
-    (
-        Left = pem_left_most(_, _),
-        LeftSignalTime = LeftSignalTime0
-    ;
-        Left = pem_additional(_, _, _, _),
-        LeftSignalTime = 0.0
-    ),
-    Time = pem_get_first_conj_par_time(Left) + LeftSignalTime.
+    Time = pem_get_first_conj_par_time(Left).
 
-:- func pem_get_future_dead_time(parallel_exec_metrics_internal, bool,
-    float, float) = float.
+:- func pem_get_future_dead_time(parallel_exec_metrics_internal) = float.
 
-    % XXX: We should make this an attribute of pem_additional.
-pem_get_future_dead_time(pem_left_most(_, _), _, _, _) = 0.0.
-pem_get_future_dead_time(pem_additional(Left, _, Seq, Par),
-        IsRightmostConj, ForkCost, ForkDelay) = DeadTime :-
-    DeadTime = ThisDeadTime + LeftDeadTime,
-    ThisDeadTime0 = Par - Seq - ForkDelay,
-    (
-        IsRightmostConj = yes,
-        ThisDeadTime = ThisDeadTime0
-    ;
-        IsRightmostConj = no,
-        ThisDeadTime = ThisDeadTime0 - ForkCost
-    ),
-    LeftDeadTime = pem_get_future_dead_time(Left, no, ForkCost, ForkDelay).
+pem_get_future_dead_time(pem_left_most(_, _, _)) = 0.0.
+pem_get_future_dead_time(pem_additional(Left, _, _, _, _, RightDeadTime)) =
+        RightDeadTime + LeftDeadTime :-
+    LeftDeadTime = pem_get_future_dead_time(Left).
 
+    % Get the overheads of parallelisation.
+    %
+    % Remember that these are already represented within the parallel execution
+    % time.
+    %
 :- func pem_get_par_overheads(parallel_exec_metrics_internal) = float.
 
-pem_get_par_overheads(pem_left_most(Seq, Par))= Par - Seq.
-pem_get_par_overheads(pem_additional(Left, Signals, Seq, Par)) = Overheads :-
-    Overheads = LeftOverheads + Signals + Par - Seq,
+pem_get_par_overheads(pem_left_most(Seq, Par, _)) = Par - Seq.
+pem_get_par_overheads(pem_additional(Left, _, _, Seq, Par, _)) =
+        Overheads :-
+    Overheads = LeftOverheads + Par - Seq,
     pem_get_par_overheads(Left) = LeftOverheads.
 
+:- func pem_get_signal_costs(parallel_exec_metrics_internal) = float.
+
+pem_get_signal_costs(pem_left_most(_, _, SignalCosts)) = SignalCosts.
+pem_get_signal_costs(pem_additional(Left, SignalsR, _, _, _, _)) = Signals :-
+    Signals = SignalsR + SignalsL,
+    SignalsL = pem_get_signal_costs(Left).
+
+:- func pem_get_wait_costs(parallel_exec_metrics_internal) = float.
+
+pem_get_wait_costs(pem_left_most(_, _, _)) = 0.0.
+pem_get_wait_costs(pem_additional(Left, _, WaitsR, _, _, _)) = Waits :-
+    Waits = WaitsR + WaitsL,
+    WaitsL = pem_get_wait_costs(Left).
+
 %----------------------------------------------------------------------------%
 
 weighted_average(Weights, Values, Average) :-
Index: mdbcomp/feedback.automatic_parallelism.m
===================================================================
RCS file: /home/mercury1/repository/mercury/mdbcomp/feedback.automatic_parallelism.m,v
retrieving revision 1.14
diff -u -p -b -r1.14 feedback.automatic_parallelism.m
--- mdbcomp/feedback.automatic_parallelism.m	21 Jan 2011 06:36:51 -0000	1.14
+++ mdbcomp/feedback.automatic_parallelism.m	22 Jan 2011 04:02:01 -0000
@@ -53,6 +53,10 @@
                 % it starts being executed, measured in call sequence counts.
                 cpcp_sparking_delay         :: int,
 
+                % The cost of barrier synchronisation for each conjunct at the
+                % end of the parallel conjunction.
+                cpcp_barrier_cost           :: int,
+
                 % The costs of maintaining a lock on a single dependent
                 % variable, measured in call sequence counts. The first number
                 % gives the cost of the call to signal, and the second gives
@@ -161,7 +165,7 @@
                 % The goal path of the conjunction in which the push is done.
                 pg_goal_path    :: goal_path_string,
 
-                % The range of conjuncts to push.
+                % The range of conjuncts to push, (inclusive)
                 pg_pushee_lo    :: int,
                 pg_pushee_hi    :: int,
 
@@ -311,10 +315,12 @@
                 pem_par_time                :: float,
 
                 % The overheads of parallel execution. These are already
-                % included in pem_par_time.
-                % Add these to pem_seq_time to get the 'time on cpu' of
-                % this execution.
-                pem_par_overheads           :: float,
+                % included in pem_par_time.  Overheads are seperated into
+                % different causes.
+                pem_par_overhead_xpark_cost :: float,
+                pem_par_overhead_barrier    :: float,
+                pem_par_overhead_signals    :: float,
+                pem_par_overhead_waits      :: float,
 
                 % The amount of time the initial (left most) conjunct spends
                 % waiting for the other conjuncts. During this time,
@@ -342,6 +348,12 @@
     %
 :- func parallel_exec_metrics_get_cpu_time(parallel_exec_metrics) = float.
 
+    % The overheads of parallel execution.
+    %
+    % Add these to pem_seq_time to get the 'time on cpu' of this execution.
+    %
+:- func parallel_exec_metrics_get_overheads(parallel_exec_metrics) = float.
+
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
@@ -367,7 +379,12 @@ parallel_exec_metrics_get_time_saving(PE
 
 parallel_exec_metrics_get_cpu_time(PEM) = SeqTime + Overheads :-
     SeqTime = PEM ^ pem_seq_time,
-    Overheads = PEM ^ pem_par_overheads.
+    Overheads = parallel_exec_metrics_get_overheads(PEM).
+
+parallel_exec_metrics_get_overheads(PEM) =
+        SparkCosts + BarrierCosts + SignalCosts + WaitCosts :-
+    PEM = parallel_exec_metrics(_, _, _, SparkCosts, BarrierCosts,
+        SignalCosts, WaitCosts, _, _).
 
 %-----------------------------------------------------------------------------%
 %
Index: mdbcomp/feedback.m
===================================================================
RCS file: /home/mercury1/repository/mercury/mdbcomp/feedback.m,v
retrieving revision 1.22
diff -u -p -b -r1.22 feedback.m
--- mdbcomp/feedback.m	21 Jan 2011 04:31:52 -0000	1.22
+++ mdbcomp/feedback.m	22 Jan 2011 04:02:01 -0000
@@ -535,7 +535,7 @@ feedback_first_line = "Mercury Compiler 
 
 :- func feedback_version = string.
 
-feedback_version = "17".
+feedback_version = "18".
 
 %-----------------------------------------------------------------------------%
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20110122/9a652c59/attachment.sig>
Previous message: [m-rev.] diff: Fix costs of recursive calls in seldom-used branches.
Next message: [m-rev.] diff: fix installation of csharp grade on test hosts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list