[m-rev.] For review: Add more threadscope instrumentation.

Paul Bone pbone at csse.unimelb.edu.au
Fri Mar 11 18:15:01 AEDT 2011


For review by anyone:


Add more threadscope instrumentation.

This change introduces instrumentation that tracks sparks as well as parallel
conjunctions and their conjuncts.  This should hopefully give us more
information to diagnose runtime performance issues.

As of this date the ThreadScope program hasn't been updated to read or
understand these new events.

runtime/mercury_threadscope.[ch]:
    Added a function and types to register all the threadscope strings from an
    array.

    Add functions to post the new events (see below).

runtime/mercury_threadscope.c:
    Added support for 5 new threadscope events.
        Registering a string so that other messages may refer to a constant
        string.

        Marking the beginning and ends of parallel conjunctions.

        Creating a spark for a parallel conjunct.

        Finishing a parallel conjunct.

    Re-arranged event IDs, I've started allocating IDs from 38 onwards for
    general purposes and 100 onwards for mercury specific events after talking
    with Duncan Coutts.

    Trimmed excess whitespace from the end of lines.

runtime/mercury_context.h:
    Post a beginning parallel conjunction message when the sync term for the
    parallel conjunction is initialized.

    Post an event when creating a spark for a parallel conjunction.

    Add a MR_spark_id field to the MR_Spark structure, these identify sparks to
    threadscope.

runtime/mercury_context.c:
    Post threadscope messages when a spark is about to be executed.

    Post a threadscope event when a parallel conjunct is completed.

    Add a missing memory barrier.

runtime/mercury_wrapper.[ch]:
    Create a global function pointer for the code that registers strings in the
    threadscope string table, this is filled in by mkinit.

    Call this function pointer immediatly after setting up threadscope.

runtime/mercury_wsdeque.[ch]:
    Modify MR_wsdeque_pop_bottom to return the spark pointer (which points onto
    the queue) rather then returning a result through a pointer and bool if the
    operation was successful.  This pointer is safe to dereference until
    MR_wsdeque_push_bottom is used.

runtime/mercury_wsdeque.c:
    Corrected a code comment.

runtime/mercury_engine.h:
    Documented some of the fields of the engine structure that hadn't been
    documented.

    Add a next spark ID field to the engine structure.

    Change the type of the engine ID field to MR_uint_least16_t

compiler/llds.m:
    Add a third field to the init_sync_term instruction that stores the index
    into the threadscope string table of the static conjunction ID.

    Add a field to the c_file structure containing the threadscope string
    table.

compiler/layout.m:
    Added a new layout array name for the threadscope string table.

compiler/layout_out.m:
    Implement code to write out the threadscope string table.

compiler/llds_out_file.m:
    Write out the threadscope string table when writing out the c_file.

compiler/par_conj_gen.m:
    Create strings that statically identify parallel conjunctions for each
    init_sync_term LLDS instruction.  These strings are added to a table in the
    !CodeInfo and the index of the string is added to the init_sync_term
    instruction.

    Add an extra instruction after a parallel conjunction to post the message
    that the parallel conjunction has completed.

compiler/global_data.m:
    Add fields to the global data structure to represent the threadscope string
    table and its current size.

    Add predicates to update and retrieve the table.

    Handle merging of threadscope string tables in global data by allowing the
    references to the strings to be remapped.

    Refactored remapping code so that a caller such as proc_gen only needs to
    call one remapping predicate after merging global data..

compiler/code_info.m:
    Add a table of strings for use with threadscope to the code_info_persistent
    type.

    Modify the code_info_init to initialise the threadscope string table fields.

    Add a predicate to get the string table and another to update it.

compiler/proc_gen.m:
    Build the containing goal map before code generation for procedures with
    parallel conjunctions in a parallel grade.  par_conj_gen.m depends on this.

    Conform to changes in code_info.m and global_data.m

compiler/llds_out_instr.m:
    Write out the extra parameter in the init_sync_term instruction.

compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
    Conform to changes in llds.m

compiler/opt_debug.m:
    Conform to changes in layout.m

compiler/mercury_compile_llds_back_end.m:
    Fix some trailing whitespace.

util/mkinit.c:
    Build an initialisation function that registers all the strings in
    threadscope string tables.

    Correct the layout of a comment.

Index: compiler/code_info.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/code_info.m,v
retrieving revision 1.382
diff -u -p -b -r1.382 code_info.m
--- compiler/code_info.m	13 Jan 2011 00:36:51 -0000	1.382
+++ compiler/code_info.m	11 Mar 2011 06:55:53 -0000
@@ -112,7 +112,8 @@
 :- pred code_info_init(bool::in, globals::in, pred_id::in, proc_id::in,
     pred_info::in, proc_info::in, abs_follow_vars::in, module_info::in,
     static_cell_info::in, resume_point_info::out, trace_slot_info::out,
-    maybe(containing_goal_map)::in, code_info::out) is det.
+    maybe(containing_goal_map)::in, list(string)::in, int::in, code_info::out)
+    is det.
 
     % Get the globals table.
     %
@@ -464,14 +465,22 @@
 
                 cip_static_cell_info    :: static_cell_info,
 
-                cip_used_env_vars       :: set(string)
+                cip_used_env_vars           :: set(string),
+
+                % A counter and table for allocating and maintaining slots
+                % where string IDs will be placed at runtime for threadscope
+                % profiling.  The actual string IDs are allocated at runtime
+                % and their IDs are placed in an array slot which can be
+                % referred to statically.
+                cip_ts_string_table_size    :: int,
+                cip_ts_rev_string_table     :: list(string)
             ).
 
 %---------------------------------------------------------------------------%
 
 code_info_init(SaveSuccip, Globals, PredId, ProcId, PredInfo, ProcInfo,
         FollowVars, ModuleInfo, StaticCellInfo, ResumePoint, TraceSlotInfo,
-        MaybeContainingGoalMap, CodeInfo) :-
+        MaybeContainingGoalMap, TSRevStringTable, TSStringTableSize, CodeInfo) :-
     proc_info_get_initial_instmap(ProcInfo, ModuleInfo, InstMap),
     proc_info_get_liveness_info(ProcInfo, Liveness),
     CodeModel = proc_info_interface_code_model(ProcInfo),
@@ -582,7 +591,9 @@ code_info_init(SaveSuccip, Globals, Pred
             -1,
             no,
             StaticCellInfo,
-            set.init
+            set.init,
+            TSStringTableSize,
+            TSRevStringTable
         )
     ),
     init_maybe_trace_info(TraceLevel, Globals, ModuleInfo,
@@ -892,6 +903,12 @@ get_containing_goal_map_det(CI, Containi
 :- pred add_closure_layout(closure_proc_id_data::in,
     code_info::in, code_info::out) is det.
 
+:- pred add_threadscope_string(string::in, int::out,
+    code_info::in, code_info::out) is det.
+
+:- pred get_threadscope_rev_string_table(code_info::in,
+    list(string)::out, int::out) is det.
+
 :- pred add_scalar_static_cell(assoc_list(rval, llds_type)::in,
     data_id::out, code_info::in, code_info::out) is det.
 
@@ -1164,6 +1181,19 @@ add_closure_layout(ClosureLayout, !CI) :
     get_closure_layouts(!.CI, ClosureLayouts),
     set_closure_layouts([ClosureLayout | ClosureLayouts], !CI).
 
+add_threadscope_string(String, SlotNum, !CI) :-
+    Size0 = !.CI ^ code_info_persistent ^ cip_ts_string_table_size,
+    RevTable0 = !.CI ^ code_info_persistent ^ cip_ts_rev_string_table,
+    SlotNum = Size0,
+    Size = Size0 + 1,
+    RevTable = [String | RevTable0],
+    !CI ^ code_info_persistent ^ cip_ts_string_table_size := Size,
+    !CI ^ code_info_persistent ^ cip_ts_rev_string_table := RevTable.
+
+get_threadscope_rev_string_table(CI, RevTable, TableSize) :-
+    RevTable = CI ^ code_info_persistent ^ cip_ts_rev_string_table,
+    TableSize = CI ^ code_info_persistent ^ cip_ts_string_table_size.
+
 add_scalar_static_cell(RvalsTypes, DataAddr, !CI) :-
     get_static_cell_info(!.CI, StaticCellInfo0),
     global_data.add_scalar_static_cell(RvalsTypes, DataAddr,
Index: compiler/dupelim.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/dupelim.m,v
retrieving revision 1.96
diff -u -p -b -r1.96 dupelim.m
--- compiler/dupelim.m	30 Dec 2010 11:17:53 -0000	1.96
+++ compiler/dupelim.m	11 Mar 2011 06:55:53 -0000
@@ -420,9 +420,9 @@ standardize_instr(Instr0, Instr) :-
         standardize_rval(Rval0, Rval),
         Instr = prune_tickets_to(Rval)
     ;
-        Instr0 = init_sync_term(Lval0, N),
+        Instr0 = init_sync_term(Lval0, N, ConjId),
         standardize_lval(Lval0, Lval),
-        Instr = init_sync_term(Lval, N)
+        Instr = init_sync_term(Lval, N, ConjId)
     ;
         Instr0 = join_and_continue(Lval0, Label),
         standardize_lval(Lval0, Lval),
@@ -834,7 +834,7 @@ most_specific_instr(InstrA, InstrB, Mayb
         ; InstrA = decr_sp_and_return(_)
         ; InstrA = foreign_proc_code(_, _, _, _, _, _, _, _, _, _)
         ; InstrA = fork_new_child(_, _)
-        ; InstrA = init_sync_term(_, _)
+        ; InstrA = init_sync_term(_, _, _)
         ; InstrA = join_and_continue(_, _)
         ),
         ( InstrA = InstrB ->
Index: compiler/dupproc.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/dupproc.m,v
retrieving revision 1.27
diff -u -p -b -r1.27 dupproc.m
--- compiler/dupproc.m	15 Dec 2010 06:29:34 -0000	1.27
+++ compiler/dupproc.m	11 Mar 2011 06:55:53 -0000
@@ -258,7 +258,7 @@ standardize_instr(Instr, StdInstr, DupPr
         ; Instr = prune_tickets_to(_)
         ; Instr = decr_sp(_)
         ; Instr = decr_sp_and_return(_)
-        ; Instr = init_sync_term(_, _)
+        ; Instr = init_sync_term(_, _, _)
         ),
         StdInstr = Instr
     ).
Index: compiler/exprn_aux.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/exprn_aux.m,v
retrieving revision 1.91
diff -u -p -b -r1.91 exprn_aux.m
--- compiler/exprn_aux.m	30 Dec 2010 11:17:53 -0000	1.91
+++ compiler/exprn_aux.m	11 Mar 2011 06:55:53 -0000
@@ -410,9 +410,9 @@ transform_lval_in_uinstr(Transform, Uins
             MaybeLabel1, MaybeLabel2, MaybeLabel3, MaybeLabel4, MaybeLabel5,
             ReferStackSlot, MayDupl)
     ;
-        Uinstr0 = init_sync_term(Lval0, BranchCount),
+        Uinstr0 = init_sync_term(Lval0, BranchCount, ConjId),
         Transform(Lval0, Lval, !Acc),
-        Uinstr = init_sync_term(Lval, BranchCount)
+        Uinstr = init_sync_term(Lval, BranchCount, ConjId)
     ;
         Uinstr0 = join_and_continue(Lval0, Label),
         Transform(Lval0, Lval, !Acc),
Index: compiler/global_data.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/global_data.m,v
retrieving revision 1.44
diff -u -p -b -r1.44 global_data.m
--- compiler/global_data.m	30 Dec 2010 11:17:54 -0000	1.44
+++ compiler/global_data.m	11 Mar 2011 06:55:53 -0000
@@ -61,6 +61,15 @@
 :- pred global_data_get_all_closure_layouts(global_data::in,
     list(closure_proc_id_data)::out) is det.
 
+:- pred global_data_get_threadscope_string_table(global_data::in,
+    list(string)::out) is det.
+
+:- pred global_data_get_threadscope_rev_string_table(global_data::in,
+    list(string)::out, int::out) is det.
+
+:- pred global_data_set_threadscope_rev_string_table(list(string)::in, int::in,
+    global_data::in, global_data::out) is det.
+
 :- pred global_data_get_static_cell_info(global_data::in,
     static_cell_info::out) is det.
 
@@ -111,6 +120,8 @@
 :- pred bump_type_num_counter(int::in, global_data::in, global_data::out)
     is det.
 
+:- type global_data_remapping.
+
     % merge_global_datas(GlobalDataA, GlobalDataB, GlobalData, Remap)
     %
     % Merge two global data structures, where static cell information from
@@ -119,13 +130,13 @@
     % information necessary for remap_static_cell_references/3.
     %
 :- pred merge_global_datas(global_data::in, global_data::in, global_data::out,
-    static_cell_remap_info::out) is det.
+    global_data_remapping::out) is det.
 
-    % Update instructions in a C procedure that reference the static cells
-    % from the GlobalDataB that was passed to merge_global_datas/4, to
-    % reference the static cells of the merged global_data structure.
+    % Update instructions in a C procedure that reference things from
+    % GlobalDataB that was passed to merge_global_datas/4, to reference things
+    % from the merged global_data structure.
     %
-:- pred remap_static_cell_references(static_cell_remap_info::in,
+:- pred remap_references_to_global_data(global_data_remapping::in,
     c_procedure::in, c_procedure::out) is det.
 
 %-----------------------------------------------------------------------------%
@@ -166,6 +177,15 @@
                 % two closures to have the same layout.
                 gd_closure_layouts      :: list(closure_proc_id_data),
 
+                % A table for allocating and maintaining slots where string IDs
+                % will be placed at runtime for threadscope profiling.  The
+                % actual string IDs are allocated at runtime and their IDs are
+                % placed in an array slot which can be referred to statically.
+                % The size of the table is maintained for allocating offsets
+                % into it.
+                gd_ts_string_table_size     :: int,
+                gd_ts_rev_string_table      :: list(string),
+
                 % Information about all the statically allocated cells
                 % created so far.
                 gd_static_cell_info     :: static_cell_info
@@ -174,7 +194,8 @@
 global_data_init(StaticCellInfo, GlobalData) :-
     map.init(EmptyDataMap),
     map.init(EmptyLayoutMap),
-    GlobalData = global_data(EmptyDataMap, EmptyLayoutMap, [], StaticCellInfo).
+    GlobalData = global_data(EmptyDataMap, EmptyLayoutMap, [],
+        0, [], StaticCellInfo).
 
 global_data_add_new_proc_var(PredProcId, ProcVar, !GlobalData) :-
     ProcVarMap0 = !.GlobalData ^ gd_proc_var_map,
@@ -215,6 +236,20 @@ global_data_get_all_proc_layouts(GlobalD
 global_data_get_all_closure_layouts(GlobalData, ClosureLayouts) :-
     ClosureLayouts = GlobalData ^ gd_closure_layouts.
 
+global_data_get_threadscope_string_table(GlobalData, Table) :-
+    global_data_get_threadscope_rev_string_table(GlobalData, RevTable, _),
+    Table = list.reverse(RevTable).
+
+global_data_get_threadscope_rev_string_table(GlobalData,
+        RevTable, TableSize) :-
+    RevTable = GlobalData ^ gd_ts_rev_string_table,
+    TableSize = GlobalData ^ gd_ts_string_table_size.
+
+global_data_set_threadscope_rev_string_table(RevTable, TableSize,
+        !GlobalData) :-
+    !GlobalData ^ gd_ts_rev_string_table := RevTable,
+    !GlobalData ^ gd_ts_string_table_size := TableSize.
+
 global_data_get_static_cell_info(GlobalData, StaticCellInfo) :-
     StaticCellInfo = GlobalData ^ gd_static_cell_info.
 
@@ -641,18 +676,47 @@ bump_type_num_counter(Increment, !Global
     Counter = counter.init(N + Increment),
     !GlobalData ^ gd_static_cell_info ^ sci_type_counter := Counter.
 
-merge_global_datas(GlobalDataA, GlobalDataB, GlobalData, Remap) :-
+merge_global_datas(GlobalDataA, GlobalDataB, GlobalData, GlobalDataRemap) :-
     GlobalDataA = global_data(ProcVarMapA, ProcLayoutMapA, ClosureLayoutsA,
-        StaticCellInfoA),
+        TSStringSlotCounterA, TSRevStringTableA, StaticCellInfoA),
     GlobalDataB = global_data(ProcVarMapB, ProcLayoutMapB, ClosureLayoutsB,
-        StaticCellInfoB),
-    GlobalData = global_data(ProcVarMap, ProcLayoutMap, ClosureLayouts,
-        StaticCellInfo),
+        TSStringSlotCounterB, TSRevStringTableB, StaticCellInfoB),
     ProcVarMap = map.old_merge(ProcVarMapA, ProcVarMapB),
     ProcLayoutMap = map.old_merge(ProcLayoutMapA, ProcLayoutMapB),
     ClosureLayouts = ClosureLayoutsA ++ ClosureLayoutsB,
+    merge_threadscope_string_tables(TSRevStringTableA, TSStringSlotCounterA,
+        TSRevStringTableB, TSStringSlotCounterB,
+        TSRevStringTable, TSStringSlotCounter, MaybeTSStringTableRemap),
     merge_static_cell_infos(StaticCellInfoA, StaticCellInfoB, StaticCellInfo,
-        Remap).
+        StaticCellRemap),
+    GlobalData = global_data(ProcVarMap, ProcLayoutMap, ClosureLayouts,
+        TSStringSlotCounter, TSRevStringTable, StaticCellInfo),
+    GlobalDataRemap =
+        global_data_remapping(MaybeTSStringTableRemap, StaticCellRemap).
+
+    % merge_threadscope_string_tables(RevTableA, CounterA, RevTableB, CounterB,
+    %   RevTable, Counter, MaybeRemapOffset).
+    %
+    % Merge the threadscope string tables.
+    %
+    % After doing this merge the references in RevTableB may be adjusted and
+    % must be corrected by adding RemapOffset to them if MaybeRemapOffset =
+    % yes(RemapOffset).
+    %
+:- pred merge_threadscope_string_tables(list(string)::in, int::in,
+    list(string)::in, int::in,
+    list(string)::out, int::out, maybe(int)::out) is det.
+
+merge_threadscope_string_tables([], _, [], _, [], 0, no).
+merge_threadscope_string_tables([], _, [X | Xs], N, [X | Xs], N, no).
+merge_threadscope_string_tables([X | Xs], N, [], _, [X | Xs], N, no).
+merge_threadscope_string_tables(RevTableA, CounterA, RevTableB, CounterB,
+        RevTable, Counter, yes(RemapOffset)) :-
+    RevTableA = [_ | _],
+    RevTableB = [_ | _],
+    RevTable = RevTableB ++ RevTableA,
+    Counter = CounterA + CounterB,
+    RemapOffset = CounterA.
 
 :- pred merge_static_cell_infos(static_cell_info::in, static_cell_info::in,
     static_cell_info::out, static_cell_remap_info::out) is det.
@@ -889,12 +953,18 @@ remap_arg_group_value(Remap, !GroupedArg
 
 %-----------------------------------------------------------------------------%
 
-remap_static_cell_references(Remap, !Procedure) :-
+:- type global_data_remapping
+    --->    global_data_remapping(
+                gdr_maybe_ts_table_offset   :: maybe(int),
+                gdr_static_cell_remap_info  :: static_cell_remap_info
+            ).
+
+remap_references_to_global_data(Remap, !Procedure) :-
     Code0 = !.Procedure ^ cproc_code,
     list.map(remap_instruction(Remap), Code0, Code),
     !Procedure ^ cproc_code := Code.
 
-:- pred remap_instruction(static_cell_remap_info::in,
+:- pred remap_instruction(global_data_remapping::in,
     instruction::in, instruction::out) is det.
 
 remap_instruction(Remap, !Instr) :-
@@ -902,49 +972,50 @@ remap_instruction(Remap, !Instr) :-
     remap_instr(Remap, Uinstr0, Uinstr),
     !:Instr = llds_instr(Uinstr, Comment).
 
-:- pred remap_instr(static_cell_remap_info::in, instr::in, instr::out) is det.
+:- pred remap_instr(global_data_remapping::in, instr::in, instr::out) is det.
 
-remap_instr(Remap, Instr0, Instr) :-
+remap_instr(GlobalDataRemap, Instr0, Instr) :-
+    StaticCellRemap = GlobalDataRemap ^ gdr_static_cell_remap_info,
     (
         Instr0 = block(NumIntTemps, NumFloatTemps, Block0),
-        list.map(remap_instruction(Remap), Block0, Block),
+        list.map(remap_instruction(GlobalDataRemap), Block0, Block),
         Instr = block(NumIntTemps, NumFloatTemps, Block)
     ;
         Instr0 = assign(Lval, Rval0),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = assign(Lval, Rval)
     ;
         Instr0 = keep_assign(Lval, Rval0),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = keep_assign(Lval, Rval)
     ;
         Instr0 = if_val(Rval0, CodeAddr),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr  = if_val(Rval, CodeAddr)
     ;
         Instr0 = foreign_proc_code(A, Comps0, B, C, D, E, F, G, H, I),
-        list.map(remap_foreign_proc_component(Remap), Comps0, Comps),
+        list.map(remap_foreign_proc_component(StaticCellRemap), Comps0, Comps),
         Instr  = foreign_proc_code(A, Comps,  B, C, D, E, F, G, H, I)
     ;
         Instr0 = computed_goto(Rval0, MaybeLabels),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = computed_goto(Rval, MaybeLabels)
     ;
         Instr0 = save_maxfr(Lval0),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = save_maxfr(Lval)
     ;
         Instr0 = restore_maxfr(Lval0),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = restore_maxfr(Lval)
     ;
         Instr0 = incr_hp(Lval0, MaybeTag, MaybeOffset, SizeRval0, Prof,
             Atomic, MaybeRegion0, MaybeReuse0),
-        remap_lval(Remap, Lval0, Lval),
-        remap_rval(Remap, SizeRval0, SizeRval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
+        remap_rval(StaticCellRemap, SizeRval0, SizeRval),
         (
             MaybeRegion0 = yes(Region0),
-            remap_rval(Remap, Region0, Region),
+            remap_rval(StaticCellRemap, Region0, Region),
             MaybeRegion = yes(Region)
         ;
             MaybeRegion0 = no,
@@ -952,10 +1023,10 @@ remap_instr(Remap, Instr0, Instr) :-
         ),
         (
             MaybeReuse0 = llds_reuse(Reuse0, MaybeFlag0),
-            remap_rval(Remap, Reuse0, Reuse),
+            remap_rval(StaticCellRemap, Reuse0, Reuse),
             (
                 MaybeFlag0 = yes(Flag0),
-                remap_lval(Remap, Flag0, Flag),
+                remap_lval(StaticCellRemap, Flag0, Flag),
                 MaybeFlag = yes(Flag)
             ;
                 MaybeFlag0 = no,
@@ -970,15 +1041,15 @@ remap_instr(Remap, Instr0, Instr) :-
             Atomic, MaybeRegion, MaybeReuse)
     ;
         Instr0 = mark_hp(Lval0),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = mark_hp(Lval)
     ;
         Instr0 = restore_hp(Rval0),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = restore_hp(Rval)
     ;
         Instr0 = free_heap(Rval0),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = free_heap(Rval)
     ;
         Instr0 = push_region_frame(StackId, EmbeddedStackFrame),
@@ -986,41 +1057,43 @@ remap_instr(Remap, Instr0, Instr) :-
     ;
         Instr0 = region_fill_frame(FillOp, EmbeddedStackFrame, IdRval0,
             NumLval0, AddrLval0),
-        remap_rval(Remap, IdRval0, IdRval),
-        remap_lval(Remap, NumLval0, NumLval),
-        remap_lval(Remap, AddrLval0, AddrLval),
+        remap_rval(StaticCellRemap, IdRval0, IdRval),
+        remap_lval(StaticCellRemap, NumLval0, NumLval),
+        remap_lval(StaticCellRemap, AddrLval0, AddrLval),
         Instr = region_fill_frame(FillOp, EmbeddedStackFrame, IdRval,
             NumLval, AddrLval)
     ;
         Instr0 = region_set_fixed_slot(SetOp, EmbeddedStackFrame, ValueRval0),
-        remap_rval(Remap, ValueRval0, ValueRval),
+        remap_rval(StaticCellRemap, ValueRval0, ValueRval),
         Instr = region_set_fixed_slot(SetOp, EmbeddedStackFrame, ValueRval)
     ;
         Instr0 = use_and_maybe_pop_region_frame(UseOp, EmbeddedStackFrame),
         Instr = use_and_maybe_pop_region_frame(UseOp, EmbeddedStackFrame)
     ;
         Instr0 = store_ticket(Lval0),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = store_ticket(Lval)
     ;
         Instr0 = reset_ticket(Rval0, Reason),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = reset_ticket(Rval, Reason)
     ;
         Instr0 = mark_ticket_stack(Lval0),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = mark_ticket_stack(Lval)
     ;
         Instr0 = prune_tickets_to(Rval0),
-        remap_rval(Remap, Rval0, Rval),
+        remap_rval(StaticCellRemap, Rval0, Rval),
         Instr = prune_tickets_to(Rval)
     ;
-        Instr0 = init_sync_term(Lval0, NumJoins),
-        remap_lval(Remap, Lval0, Lval),
-        Instr = init_sync_term(Lval, NumJoins)
+        Instr0 = init_sync_term(Lval0, NumJoins, ConjId0),
+        remap_lval(StaticCellRemap, Lval0, Lval),
+        remap_ts_table_index(GlobalDataRemap ^ gdr_maybe_ts_table_offset,
+            ConjId0, ConjId),
+        Instr = init_sync_term(Lval, NumJoins, ConjId)
     ;
         Instr0 = join_and_continue(Lval0, Label),
-        remap_lval(Remap, Lval0, Lval),
+        remap_lval(StaticCellRemap, Lval0, Lval),
         Instr = join_and_continue(Lval, Label)
     ;
         ( Instr0 = comment(_)
@@ -1040,6 +1113,11 @@ remap_instr(Remap, Instr0, Instr) :-
         Instr = Instr0
     ).
 
+:- pred remap_ts_table_index(maybe(int)::in, int::in, int::out) is det.
+
+remap_ts_table_index(no, !ConjId).
+remap_ts_table_index(yes(Offset), ConjId, ConjId + Offset).
+
 :- pred remap_foreign_proc_component(static_cell_remap_info::in,
     foreign_proc_component::in, foreign_proc_component::out) is det.
 
Index: compiler/jumpopt.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/jumpopt.m,v
retrieving revision 1.115
diff -u -p -b -r1.115 jumpopt.m
--- compiler/jumpopt.m	15 Dec 2010 06:29:40 -0000	1.115
+++ compiler/jumpopt.m	11 Mar 2011 06:55:53 -0000
@@ -432,7 +432,7 @@ jump_opt_instr_list([Instr0 | Instrs0], 
         ; Uinstr0 = free_heap(_)
         ; Uinstr0 = incr_hp(_, _, _, _, _, _, _, _)
         ; Uinstr0 = restore_hp(_)
-        ; Uinstr0 = init_sync_term(_, _)
+        ; Uinstr0 = init_sync_term(_, _, _)
         ),
         NewRemain = usual_case
     ),
Index: compiler/layout.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/layout.m,v
retrieving revision 1.42
diff -u -p -b -r1.42 layout.m
--- compiler/layout.m	30 Oct 2009 03:33:14 -0000	1.42
+++ compiler/layout.m	11 Mar 2011 07:11:58 -0000
@@ -328,7 +328,8 @@
     ;       proc_body_bytecodes_array
     ;       proc_table_io_decl_array
     ;       proc_event_layouts_array
-    ;       proc_exec_trace_array.
+    ;       proc_exec_trace_array
+    ;       threadscope_string_table_array.
 
 %-----------------------------------------------------------------------------%
 %
Index: compiler/layout_out.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/layout_out.m,v
retrieving revision 1.114
diff -u -p -b -r1.114 layout_out.m
--- compiler/layout_out.m	13 Jan 2011 00:36:52 -0000	1.114
+++ compiler/layout_out.m	11 Mar 2011 07:11:58 -0000
@@ -58,6 +58,7 @@
     list(proc_layout_proc_static)::in,
     list(int)::in, list(int)::in, list(int)::in, list(table_io_decl_data)::in,
     list(layout_slot_name)::in, list(proc_layout_exec_trace)::in,
+    list(string)::in,
     decl_set::in, decl_set::out, io::di, io::uo) is det.
 
 %-----------------------------------------------------------------------------%
@@ -374,7 +375,7 @@ output_layout_array_defns(Info, PseudoTy
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, !DeclSet, !IO) :-
+        ProcEventLayouts, ExecTraces, TSStringTable, !DeclSet, !IO) :-
     (
         PseudoTypeInfos = []
     ;
@@ -485,6 +486,12 @@ output_layout_array_defns(Info, PseudoTy
     ;
         ExecTraces = [_ | _],
         output_exec_traces_array(Info, ExecTraces, !IO)
+    ),
+    (
+        TSStringTable = []
+    ;
+        TSStringTable = [_ | _],
+        output_threadscope_string_table_array(Info, TSStringTable, !IO)
     ).
 
 %-----------------------------------------------------------------------------%
@@ -1508,6 +1515,41 @@ eval_method_to_c_string(eval_table_io(De
     ).
 
 %-----------------------------------------------------------------------------%
+%
+% Definition of array #20: threadscope string table.
+%
+
+:- pred output_threadscope_string_table_array(llds_out_info::in,
+    list(string)::in, io::di, io::uo) is det.
+
+output_threadscope_string_table_array(Info, TSStringTable, !IO) :-
+    ModuleName = Info ^ lout_mangled_module_name,
+    list.length(TSStringTable, NumStrings),
+    Name = threadscope_string_table_array,
+    io.write_string("#ifdef MR_THREADSCOPE\n", !IO),
+    output_layout_array_name_storage_type_name(ModuleName, Name,
+        being_defined, !IO),
+    io.format("[%d] = {\n", [i(NumStrings)], !IO),
+    list.foldl2(output_threadscope_string_table_slot(Info), TSStringTable,
+        0, _, !IO),
+    io.write_string("};\n#endif\n\n", !IO).
+
+:- pred output_threadscope_string_table_slot(llds_out_info::in, string::in,
+    int::in, int::out, io::di, io::uo) is det.
+
+output_threadscope_string_table_slot(Info, String, !Slot, !IO) :-
+    AutoComments = Info ^ lout_auto_comments,
+    (
+        AutoComments = yes,
+        io.format("/* %d */ ", [i(!.Slot)], !IO)
+    ;
+        AutoComments = no
+    ),
+    io.write_string("{ ", !IO),
+    quote_and_write_string(String, !IO),
+    io.write_string(", 0},\n", !IO).
+
+%-----------------------------------------------------------------------------%
 
 output_layout_name_decl(LayoutName, !IO) :-
     output_layout_name_storage_type_name(LayoutName, not_being_defined, !IO),
@@ -1594,6 +1636,9 @@ output_layout_array_name(UseMacro, Modul
         ;
             ArrayName = proc_exec_trace_array,
             io.write_string("MR_proc_exec_traces", !IO)
+        ;
+            ArrayName = threadscope_string_table_array,
+            io.write_string("MR_threadscope_strings", !IO)
         ),
         io.write_string("(", !IO),
         io.write_string(ModuleName, !IO),
@@ -1657,6 +1702,10 @@ output_layout_array_name(UseMacro, Modul
         ;
             ArrayName = proc_exec_trace_array,
             io.write_string("mercury_data__proc_exec_traces_array__", !IO)
+        ;
+            ArrayName = threadscope_string_table_array,
+            io.write_string("mercury_data__threadscope_string_table_array__",
+                !IO)
         ),
         io.write_string(ModuleName, !IO)
     ).
@@ -1906,6 +1955,13 @@ output_layout_array_name_storage_type_na
         io.write_string("static MR_STATIC_CODE_CONST MR_ExecTrace ", !IO),
         output_layout_array_name(do_not_use_layout_macro, ModuleName,
             Name, !IO)
+    ;
+        Name = threadscope_string_table_array,
+        io.write_string(
+            "static MR_Threadscope_String ",
+            !IO),
+        output_layout_array_name(do_not_use_layout_macro, ModuleName,
+            Name, !IO)
     ).
 
 output_layout_name_storage_type_name(Name, BeingDefined, !IO) :-
@@ -2985,6 +3041,7 @@ output_layout_slots_in_vector(ModuleName
         ; ArrayName = proc_table_io_decl_array
         ; ArrayName = proc_event_layouts_array
         ; ArrayName = proc_exec_trace_array
+        ; ArrayName = threadscope_string_table_array
         ),
         output_layout_slot_addr(use_layout_macro, ModuleName, SlotName, !IO),
         io.write_string(",\n", !IO),
Index: compiler/livemap.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/livemap.m,v
retrieving revision 1.95
diff -u -p -b -r1.95 livemap.m
--- compiler/livemap.m	30 Dec 2010 11:17:55 -0000	1.95
+++ compiler/livemap.m	11 Mar 2011 06:55:53 -0000
@@ -345,7 +345,7 @@ livemap_do_build_instr(Instr0, !Instrs, 
         % that need livemaps have been run for the last time.
         unexpected(this_file, "build_livemap_instr: decr_sp_and_return")
     ;
-        Uinstr0 = init_sync_term(_, _)
+        Uinstr0 = init_sync_term(_, _, _)
     ;
         Uinstr0 = fork_new_child(_, _)
     ;
Index: compiler/llds.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds.m,v
retrieving revision 1.371
diff -u -p -b -r1.371 llds.m
--- compiler/llds.m	13 Jan 2011 00:36:52 -0000	1.371
+++ compiler/llds.m	11 Mar 2011 07:11:58 -0000
@@ -93,6 +93,7 @@
                 cfile_proc_head_var_nums    :: list(int),
                 cfile_proc_var_names        :: list(int),
                 cfile_proc_body_bytecodes   :: list(int),
+                cfile_ts_string_table       :: list(string),
                 cfile_table_io_decls        :: list(table_io_decl_data),
                 cfile_table_io_decl_map     :: map(pred_proc_id,
                                                 layout_slot_name),
@@ -608,13 +609,16 @@
             % MayBeDupl says whether this instruction may be duplicated
             % by jump optimization.
 
-    ;       init_sync_term(lval, int)
+    ;       init_sync_term(lval, int, int)
             % Initialize a synchronization term, which is a continuous number
             % of slots on the detstack.  The first argument contains the base
             % address of the synchronization term.  The second argument
             % indicates how many branches we expect to join at the end of the
-            % parallel conjunction. (See the documentation in par_conj_gen.m
-            % and runtime/mercury_context.{c,h} for further information about
+            % parallel conjunction.  The third argument is an index into the
+            % threadscope string table.  The string that it refers to
+            % identifies this parallel conjunction within the source code.
+            % (See the documentation in par_conj_gen.m and
+            % runtime/mercury_context.{c,h} for further information about
             % synchronisation terms.)
 
     ;       fork_new_child(lval, label)
Index: compiler/llds_out_file.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds_out_file.m,v
retrieving revision 1.3
diff -u -p -b -r1.3 llds_out_file.m
--- compiler/llds_out_file.m	30 Dec 2010 11:17:55 -0000	1.3
+++ compiler/llds_out_file.m	11 Mar 2011 07:11:58 -0000
@@ -158,7 +158,7 @@ output_single_c_file(Globals, CFile, Fil
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         InternalLabelToLayoutMap, EntryLabelToLayoutMap,
         CallSiteStatics, CoveragePoints, ProcStatics,
-        ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes,
+        ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TSStringTable,
         TableIoDecls, TableIoDeclMap, ProcEventLayouts, ExecTraces,
         ProcLayoutDatas, ModuleLayoutDatas, ClosureLayoutDatas,
         Modules, UserInitPredCNames, UserFinalPredCNames, ComplexityProcs),
@@ -218,14 +218,14 @@ output_single_c_file(Globals, CFile, Fil
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, !DeclSet, !IO),
+        ProcEventLayouts, ExecTraces, TSStringTable, !DeclSet, !IO),
 
     list.foldl2(output_comp_gen_c_module(Info), Modules, !DeclSet, !IO),
     list.foldl(output_user_foreign_code(Info), UserForeignCode, !IO),
     list.foldl(io.write_string, Exports, !IO),
     io.write_string("\n", !IO),
     output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
-        ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs,
+        ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs, TSStringTable,
         UserInitPredCNames, UserFinalPredCNames, !DeclSet, !IO),
     io.set_output_stream(OutputStream, _, !IO).
 
@@ -249,10 +249,10 @@ proc_gather_env_var_names([Proc | Procs]
     list(comp_gen_c_module)::in, list(rtti_data)::in,
     list(proc_layout_data)::in, list(module_layout_data)::in,
     list(complexity_proc_info)::in, list(string)::in, list(string)::in,
-    decl_set::in, decl_set::out, io::di, io::uo) is det.
+    list(string)::in, decl_set::in, decl_set::out, io::di, io::uo) is det.
 
 output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
-        ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs,
+        ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs, TSStringTable,
         InitPredNames, FinalPredNames, !DeclSet, !IO) :-
     MustInit = (pred(Module::in) is semidet :-
         module_defines_label_with_layout(Info, Module)
@@ -295,6 +295,12 @@ output_c_module_init_list(Info, ModuleNa
     io.write_string("init_complexity_procs(void);\n", !IO),
     io.write_string("#endif\n", !IO),
 
+    io.write_string("#ifdef MR_THREADSCOPE\n", !IO),
+    io.write_string("void ", !IO),
+    output_init_name(ModuleName, !IO),
+    io.write_string("init_threadscope_string_table(void);\n", !IO),
+    io.write_string("#endif\n", !IO),
+
     (
         InitPredNames = []
     ;
@@ -396,6 +402,26 @@ output_c_module_init_list(Info, ModuleNa
     io.write_string("}\n", !IO),
     io.write_string("\n#endif\n\n", !IO),
 
+    io.write_string("#ifdef MR_THREADSCOPE\n", !IO),
+    io.write_string("\nvoid ", !IO),
+    output_init_name(ModuleName, !IO),
+    io.write_string("init_threadscope_string_table(void)\n", !IO),
+    io.write_string("{\n", !IO),
+    (
+        TSStringTable = []
+    ;
+        TSStringTable = [_ | _],
+        TSStringTableSize = length(TSStringTable),
+        io.write_string("\tMR_threadscope_register_strings_array(\n", !IO),
+        io.write_string("\t\t", !IO),
+        MangledModuleName = Info ^ lout_mangled_module_name,
+        output_layout_array_name(use_layout_macro, MangledModuleName,
+            threadscope_string_table_array, !IO),
+        io.format(", %d);\n", [i(TSStringTableSize)], !IO)
+    ),
+    io.write_string("}\n", !IO),
+    io.write_string("\n#endif\n\n", !IO),
+
     (
         InitPredNames = []
     ;
Index: compiler/llds_out_instr.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds_out_instr.m,v
retrieving revision 1.4
diff -u -p -b -r1.4 llds_out_instr.m
--- compiler/llds_out_instr.m	15 Dec 2010 06:29:42 -0000	1.4
+++ compiler/llds_out_instr.m	11 Mar 2011 06:55:53 -0000
@@ -205,7 +205,7 @@ output_record_instr_decls(Info, Instr, !
         ( Instr = mark_hp(Lval)
         ; Instr = store_ticket(Lval)
         ; Instr = mark_ticket_stack(Lval)
-        ; Instr = init_sync_term(Lval, _NumBranches)
+        ; Instr = init_sync_term(Lval, _NumBranches, _ConjIdSlotNum)
         ),
         output_record_lval_decls(Info, Lval, !DeclSet, !IO)
     ;
@@ -891,11 +891,13 @@ output_instruction(Info, Instr, ProfInfo
         ),
         io.write_string("\t}\n", !IO)
     ;
-        Instr = init_sync_term(Lval, N),
+        Instr = init_sync_term(Lval, NumConjuncts, TSStringIndex),
         io.write_string("\tMR_init_sync_term(", !IO),
         output_lval_as_word(Info, Lval, !IO),
         io.write_string(", ", !IO),
-        io.write_int(N, !IO),
+        io.write_int(NumConjuncts, !IO),
+        io.write_string(", ", !IO),
+        io.write_int(TSStringIndex, !IO),
         io.write_string(");\n", !IO)
     ;
         Instr = fork_new_child(Lval, Child),
Index: compiler/llds_to_x86_64.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds_to_x86_64.m,v
retrieving revision 1.17
diff -u -p -b -r1.17 llds_to_x86_64.m
--- compiler/llds_to_x86_64.m	30 Dec 2010 11:17:55 -0000	1.17
+++ compiler/llds_to_x86_64.m	11 Mar 2011 06:55:53 -0000
@@ -431,7 +431,7 @@ instr_to_x86_64(!RegMap, decr_sp_and_ret
 instr_to_x86_64(!RegMap, foreign_proc_code(_, _, _, _, _, _, _, _, _, _),
         Instr) :-
     Instr = [x86_64_comment("<<foreign_proc_code>>")].
-instr_to_x86_64(!RegMap, init_sync_term(_, _), Instr) :-
+instr_to_x86_64(!RegMap, init_sync_term(_, _, _), Instr) :-
     Instr = [x86_64_comment("<<init_sync_term>>")].
 instr_to_x86_64(!RegMap, fork_new_child(_, _), Instr) :-
     Instr = [x86_64_comment("<<fork_new_child>>")].
Index: compiler/mercury_compile_llds_back_end.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mercury_compile_llds_back_end.m,v
retrieving revision 1.11
diff -u -p -b -r1.11 mercury_compile_llds_back_end.m
--- compiler/mercury_compile_llds_back_end.m	30 Dec 2010 11:17:56 -0000	1.11
+++ compiler/mercury_compile_llds_back_end.m	11 Mar 2011 07:11:58 -0000
@@ -592,6 +592,7 @@ llds_output_pass(HLDS, GlobalData0, Proc
     global_data_get_static_cell_info(GlobalData, StaticCellInfo),
     get_static_cells(StaticCellInfo,
         ScalarCommonCellDatas, VectorCommonCellDatas),
+    global_data_get_threadscope_string_table(GlobalData, TSStringTable),
 
     % Next we put it all together and output it to one or more C files.
     RttiDatas = TypeCtorRttiData ++ TypeClassInfoRttiData,
@@ -638,7 +639,7 @@ llds_output_pass(HLDS, GlobalData0, Proc
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         InternalLabelToLayoutMap, ProcLabelToLayoutMap,
         CallSites, CoveragePoints, ProcStatics,
-        ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes,
+        ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TSStringTable,
         TableIoDecls, TableIoDeclMap, ProcEventLayouts, ExecTraces,
         ProcLayoutDatas, ModuleLayoutDatas, ClosureLayoutDatas, ChunkedModules,
         UserInitPredCNames, UserFinalPredCNames, ComplexityProcs),
Index: compiler/middle_rec.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/middle_rec.m,v
retrieving revision 1.141
diff -u -p -b -r1.141 middle_rec.m
--- compiler/middle_rec.m	30 Dec 2010 11:17:56 -0000	1.141
+++ compiler/middle_rec.m	11 Mar 2011 06:55:53 -0000
@@ -590,7 +590,7 @@ find_used_registers_instr(Uinstr, !Used)
         ; Uinstr = mark_hp(Lval)
         ; Uinstr = store_ticket(Lval)
         ; Uinstr = mark_ticket_stack(Lval)
-        ; Uinstr = init_sync_term(Lval, _)
+        ; Uinstr = init_sync_term(Lval, _, _)
         ; Uinstr = fork_new_child(Lval, _)
         ; Uinstr = join_and_continue(Lval, _)
         ),
Index: compiler/opt_debug.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/opt_debug.m,v
retrieving revision 1.217
diff -u -p -b -r1.217 opt_debug.m
--- compiler/opt_debug.m	30 Dec 2010 11:17:56 -0000	1.217
+++ compiler/opt_debug.m	11 Mar 2011 07:11:58 -0000
@@ -566,6 +566,9 @@ dump_layout_array_name(ArrayName) = Str 
     ;
         ArrayName = proc_table_io_decl_array,
         Str = "proc_table_io_decl_array"
+    ;
+        ArrayName = threadscope_string_table_array,
+        Str = "threadscope_string_table_array"
     ).
 
 dump_layout_name(proc_layout(RttiProcLabel, _)) =
@@ -1058,9 +1061,9 @@ dump_instr(MaybeProcLabel, AutoComments,
         Instr = decr_sp_and_return(Size),
         Str = "decr_sp_and_return(" ++ int_to_string(Size) ++ ")"
     ;
-        Instr = init_sync_term(Lval, N),
+        Instr = init_sync_term(Lval, N, TSStringIndex),
         Str = "init_sync_term(" ++ dump_lval(MaybeProcLabel, Lval) ++ ", "
-            ++ int_to_string(N) ++ ")"
+            ++ int_to_string(N) ++ ", " ++ int_to_string(TSStringIndex) ++ ")"
     ;
         Instr = fork_new_child(Lval, Child),
         Str = "fork_new_child(" ++ dump_lval(MaybeProcLabel, Lval)
Index: compiler/opt_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/opt_util.m,v
retrieving revision 1.173
diff -u -p -b -r1.173 opt_util.m
--- compiler/opt_util.m	30 Dec 2010 11:17:56 -0000	1.173
+++ compiler/opt_util.m	11 Mar 2011 06:55:53 -0000
@@ -884,7 +884,7 @@ instr_refers_to_stack(llds_instr(Uinstr,
         ; Uinstr = incr_sp(_, _, _)
         ; Uinstr = decr_sp(_)
         ; Uinstr = decr_sp_and_return(_)
-        ; Uinstr = init_sync_term(_, _)
+        ; Uinstr = init_sync_term(_, _, _)
         ; Uinstr = fork_new_child(_, _)
         ; Uinstr = join_and_continue(_, _)
         ),
@@ -1108,7 +1108,7 @@ can_instr_branch_away(prune_tickets_to(_
 can_instr_branch_away(incr_sp(_, _, _)) = no.
 can_instr_branch_away(decr_sp(_)) = no.
 can_instr_branch_away(decr_sp_and_return(_)) = yes.
-can_instr_branch_away(init_sync_term(_, _)) = no.
+can_instr_branch_away(init_sync_term(_, _, _)) = no.
 can_instr_branch_away(fork_new_child(_, _)) = no.
 can_instr_branch_away(join_and_continue(_, _)) = yes.
 can_instr_branch_away(foreign_proc_code(_, Comps, _, _, _, _, _, _, _, _)) =
@@ -1187,7 +1187,7 @@ can_instr_fall_through(prune_tickets_to(
 can_instr_fall_through(incr_sp(_, _, _)) = yes.
 can_instr_fall_through(decr_sp(_)) = yes.
 can_instr_fall_through(decr_sp_and_return(_)) = no.
-can_instr_fall_through(init_sync_term(_, _)) = yes.
+can_instr_fall_through(init_sync_term(_, _, _)) = yes.
 can_instr_fall_through(fork_new_child(_, _)) = yes.
 can_instr_fall_through(join_and_continue(_, _)) = no.
 can_instr_fall_through(foreign_proc_code(_, _, _, _, _, _, _, _, _, _)) = yes.
@@ -1238,7 +1238,7 @@ can_use_livevals(prune_tickets_to(_), no
 can_use_livevals(incr_sp(_, _, _), no).
 can_use_livevals(decr_sp(_), no).
 can_use_livevals(decr_sp_and_return(_), yes).
-can_use_livevals(init_sync_term(_, _), no).
+can_use_livevals(init_sync_term(_, _, _), no).
 can_use_livevals(fork_new_child(_, _), no).
 can_use_livevals(join_and_continue(_, _), no).
 can_use_livevals(foreign_proc_code(_, _, _, _, _, _, _, _, _, _), no).
@@ -1285,7 +1285,7 @@ instr_labels_2(Uinstr, Labels, CodeAddrs
         ; Uinstr = prune_tickets_to(_)
         ; Uinstr = incr_sp(_, _, _)
         ; Uinstr = decr_sp(_)
-        ; Uinstr = init_sync_term(_, _)
+        ; Uinstr = init_sync_term(_, _, _)
         ),
         Labels = [],
         CodeAddrs = []
@@ -1373,7 +1373,7 @@ possible_targets(Uinstr, Labels, CodeAdd
         ; Uinstr = prune_tickets_to(_)
         ; Uinstr = incr_sp(_, _, _)
         ; Uinstr = decr_sp(_)
-        ; Uinstr = init_sync_term(_, _)
+        ; Uinstr = init_sync_term(_, _, _)
         ; Uinstr = fork_new_child(_, _)
         ),
         Labels = [],
@@ -1541,7 +1541,7 @@ instr_rvals_and_lvals(prune_tickets_to(R
 instr_rvals_and_lvals(incr_sp(_, _, _), [], []).
 instr_rvals_and_lvals(decr_sp(_), [], []).
 instr_rvals_and_lvals(decr_sp_and_return(_), [], []).
-instr_rvals_and_lvals(init_sync_term(Lval, _), [], [Lval]).
+instr_rvals_and_lvals(init_sync_term(Lval, _, _), [], [Lval]).
 instr_rvals_and_lvals(fork_new_child(Lval, _), [], [Lval]).
 instr_rvals_and_lvals(join_and_continue(Lval, _), [], [Lval]).
 instr_rvals_and_lvals(foreign_proc_code(_, Cs, _, _, _, _, _, _, _, _),
@@ -1719,7 +1719,7 @@ count_temps_instr(prune_tickets_to(Rval)
 count_temps_instr(incr_sp(_, _, _), !R, !F).
 count_temps_instr(decr_sp(_), !R, !F).
 count_temps_instr(decr_sp_and_return(_), !R, !F).
-count_temps_instr(init_sync_term(Lval, _), !R, !F) :-
+count_temps_instr(init_sync_term(Lval, _, _), !R, !F) :-
     count_temps_lval(Lval, !R, !F).
 count_temps_instr(fork_new_child(Lval, _), !R, !F) :-
     count_temps_lval(Lval, !R, !F).
@@ -1930,7 +1930,7 @@ touches_nondet_ctrl_instr(Uinstr) = Touc
         ; Uinstr = arbitrary_c_code(_, _, _)
         ; Uinstr = save_maxfr(_)
         ; Uinstr = restore_maxfr(_)
-        ; Uinstr = init_sync_term(_, _)     % This is a safe approximation.
+        ; Uinstr = init_sync_term(_, _, _)  % This is a safe approximation.
         ; Uinstr = fork_new_child(_, _)     % This is a safe approximation.
         ; Uinstr = join_and_continue(_, _)  % This is a safe approximation.
         ),
@@ -2481,7 +2481,7 @@ replace_labels_instr(Uinstr0, Uinstr, Re
         ),
         Uinstr = prune_tickets_to(Rval)
     ;
-        Uinstr0 = init_sync_term(Lval0, N),
+        Uinstr0 = init_sync_term(Lval0, NumConjuncts, TSStringIndex),
         (
             ReplData = yes,
             replace_labels_lval(Lval0, Lval, ReplMap)
@@ -2489,7 +2489,7 @@ replace_labels_instr(Uinstr0, Uinstr, Re
             ReplData = no,
             Lval = Lval0
         ),
-        Uinstr = init_sync_term(Lval, N)
+        Uinstr = init_sync_term(Lval, NumConjuncts, TSStringIndex)
     ;
         Uinstr0 = fork_new_child(Lval0, Child0),
         replace_labels_lval(Lval0, Lval, ReplMap),
Index: compiler/par_conj_gen.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/par_conj_gen.m,v
retrieving revision 1.42
diff -u -p -b -r1.42 par_conj_gen.m
--- compiler/par_conj_gen.m	30 Dec 2010 11:17:57 -0000	1.42
+++ compiler/par_conj_gen.m	11 Mar 2011 07:11:59 -0000
@@ -107,12 +107,15 @@
 :- import_module check_hlds.mode_util.
 :- import_module hlds.hlds_llds.
 :- import_module hlds.hlds_module.
+:- import_module hlds.hlds_out.
+:- import_module hlds.hlds_out.hlds_out_util.
 :- import_module hlds.instmap.
 :- import_module libs.globals.
 :- import_module libs.options.
 :- import_module ll_backend.code_gen.
 :- import_module ll_backend.continuation_info.
 :- import_module ll_backend.exprn_aux.
+:- import_module mdbcomp.goal_path.
 :- import_module parse_tree.prog_data.
 
 :- import_module bool.
@@ -123,6 +126,7 @@
 :- import_module maybe.
 :- import_module require.
 :- import_module set.
+:- import_module string.
 :- import_module unit.
 
 %---------------------------------------------------------------------------%
@@ -184,9 +188,10 @@ generate_par_conj(Goals, GoalInfo, CodeM
     (
         % The highest numbered slot has the lowest address.
         list.last(SyncTermSlots, SyncTermBaseSlotPrime),
-        SyncTermBaseSlotPrime = stackvar(SlotNum),
+        SyncTermBaseSlotPrime = stackvar(SlotNumPrime),
         StackId = det_stack
     ->
+        SlotNum = SlotNumPrime,
         SyncTermBaseSlot = SyncTermBaseSlotPrime,
         ParentSyncTermBaseSlot = parent_stackvar(SlotNum)
     ;
@@ -194,8 +199,9 @@ generate_par_conj(Goals, GoalInfo, CodeM
     ),
 
     NumGoals = list.length(Goals),
+    create_static_conj_id(GoalInfo, StaticConjId, !CI),
     MakeSyncTermCode = singleton(
-        llds_instr(init_sync_term(SyncTermBaseSlot, NumGoals),
+        llds_instr(init_sync_term(SyncTermBaseSlot, NumGoals, StaticConjId),
             "initialize sync term")
     ),
 
@@ -206,9 +212,11 @@ generate_par_conj(Goals, GoalInfo, CodeM
         no, GoalCode, !CI),
     set_par_conj_depth(Depth, !CI),
 
-    EndLabelCode = singleton(
-        llds_instr(label(EndLabel), "end of parallel conjunction")
-    ),
+    EndLabelCode = from_list([
+        llds_instr(label(EndLabel), "end of parallel conjunction"),
+        llds_instr(ts_finish_par_conj_instr(SlotNum, SyncTermBaseSlot),
+            "finish parallel conjunction (ThreadScope instrumentation")
+    ]),
     Code =
         MaybeSetParentSpCode ++
         SaveCode ++
@@ -296,6 +304,20 @@ generate_det_par_conj_2([Goal | Goals], 
         MaybeEnd, RestCode, !CI),
     Code = ThisCode ++ RestCode.
 
+:- func ts_finish_par_conj_instr(int, lval) = instr.
+
+ts_finish_par_conj_instr(SyncTermBaseSlot, SyncTermBaseSlotLval) =
+        foreign_proc_code([], Components, proc_will_not_call_mercury, no, no,
+            no, no, no, yes, proc_may_duplicate) :-
+    Components = [foreign_proc_raw_code(cannot_branch_away,
+        proc_does_not_affect_liveness,
+        live_lvals_info(set([SyncTermBaseSlotLval])),
+        format(Code, [i(SyncTermBaseSlot)]))],
+    Code = "#ifdef MR_THREADSCOPE
+MR_threadscope_post_stop_par_conj(&MR_sv(%d));
+#endif
+".
+
 %-----------------------------------------------------------------------------%
 
     % In the code of parallel conjuncts we have to refer to stack slots in
@@ -405,6 +427,25 @@ place_all_outputs([Var | Vars], !CI) :-
 
 %----------------------------------------------------------------------------%
 
+:- pred create_static_conj_id(hlds_goal_info::in, int::out,
+    code_info::in, code_info::out) is det.
+
+create_static_conj_id(GoalInfo, SlotNum, !CI) :-
+    get_pred_id(!.CI, PredId),
+    get_proc_id(!.CI, ProcId),
+    get_module_info(!.CI, ModuleInfo),
+    ProcString = pred_proc_id_pair_to_string(ModuleInfo, PredId, ProcId),
+
+    get_containing_goal_map_det(!.CI, ContainingGoalMap),
+    GoalId = goal_info_get_goal_id(GoalInfo),
+    GoalPath = goal_id_to_forward_path(ContainingGoalMap, GoalId),
+    GoalPathString = goal_path_to_string(GoalPath),
+
+    String = format("%s: %s", [s(ProcString), s(GoalPathString)]),
+    add_threadscope_string(String, SlotNum, !CI).
+
+%----------------------------------------------------------------------------%
+
 :- func this_file = string.
 
 this_file = "par_conj_gen.m".
Index: compiler/peephole.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/peephole.m,v
retrieving revision 1.104
diff -u -p -b -r1.104 peephole.m
--- compiler/peephole.m	30 Dec 2010 11:17:57 -0000	1.104
+++ compiler/peephole.m	11 Mar 2011 06:55:53 -0000
@@ -611,7 +611,7 @@ replace_tagged_ptr_components_in_instr(O
         ; Uinstr0 = incr_sp(_, _, _)
         ; Uinstr0 = decr_sp(_)
         ; Uinstr0 = decr_sp_and_return(_)
-        ; Uinstr0 = init_sync_term(_, _)
+        ; Uinstr0 = init_sync_term(_, _, _)
         ; Uinstr0 = fork_new_child(_, _)
         ; Uinstr0 = join_and_continue(_, _)
         ),
Index: compiler/proc_gen.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/proc_gen.m,v
retrieving revision 1.43
diff -u -p -b -r1.43 proc_gen.m
--- compiler/proc_gen.m	4 Jan 2011 02:47:49 -0000	1.43
+++ compiler/proc_gen.m	11 Mar 2011 06:55:53 -0000
@@ -180,9 +180,10 @@ generate_code_parallel(ModuleInfo0, Pred
         list.condense(PredProceduresB0, ProceduresB0)
     ),
     merge_global_datas(GlobalDataA, GlobalDataB, !:GlobalData,
-        StaticCellRemapInfo),
-    list.map(remap_static_cell_references(StaticCellRemapInfo),
+        Remap),
+    list.map(remap_references_to_global_data(Remap),
         ProceduresB0, ProceduresB),
+
     Procedures = ProceduresA ++ ProceduresB.
 
     % These numbers are rather arbitrary.
@@ -306,7 +307,18 @@ generate_proc_code(PredInfo, ProcInfo0, 
 
     module_info_get_globals(ModuleInfo, Globals),
     globals.get_trace_level(Globals, TraceLevel),
-    ( given_trace_level_is_none(TraceLevel) = no ->
+    proc_info_get_has_parallel_conj(ProcInfo1, HasParConj),
+    globals.lookup_bool_option(Globals, parallel, Parallel),
+    (
+        % Make the containing goal map availble if we need it, it is needed
+        % for tracing or for parallel conjunctions.
+        (
+            given_trace_level_is_none(TraceLevel) = no
+        ;
+            HasParConj = yes,
+            Parallel = yes
+        )
+    ->
         fill_goal_id_slots_in_proc(ModuleInfo, ContainingGoalMap,
             ProcInfo1, ProcInfo),
         MaybeContainingGoalMap = yes(ContainingGoalMap)
@@ -337,9 +349,13 @@ generate_proc_code(PredInfo, ProcInfo0, 
     % for model_det procedures, always needed for model_semi procedures, and
     % needed for model_non procedures only if we are doing execution tracing.
     global_data_get_static_cell_info(!.GlobalData, StaticCellInfo0),
+    global_data_get_threadscope_rev_string_table(!.GlobalData,
+        TSRevStringTable0, TSStringTableSize0),
+
     code_info_init(SaveSuccip, Globals, PredId, ProcId, PredInfo,
         ProcInfo, FollowVars, ModuleInfo, StaticCellInfo0,
-        OutsideResumePoint, TraceSlotInfo, MaybeContainingGoalMap, CodeInfo0),
+        OutsideResumePoint, TraceSlotInfo, MaybeContainingGoalMap,
+        TSRevStringTable0, TSStringTableSize0, CodeInfo0),
 
     % Find out the approriate context for the predicate's interface events.
     pred_info_get_clauses_info(PredInfo, ClausesInfo),
@@ -362,6 +378,11 @@ generate_proc_code(PredInfo, ProcInfo0, 
     get_static_cell_info(CodeInfo, StaticCellInfo),
     global_data_set_static_cell_info(StaticCellInfo, !GlobalData),
 
+    get_threadscope_rev_string_table(CodeInfo,
+        TSRevStringTable, TSStringTableSize),
+    global_data_set_threadscope_rev_string_table(TSRevStringTable,
+        TSStringTableSize, !GlobalData),
+
     get_created_temp_frame(CodeInfo, CreatedTempFrame),
     get_proc_trace_events(CodeInfo, ProcTraceEvents),
     % You can have user trace events even if the effective trace level is none.
Index: compiler/reassign.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/reassign.m,v
retrieving revision 1.32
diff -u -p -b -r1.32 reassign.m
--- compiler/reassign.m	30 Dec 2010 11:17:58 -0000	1.32
+++ compiler/reassign.m	11 Mar 2011 06:55:53 -0000
@@ -306,7 +306,7 @@ remove_reassign_loop([Instr0 | Instrs0],
         !:KnownContentsMap = map.init,
         !:DepLvalMap = map.init
     ;
-        Uinstr0 = init_sync_term(Target, _),
+        Uinstr0 = init_sync_term(Target, _, _),
         !:RevInstrs = [Instr0 | !.RevInstrs],
         clobber_dependents(Target, !KnownContentsMap, !DepLvalMap)
     ;
Index: compiler/use_local_vars.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/use_local_vars.m,v
retrieving revision 1.42
diff -u -p -b -r1.42 use_local_vars.m
--- compiler/use_local_vars.m	30 Dec 2010 11:17:59 -0000	1.42
+++ compiler/use_local_vars.m	11 Mar 2011 06:55:53 -0000
@@ -687,7 +687,7 @@ substitute_lval_in_instr_until_defn_2(Ol
         ; Uinstr0 = incr_sp(_, _, _)
         ; Uinstr0 = decr_sp(_)
         ; Uinstr0 = decr_sp_and_return(_)
-        ; Uinstr0 = init_sync_term(_, _)
+        ; Uinstr0 = init_sync_term(_, _, _)
         ; Uinstr0 = fork_new_child(_, _)
         ; Uinstr0 = join_and_continue(_, _)
         ; Uinstr0 = arbitrary_c_code(_, _, _)
Index: runtime/mercury_context.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.c,v
retrieving revision 1.86
diff -u -p -b -r1.86 mercury_context.c
--- runtime/mercury_context.c	8 Feb 2011 03:48:10 -0000	1.86
+++ runtime/mercury_context.c	11 Mar 2011 07:12:00 -0000
@@ -1139,6 +1139,7 @@ MR_define_entry(MR_do_runnext);
     #ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
     MR_Timer        runnext_timer;
     #endif
+
     /*
     ** If this engine is holding onto a context, the context should not be
     ** in the middle of running some code.
@@ -1323,6 +1324,9 @@ ReadySpark:
                 &MR_profile_parallel_executed_global_sparks);
     }
     #endif
+    #ifdef MR_THREADSCOPE
+    MR_threadscope_post_steal_spark(spark.MR_spark_id);
+    #endif
     MR_GOTO(spark.MR_spark_resume);
 }
   #else /* !MR_THREAD_SAFE */
@@ -1357,6 +1361,10 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
     MR_bool     jnc_last;
     MR_Context  *this_context = MR_ENGINE(MR_eng_this_context);
 
+  #ifdef MR_THREADSCOPE
+    MR_threadscope_post_stop_par_conjunct((MR_Word*)jnc_st);
+  #endif
+
     /*
     ** Atomically decrement and fetch the number of conjuncts yet to complete.
     ** If we're the last conjunct to complete (the parallel conjunction is
@@ -1399,18 +1407,19 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
             return MR_ENTRY(MR_do_runnext);
         }
     } else {
-        MR_bool     popped;
-        MR_Code     *spark_resume;
+        volatile MR_Spark *spark;
 
         /*
         ** The parallel conjunction it is not yet finished. Try to work on a
         ** spark from our local stack. The sparks on our stack are likely to
         ** cause this conjunction to be complete.
         */
-        popped = MR_wsdeque_pop_bottom(&this_context->MR_ctxt_spark_deque,
-            &spark_resume);
-        if (popped) {
-            return spark_resume;
+        spark = MR_wsdeque_pop_bottom(&this_context->MR_ctxt_spark_deque);
+        if (NULL != spark) {
+#ifdef MR_THREADSCOPE
+            MR_threadscope_post_run_spark(spark->MR_spark_id);
+#endif
+            return spark->MR_spark_resume;
         } else {
             /*
             ** If this context originated the parallel conjunction that we've
@@ -1426,9 +1435,10 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
   #endif
                 MR_save_context(this_context);
                 /*
-                ** XXX: Make sure the context gets saved before we set
-                ** the join label, use a memory barrier.
+                ** Make sure the context gets saved before we set the join
+                ** label, use a memory barrier.
                 */
+                MR_CPU_SFENCE;
                 this_context->MR_ctxt_resume = (join_label);
                 MR_ENGINE(MR_eng_this_context) = NULL;
             } else {
Index: runtime/mercury_context.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.h,v
retrieving revision 1.62
diff -u -p -b -r1.62 mercury_context.h
--- runtime/mercury_context.h	8 Feb 2011 03:48:10 -0000	1.62
+++ runtime/mercury_context.h	11 Mar 2011 07:11:59 -0000
@@ -237,6 +237,9 @@ struct MR_Spark_Struct {
     MR_SyncTerm             *MR_spark_sync_term;
     MR_Code                 *MR_spark_resume;
     MR_ThreadLocalMuts      *MR_spark_thread_local_mutables;
+#ifdef MR_THREADSCOPE
+    MR_uint_least32_t       MR_spark_id;
+#endif
 };
 
 struct MR_SparkDeque_Struct {
@@ -718,14 +721,26 @@ extern  void        MR_schedule_context(
     volatile MR_Unsigned    MR_st_count;
   };
 
-  #define MR_init_sync_term(sync_term, nbranches)                             \
+#ifdef MR_THREADSCOPE
+  #define MR_init_sync_term(sync_term, nbranches, static_conj_id)             \
     do {                                                                      \
         MR_SyncTerm *init_st = (MR_SyncTerm *) &(sync_term);                  \
                                                                               \
         init_st->MR_st_orig_context = MR_ENGINE(MR_eng_this_context);         \
         init_st->MR_st_parent_sp = MR_parent_sp;                              \
         init_st->MR_st_count = (nbranches);                                   \
+        MR_threadscope_post_start_par_conj(&(sync_term), static_conj_id);     \
     } while (0)
+#else
+  #define MR_init_sync_term(sync_term, nbranches, static_conj_id)             \
+    do {                                                                      \
+        MR_SyncTerm *init_st = (MR_SyncTerm *) &(sync_term);                  \
+                                                                              \
+        init_st->MR_st_orig_context = MR_ENGINE(MR_eng_this_context);         \
+        init_st->MR_st_parent_sp = MR_parent_sp;                              \
+        init_st->MR_st_count = (nbranches);                                   \
+    } while (0)
+#endif
 
   /*
   ** fork_new_child(MR_SyncTerm st, MR_Code *child):
@@ -735,17 +750,35 @@ extern  void        MR_schedule_context(
   ** MR_parent_sp must already be set appropriately before this instruction
   ** is executed.
   */
+#ifdef MR_THREADSCOPE
   #define MR_fork_new_child(sync_term, child)                                 \
     do {                                                                      \
         MR_Spark fnc_spark;                                                   \
         MR_SparkDeque   *fnc_deque;                                           \
+        MR_uint_least32_t   id;                                               \
                                                                               \
         fnc_spark.MR_spark_sync_term = (MR_SyncTerm*) &(sync_term);           \
         fnc_spark.MR_spark_resume = (child);                                  \
         fnc_spark.MR_spark_thread_local_mutables = MR_THREAD_LOCAL_MUTABLES;  \
+        id = MR_ENGINE(MR_eng_next_spark_id)++;                               \
+        fnc_spark.MR_spark_id = (MR_ENGINE(MR_eng_id) << 24)|(id & 0xFFFFFF); \
         fnc_deque = &MR_ENGINE(MR_eng_this_context)->MR_ctxt_spark_deque;     \
         MR_wsdeque_push_bottom(fnc_deque, &fnc_spark);                        \
+        MR_threadscope_post_sparking(&(sync_term), fnc_spark.MR_spark_id);    \
     } while (0)
+#else
+  #define MR_fork_new_child(sync_term, child)                                 \
+    do {                                                                      \
+        MR_Spark fnc_spark;                                                   \
+        MR_SparkDeque   *fnc_deque;                                           \
+                                                                              \
+        fnc_spark.MR_spark_sync_term = (MR_SyncTerm*) &(sync_term);           \
+        fnc_spark.MR_spark_resume = (child);                                  \
+        fnc_spark.MR_spark_thread_local_mutables = MR_THREAD_LOCAL_MUTABLES;  \
+        fnc_deque = &MR_ENGINE(MR_eng_this_context)->MR_ctxt_spark_deque;     \
+        MR_wsdeque_push_bottom(fnc_deque, &fnc_spark);                        \
+    } while (0)
+#endif
 
   /*
   ** This macro may be used as conditions for runtime parallelism decisions.
Index: runtime/mercury_engine.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_engine.h,v
retrieving revision 1.54
diff -u -p -b -r1.54 mercury_engine.h
--- runtime/mercury_engine.h	31 May 2010 09:41:47 -0000	1.54
+++ runtime/mercury_engine.h	10 Mar 2011 06:12:56 -0000
@@ -275,6 +275,12 @@ typedef struct {
 **
 ** global_hp    The global heap pointer for this engine.
 **
+** MR_eng_parent_sp
+**              The stack pointer of the parent contexts' stack frame from
+**              which this execution was forked.  This is used to implement
+**              parallel conjunctions and enable a parallel conjunct to read
+**              it's input from it's parent, and to write back it's output.
+**
 ** this_context Points to the "backing store" for the context currently
 **              executing on this engine.
 **
@@ -341,6 +347,18 @@ typedef struct {
 **              that engine is available. When the call into the Mercury code
 **              finishes, c_depth is decremented.
 **
+** MR_eng_cpu_clock_ticks_offset
+**              The offset to be added to the CPU's TSC to give a time relative to the start of the program.
+**
+** MR_eng_ts_buffer
+**              The buffer object used by threadscope for this engine.
+**
+** MR_eng_id    The ID of this engine which is used by threadscope.
+**
+** MR_eng_next_spark_id
+**              In threadscope grades sparks are given IDs to help us track
+**              them.  This and MR_eng_id is used to allocate unique IDs.
+**
 ** jmp_buf      The jump buffer used by library/exception.m to return to the
 **              runtime system on otherwise unhandled exceptions.
 **
@@ -400,7 +418,8 @@ typedef struct MR_mercury_engine_struct 
     */
     MR_int_least64_t                    MR_eng_cpu_clock_ticks_offset;
     struct MR_threadscope_event_buffer  *MR_eng_ts_buffer;
-    MR_Unsigned                         MR_eng_id;
+    MR_uint_least16_t                   MR_eng_id;
+    MR_uint_least32_t                   MR_eng_next_spark_id;
   #endif
 #endif
     jmp_buf             *MR_eng_jmp_buf;
Index: runtime/mercury_misc.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_misc.h,v
retrieving revision 1.30
diff -u -p -b -r1.30 mercury_misc.h
--- runtime/mercury_misc.h	10 Sep 2010 13:10:46 -0000	1.30
+++ runtime/mercury_misc.h	11 Mar 2011 06:55:53 -0000
@@ -153,6 +153,9 @@ MR_perform_registered_exception_cleanups
 #define MR_proc_exec_traces(m)                      \
     MR_PASTE2(mercury_data__proc_exec_traces_array__, m)
 
+#define MR_threadscope_strings(m)                   \
+    MR_PASTE2(mercury_data__threadscope_string_table_array__, m)
+
 /*---------------------------------------------------------------------------*/
 
 #define MR_no_var_label_layout_refs1(m, s1)             \
Index: runtime/mercury_threadscope.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_threadscope.c,v
retrieving revision 1.8
diff -u -p -b -r1.8 mercury_threadscope.c
--- runtime/mercury_threadscope.c	8 Feb 2011 03:48:10 -0000	1.8
+++ runtime/mercury_threadscope.c	11 Mar 2011 06:55:52 -0000
@@ -120,8 +120,8 @@ ENDINIT
 #define MR_TS_EVENT_STOP_THREAD          2 /* (thread, status)       */
 #define MR_TS_EVENT_THREAD_RUNNABLE      3 /* (thread)               */
 #define MR_TS_EVENT_MIGRATE_THREAD       4 /* (thread, new_cap)      */
-#define MR_TS_EVENT_RUN_SPARK            5 /* (thread)               */
-#define MR_TS_EVENT_STEAL_SPARK          6 /* (thread, victim_cap)   */
+#define MR_TS_EVENT_RUN_SPARK            5 /* (thread, spark_id)     */
+#define MR_TS_EVENT_STEAL_SPARK          6 /* (thread, victim_cap, spark_id) */
 #define MR_TS_EVENT_SHUTDOWN             7 /* ()                     */
 #define MR_TS_EVENT_THREAD_WAKEUP        8 /* (thread, other_cap)    */
 #define MR_TS_EVENT_GC_START             9 /* ()                     */
@@ -136,11 +136,34 @@ ENDINIT
 #define MR_TS_EVENT_GC_IDLE             20 /* () */
 #define MR_TS_EVENT_GC_WORK             21 /* () */
 #define MR_TS_EVENT_GC_DONE             22 /* () */
-#define MR_TS_EVENT_CALL_MAIN           23 /* () */
+
+/*
+** Duncan Coutts has reserved IDs 23-37 in a discussion via IRC.
+*/
+
+#define MR_TS_EVENT_RUNTIME_TYPE        38 /* (rt_type) */
+
+#define MR_TS_EVENT_STRING              39 /* (string, id) */
+#define MR_TS_EVENT_CALL_MAIN           40 /* () */
 #define MR_TS_EVENT_LOOKING_FOR_GLOBAL_WORK \
-                                        24 /* () */
+                                        41 /* () */
+                                        /* XXX: Is it possible to infer the
+                                         * dynamic conj id here? */
+#define MR_TS_EVENT_SPARKING            42 /* (int id, spark id) */
+
+#define MR_TS_NUM_EVENT_TAGS            43
+
+#define MR_TS_MER_EVENT_START           100
 
-#define MR_TS_NUM_EVENT_TAGS            25
+#define MR_TS_MER_EVENT_START_PAR_CONJ    100 /* (int id, memo'd string id) */
+#define MR_TS_MER_EVENT_STOP_PAR_CONJ     101 /* (int id) */
+#define MR_TS_MER_EVENT_STOP_PAR_CONJUNCT 102 /* (itn id) */
+
+#define MR_TS_NUM_MER_EVENTS            3
+
+#define MR_TS_RUNTIME_HASKELL           0
+#define MR_TS_RUNTIME_EDEN              1
+#define MR_TS_RUNTIME_MERCURY           2
 
 #if 0  /* DEPRECATED EVENTS: */
 #define EVENT_CREATE_SPARK        13 /* (cap, thread) */
@@ -203,7 +226,8 @@ typedef MR_uint_least32_t   EventlogOffs
 
 typedef struct {
     EventType   etd_event_type;
-    const char  *etd_description;
+    const char*         etd_description;
+    MR_int_least16_t    etd_size;
 } EventTypeDesc;
 
 /***************************************************************************/
@@ -215,13 +239,16 @@ static EventTypeDesc event_type_descs[] 
         ** using.  It should be given outside of a block.
         */
         MR_TS_EVENT_STARTUP,
-        "Startup (num_engines)"
+        "Startup (num_engines)",
+        2 /* MR_EngineId */
     },
     { 
         /*
         ** The last event in the log.  It should be given outside of a block.
         */
-        MR_TS_EVENT_SHUTDOWN, "Shutdown"
+        MR_TS_EVENT_SHUTDOWN,
+        "Shutdown",
+        0
     },
     {
         /*
@@ -231,28 +258,50 @@ static EventTypeDesc event_type_descs[] 
         ** Blocks _must not_ exist within other blocks.
         */
         MR_TS_EVENT_BLOCK_MARKER, 
-        "A block of events generated by a specific engine follows" 
+        "A block of events generated by a specific engine follows",
+        4 + 8 + 2 /* EnginelogOffset, Time, MR_EngineId */
     },
     {
         /*
         ** Called when a context is created or re-used.
         */
         MR_TS_EVENT_CREATE_THREAD,
-        "A context is created or re-used"
+        "A context is created or re-used",
+        4 /* MR_ContextId */
     },
     {
         /*
         ** Called from MR_schedule_context()
         */
         MR_TS_EVENT_THREAD_RUNNABLE,
-        "The context is being placed on the run queue"
+        "The context is being placed on the run queue",
+        4 /* MR_ContextId */
+    },
+    {
+        /*
+        ** The named context begun executing a spark from it's local stack.
+        */
+        MR_TS_EVENT_RUN_SPARK,
+        "Run a spark from the local stack",
+        4 + 4 /* MR_ContextId + MR_SparkId */
+    },
+    {
+        /*
+        ** The named context begun executing a spark from another engine's
+        ** stack.
+        */
+        MR_TS_EVENT_STEAL_SPARK,
+        "Run a spark stolen from another engine",
+        4 + 2 + 4 /* MR_ContextId + MR_EngineId  + SparkId */
     },
     {
         /*
         ** The named context begun executing on the engine named by the current
         ** block.
         */
-        MR_TS_EVENT_RUN_THREAD, "Run context"
+        MR_TS_EVENT_RUN_THREAD,
+        "Run context",
+        4 /* MR_ContextId */
     },
     {
         /*
@@ -260,25 +309,29 @@ static EventTypeDesc event_type_descs[] 
         ** current block.  The reason why the context stopped is given.
         */
         MR_TS_EVENT_STOP_THREAD, 
-        "Context stopped"
+        "Context stopped",
+        4 + 2 /* MR_ContextId, MR_ContextStopReason */
     },
     {
         /*
         ** This event is posted when a context is created for a spark.
         */
         MR_TS_EVENT_CREATE_SPARK_THREAD,
-        "Create a context for executing a spark"
+        "Create a context for executing a spark",
+        4 /* MR_ContextId */
     },
     {
         MR_TS_EVENT_LOG_MSG,
-        "A user-provided log message"
+        "A user-provided log messagei",
+        -1 /* Variable length */
     },
     {
         /*
         ** Start a garbage collection run
         */
         MR_TS_EVENT_GC_START,
-        "Start GC"
+        "Start GC",
+        0
     },
     {
         /*
@@ -286,6 +339,25 @@ static EventTypeDesc event_type_descs[] 
         */
         MR_TS_EVENT_GC_END,
         "Stop GC",
+        0
+    },
+    {
+        /*
+        ** The runtime system writes down which type of runtime system it is.
+        ** ie, Haskell, Eden or Mercury.
+        */
+        MR_TS_EVENT_RUNTIME_TYPE,
+        "The type of the runtime",
+        2
+    },
+    {
+        /*
+        ** The runtime system registers a string and an ID for it so that the
+        ** ID represents the string in future messages.
+        */
+        MR_TS_EVENT_STRING,
+        "Register an id->string mapping",
+        -1
     },
     {
         /*
@@ -293,50 +365,51 @@ static EventTypeDesc event_type_descs[] 
         ** parameters.
         */
         MR_TS_EVENT_CALL_MAIN,
-        "About to call main/2"
+        "About to call main/2",
+        0
     },
     {
         MR_TS_EVENT_LOOKING_FOR_GLOBAL_WORK,
-        "Engine begins looking for global work"
+        "Engine begins looking for global work",
+        0
+    },
+    {
+        MR_TS_EVENT_SPARKING,
+        "A spark is being created",
+        8 + 4 /* Dynamic Conj ID + Spark ID */
+        /*
+         * Note that the dynamic conj ID is only useful for Mercury. other
+         * implementors may want different attributes here.
+         */
+    },
+    {
+        MR_TS_MER_EVENT_START_PAR_CONJ,
+        "Start a parallel conjunction (dyn id, static id)",
+        8 + 4 /* Dynamic Conj ID + Static Conj ID */
+    },
+    {
+        MR_TS_MER_EVENT_STOP_PAR_CONJ,
+        "Stop a parallel conjunction (dyn id)",
+        8 /* Dynamic Conj ID */
+    },
+    {
+        MR_TS_MER_EVENT_STOP_PAR_CONJUNCT,
+        "Stop a parallel conjunct (dyn id)",
+        8 /* Dynamic Conj ID */
     },
     {
         /* Mark the end of this array. */
-        MR_TS_NUM_EVENT_TAGS, NULL
+        MR_TS_NUM_EVENT_TAGS, NULL, 0
     }
 };
 
-static MR_uint_least16_t event_type_sizes[] = {
-    /* MR_TS_EVENT_CREATE_THREAD */
-        4, /* MR_ContextId */
-    /* MR_TS_EVENT_RUN_THREAD */
-        4, /* MR_ContextId */
-    /* MR_TS_EVENT_STOP_THREAD */
-        4 + 2, /* MR_ContextId, MR_ContextStopReason */
-    /* MR_TS_EVENT_THREAD_RUNNABLE */
-        4, /* MR_ContextId */
-    0, 0, 0,
-    /* MR_TS_EVENT_SHUTDOWN */          
-        0,
-    0,
-    /* MR_TS_EVENT_GC_START */
-        0,
-    /* MR_TS_EVENT_GC_END */
-        0,
-    0, 0, 0, 0,
-    /* MR_TS_EVENT_CREATE_SPARK_THREAD */
-        4, /* MR_ContextId */
-    /* MR_TS_EVENT_LOG_MSG */
-        -1, /* Variable size event */
-    /* MR_TS_EVENT_STARTUP */
-        2, /* MR_EngineId */
-    /* MR_TS_EVENT_BLOCK_MARKER */
-        4 + 8 + 2, /* EnginelogOffset, Time, MR_EngineId */
-    0, 0, 0, 0,
-    /* MR_TS_EVENT_CALL_MAIN */
-        0,
-    /* MR_TS_EVENT_LOOKING_FOR_GLOBAL_WORK */
-        0,
-};
+/*
+** These tables are filled in when the header of the log file is written.
+** While they can be inferred from the event_type_desc structure they allow for
+** constant time lookup.
+*/
+static MR_int_least16_t event_type_sizes[MR_TS_NUM_EVENT_TAGS];
+static MR_int_least16_t event_type_sizes_mercury[MR_TS_NUM_MER_EVENTS];
 
 static FILE* MR_threadscope_output_file = NULL;
 static char* MR_threadscope_output_filename;
@@ -356,8 +429,30 @@ static Timedelta        MR_global_offset
 
 static struct MR_threadscope_event_buffer global_buffer;
 
+/*
+** An ID that may be allocated to the next string to be registered.
+*/
+static MR_TS_StringId   MR_next_string_id = 0;
+
 /***************************************************************************/
 
+MR_STATIC_INLINE MR_int_least16_t
+event_type_size(EventType event_type) {
+    MR_int_least16_t size;
+
+    if (event_type < MR_TS_NUM_EVENT_TAGS) {
+        size = event_type_sizes[event_type];
+    } else if ((event_type < (MR_TS_MER_EVENT_START + MR_TS_NUM_MER_EVENTS))
+            && (event_type >= MR_TS_MER_EVENT_START)) {
+        size = event_type_sizes_mercury[event_type - MR_TS_MER_EVENT_START];
+    } else {
+        fprintf(stderr, "Unknown event type %d\n", event_type);
+        abort();
+    }
+
+    return size;
+}
+
 /*
 ** Is there enough room for this statically sized event in the current engine's
 ** buffer _and_ enough room for the block marker event.
@@ -367,8 +462,8 @@ enough_room_for_event(
         struct MR_threadscope_event_buffer *buffer,
         EventType event_type) 
 {
-    return (buffer->MR_tsbuffer_pos + event_type_sizes[event_type] +
-                event_type_sizes[MR_TS_EVENT_BLOCK_MARKER] +
+    return (buffer->MR_tsbuffer_pos + event_type_size(event_type) +
+                event_type_size(MR_TS_EVENT_BLOCK_MARKER) +
                 ((2 + 8) * 2)) /* (EventType, Time) * 2 */
             < MR_TS_BUFFERSIZE; 
 }
@@ -379,8 +474,8 @@ enough_room_for_variable_size_event(
         MR_Unsigned length)
 {
     return (buffer->MR_tsbuffer_pos + length + 
-                event_type_sizes[MR_TS_EVENT_BLOCK_MARKER] +
-                ((2 + 8) * 2) + 2) /* (EventType, Time) * 2 + StringLength */
+                event_type_size(MR_TS_EVENT_BLOCK_MARKER) +
+                (2 + 8) * 2) /* (EventType, Time) * 2 */
             < MR_TS_BUFFERSIZE;
 }
 
@@ -519,6 +614,27 @@ MR_STATIC_INLINE void put_stop_reason(
     put_be_uint16(buffer, reason);
 }
 
+MR_STATIC_INLINE void put_string_id(
+        struct MR_threadscope_event_buffer *buffer,
+        MR_TS_StringId id)
+{
+    put_be_uint32(buffer, id);
+}
+
+MR_STATIC_INLINE void put_par_conj_dynamic_id(
+        struct MR_threadscope_event_buffer *buffer,
+        MR_Word* id)
+{
+    put_be_uint64(buffer, (MR_Word)id);
+}
+
+MR_STATIC_INLINE void put_spark_id(
+        struct MR_threadscope_event_buffer *buffer,
+        MR_SparkId spark_id)
+{
+    put_be_uint32(buffer, spark_id);
+}
+
 /***************************************************************************/
 
 static struct MR_threadscope_event_buffer* 
@@ -537,6 +653,9 @@ static void
 put_event_type(struct MR_threadscope_event_buffer *buffer, 
     EventTypeDesc *event_type);
 
+static MR_TS_StringId
+MR_threadscope_register_string(const char *string);
+
 static MR_bool 
 flush_event_buffer(struct MR_threadscope_event_buffer *buffer); 
 
@@ -600,10 +719,17 @@ MR_setup_threadscope(void) 
     MR_open_output_file_and_write_prelude();
     
     /*
+    ** Put the runtime type event in the buffer.
+    */
+    put_event_header(&global_buffer, MR_TS_EVENT_RUNTIME_TYPE, 0);
+    put_be_uint16(&global_buffer, MR_TS_RUNTIME_MERCURY);
+
+    /*
     ** Put the startup event in the buffer.
     */
     put_event_header(&global_buffer, MR_TS_EVENT_STARTUP, 0);
     put_engine_id(&global_buffer, (MR_EngineId)MR_num_threads);
+
     flush_event_buffer(&global_buffer);
 }
 
@@ -625,6 +751,7 @@ MR_threadscope_setup_engine(MercuryEngin
     );
     MR_LOCK(&MR_next_engine_id_lock, "MR_get_next_engine_id");
     eng->MR_eng_id = MR_next_engine_id++;
+    eng->MR_eng_next_spark_id = 0;
     MR_UNLOCK(&MR_next_engine_id_lock, "MR_get_next_engine_id");
 
     if (eng->MR_eng_id == 0) {
@@ -953,6 +1080,83 @@ MR_threadscope_post_stop_context(MR_Cont
 }
 
 void
+MR_threadscope_post_run_spark(MR_SparkId spark_id)
+{
+    struct MR_threadscope_event_buffer  *buffer;
+    MR_Context                          *context;
+
+    buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+    context = MR_thread_engine_base->MR_eng_this_context;
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_EVENT_RUN_SPARK)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_EVENT_RUN_SPARK,
+        get_current_time_nanosecs());
+    put_context_id(buffer, context->MR_ctxt_num_id);
+    put_spark_id(buffer, spark_id);
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_steal_spark(MR_SparkId spark_id)
+{
+    struct MR_threadscope_event_buffer  *buffer;
+    MR_Context                          *context;
+    unsigned                            engine_id;
+
+    buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+    context = MR_thread_engine_base->MR_eng_this_context;
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_EVENT_STEAL_SPARK)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_EVENT_STEAL_SPARK,
+        get_current_time_nanosecs());
+    put_context_id(buffer, context->MR_ctxt_num_id);
+    /*
+    ** The engine that created the spark (which may not be whom it was stolen
+    ** from if different work-stealking algorithms are implemented) can be
+    ** derrived from the spark id.
+    */
+    engine_id = (spark_id & 0xFF000000) >> 24;
+    put_be_uint16(buffer, engine_id);
+    put_spark_id(buffer, spark_id);
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_sparking(MR_Word* dynamic_conj_id, MR_SparkId spark_id)
+{
+    struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_EVENT_SPARKING)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_EVENT_SPARKING,
+        get_current_time_nanosecs());
+    put_par_conj_dynamic_id(buffer, dynamic_conj_id);
+    put_spark_id(buffer, spark_id);
+
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
 MR_threadscope_post_calling_main(void) {
     struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
     
@@ -987,11 +1191,112 @@ MR_threadscope_post_looking_for_global_w
 }
 
 void
+MR_threadscope_post_start_par_conj(MR_Word* dynamic_id,
+        MR_TS_StringId static_id) {
+    struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_MER_EVENT_START_PAR_CONJ)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_MER_EVENT_START_PAR_CONJ,
+        get_current_time_nanosecs());
+    put_par_conj_dynamic_id(buffer, dynamic_id);
+    put_string_id(buffer, static_id);
+
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_stop_par_conj(MR_Word *dynamic_id) {
+    struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_MER_EVENT_STOP_PAR_CONJ)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_MER_EVENT_STOP_PAR_CONJ,
+        get_current_time_nanosecs());
+    put_par_conj_dynamic_id(buffer, dynamic_id);
+
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_stop_par_conjunct(MR_Word *dynamic_id) {
+    struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+    MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+    if (!enough_room_for_event(buffer, MR_TS_MER_EVENT_STOP_PAR_CONJUNCT)) {
+        flush_event_buffer(buffer);
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    } else if (!block_is_open(buffer)) {
+        open_block(buffer, MR_ENGINE(MR_eng_id));
+    }
+
+    put_event_header(buffer, MR_TS_MER_EVENT_STOP_PAR_CONJUNCT,
+        get_current_time_nanosecs());
+    put_par_conj_dynamic_id(buffer, dynamic_id);
+
+    MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+/*
+** Register a string for use in future messages.
+*/
+static MR_TS_StringId
+MR_threadscope_register_string(const char *string)
+{
+    MR_TS_StringId id;
+    unsigned length;
+
+    length = strlen(string);
+
+    /*
+    ** +2 for the event length.
+    ** +4 for the string id.
+    */
+    if (!enough_room_for_variable_size_event(&global_buffer, strlen(string)
+            + 2 + 4)) {
+        flush_event_buffer(&global_buffer);
+    }
+
+    put_event_header(&global_buffer, MR_TS_EVENT_STRING, 0);
+    id = MR_next_string_id++;
+    put_be_uint16(&global_buffer, length + 4);
+    put_raw_string(&global_buffer, string, length);
+    put_string_id(&global_buffer, id);
+
+    return id;
+}
+
+void
+MR_threadscope_register_strings_array(MR_Threadscope_String *array,
+        unsigned size) {
+    unsigned i;
+
+    for (i = 0; i < size; i++) {
+        array[i].MR_tsstring_id =
+            MR_threadscope_register_string(array[i].MR_tsstring_string);
+    }
+
+    flush_event_buffer(&global_buffer);
+}
+
+void
 MR_threadscope_post_log_msg(const char *message) {
     struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
 
     MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
-    if (!enough_room_for_variable_size_event(buffer, strlen(message))) {
+    if (!enough_room_for_variable_size_event(buffer, strlen(message) + 2)) {
         flush_event_buffer(buffer),
         open_block(buffer, MR_ENGINE(MR_eng_id));
     } else if (!block_is_open(buffer)) {
@@ -1082,14 +1387,33 @@ MR_close_output_file(void)
 }
 
 static void
-put_event_type(struct MR_threadscope_event_buffer *buffer, EventTypeDesc *event_type)
+put_event_type(struct MR_threadscope_event_buffer *buffer,
+    EventTypeDesc *event_type_desc)
 {
+    MR_int_least16_t    size;
+    EventType           event_type;
+
+    /*
+     * This also fills in our tables of event sizes.
+     */
+    event_type = event_type_desc->etd_event_type;
+    size = event_type_desc->etd_size;
+    if (event_type < MR_TS_NUM_EVENT_TAGS) {
+        event_type_sizes[event_type] = size;
+    } else if ((event_type < (MR_TS_MER_EVENT_START + MR_TS_NUM_MER_EVENTS))
+            && (event_type >= MR_TS_MER_EVENT_START)) {
+        event_type_sizes_mercury[event_type - MR_TS_MER_EVENT_START] = size;
+    } else {
+        fprintf(stderr, "Unknown event type %d\n", event_type);
+        abort();
+    }
+
     put_be_uint32(buffer, MR_TS_EVENT_ET_BEGIN);
 
-    put_be_uint16(buffer, event_type->etd_event_type);
-    put_be_int16(buffer, event_type_sizes[event_type->etd_event_type]);
+    put_be_uint16(buffer, event_type);
+    put_be_int16(buffer, size);
 
-    put_string_size32(buffer, event_type->etd_description);
+    put_string_size32(buffer, event_type_desc->etd_description);
 
     /* There is no extended data in any of our events */
     put_be_uint32(buffer, 0);
Index: runtime/mercury_threadscope.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_threadscope.h,v
retrieving revision 1.5
diff -u -p -b -r1.5 mercury_threadscope.h
--- runtime/mercury_threadscope.h	17 Feb 2010 02:37:45 -0000	1.5
+++ runtime/mercury_threadscope.h	11 Mar 2011 06:55:52 -0000
@@ -39,6 +39,13 @@ typedef struct MR_threadscope_event_buff
 typedef MR_uint_least16_t   MR_EngineId;
 typedef MR_uint_least16_t   MR_ContextStopReason;
 typedef MR_Integer          MR_ContextId;
+typedef MR_uint_least32_t   MR_TS_StringId;
+typedef MR_uint_least32_t   MR_SparkId;
+
+typedef struct MR_Threadscope_String {
+    const char*     MR_tsstring_string;
+    MR_TS_StringId  MR_tsstring_id;
+} MR_Threadscope_String;
 
 /*
 ** This must be called by the primordial thread before starting any other
@@ -122,6 +129,26 @@ extern void
 MR_threadscope_post_stop_context(MR_ContextStopReason reason);
 
 /*
+** This message says we're about to execute a spark from our local stack.
+*/
+extern void
+MR_threadscope_post_run_spark(MR_SparkId spark_id);
+
+/*
+** This message says that we're about to execute a spark that was stolen from
+** another's stack.
+*/
+extern void
+MR_threadscope_post_steal_spark(MR_SparkId spark_id);
+
+/*
+** This message says that a spark is being created for the given computation.
+** The spark's ID is given as an argument.
+*/
+extern void
+MR_threadscope_post_sparking(MR_Word* dynamic_conj_id, MR_SparkId spark_id);
+
+/*
 ** Post this message just before invoking the main/2 predicate.
 */
 extern void
@@ -134,6 +161,31 @@ extern void
 MR_threadscope_post_looking_for_global_work(void);
 
 /*
+** Post this message before a parallel conjunction starts.
+*/
+extern void
+MR_threadscope_post_start_par_conj(MR_Word* dynamic_id, MR_TS_StringId static_id);
+
+/*
+** Post this message after a parallel conjunction stops.
+*/
+extern void
+MR_threadscope_post_stop_par_conj(MR_Word* dynamic_id);
+
+/*
+** Post this message when a parallel conjunct calls the bariier code.
+*/
+extern void
+MR_threadscope_post_stop_par_conjunct(MR_Word* dynamic_id);
+
+/*
+** Register all the strings in an array and save their IDs in the array.
+*/
+extern void
+MR_threadscope_register_strings_array(MR_Threadscope_String *array,
+    unsigned size);
+
+/*
 ** Post a user-defined log message.
 */
 extern void
Index: runtime/mercury_wrapper.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.c,v
retrieving revision 1.216
diff -u -p -b -r1.216 mercury_wrapper.c
--- runtime/mercury_wrapper.c	26 Jan 2011 16:08:07 -0000	1.216
+++ runtime/mercury_wrapper.c	11 Mar 2011 07:11:59 -0000
@@ -410,6 +410,9 @@ void    (*MR_address_of_init_modules_com
 void    (*MR_address_of_write_out_proc_statics)(FILE *deep_fp,
             FILE *procrep_fp);
 #endif
+#ifdef  MR_THREADSCOPE
+void    (*MR_address_of_init_modules_threadscope_string_table)(void);
+#endif
 void    (*MR_address_of_init_modules_required)(void);
 void    (*MR_address_of_final_modules_required)(void);
 
@@ -650,6 +653,12 @@ mercury_runtime_init(int argc, char **ar
     ** Pin the primordial thread, if thread pinning is configured.
     */
     MR_setup_threadscope();
+
+    /*
+    ** Setup the threadscope string tables before the standard library is
+    ** initalised or engines are created.
+    */
+    (*MR_address_of_init_modules_threadscope_string_table)();
   #endif
 
     MR_all_engine_bases = MR_GC_malloc(sizeof(MercuryEngine*)*MR_num_threads);
Index: runtime/mercury_wrapper.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.h,v
retrieving revision 1.84
diff -u -p -b -r1.84 mercury_wrapper.h
--- runtime/mercury_wrapper.h	7 Oct 2010 23:38:44 -0000	1.84
+++ runtime/mercury_wrapper.h	11 Mar 2011 07:11:59 -0000
@@ -110,6 +110,9 @@ extern	void		(*MR_address_of_init_module
 extern	void		(*MR_address_of_write_out_proc_statics)(FILE *deep_fp,
 				FILE *procrep_fp);
 #endif
+#ifdef  MR_THREADSCOPE
+extern  void        (*MR_address_of_init_modules_threadscope_string_table)(void);
+#endif
 extern	void		(*MR_address_of_init_modules_required)(void);
 extern	void		(*MR_address_of_final_modules_required)(void);
 
Index: runtime/mercury_wsdeque.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wsdeque.c,v
retrieving revision 1.2
diff -u -p -b -r1.2 mercury_wsdeque.c
--- runtime/mercury_wsdeque.c	20 Mar 2010 10:15:51 -0000	1.2
+++ runtime/mercury_wsdeque.c	11 Mar 2011 06:23:40 -0000
@@ -17,12 +17,7 @@
 ** access one end of the queue (the bottom) while other threads can only pop
 ** elements from the other end (the top).
 **
-** We haven't implemented work stealing yet so the data structure is currently
-** only used as a stack (for context-local sparks) and as a queue (for the
-** global spark queue).
-**
-** NOTE: we need to insert memory barriers in the right places once we do start
-** work stealing.
+** XXX: we need to insert memory barriers in the right places.
 */
 
 #include "mercury_imp.h"
Index: runtime/mercury_wsdeque.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wsdeque.h,v
retrieving revision 1.4
diff -u -p -b -r1.4 mercury_wsdeque.h
--- runtime/mercury_wsdeque.h	7 Oct 2010 23:38:45 -0000	1.4
+++ runtime/mercury_wsdeque.h	11 Mar 2011 07:11:59 -0000
@@ -56,9 +56,8 @@ extern  MR_bool MR_wsdeque_is_empty(cons
 ** Push a spark on the bottom of the deque.  Must only be called by the owner
 ** of the deque.  The deque may grow as necessary.
 */
-MR_INLINE
-void            MR_wsdeque_push_bottom(MR_SparkDeque *dq,
-                    const MR_Spark *spark);
+MR_INLINE void
+MR_wsdeque_push_bottom(MR_SparkDeque *dq, const MR_Spark *spark);
 
 /*
 ** Same as MR_wsdeque_push_bottom but assume that there is enough space
@@ -68,11 +67,12 @@ extern  void    MR_wsdeque_putback_botto
                     const MR_Spark *spark);
 
 /*
-** Pop a spark off the bottom of the deque.  Must only be called by
-** the owner of the deque.  Returns true if successful.
+** Pop a spark off the bottom of the deque.  Must only be called by the owner
+** of the deque.  The pointer returned here can be used until the next call to
+** a MR_wsdeque function, at which point it's memory may have been overwritten.
 */
-MR_INLINE MR_bool
-MR_wsdeque_pop_bottom(MR_SparkDeque *dq, MR_Code **ret_spark_resume);
+MR_INLINE volatile MR_Spark*
+MR_wsdeque_pop_bottom(MR_SparkDeque *dq);
 
 /*
 ** Attempt to steal a spark from the top of the deque.
@@ -82,7 +82,8 @@ MR_wsdeque_pop_bottom(MR_SparkDeque *dq,
 **   0 if the deque is empty or
 **  -1 if the steal was aborted due to a concurrent steal or pop_bottom.
 */
-extern  int     MR_wsdeque_steal_top(MR_SparkDeque *dq, MR_Spark *ret_spark);
+extern  int
+MR_wsdeque_steal_top(MR_SparkDeque *dq, MR_Spark *ret_spark);
 
 /*
 ** Take a spark from the top of the deque, assuming there are no concurrent
@@ -129,14 +130,15 @@ MR_wsdeque_push_bottom(MR_SparkDeque *dq
     dq->MR_sd_bottom = bot + 1;
 }
 
-MR_INLINE MR_bool
-MR_wsdeque_pop_bottom(MR_SparkDeque *dq, MR_Code **ret_spark_resume)
+MR_INLINE volatile MR_Spark*
+MR_wsdeque_pop_bottom(MR_SparkDeque *dq)
 {
     MR_Integer              bot;
     MR_Integer              top;
     MR_Integer              size;
     volatile MR_SparkArray  *arr;
     MR_bool                 success;
+    volatile MR_Spark       *spark;
 
     bot = dq->MR_sd_bottom;
     arr = dq->MR_sd_active_array;
@@ -148,18 +150,18 @@ MR_wsdeque_pop_bottom(MR_SparkDeque *dq,
 
     if (size < 0) {
         dq->MR_sd_bottom = top;
-        return MR_FALSE;
+        return NULL;
     }
 
-    (*ret_spark_resume) = MR_sa_element(arr, bot).MR_spark_resume;
+    spark = &MR_sa_element(arr, bot);
     if (size > 0) {
-        return MR_TRUE;
+        return spark;
     }
 
     /* size = 0 */
     success = MR_compare_and_swap_int(&dq->MR_sd_top, top, top + 1);
     dq->MR_sd_bottom = top + 1;
-    return success;
+    return success ? spark : NULL;
 }
 
 MR_INLINE int
Index: util/mkinit.c
===================================================================
RCS file: /home/mercury1/repository/mercury/util/mkinit.c,v
retrieving revision 1.122
diff -u -p -b -r1.122 mkinit.c
--- util/mkinit.c	13 Feb 2010 07:29:10 -0000	1.122
+++ util/mkinit.c	11 Mar 2011 07:11:59 -0000
@@ -32,7 +32,6 @@
 **          in particular the predicates make_init_obj/7 and
 **          make_standalone_interface/3.
 **      - util/mkinit_erl.c
-**
 */
 
 /*---------------------------------------------------------------------------*/
@@ -72,6 +71,9 @@ static const char if_need_term_size[] =
 static const char if_need_deep_prof[] =
     "#if defined(MR_DEEP_PROFILING)\n";
 
+static const char if_need_threadscope[] =
+    "#if defined(MR_THREADSCOPE)\n";
+
 typedef enum
 {
     TASK_OUTPUT_INIT_PROG = 0,
@@ -86,8 +88,9 @@ typedef enum
     PURPOSE_DEBUGGER = 2,
     PURPOSE_COMPLEXITY = 3,
     PURPOSE_PROC_STATIC = 4,
-    PURPOSE_REQ_INIT = 5,
-    PURPOSE_REQ_FINAL = 6
+    PURPOSE_THREADSCOPE_STRING_TABLE = 5,
+    PURPOSE_REQ_INIT = 6,
+    PURPOSE_REQ_FINAL = 7
 } Purpose;
 
 const char  *main_func_name[] =
@@ -97,6 +100,7 @@ const char  *main_func_name[] =
     "init_modules_debugger",
     "init_modules_complexity_procs",
     "write_out_proc_statics",
+    "init_modules_threadscope_string_table",
     "init_modules_required",
     "final_modules_required"
 };
@@ -108,6 +112,7 @@ const char  *module_suffix[] =
     "init_debugger",
     "init_complexity_procs",
     "write_out_proc_statics",
+    "init_threadscope_string_table",
     "",
     "",
 };
@@ -120,6 +125,7 @@ const char  *init_suffix[] =
     "_complexity",
     "write_out_proc_statics",
     "",
+    "",
     ""
 };
 
@@ -130,6 +136,7 @@ const char  *bunch_function_guard[] =
     if_need_to_init,
     if_need_term_size,
     if_need_deep_prof,
+    if_need_threadscope,
     NULL,
     NULL,
 };
@@ -141,6 +148,7 @@ const char  *main_func_guard[] =
     NULL,
     if_need_term_size,
     if_need_deep_prof,
+    if_need_threadscope,
     NULL,
     NULL,
 };
@@ -153,6 +161,7 @@ const char  *main_func_body_guard[] =
     NULL,
     NULL,
     NULL,
+    NULL,
     NULL
 };
 
@@ -164,6 +173,7 @@ const char  *main_func_arg_defn[] =
     "void",
     "FILE *deep_fp, FILE *procrep_fp",
     "void",
+    "void",
     "void"
 };
 
@@ -175,6 +185,7 @@ const char  *main_func_arg_decl[] =
     "void",
     "FILE *, FILE *",
     "void",
+    "void",
     "void"
 };
 
@@ -186,6 +197,7 @@ const char  *main_func_arg[] =
     "",
     "deep_fp, procrep_fp",
     "",
+    "",
     ""
 };
 
@@ -398,6 +410,10 @@ static const char mercury_funcs2[] =
     "   MR_address_of_write_out_proc_statics =\n"
     "       write_out_proc_statics;\n"
     "#endif\n"
+    "#ifdef MR_THREADSCOPE\n"
+    "   MR_address_of_init_modules_threadscope_string_table =\n"
+    "       init_modules_threadscope_string_table;\n"
+    "#endif\n"
     "   MR_address_of_init_modules_required = init_modules_required;\n"
     "   MR_address_of_final_modules_required = final_modules_required;\n"
     "#ifdef MR_RECORD_TERM_SIZES\n"
@@ -692,6 +708,10 @@ output_init_program(void)
         std_and_special_modules, std_module_next + special_module_next);
     output_main_init_function(PURPOSE_PROC_STATIC, num_bunches);
 
+    num_bunches = output_sub_init_functions(PURPOSE_THREADSCOPE_STRING_TABLE,
+        std_modules, std_module_next);
+    output_main_init_function(PURPOSE_THREADSCOPE_STRING_TABLE, num_bunches);
+
     num_bunches = output_sub_init_functions(PURPOSE_REQ_INIT,
         req_init_modules, req_init_module_next);
     output_main_init_function(PURPOSE_REQ_INIT, num_bunches);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20110311/3725e3da/attachment.sig>


More information about the reviews mailing list