[m-rev.] for post-commit review: Fix compatibility issues for the low-level C parallel grades.

Paul Bone pbone at csse.unimelb.edu.au
Sat Mar 20 21:15:14 AEDT 2010

For post commit review by anyone.  I'm committing this to the main branch now.
I'll let our release manager (Julien) review it or wait for it to be reviewed
before pushing it onto the 10.04 branch.


Branches: main, 10.04

Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
parallel and threadscope grades.

We had been using types with the wrong signedness well calling atomic operations.
GCC 4.x also picked up an error where #elif was used instead of #else.

While testing these changes on a 32bit system more bugs where found on the i386
architecture and on AMD brand processors.

    Add unsigned variants of the following atomic operations:

    Add a signed variant for compare and swap.

    Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
    the end of the name.
    Use volatile storage in the MR_Stats structure.
    A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
    must use 64bit values.  Therefore 64bit values in the MR_Stats structure
    are now protected by a lock on 32bit machines.

    Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().

    AMD CPUs do not conform to Intel's specification for being able to
    extract the CPU clock speed from the brand string.  When we cannot
    determine the CPU's clock speed then we write out threadscope
    timestamps in raw clock cycles rather than nanoseconds.

    On i386 machines the ebx register is used to implement PIC code,
    however the CPUID instruction uses it to output information.  Save
    this register on C's stack while we issue CPUID and retrieve the
    result in ebx.

    We now pass native machine sized values to the inline assembler code
    that implements RDTSC and RDTSCP.

    Fix commenting style in some places.

    Fix some incorrect C preprocessor code for conditional compilation.

    Increment binary compatibility number.  This should have been done in a
    prior change when the MR_runnext macro changed which broke binary
    compatibility in the parallel low-level C grades.

    In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
    remaining before the conjunction is complete.

    Record raw cpu clock ticks rather than milliseconds when we don't
    know the processor's clock speed.

    Conform to changes in mercury_atomic_ops.h

Index: compiler/options.m
RCS file: /home/mercury1/repository/mercury/compiler/options.m,v
retrieving revision 1.668
diff -u -p -b -r1.668 options.m
--- compiler/options.m	11 Feb 2010 04:36:09 -0000	1.668
+++ compiler/options.m	23 Feb 2010 03:10:27 -0000
@@ -4172,7 +4172,12 @@ options_help_compilation_model -->
         "\tEnable experimental complexity analysis for the predicates",
         "\tlisted in the given file.",
         "\tThis option is supported for the C back-end, with",
-        "\t--no-highlevel-code."
+        "\t--no-highlevel-code.",
+        "--threadscope\t\t(grade modifier: `.threadscope')",
+        "\tEnable support for profiling parallel execution.",
+        "\tThis option is supported in the low-level C back-end parallel",
+        "\tgrades on x86 and x86_64 processors."
     io.write_string("      Miscellaneous optional features\n"),
@@ -4206,6 +4211,9 @@ options_help_compilation_model -->
         "\tAs above, but use a dynamically sized trail that is composed",
         "\tof small segments.  This can help to avoid trail exhaustion",
         "\tat the cost of increased execution time.",
+        "--parallel\t\t(grade modifier: `.par')",
+        "\tEnable parallel execution support.",
+        "\tThis option is only supported for the C back-ends.",
         "--maybe-thread-safe {yes, no}",
         "\tSpecify how to treat the `maybe_thread_safe' foreign code",
         "\tattribute.  `yes' means that a foreign procedure with the",
Index: doc/reference_manual.texi
RCS file: /home/mercury1/repository/mercury/doc/reference_manual.texi,v
retrieving revision 1.438
diff -u -p -b -r1.438 reference_manual.texi
--- doc/reference_manual.texi	14 Jan 2010 02:27:58 -0000	1.438
+++ doc/reference_manual.texi	23 Feb 2010 04:42:22 -0000
@@ -672,6 +672,17 @@ This is an abbreviation for @samp{not (s
 A conjunction.
 @var{Goal1} and @var{Goal2} must be valid goals.
+ at item @code{@var{Goal1} & @var{Goal2}}
+A parallel conjunction.
+This has the same declarative semantics as the normal conjunction.
+Operationally, implementations may execute @var{Goal1} & @var{Goal2}
+in parallel with one-another.
+Implementations may also start the parallel execution of these goals
+in any order.
+It is a compilation error for @var{Goal1} or @var{Goal2} to have a
+determinism other than @samp{det} or @samp{cc_multi}.  
+ at xref{Determinism categories}.
 @item @code{@var{Goal1} ; @var{Goal2}}
 where @var{Goal1} is not of the form @samp{Goal1a -> Goal1b}:
 a disjunction.
Index: doc/user_guide.texi
RCS file: /home/mercury1/repository/mercury/doc/user_guide.texi,v
retrieving revision 1.603
diff -u -p -b -r1.603 user_guide.texi
--- doc/user_guide.texi	4 Feb 2010 02:20:46 -0000	1.603
+++ doc/user_guide.texi	23 Feb 2010 04:40:20 -0000
@@ -5588,6 +5588,8 @@ then a progress message will be displaye
                                     program with mprof.
 * Using mdprof::                    How to analyze the time and/or memory
                                     performance of a program with mdprof.
+* Using threadscope::               How to analyse the parallel
+                                    execution of a program with threadscope.
 * Profiling and shared libraries::  Profiling dynamically linked executables.
 @end menu
@@ -5597,6 +5599,7 @@ then a progress message will be displaye
 @cindex Measuring performance
 @cindex Optimization
 @cindex Efficiency
+ at cindex Parallel performance
 To obtain the best trade-off between productivity and efficiency,
 programmers should not spend too much time optimizing their code
@@ -5616,19 +5619,34 @@ that associates a lot more context with 
 but not both at the same time;
 @samp{mdprof} can profile both time and space at the same time.
+The parallel execution of Mercury programms can be analyzed with a third
+profiler called @samp{threadscope}.
+ at samp{threadscope} allows programmers to visualise CPU utilization,
+as well as how garbage collection, task granularity and the management of
+parallel tasks.
+The @samp{threadscope} tool is not included with the Melbourne Mercury
+See @url{http://research.microsoft.com/en-us/projects/threadscope/, 
+Threadscope: Peformance Tuning Parallel Haskell Programs}.
 @node Building profiled applications
 @section Building profiled applications
 @cindex Building profiled applications
 @pindex mprof
 @pindex mdprof
+ at pindex threadscope
 @cindex Time profiling
 @cindex Heap profiling
 @cindex Memory profiling
 @cindex Allocation profiling
 @cindex Deep profiling
+ at cindex Threadscope profiling
+ at cindex Parallel runtime profiling
+ at findex --parallel
+ at findex --threadscope
 To enable profiling, your program must be built with profiling enabled.
-The two different profilers require different support,
+The three different profilers require different support,
 and thus you must choose which one to enable when you build your program.
 @itemize @bullet
@@ -5644,6 +5662,10 @@ pass the @samp{--memory-profiling} optio
 To build your program with deep profiling enabled (for @samp{mdprof}),
 pass the @samp{--deep-profiling} option to @samp{mmc},
 @samp{mgnuc} and @samp{ml}.
+ at item
+To build your program with threadscope profiling enabled (for @samp{threadscope}).
+pass the @samp{--parallel --threadscope} options to @samp{mmc},
+ at samp{mgnuc} and @samp{ml}.
 @end itemize
 If you are using Mmake,
@@ -5653,7 +5675,7 @@ e.g.@: by adding the line @samp{GRADEFLA
 (For more information about the different grades,
 see @ref{Compilation model options}.)
-Enabling profiling has several effects.
+Enabling @samp{mprof} or @samp{mdprof} profiling has several effects.
 First, it causes the compiler to generate slightly modified code,
 which counts the number of times each predicate or function is called,
 and for every call, records the caller and callee.
@@ -5667,6 +5689,13 @@ Third, if you enable graph profiling,
 the compiler will generate for each source file
 the static call graph for that file in @samp{@var{module}.prof}.
+Enabling @samp{threadscope} profiling causes the compiler to build the project
+against a different runtime system.
+This runtime system logs events relevant to parallel execution.
+ at samp{threadscope} support uses special x86 and x86_64 instructions to access the
+processor's time stamp counter.
+Therefore it is not supported on other architectures.
 @node Creating profiles
 @section Creating profiles
 @cindex Profiling
@@ -5701,6 +5730,10 @@ will use two of those files (@file{Prof.
 and a two others: @file{Prof.MemoryWords} and @file{Prof.MemoryCells}.
 Executables compiled with @samp{--deep-profiling}
 save profiling data in a single file, @file{Deep.data}.
+Executables compiled with @samp{--parallel --threadscope}
+save profiling data in a single file with the same name as the program being
+profiled and the extension @samp{.eventlog}, for example
+ at file{my_program.eventlog}.
 It is also possible to combine @samp{mprof} profiling results
 from multiple runs of your program.
@@ -5715,7 +5748,7 @@ when running your program with @samp{mpr
 If this happens, just run it again --- the problem occurs only very rarely.
 The same vulnerability does not occur with @samp{mdprof} profiling.
-With both profilers,
+With the @samp{mprof} and @samp{mdprof} profilers,
 you can control whether time profiling measures
 real (elapsed) time, user time plus system time, or user time only,
 by including the options @samp{-Tr}, @samp{-Tp}, or @samp{-Tv} respectively
@@ -6092,6 +6125,36 @@ all		map
 internal	set
 @end example
+ at node Using threadscope
+ at section Using threadscope
+ at pindex threadscope
+ at cindex Threadscope profiling
+ at cindex Parallel execution profiling
+The @samp{threadscope} tools are not distributed with Mercury.
+The tools are written in Haskell and work with GHC 6.10.
+ at samp{threadscope} has a number of dependencies in the form of Haskell
+libraries, many of these will be provided with GHC or packaged for/by
+your operating system.
+These are: @samp{array}, @samp{binary}, @samp{cairo},
+ at samp{containers}, @samp{filepath}, @samp{ghc-events}, @samp{glade},
+ at samp{gtk}, @samp{mtl}.
+The @samp{cairo}, @samp{gtk} and @samp{glade} modules are provided by
+the @samp{gtk2hs} package.
+ at samp{ghc-events} is not packaged by most operating systems at this stage, It
+can be retrieved from
+ at url{http://hackage.haskell.org/package/ghc-events, hackage}.
+threadscope itself can also be retrieved from
+ at url{http://hackage.haskell.org/package/threadscope, hackage}.
+Information about how to install Haskell packages can be found
+ at url{http://haskell.org/haskellwiki/Cabal/How_to_install_a_Cabal_package, here}
+Once @samp{threadscope} is installed it can be used to view @file{*.eventlog}
+profiles either bu using the menu in the @samp{threadscope}'s
+user interface.
+Or by executing @samp{threadscope} and giving the filename on the command line.
 @node Profiling and shared libraries
 @section Profiling and shared libraries
 @pindex mprof
@@ -7314,7 +7377,7 @@ The set of aspects and their alternative
 @cindex .decldebug (grade modifier)
 @c @cindex .ssdebug (grade modifier)
 @cindex .par (grade modifier)
- at c @cindex .threadscope (grade modifier)
+ at cindex .threadscope (grade modifier)
 @cindex prof (grade modifier)
 @cindex memprof (grade modifier)
 @cindex profdeep (grade modifier)
@@ -7327,7 +7390,7 @@ The set of aspects and their alternative
 @cindex decldebug (grade modifier)
 @c @cindex ssdebug (grade modifier)
 @cindex par (grade modifier)
- at c @cindex threadscope (grade modifier)
+ at cindex threadscope (grade modifier)
 @table @asis
 @item What target language to use, what data representation to use, and (for C) what combination of GNU C extensions to use:
 @samp{none}, @samp{reg}, @samp{jump}, @samp{asm_jump},
@@ -7360,10 +7423,10 @@ small segments: @samp{stseg} (the defaul
 @item Whether to use a thread-safe version of the runtime environment:
 @samp{par} (the default is a non-thread-safe environment).
- at c @item Whether to include support for profile the execution of parallel
- at c programs:
- at c @samp{threadscope} (the default is no support for profiling parallel
- at c execution).
+ at item Whether to include support for profile the execution of parallel
+ at samp{threadscope} (the default is no support for profiling parallel
 @c See also the @samp{--profile-parallel-execution} runtime option.
 @end table
@@ -7497,6 +7560,12 @@ and grade modifier; they are followed by
 @c @item @samp{.ssdebug}
 @c @code{--ss-debug}.
+ at item @samp{.par}
+ at code{--parallel}.
+ at item @samp{.par.threadscope}
+ at code{--parallel --threadscope}.
 @end table
 @end table
@@ -7858,6 +7927,30 @@ or for backtrackable destructive update.
 This option is only supported by the C back-ends.
 @sp 1
+ at item @code{--parallel}
+ at findex --parallel
+ at cindex Parallel evaluation
+Enable support for parallel evaluation.
+This enables runtime and code generation options necessary for taking
+advantage of a shared memory parallel computer.
+To parallel evaluation can be achieved by using either the parallel conjunction
+operator or the concurrency support provided in the @samp{thread} module of the
+standard library.
+ at xref{Goals, parallel conjunction, Goals, mercury_ref, The Mercury
+Language Reference Manual}, and
+ at xref{thread, the thread module, thread, mercury_library, The Mercury
+Library Reference Manual}.
+This option is only supported by the C back-ends.
+ at sp 1
+ at item @code{--threadscope}
+ at findex --threadscope
+ at cindex Threadscope profiling
+Enable support for threadscope profiling.
+This enables runtime support for profiling the parallel evaluation of
+programs, @xref{Using threadscope}.
+ at sp 1
 @item @code{--maybe-thread-safe @{yes, no@}}
 @findex --maybe-thread-safe
 Specify how to treat the @samp{maybe_thread_safe} foreign code
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20100320/00dccd5c/attachment.sig>

More information about the reviews mailing list