[m-rev.] for post-commit review: Fix compatibility issues for the low-level C parallel grades.
Julien Fischer
juliensf at csse.unimelb.edu.au
Sun Mar 21 00:50:50 AEDT 2010
Hi Paul,
The diff below does not correspond to the log message -- it appears to be
the diff for the change you made that documents the .par grades.
Julien.
On Sat, 20 Mar 2010, Paul Bone wrote:
>
> For post commit review by anyone. I'm committing this to the main branch now.
> I'll let our release manager (Julien) review it or wait for it to be reviewed
> before pushing it onto the 10.04 branch.
>
> Thanks.
>
>
> Branches: main, 10.04
>
> Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
> parallel and threadscope grades.
>
> We had been using types with the wrong signedness well calling atomic operations.
> GCC 4.x also picked up an error where #elif was used instead of #else.
>
> While testing these changes on a 32bit system more bugs where found on the i386
> architecture and on AMD brand processors.
>
> runtime/mercury_atomic_ops.h:
> runtime/mercury_atomic_ops.c:
> Add unsigned variants of the following atomic operations:
> increment,
> add,
> add_and_fetch,
> dec_and_is_zero,
>
> Add a signed variant for compare and swap.
>
> Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
> the end of the name.
>
> Use volatile storage in the MR_Stats structure.
>
> A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
> must use 64bit values. Therefore 64bit values in the MR_Stats structure
> are now protected by a lock on 32bit machines.
>
> runtime/mercury_atomic_ops.h:
> Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().
>
> runtime/mercury_atomic_ops.c:
> AMD CPUs do not conform to Intel's specification for being able to
> extract the CPU clock speed from the brand string. When we cannot
> determine the CPU's clock speed then we write out threadscope
> timestamps in raw clock cycles rather than nanoseconds.
>
> On i386 machines the ebx register is used to implement PIC code,
> however the CPUID instruction uses it to output information. Save
> this register on C's stack while we issue CPUID and retrieve the
> result in ebx.
>
> We now pass native machine sized values to the inline assembler code
> that implements RDTSC and RDTSCP.
>
> Fix commenting style in some places.
>
> runtime/mercury_atomic_ops.c:
> Fix some incorrect C preprocessor code for conditional compilation.
>
> runtime/mercury_grade.h:
> Increment binary compatibility number. This should have been done in a
> prior change when the MR_runnext macro changed which broke binary
> compatibility in the parallel low-level C grades.
>
> runtime/mercury_context.h:
> In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
> remaining before the conjunction is complete.
>
> runtime/mercury_threadscope.c:
> Record raw cpu clock ticks rather than milliseconds when we don't
> know the processor's clock speed.
>
> runtime/mercury_context.c:
> runtime/mercury_wsdeque.h:
> runtime/mercury_wsdeque.c:
> Conform to changes in mercury_atomic_ops.h
>
> Index: compiler/options.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/options.m,v
> retrieving revision 1.668
> diff -u -p -b -r1.668 options.m
> --- compiler/options.m 11 Feb 2010 04:36:09 -0000 1.668
> +++ compiler/options.m 23 Feb 2010 03:10:27 -0000
> @@ -4172,7 +4172,12 @@ options_help_compilation_model -->
> "\tEnable experimental complexity analysis for the predicates",
> "\tlisted in the given file.",
> "\tThis option is supported for the C back-end, with",
> - "\t--no-highlevel-code."
> + "\t--no-highlevel-code.",
> +
> + "--threadscope\t\t(grade modifier: `.threadscope')",
> + "\tEnable support for profiling parallel execution.",
> + "\tThis option is supported in the low-level C back-end parallel",
> + "\tgrades on x86 and x86_64 processors."
> ]),
>
> io.write_string(" Miscellaneous optional features\n"),
> @@ -4206,6 +4211,9 @@ options_help_compilation_model -->
> "\tAs above, but use a dynamically sized trail that is composed",
> "\tof small segments. This can help to avoid trail exhaustion",
> "\tat the cost of increased execution time.",
> + "--parallel\t\t(grade modifier: `.par')",
> + "\tEnable parallel execution support.",
> + "\tThis option is only supported for the C back-ends.",
> "--maybe-thread-safe {yes, no}",
> "\tSpecify how to treat the `maybe_thread_safe' foreign code",
> "\tattribute. `yes' means that a foreign procedure with the",
> Index: doc/reference_manual.texi
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/doc/reference_manual.texi,v
> retrieving revision 1.438
> diff -u -p -b -r1.438 reference_manual.texi
> --- doc/reference_manual.texi 14 Jan 2010 02:27:58 -0000 1.438
> +++ doc/reference_manual.texi 23 Feb 2010 04:42:22 -0000
> @@ -672,6 +672,17 @@ This is an abbreviation for @samp{not (s
> A conjunction.
> @var{Goal1} and @var{Goal2} must be valid goals.
>
> + at item @code{@var{Goal1} & @var{Goal2}}
> +A parallel conjunction.
> +This has the same declarative semantics as the normal conjunction.
> +Operationally, implementations may execute @var{Goal1} & @var{Goal2}
> +in parallel with one-another.
> +Implementations may also start the parallel execution of these goals
> +in any order.
> +It is a compilation error for @var{Goal1} or @var{Goal2} to have a
> +determinism other than @samp{det} or @samp{cc_multi}.
> + at xref{Determinism categories}.
> +
> @item @code{@var{Goal1} ; @var{Goal2}}
> where @var{Goal1} is not of the form @samp{Goal1a -> Goal1b}:
> a disjunction.
> Index: doc/user_guide.texi
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/doc/user_guide.texi,v
> retrieving revision 1.603
> diff -u -p -b -r1.603 user_guide.texi
> --- doc/user_guide.texi 4 Feb 2010 02:20:46 -0000 1.603
> +++ doc/user_guide.texi 23 Feb 2010 04:40:20 -0000
> @@ -5588,6 +5588,8 @@ then a progress message will be displaye
> program with mprof.
> * Using mdprof:: How to analyze the time and/or memory
> performance of a program with mdprof.
> +* Using threadscope:: How to analyse the parallel
> + execution of a program with threadscope.
> * Profiling and shared libraries:: Profiling dynamically linked executables.
> @end menu
>
> @@ -5597,6 +5599,7 @@ then a progress message will be displaye
> @cindex Measuring performance
> @cindex Optimization
> @cindex Efficiency
> + at cindex Parallel performance
>
> To obtain the best trade-off between productivity and efficiency,
> programmers should not spend too much time optimizing their code
> @@ -5616,19 +5619,34 @@ that associates a lot more context with
> but not both at the same time;
> @samp{mdprof} can profile both time and space at the same time.
>
> +The parallel execution of Mercury programms can be analyzed with a third
> +profiler called @samp{threadscope}.
> + at samp{threadscope} allows programmers to visualise CPU utilization,
> +as well as how garbage collection, task granularity and the management of
> +parallel tasks.
> +The @samp{threadscope} tool is not included with the Melbourne Mercury
> +Compiler,
> +See @url{http://research.microsoft.com/en-us/projects/threadscope/,
> +Threadscope: Peformance Tuning Parallel Haskell Programs}.
> +
> @node Building profiled applications
> @section Building profiled applications
> @cindex Building profiled applications
> @pindex mprof
> @pindex mdprof
> + at pindex threadscope
> @cindex Time profiling
> @cindex Heap profiling
> @cindex Memory profiling
> @cindex Allocation profiling
> @cindex Deep profiling
> + at cindex Threadscope profiling
> + at cindex Parallel runtime profiling
> + at findex --parallel
> + at findex --threadscope
>
> To enable profiling, your program must be built with profiling enabled.
> -The two different profilers require different support,
> +The three different profilers require different support,
> and thus you must choose which one to enable when you build your program.
>
> @itemize @bullet
> @@ -5644,6 +5662,10 @@ pass the @samp{--memory-profiling} optio
> To build your program with deep profiling enabled (for @samp{mdprof}),
> pass the @samp{--deep-profiling} option to @samp{mmc},
> @samp{mgnuc} and @samp{ml}.
> + at item
> +To build your program with threadscope profiling enabled (for @samp{threadscope}).
> +pass the @samp{--parallel --threadscope} options to @samp{mmc},
> + at samp{mgnuc} and @samp{ml}.
> @end itemize
>
> If you are using Mmake,
> @@ -5653,7 +5675,7 @@ e.g.@: by adding the line @samp{GRADEFLA
> (For more information about the different grades,
> see @ref{Compilation model options}.)
>
> -Enabling profiling has several effects.
> +Enabling @samp{mprof} or @samp{mdprof} profiling has several effects.
> First, it causes the compiler to generate slightly modified code,
> which counts the number of times each predicate or function is called,
> and for every call, records the caller and callee.
> @@ -5667,6 +5689,13 @@ Third, if you enable graph profiling,
> the compiler will generate for each source file
> the static call graph for that file in @samp{@var{module}.prof}.
>
> +Enabling @samp{threadscope} profiling causes the compiler to build the project
> +against a different runtime system.
> +This runtime system logs events relevant to parallel execution.
> + at samp{threadscope} support uses special x86 and x86_64 instructions to access the
> +processor's time stamp counter.
> +Therefore it is not supported on other architectures.
> +
> @node Creating profiles
> @section Creating profiles
> @cindex Profiling
> @@ -5701,6 +5730,10 @@ will use two of those files (@file{Prof.
> and a two others: @file{Prof.MemoryWords} and @file{Prof.MemoryCells}.
> Executables compiled with @samp{--deep-profiling}
> save profiling data in a single file, @file{Deep.data}.
> +Executables compiled with @samp{--parallel --threadscope}
> +save profiling data in a single file with the same name as the program being
> +profiled and the extension @samp{.eventlog}, for example
> + at file{my_program.eventlog}.
>
> It is also possible to combine @samp{mprof} profiling results
> from multiple runs of your program.
> @@ -5715,7 +5748,7 @@ when running your program with @samp{mpr
> If this happens, just run it again --- the problem occurs only very rarely.
> The same vulnerability does not occur with @samp{mdprof} profiling.
>
> -With both profilers,
> +With the @samp{mprof} and @samp{mdprof} profilers,
> you can control whether time profiling measures
> real (elapsed) time, user time plus system time, or user time only,
> by including the options @samp{-Tr}, @samp{-Tp}, or @samp{-Tv} respectively
> @@ -6092,6 +6125,36 @@ all map
> internal set
> @end example
>
> + at node Using threadscope
> + at section Using threadscope
> +
> + at pindex threadscope
> + at cindex Threadscope profiling
> + at cindex Parallel execution profiling
> +
> +The @samp{threadscope} tools are not distributed with Mercury.
> +The tools are written in Haskell and work with GHC 6.10.
> + at samp{threadscope} has a number of dependencies in the form of Haskell
> +libraries, many of these will be provided with GHC or packaged for/by
> +your operating system.
> +These are: @samp{array}, @samp{binary}, @samp{cairo},
> + at samp{containers}, @samp{filepath}, @samp{ghc-events}, @samp{glade},
> + at samp{gtk}, @samp{mtl}.
> +The @samp{cairo}, @samp{gtk} and @samp{glade} modules are provided by
> +the @samp{gtk2hs} package.
> + at samp{ghc-events} is not packaged by most operating systems at this stage, It
> +can be retrieved from
> + at url{http://hackage.haskell.org/package/ghc-events, hackage}.
> +threadscope itself can also be retrieved from
> + at url{http://hackage.haskell.org/package/threadscope, hackage}.
> +Information about how to install Haskell packages can be found
> + at url{http://haskell.org/haskellwiki/Cabal/How_to_install_a_Cabal_package, here}
> +
> +Once @samp{threadscope} is installed it can be used to view @file{*.eventlog}
> +profiles either bu using the menu in the @samp{threadscope}'s
> +user interface.
> +Or by executing @samp{threadscope} and giving the filename on the command line.
> +
> @node Profiling and shared libraries
> @section Profiling and shared libraries
> @pindex mprof
> @@ -7314,7 +7377,7 @@ The set of aspects and their alternative
> @cindex .decldebug (grade modifier)
> @c @cindex .ssdebug (grade modifier)
> @cindex .par (grade modifier)
> - at c @cindex .threadscope (grade modifier)
> + at cindex .threadscope (grade modifier)
> @cindex prof (grade modifier)
> @cindex memprof (grade modifier)
> @cindex profdeep (grade modifier)
> @@ -7327,7 +7390,7 @@ The set of aspects and their alternative
> @cindex decldebug (grade modifier)
> @c @cindex ssdebug (grade modifier)
> @cindex par (grade modifier)
> - at c @cindex threadscope (grade modifier)
> + at cindex threadscope (grade modifier)
> @table @asis
> @item What target language to use, what data representation to use, and (for C) what combination of GNU C extensions to use:
> @samp{none}, @samp{reg}, @samp{jump}, @samp{asm_jump},
> @@ -7360,10 +7423,10 @@ small segments: @samp{stseg} (the defaul
> @item Whether to use a thread-safe version of the runtime environment:
> @samp{par} (the default is a non-thread-safe environment).
>
> - at c @item Whether to include support for profile the execution of parallel
> - at c programs:
> - at c @samp{threadscope} (the default is no support for profiling parallel
> - at c execution).
> + at item Whether to include support for profile the execution of parallel
> +programs:
> + at samp{threadscope} (the default is no support for profiling parallel
> +execution).
> @c See also the @samp{--profile-parallel-execution} runtime option.
>
> @end table
> @@ -7497,6 +7560,12 @@ and grade modifier; they are followed by
> @c @item @samp{.ssdebug}
> @c @code{--ss-debug}.
>
> + at item @samp{.par}
> + at code{--parallel}.
> +
> + at item @samp{.par.threadscope}
> + at code{--parallel --threadscope}.
> +
> @end table
>
> @end table
> @@ -7858,6 +7927,30 @@ or for backtrackable destructive update.
> This option is only supported by the C back-ends.
>
> @sp 1
> + at item @code{--parallel}
> + at findex --parallel
> + at cindex Parallel evaluation
> +Enable support for parallel evaluation.
> +This enables runtime and code generation options necessary for taking
> +advantage of a shared memory parallel computer.
> +To parallel evaluation can be achieved by using either the parallel conjunction
> +operator or the concurrency support provided in the @samp{thread} module of the
> +standard library.
> + at xref{Goals, parallel conjunction, Goals, mercury_ref, The Mercury
> +Language Reference Manual}, and
> + at xref{thread, the thread module, thread, mercury_library, The Mercury
> +Library Reference Manual}.
> +This option is only supported by the C back-ends.
> +
> + at sp 1
> + at item @code{--threadscope}
> + at findex --threadscope
> + at cindex Threadscope profiling
> +Enable support for threadscope profiling.
> +This enables runtime support for profiling the parallel evaluation of
> +programs, @xref{Using threadscope}.
> +
> + at sp 1
> @item @code{--maybe-thread-safe @{yes, no@}}
> @findex --maybe-thread-safe
> Specify how to treat the @samp{maybe_thread_safe} foreign code
>
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list