[m-rev.] For review: Implement support for threadscope profiling.

Paul Bone pbone at csse.unimelb.edu.au
Thu Dec 3 16:29:59 AEDT 2009


This has been commited with the revised changelog:

Support for threadscope profiling of the parallel runtime.

This change adds support for threadscope profiling of the parallel runtime in
low level C grades.  It can be enabled by compiling _all_ code with the
MR_PROFILE_PARALLEL_EXECUTION_SUPPORT C macro defined.  The runtime, libraries
and applications must all have this flag defined as it alters the MercuryEngine
and MR_Context structures.

See Don Jones Jr, Simon Marlow, Satnam Singh - Parallel Performance Tuning for
Haskell.

This change also includes:

    Smarter thread pinning (the primordial thread is pinned to the thread that
    it is currently running on).

    The addition of callbacks from the Boehm GC to notify the runtime of
    stop the world garbage collections.

    Implement some userspace spin loops and conditions.  These are cheaper than
    their POSIX equivalents, do not support sleeping, and are signal handler
    safe. 

boehm_gc/alloc.h:
boehm_gc/alloc.c:
    Declare and define the new callback functions.

boehm_gc/alloc.c:
    Call the start and stop collect callbacks when we start and stop a
    stop-the-world collection.    
    
    Correct how we record the time spent collecting, it now includes
    collections that stop prematurely.

boehm_gc/pthread_stop_world.c:
    Call the pause and resume thread callbacks in each thread where the GC
    arranges for that thread to be stopped during a stop-the-world collection.

runtime/mercury_threadscope.c:
runtime/mercury_threadscope.h:
    New files implementing the threadscope support.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
    Rename MR_configure_profiling_timers to MR_do_cpu_feature_detection.
    
    Add a new function MR_read_cpu_tsc() to read the TSC register from the CPU,
    this simply abstracts the static MR_rdtsc function.

runtime/mercury_atomic_ops.h:
    Modify the C inline assembler to ensure we tell the C compiler that the
    value in the register mapped to the 'old' parameter is also an output from
    the instructions.  That is, the C compiler must not depend on the value of
    'old' being the same before and after the instruction is executed.  This
    has never been a problem in practice though.
    
    Implement some cheap userspace mutual exclusion locks and condition
    variables.  These will be faster than pthread's mutexes when critical
    sections are short and threads are pinned to separate CPUs. 
    
runtime/mercury_context.c:
runtime/mercury_context.h:
    Add a new function for pinning the primordial thread.  If the OS supports
    sched_getcpu we use it to determine which CPU the primordial thread should
    use.  No other thread will be pinned to this CPU.
    
    Add a numeric id field to each context, this id is uniquely assigned and
    identifies each context for threadscope.
    
    MR_schedule_context posts the 'context runnable' threadscope event.
    
    MR_do_runnext has been modified to destroy engines differently, it ensures
    they cleanup properly so that their threadscope events are flushed properly
    and then calls pthread_exit(0)
    
    MR_do_runnext posts events for threadscope.
    
    MR_do_join_and_continue posts events for threadscope.

runtime/mercury_engine.h:
    Add new fields to the MercuryEngine structure including a buffer of
    threadscope events, a clock offset (used to synchronize the TSC clocks) and
    a unique identifier for the engine,

runtime/mercury_engine.c:
    Call MR_threadscope_setup_engine() and MR_threadscope_finalize_engine for
    newly created and about-to-be-destroyed engines.
    
    When the main context finishes on a thread that's not the primordial thread
    post a 'context is yielding' message before re-scheduling the context on
    the primordial thread.

runtime/mercury_thread.c:
    Added an XXX comment about a potential problem, it's only relevant for
    programs using thread.spawn.
    
    Added calls to the TSC synchronisation code used for threadscope profiling.
    It appears that this is not necessary on modern x86 machines, it has been
    commented out.
    
    Post a threadscope event when we create a new context.
    
    Don't call pthread_exit in MR_destroy_thread, we now do this in
    MR_do_runnext so that we can unlock the runqueue mutex after cleaning up.

runtime/mercury_wrapper.c:
    Conform to changes in mercury_atomic_ops.[ch]
    
    Post an event immediately before calling main to mark the beginning of the
    program in the threadscope profile.
    
    Post a "context finished" event at the end of the program.
    
    Wait until all engines have exited before cleaning up global data, this is
    important for finishing writing the threadscope data file. 

configure.in:
runtime/mercury_conf.h.in:
    Test for the sched_getcpu C function and utmpx.h header file, these are
    used for thread pinning.

runtime/Mmakefile:
    Include the mercury_threadscope.[hc] files in the list of runtime headers
    and sources respectively.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20091203/443c8393/attachment.sig>


More information about the reviews mailing list