[m-rev.] For review: Implement support for threadscope profiling.

Peter Wang novalazy at gmail.com
Wed Dec 2 12:19:23 AEDT 2009


On 2009-12-01, Paul Bone <pbone at csse.unimelb.edu.au> wrote:
> 
> For review by anyone.
> 
> Support for threadscope profiling of the parallel runtime.
> 
> This change adds support for threadscope profiling of the parallel runtime in
> low level C grades.  It can be enabled by compiling _all_ code with the
> MR_PROFILE_PARALLEL_EXECUTION_SUPPORT C macro defined.  The runtime, libraries
> and applications must all have this flag defined as it alters the MercuryEngine
> and MR_Context structures.

This might require a new grade component, later.

> runtime/mercury_context.c:
> runtime/mercury_context.h:
>     Add a new function for pinning the primordial thread.  If the OS supports
>     sched_getcpu we use it to determine which CPU the primordial thread should
>     use.  No other thread will be pinned to this CPU.
>     Add a numeric id field to each context, this id is uniquely assigned and
>     identifies each context for threadscope.

>     Move MR_load_context from a C macro to a C procedure.

Are you sure about this?

Unless you have tested on SPARC, I suggest you revert this part for now.
I just have a feeling it might be a problem with register windows.
Someone else might know for sure.

(Please use blank lines between changes.)

> Index: boehm_gc/include/gc.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/boehm_gc/include/gc.h,v
> retrieving revision 1.19
> diff -u -p -b -r1.19 gc.h
> --- boehm_gc/include/gc.h	18 Mar 2008 03:09:42 -0000	1.19
> +++ boehm_gc/include/gc.h	1 Dec 2009 00:25:02 -0000
> @@ -244,6 +244,27 @@ GC_API unsigned long GC_total_gc_time;
>  				/* so far by garbage collections. It is  */
>  				/* measured in milliseconds.		 */
>  
> +/*
> + * Callbacks for mercury to notify the runtime of certain events.
> + */
> +GC_API void (*GC_mercury_callback_start_collect)(void);
> +                /* Starting a collection */
> +GC_API void (*GC_mercury_callback_stop_collect)(void);
> +                /* Stopping a collection */
> +GC_API void (*GC_mercury_callback_pause_thread)(void);
> +                /*
> +                 * This thread is about to be paused.
> +                 *
> +                 * Use these with care!  They're called from a signal handler,
> +                 * they must NOT allocate memory and if they do locking they
> +                 * must use reentrant mutexes.  Also note that these do not
> +                 * work on OS X/Darwin, On Darwin the world is stopped in a

full stop

> Index: runtime/mercury_atomic_ops.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_atomic_ops.h,v
> retrieving revision 1.7
> diff -u -p -b -r1.7 mercury_atomic_ops.h
> --- runtime/mercury_atomic_ops.h	6 Nov 2009 05:40:24 -0000	1.7
> +++ runtime/mercury_atomic_ops.h	1 Dec 2009 05:29:18 -0000

> +
> +/*
> +** Roll our own cheap user-space mutual exclusion locks.  Blocking without
> +** spinning is not supported.  Storage for these locks should be volatile.
> +**
> +** I expect these to be faster than pthread mutexes when threads are pinned and
> +** critical sections are short.
> +*/
> +typedef MR_Unsigned MR_Us_Lock;
> +
> +#define MR_US_LOCK_INITIAL_VALUE (0)
> +
> +#define MR_US_TRY_LOCK(x) \
> +    MR_compare_and_swap_word(x, 0, 1)
> +
> +#define MR_US_SPIN_LOCK(x) \
> +    while (!MR_compare_and_swap_word(x, 0, 1)) { \
> +        MR_ATOMIC_PAUSE; \
> +    }

do while (0)

> +
> +#define MR_US_UNLOCK(x) \
> +    do { \
> +        MR_CPU_MFENCE; \
> +        *x = 0; \
> +    } while(0)

> +
> +/*
> +** Similar support for condition variables.  Again, make sure they are
> +** volatile.

Better not reference comments which are (slightly) far away.

> +**
> +** XXX: These are not atomic, A waiting thread will not see a change until
> +** sometime after the signaling thread has signaled the condition.  The same
> +** race can occur when clearing a condition.  Order of memory operations is not
> +** guaranteed either.
> +*/
> +typedef MR_Unsigned MR_Us_Cond;
> +
> +#define MR_US_COND_CLEAR(x) \
> +    do { \
> +        MR_CPU_MFENCE; \
> +        *x = 0; \
> +    } while(0)
> +
> +#define MR_US_COND_SET(x) \
> +    do { \
> +        MR_CPU_MFENCE; \
> +        *x = 1; \
> +        MR_CPU_MFENCE; \
> +    } while(0)
> +
> +#define MR_US_SPIN_COND(x) \
> +    do { \
> +        while (!(*x)) { \
> +            MR_ATOMIC_PAUSE; \
> +        } \
> +        MR_CPU_MFENCE; \
> +    } while (0)
> +

Align backslashes.

>  /*
> -** Configure the profiling stats code.  On i386 and x86_64 machines this uses
> -** CPUID to determine if the RDTSCP instruction is available and not prohibited
> -** by the OS.
> +** Do CPU feature detection, this is necessary for the profile parallel
> +** execution code and the threadscope code.

for profiling parallel code execution

> +** On i386 and x86_64 machines this uses CPUID to determine if the RDTSCP
> +** instruction is available and not prohibited by the OS.
> +** This function is idempotent.
> +** Note: We assume that all processors on a SMP machine are equivalent.
>  */

I think that is implied by SMP.

> Index: runtime/mercury_context.c
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.c,v
> retrieving revision 1.71
> diff -u -p -b -r1.71 mercury_context.c
> --- runtime/mercury_context.c	27 Nov 2009 03:51:19 -0000	1.71
> +++ runtime/mercury_context.c	1 Dec 2009 05:31:19 -0000
> @@ -96,14 +94,28 @@ static MR_Integer       MR_profile_paral
>  static MR_Integer       MR_profile_parallel_regular_context_reused = 0;
>  static MR_Integer       MR_profile_parallel_small_context_kept = 0;
>  static MR_Integer       MR_profile_parallel_regular_context_kept = 0;
> +#endif
>  
>  /*
> -** Write out the profiling data that we collect during execution.
> +** Local variables for thread pinning.
>  */
> -static void
> -MR_write_out_profiling_parallel_execution(void);
> +#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
> +    defined(MR_HAVE_SCHED_SETAFFINITY)

MR_LL_PARALLEL_CONJ implies MR_THREAD_SAFE so you can shorten these
conditionals.

> +static MercuryLock      MR_next_cpu_lock;
> +MR_bool                 MR_thread_pinning = MR_FALSE;
> +static MR_Unsigned      MR_next_cpu = 0;
> +#ifdef  MR_HAVE_SCHED_GETCPU
> +static MR_Integer       MR_primordial_threads_cpu = -1;
> +#endif
> +#endif

MR_primordial_thread_cpu (i.e. not plural)

> -#define MR_PROFILE_PARALLEL_EXECUTION_FILENAME "parallel_execution_profile.txt"
> +#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
> +    defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +/*
> +** These are used to give each context it's own unique ID.
> +*/

its

> +static MercuryLock      MR_next_context_id_lock;
> +static MR_ContextId     MR_next_context_id;
>  #endif

I didn't see these two locks being initialised, but I might have missed it.

Just a note: I prefer not to have so many trivial locks.  If contention is
unlikely to be high, I think sharing or reusing locks is better.
e.g. MR_next_context_id_lock looks redundant.  MR_create_context acquires
free_context_list_lock; you could increment MR_next_context_id then.

> Index: runtime/mercury_context.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.h,v
> retrieving revision 1.56
> diff -u -p -b -r1.56 mercury_context.h
> --- runtime/mercury_context.h	27 Nov 2009 03:51:19 -0000	1.56
> +++ runtime/mercury_context.h	1 Dec 2009 04:19:57 -0000
> @@ -256,6 +256,10 @@ struct MR_SparkDeque_Struct {
>  
>  struct MR_Context_Struct {
>      const char          *MR_ctxt_id;
> +#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
> +        defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +    MR_Unsigned         MR_ctxt_num_id;
> +#endif
>      MR_ContextSize      MR_ctxt_size;
>      MR_Context          *MR_ctxt_next;
>      MR_Code             *MR_ctxt_resume;
> @@ -434,6 +438,13 @@ extern  MR_PendingContext   *MR_pending_
>    ** instructions, even when done from within a critical section.
>    */
>    extern volatile MR_Integer    MR_num_outstanding_contexts_and_all_sparks;
> +
> +  /*
> +  ** The number of engines that have exited so far,  We can spinn on this to

Full stop and "spin".

> Index: runtime/mercury_engine.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_engine.h,v
> retrieving revision 1.50
> diff -u -p -b -r1.50 mercury_engine.h
> --- runtime/mercury_engine.h	30 Oct 2009 03:33:28 -0000	1.50
> +++ runtime/mercury_engine.h	30 Nov 2009 10:34:15 -0000
> @@ -392,6 +392,16 @@ typedef struct MR_mercury_engine_struct 
>  #ifdef  MR_THREAD_SAFE
>      MercuryThread       MR_eng_owner_thread;
>      MR_Unsigned         MR_eng_c_depth;
> +#if defined(MR_LL_PARALLEL_CONJ) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +    /*
> +    ** For each profiling event add this offset to the time so that events on
> +    ** different engines that occur at the same time have the same time in
> +    ** clock ticks.
> +    */
> +    MR_int_least64_t                    MR_eng_cpu_clock_ticks_offset;
> +    struct MR_threadscope_event_buffer  *MR_eng_ts_buffer;
> +    MR_Unsigned                         MR_eng_id;
> +#endif
>  #endif

MR_eng_id and MR_ctxt_num_id would probably be useful for other reasons,
so you could move them out of the conditionals in a later change.

> Index: runtime/mercury_thread.c
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_thread.c,v
> retrieving revision 1.36
> diff -u -p -b -r1.36 mercury_thread.c
> --- runtime/mercury_thread.c	27 Nov 2009 03:51:20 -0000	1.36
> +++ runtime/mercury_thread.c	30 Nov 2009 10:34:15 -0000
> @@ -13,6 +13,7 @@
>  #include "mercury_memory.h"
>  #include "mercury_context.h"    /* for MR_do_runnext */
>  #include "mercury_thread.h"
> +#include "mercury_threadscope.h"
>  
>  #include <stdio.h>
>  #include <errno.h>
> @@ -89,6 +90,7 @@ MR_create_thread_2(void *goal0)
>      if (goal != NULL) {
>          MR_init_thread(MR_use_now);
>          (goal->func)(goal->arg);
> +        /* XXX: We should clean up the engine here */
>      } else {
>          MR_pin_thread();
>          MR_init_thread(MR_use_later);
> @@ -129,6 +131,17 @@ MR_init_thread(MR_when_to_use when_to_us
>  
>  #ifdef  MR_THREAD_SAFE
>      MR_ENGINE(MR_eng_owner_thread) = pthread_self();
> +#if defined(MR_LL_PARALLEL_CONJ) && \
> +    defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +    /*
> +    ** TSC Synchronization is not used support is commented out.  See
> +    ** runtime/mercury_threadscope.h
> +    **

That sentence doesn't parse.

> Index: runtime/mercury_wrapper.c
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.c,v
> retrieving revision 1.202
> diff -u -p -b -r1.202 mercury_wrapper.c
> --- runtime/mercury_wrapper.c	30 Nov 2009 23:24:40 -0000	1.202
> +++ runtime/mercury_wrapper.c	1 Dec 2009 05:43:41 -0000

> @@ -614,6 +616,18 @@ mercury_runtime_init(int argc, char **ar
>      MR_ticket_high_water = 1;
>    #endif
>  #else
> +
> +#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ)
> +    MR_pin_primordial_thread();
> +#if defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +    /*
> +    ** We must setup threadscope before we setup the first engine.
> +    ** Pin the primordial thread, if thread pinning is configured.
> +    */
> +    MR_setup_threadscope();
> +#endif
> +#endif

Please indent nested #ifs.

> @@ -627,10 +641,20 @@ mercury_runtime_init(int argc, char **ar
>          int i;
>  
>          MR_exit_now = MR_FALSE;
> -        for (i = 1 ; i < MR_num_threads ; i++) {
> +        
> +        for (i = 1; i < MR_num_threads; i++) {
>              MR_create_thread(NULL);
>          }
> -        MR_pin_thread();
> +    #ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
> +    /*
> +    ** TSC Synchronization is not used support is commented out.  See
> +    ** runtime/mercury_threadscope.h

Sentence doesn't parse.

> +#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
> +        defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
> +    if (MR_ENGINE(MR_eng_ts_buffer))
> +        MR_threadscope_finalize_engine(MR_thread_engine_base);

{ }

I skipped over mercury_threadscope.[ch]

Peter
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list