[m-rev.] For review: Implement support for threadscope profiling.
Paul Bone
pbone at csse.unimelb.edu.au
Tue Dec 1 17:54:21 AEDT 2009
For review by anyone.
Support for threadscope profiling of the parallel runtime.
This change adds support for threadscope profiling of the parallel runtime in
low level C grades. It can be enabled by compiling _all_ code with the
MR_PROFILE_PARALLEL_EXECUTION_SUPPORT C macro defined. The runtime, libraries
and applications must all have this flag defined as it alters the MercuryEngine
and MR_Context structures.
See Don Jones Jr, Simon Marlow, Satnam Singh - Parallel Performance Tuning for
Haskell.
This change also includes:
Smarter thread pinning (the primordial thread is pinned to the thread that
it is currently running on).
The addition of callbacks from the Boehm GC to notify the runtime of
stop the world garbage collections.
Implement some userspace spin loops and conditions. These are cheaper than
their POSIX equivalents, do not support sleeping, and are signal handler
safe.
boehm_gc/alloc.h:
boehm_gc/alloc.c:
Declare and define the new callback functions.
boehm_gc/alloc.c:
Call the start and stop collect callbacks when we start and stop a
stop-the-world collection.
Correct how we record the time spent collecting, it now includes
collections that stop prematurely.
boehm_gc/pthread_stop_world.c:
Call the pause and resume thread callbacks in each thread where the GC
arranges for that thread to be stopped during a stop-the-world collection.
runtime/mercury_threadscope.c:
runtime/mercury_threadscope.h:
New files implementing the threadscope support.
runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
Rename MR_configure_profiling_timers to MR_do_cpu_feature_detection.
Add a new function MR_read_cpu_tsc() to read the TSC register from the CPU,
this simply abstracts the static MR_rdtsc function.
runtime/mercury_atomic_ops.h:
Modify the C inline assembler to ensure we tell the C compiler that the
value in the register mapped to the 'old' parameter is also an output from
the instructions. That is, the C compiler must not depend on the value of
'old' being the same before and after the instruction is executed. This
has never been a problem in practice though.
Implement some cheap userspace mutual exclusion locks and condition
variables. These will be faster than pthread's mutexes when critical
sections are short and threads are pinned to separate CPUs.
runtime/mercury_context.c:
runtime/mercury_context.h:
Add a new function for pinning the primordial thread. If the OS supports
sched_getcpu we use it to determine which CPU the primordial thread should
use. No other thread will be pinned to this CPU.
Add a numeric id field to each context, this id is uniquely assigned and
identifies each context for threadscope.
Move MR_load_context from a C macro to a C procedure.
MR_schedule_context posts the 'context runnable' threadscope event.
MR_do_runnext has been modified to destroy engines differently, it ensures
they cleanup properly so that their threadscope events are flushed properly
and then calls pthread_exit(0)
MR_do_runnext posts events for threadscope.
MR_do_join_and_continue posts events for threadscope.
runtime/mercury_engine.h:
Add new fields to the MercuryEngine structure including a buffer of
threadscope events, a clock offset (used to synchronize the TSC clocks) and
a unique identifier for the engine,
runtime/mercury_engine.c:
Call MR_threadscope_setup_engine() and MR_threadscope_finalize_engine for
newly created and about-to-be-destroyed engines.
When the main context finishes on a thread that's not the primordial thread
post a 'context is yielding' message before re-scheduling the context on
the primordial thread.
runtime/mercury_thread.c:
Added an XXX comment about a potential problem, it's only relevant for
programs using thread.spawn.
Added calls to the TSC synchronisation code used for threadscope profiling.
It appears that this is not necessary on modern x86 machines, it has been
commented out.
Post a threadscope event when we create a new context.
Don't call pthread_exit in MR_destroy_thread, we now do this in
MR_do_runnext so that we can unlock the runqueue mutex after cleaning up.
runtime/mercury_wrapper.c:
Conform to changes in mercury_atomic_ops.[ch]
Post an event immediately before calling main to mark the beginning of the
program in the threadscope profile.
Post a "context finished" event at the end of the program.
Wait until all engines have exited before cleaning up global data, this is
important for finishing writing the threadscope data file.
configure.in:
runtime/mercury_conf.h.in:
Test for the sched_getcpu C function and utmpx.h header file, these are
used for thread pinning.
runtime/Mmakefile:
Include the mercury_threadscope.[hc] files in the list of runtime headers
and sources respectively.
Index: configure.in
===================================================================
RCS file: /home/mercury1/repository/mercury/configure.in,v
retrieving revision 1.554
diff -u -p -b -r1.554 configure.in
--- configure.in 24 Nov 2009 23:49:44 -0000 1.554
+++ configure.in 1 Dec 2009 06:12:07 -0000
@@ -1153,7 +1153,8 @@ mercury_check_for_functions \
getpid setpgid fork execlp wait kill \
grantpt unlockpt ptsname tcgetattr tcsetattr ioctl \
access sleep opendir readdir closedir mkdir symlink readlink \
- gettimeofday setenv putenv _putenv posix_spawn sched_setaffinity
+ gettimeofday setenv putenv _putenv posix_spawn sched_setaffinity \
+ sched_getcpu
#-----------------------------------------------------------------------------#
@@ -1163,7 +1164,7 @@ MERCURY_CHECK_FOR_HEADERS( \
sys/types.h sys/stat.h fcntl.h termios.h sys/ioctl.h \
sys/stropts.h windows.h dirent.h getopt.h malloc.h \
semaphore.h pthread.h time.h spawn.h fenv.h sys/mman.h sys/sem.h \
- sched.h)
+ sched.h utmpx.h)
if test "$MR_HAVE_GETOPT_H" = 1; then
GETOPT_H_AVAILABLE=yes
Index: boehm_gc/alloc.c
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/alloc.c,v
retrieving revision 1.18
diff -u -p -b -r1.18 alloc.c
--- boehm_gc/alloc.c 18 Mar 2008 03:09:39 -0000 1.18
+++ boehm_gc/alloc.c 30 Nov 2009 10:34:15 -0000
@@ -65,6 +65,13 @@ GC_bool GC_mercury_calc_gc_time
unsigned long GC_total_gc_time = 0;
/* Measured in milliseconds. */
+void (*GC_mercury_callback_start_collect)(void) = NULL;
+void (*GC_mercury_callback_stop_collect)(void) = NULL;
+void (*GC_mercury_callback_pause_thread)(void) = NULL;
+void (*GC_mercury_callback_resume_thread)(void) = NULL;
+ /* Callbacks for mercury to notify */
+ /* the runtime of certain events */
+
#ifndef SMALL_CONFIG
int GC_incremental = 0; /* By default, stop the world. */
#endif
@@ -311,6 +318,7 @@ void GC_maybe_gc(void)
GC_bool GC_try_to_collect_inner(GC_stop_func stop_func)
{
CLOCK_TYPE start_time, current_time;
+ GC_bool result = TRUE;
if (GC_dont_gc) return FALSE;
if (GC_incremental && GC_collection_in_progress()) {
if (GC_print_stats) {
@@ -351,6 +359,9 @@ GC_bool GC_try_to_collect_inner(GC_stop_
GC_save_callers(GC_last_stack);
# endif
GC_is_full_gc = TRUE;
+ if (GC_mercury_callback_start_collect) {
+ GC_mercury_callback_start_collect();
+ }
if (!GC_stopped_mark(stop_func)) {
if (!GC_incremental) {
/* We're partially done and have no way to complete or use */
@@ -360,14 +371,15 @@ GC_bool GC_try_to_collect_inner(GC_stop_
GC_unpromote_black_lists();
} /* else we claim the world is already still consistent. We'll */
/* finish incrementally. */
- return(FALSE);
- }
+ result = FALSE;
+ } else {
GC_finish_collection();
+ }
if (GC_print_stats || GC_mercury_calc_gc_time) {
unsigned long cur_gc_time;
GET_TIME(current_time);
cur_gc_time = MS_TIME_DIFF(current_time,start_time);
- if (GC_print_stats) {
+ if (GC_print_stats && result) {
GC_log_printf("Complete collection took %lu msecs\n",
cur_gc_time);
}
@@ -375,7 +387,10 @@ GC_bool GC_try_to_collect_inner(GC_stop_
GC_total_gc_time += cur_gc_time;
}
}
- return(TRUE);
+ if (GC_mercury_callback_stop_collect) {
+ GC_mercury_callback_stop_collect();
+ }
+ return(result);
}
Index: boehm_gc/pthread_stop_world.c
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/pthread_stop_world.c,v
retrieving revision 1.4
diff -u -p -b -r1.4 pthread_stop_world.c
--- boehm_gc/pthread_stop_world.c 15 Aug 2006 04:19:27 -0000 1.4
+++ boehm_gc/pthread_stop_world.c 30 Nov 2009 10:34:15 -0000
@@ -119,7 +119,13 @@ void GC_suspend_handler_inner(ptr_t sig_
void GC_suspend_handler(int sig, siginfo_t *info, void *context)
{
int old_errno = errno;
+ if (GC_mercury_callback_pause_thread) {
+ GC_mercury_callback_pause_thread();
+ }
GC_with_callee_saves_pushed(GC_suspend_handler_inner, (ptr_t)(word)sig);
+ if (GC_mercury_callback_resume_thread) {
+ GC_mercury_callback_resume_thread();
+ }
errno = old_errno;
}
#else
@@ -128,7 +134,13 @@ void GC_suspend_handler(int sig, siginfo
void GC_suspend_handler(int sig, siginfo_t *info, void *context)
{
int old_errno = errno;
+ if (GC_mercury_callback_pause_thread) {
+ GC_mercury_callback_pause_thread();
+ }
GC_suspend_handler_inner((ptr_t)(word)sig, context);
+ if (GC_mercury_callback_resume_thread) {
+ GC_mercury_callback_resume_thread();
+ }
errno = old_errno;
}
#endif
Index: boehm_gc/include/gc.h
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/include/gc.h,v
retrieving revision 1.19
diff -u -p -b -r1.19 gc.h
--- boehm_gc/include/gc.h 18 Mar 2008 03:09:42 -0000 1.19
+++ boehm_gc/include/gc.h 1 Dec 2009 00:25:02 -0000
@@ -244,6 +244,27 @@ GC_API unsigned long GC_total_gc_time;
/* so far by garbage collections. It is */
/* measured in milliseconds. */
+/*
+ * Callbacks for mercury to notify the runtime of certain events.
+ */
+GC_API void (*GC_mercury_callback_start_collect)(void);
+ /* Starting a collection */
+GC_API void (*GC_mercury_callback_stop_collect)(void);
+ /* Stopping a collection */
+GC_API void (*GC_mercury_callback_pause_thread)(void);
+ /*
+ * This thread is about to be paused.
+ *
+ * Use these with care! They're called from a signal handler,
+ * they must NOT allocate memory and if they do locking they
+ * must use reentrant mutexes. Also note that these do not
+ * work on OS X/Darwin, On Darwin the world is stopped in a
+ * different way, we can't easily add support for these
+ * callbacks on Darwin.
+ */
+GC_API void (*GC_mercury_callback_resume_thread)(void);
+ /* This thread is about to be resumed */
+
/* Public procedures */
/* Initialize the collector. This is only required when using thread-local
Index: runtime/Mmakefile
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/Mmakefile,v
retrieving revision 1.148
diff -u -p -b -r1.148 Mmakefile
--- runtime/Mmakefile 30 Oct 2009 03:33:27 -0000 1.148
+++ runtime/Mmakefile 30 Nov 2009 10:34:15 -0000
@@ -92,6 +92,7 @@ HDRS = \
mercury_tags.h \
mercury_term_size.h \
mercury_thread.h \
+ mercury_threadscope.h \
mercury_timing.h \
mercury_trace_base.h \
mercury_trace_term.h \
@@ -199,6 +200,7 @@ CFILES = \
mercury_tabling.c \
mercury_term_size.c \
mercury_thread.c \
+ mercury_threadscope.c \
mercury_timing.c \
mercury_trace_base.c \
mercury_trace_term.c \
Index: runtime/mercury_atomic_ops.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_atomic_ops.c,v
retrieving revision 1.6
diff -u -p -b -r1.6 mercury_atomic_ops.c
--- runtime/mercury_atomic_ops.c 6 Nov 2009 05:40:24 -0000 1.6
+++ runtime/mercury_atomic_ops.c 30 Nov 2009 10:34:15 -0000
@@ -116,7 +116,7 @@ parse_freq_from_x86_brand_string(char *s
#endif /* __GNUC__ && (__i386__ || __x86_64__) */
extern void
-MR_configure_profiling_timers(void) {
+MR_do_cpu_feature_detection(void) {
#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
MR_Unsigned a, b, c, d;
MR_Unsigned eflags, old_eflags;
@@ -434,6 +434,23 @@ MR_profiling_stop_timer(MR_Timer *timer,
#endif /* not __GNUC__ && (__i386__ || __x86_64__) */
}
+MR_uint_least64_t
+MR_read_cpu_tsc(void)
+{
+#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
+ MR_uint_least64_t tsc;
+
+ if (MR_rdtsc_is_available == MR_TRUE) {
+ MR_rdtsc(&tsc);
+ } else {
+ tsc = 0;
+ }
+ return tsc;
+#elif /* not __GNUC__ && (__i386__ || __x86_64__) */
+ return 0;
+#endif /* not __GNUC__ && (__i386__ || __x86_64__) */
+}
+
/*
** It's convenient that this instruction is the same on both i386 and x86_64
*/
Index: runtime/mercury_atomic_ops.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_atomic_ops.h,v
retrieving revision 1.7
diff -u -p -b -r1.7 mercury_atomic_ops.h
--- runtime/mercury_atomic_ops.h 6 Nov 2009 05:40:24 -0000 1.7
+++ runtime/mercury_atomic_ops.h 1 Dec 2009 05:29:18 -0000
@@ -50,8 +50,8 @@ MR_compare_and_swap_word(volatile MR_Int
char result; \
\
__asm__ __volatile__( \
- "lock; cmpxchgq %3, %0; setz %1" \
- : "=m"(*addr), "=q"(result) \
+ "lock; cmpxchgq %4, %0; setz %1" \
+ : "=m"(*addr), "=q"(result), "=a"(old) \
: "m"(*addr), "r" (new_val), "a"(old) \
); \
return (int) result; \
@@ -65,8 +65,8 @@ MR_compare_and_swap_word(volatile MR_Int
char result; \
\
__asm__ __volatile__( \
- "lock; cmpxchgl %3, %0; setz %1" \
- : "=m"(*addr), "=q"(result) \
+ "lock; cmpxchgl %4, %0; setz %1" \
+ : "=m"(*addr), "=q"(result), "=a"(old) \
: "m"(*addr), "r" (new_val), "a"(old) \
); \
return (int) result; \
@@ -307,6 +307,113 @@ MR_atomic_sub_int(volatile MR_Integer *a
#endif
+/*
+** Memory fence operations.
+*/
+#if defined(__GNUC__) && ( defined(__i386__) || defined(__x86_64__) )
+ /*
+ ** Guarantees that any stores executed before this fence are globally
+ ** visible before those after this fence.
+ */
+ #define MR_CPU_SFENCE \
+ do { \
+ __asm__ __volatile__("sfence"); \
+ } while(0)
+
+ /*
+ ** Guarantees that any loads executed before this fence are complete before
+ ** any loads after this fence.
+ */
+ #define MR_CPU_LFENCE \
+ do { \
+ __asm__ __volatile__("lfence"); \
+ } while(0)
+
+ /*
+ ** A combination of the above.
+ */
+ #define MR_CPU_MFENCE \
+ do { \
+ __asm__ __volatile__("mfence"); \
+ } while(0)
+
+#elif __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)
+
+ /*
+ ** Our memory fences are better than GCC's. GCC only implements a full
+ ** fence.
+ */
+ #define MR_CPU_MFENCE \
+ do { \
+ __sync_synchronize(); \
+ } while(0)
+ #define MR_CPU_SFENCE MR_CPU_MFENCE
+ #define MR_CPU_LFENCE MR_CPU_MFENCE
+
+#else
+
+ #pragma error "Please implement memory fence operations for this " \
+ "compiler/architecture"
+
+#endif
+
+/*
+** Roll our own cheap user-space mutual exclusion locks. Blocking without
+** spinning is not supported. Storage for these locks should be volatile.
+**
+** I expect these to be faster than pthread mutexes when threads are pinned and
+** critical sections are short.
+*/
+typedef MR_Unsigned MR_Us_Lock;
+
+#define MR_US_LOCK_INITIAL_VALUE (0)
+
+#define MR_US_TRY_LOCK(x) \
+ MR_compare_and_swap_word(x, 0, 1)
+
+#define MR_US_SPIN_LOCK(x) \
+ while (!MR_compare_and_swap_word(x, 0, 1)) { \
+ MR_ATOMIC_PAUSE; \
+ }
+
+#define MR_US_UNLOCK(x) \
+ do { \
+ MR_CPU_MFENCE; \
+ *x = 0; \
+ } while(0)
+
+/*
+** Similar support for condition variables. Again, make sure they are
+** volatile.
+**
+** XXX: These are not atomic, A waiting thread will not see a change until
+** sometime after the signaling thread has signaled the condition. The same
+** race can occur when clearing a condition. Order of memory operations is not
+** guaranteed either.
+*/
+typedef MR_Unsigned MR_Us_Cond;
+
+#define MR_US_COND_CLEAR(x) \
+ do { \
+ MR_CPU_MFENCE; \
+ *x = 0; \
+ } while(0)
+
+#define MR_US_COND_SET(x) \
+ do { \
+ MR_CPU_MFENCE; \
+ *x = 1; \
+ MR_CPU_MFENCE; \
+ } while(0)
+
+#define MR_US_SPIN_COND(x) \
+ do { \
+ while (!(*x)) { \
+ MR_ATOMIC_PAUSE; \
+ } \
+ MR_CPU_MFENCE; \
+ } while (0)
+
#endif /* MR_LL_PARALLEL_CONJ */
/*
@@ -346,16 +453,21 @@ typedef struct {
/*
** The number of CPU clock cycles per second, ie a 1GHz CPU will have a value
** of 10^9, zero if unknown.
+** This value is only available after MR_do_cpu_feature_detection() has been
+** called.
*/
extern MR_uint_least64_t MR_cpu_cycles_per_sec;
/*
-** Configure the profiling stats code. On i386 and x86_64 machines this uses
-** CPUID to determine if the RDTSCP instruction is available and not prohibited
-** by the OS.
+** Do CPU feature detection, this is necessary for the profile parallel
+** execution code and the threadscope code.
+** On i386 and x86_64 machines this uses CPUID to determine if the RDTSCP
+** instruction is available and not prohibited by the OS.
+** This function is idempotent.
+** Note: We assume that all processors on a SMP machine are equivalent.
*/
extern void
-MR_configure_profiling_timers(void);
+MR_do_cpu_feature_detection(void);
/*
** Start and initialize a timer structure.
@@ -369,6 +481,13 @@ MR_profiling_start_timer(MR_Timer *timer
extern void
MR_profiling_stop_timer(MR_Timer *timer, MR_Stats *stats);
+/*
+** Read the CPU's TSC. This is currently only implemented for i386 and x86-64
+** systems. It returns 0 when support is not available.
+*/
+extern MR_uint_least64_t
+MR_read_cpu_tsc(void);
+
#endif /* MR_THREAD_SAFE && MR_PROFILE_PARALLEL_EXECUTION_SUPPORT */
/*---------------------------------------------------------------------------*/
Index: runtime/mercury_conf.h.in
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_conf.h.in,v
retrieving revision 1.66
diff -u -p -b -r1.66 mercury_conf.h.in
--- runtime/mercury_conf.h.in 5 Nov 2009 05:47:40 -0000 1.66
+++ runtime/mercury_conf.h.in 30 Nov 2009 10:34:15 -0000
@@ -140,6 +140,7 @@
** MR_HAVE_SYS_MMAN_H we have <sys/mman.h>
** MR_HAVE_SYS_SEM_H we have <sys/sem.h>
** MR_HAVE_SCHED_H we have <sched.h>
+** MR_HAVE_UTMPX_H we have <utmpx.h>
*/
#undef MR_HAVE_SYS_SIGINFO_H
#undef MR_HAVE_SYS_SIGNAL_H
@@ -170,6 +171,7 @@
#undef MR_HAVE_SYS_MMAN_H
#undef MR_HAVE_SYS_SEM_H
#undef MR_HAVE_SCHED_H
+#undef MR_HAVE_UTMPX_H
/*
** MR_HAVE_POSIX_TIMES is defined if we have the POSIX
@@ -268,7 +270,8 @@
** MR_HAVE__PUTENV we have the _putenv() function.
** MR_HAVE_POSIX_SPAWN we have the posix_spawn() function.
** MR_HAVE_FESETROUND we have the fesetround() function.
-** MR_HAVE_SCHED_SETAFFINITY if we have the sched_setaffinity() function.
+** MR_HAVE_SCHED_SETAFFINITY we have the sched_setaffinity() function.
+** MR_HAVE_SCHED_GETCPU we have the sched_getcpu() function (glibc specific).
*/
#undef MR_HAVE_GETPID
#undef MR_HAVE_SETPGID
@@ -331,6 +334,7 @@
#undef MR_HAVE_POSIX_SPAWN
#undef MR_HAVE_FESETROUND
#undef MR_HAVE_SCHED_SETAFFINITY
+#undef MR_HAVE_SCHED_GETCPU
/*
** We use mprotect() and signals to catch stack and heap overflows.
Index: runtime/mercury_context.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.c,v
retrieving revision 1.71
diff -u -p -b -r1.71 mercury_context.c
--- runtime/mercury_context.c 27 Nov 2009 03:51:19 -0000 1.71
+++ runtime/mercury_context.c 1 Dec 2009 05:31:19 -0000
@@ -38,11 +38,12 @@ ENDINIT
#include "mercury_memory_handlers.h"
#include "mercury_context.h"
#include "mercury_engine.h" /* for `MR_memdebug' */
+#include "mercury_threadscope.h" /* for data types and posting events */
#include "mercury_reg_workarounds.h" /* for `MR_fd*' stuff */
-static void
-MR_init_context_maybe_generator(MR_Context *c, const char *id,
- MR_GeneratorPtr gen);
+#if defined(MR_THREAD_SAFE) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+#define MR_PROFILE_PARALLEL_EXECUTION_FILENAME "parallel_execution_profile.txt"
+#endif
/*---------------------------------------------------------------------------*/
@@ -66,9 +67,6 @@ MR_Context *MR_runqueue_tai
#ifdef MR_LL_PARALLEL_CONJ
MR_SparkDeque MR_spark_queue;
MercuryLock MR_sync_term_lock;
- MR_bool MR_thread_pinning = MR_FALSE;
- static MercuryLock MR_next_cpu_lock;
- static MR_Unsigned MR_next_cpu = 0;
#endif
MR_PendingContext *MR_pending_contexts;
@@ -96,14 +94,28 @@ static MR_Integer MR_profile_paral
static MR_Integer MR_profile_parallel_regular_context_reused = 0;
static MR_Integer MR_profile_parallel_small_context_kept = 0;
static MR_Integer MR_profile_parallel_regular_context_kept = 0;
+#endif
/*
-** Write out the profiling data that we collect during execution.
+** Local variables for thread pinning.
*/
-static void
-MR_write_out_profiling_parallel_execution(void);
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY)
+static MercuryLock MR_next_cpu_lock;
+MR_bool MR_thread_pinning = MR_FALSE;
+static MR_Unsigned MR_next_cpu = 0;
+#ifdef MR_HAVE_SCHED_GETCPU
+static MR_Integer MR_primordial_threads_cpu = -1;
+#endif
+#endif
-#define MR_PROFILE_PARALLEL_EXECUTION_FILENAME "parallel_execution_profile.txt"
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+/*
+** These are used to give each context it's own unique ID.
+*/
+static MercuryLock MR_next_context_id_lock;
+static MR_ContextId MR_next_context_id;
#endif
/*
@@ -123,11 +135,31 @@ int volatile MR_num_idle_engines = 0;
int volatile MR_num_outstanding_contexts_and_global_sparks = 0;
MR_Integer volatile MR_num_outstanding_contexts_and_all_sparks = 0;
+MR_Unsigned volatile MR_num_exited_engines = 0;
+
static MercuryLock MR_par_cond_stats_lock;
#endif
/*---------------------------------------------------------------------------*/
+static void
+MR_init_context_maybe_generator(MR_Context *c, const char *id,
+ MR_GeneratorPtr gen);
+
+/*
+** Write out the profiling data that we collect during execution.
+*/
+static void
+MR_write_out_profiling_parallel_execution(void);
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY)
+static void
+MR_do_pin_thread(int cpu);
+#endif
+
+/*---------------------------------------------------------------------------*/
+
void
MR_init_thread_stuff(void)
{
@@ -199,6 +231,39 @@ MR_init_thread_stuff(void)
#endif /* MR_THREAD_SAFE */
}
+/*
+** Pin the primordial thread first to the CPU it is currently using. (where
+** support is available).
+*/
+void
+MR_pin_primordial_thread(void)
+{
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY)
+#ifdef MR_HAVE_SCHED_GETCPU
+ /*
+ ** We don't need locking to pin the primordial thread as it is called
+ ** before any other threads exist.
+ */
+ if (MR_thread_pinning) {
+ MR_primordial_threads_cpu = sched_getcpu();
+ if (MR_primordial_threads_cpu == -1) {
+ perror("Warning: unable to determine the current CPU for "
+ "the primordial thread: ");
+ } else {
+ MR_do_pin_thread(MR_primordial_threads_cpu);
+ }
+ }
+ if (MR_primordial_threads_cpu == -1) {
+ MR_pin_thread();
+ }
+#else
+ MR_pin_thread();
+#endif
+#endif /* defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY) */
+}
+
void
MR_pin_thread(void)
{
@@ -206,16 +271,31 @@ MR_pin_thread(void)
defined(MR_HAVE_SCHED_SETAFFINITY)
MR_LOCK(&MR_next_cpu_lock, "MR_pin_thread");
if (MR_thread_pinning) {
+#if defined(MR_HAVE_SCHED_GETCPU)
+ if (MR_next_cpu == MR_primordial_threads_cpu) {
+ MR_next_cpu++;
+ }
+#endif
+ MR_do_pin_thread(MR_next_cpu);
+ MR_next_cpu++;
+ }
+ MR_UNLOCK(&MR_next_cpu_lock, "MR_pin_thread");
+#endif /* defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY) */
+}
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_HAVE_SCHED_SETAFFINITY)
+static void
+MR_do_pin_thread(int cpu)
+{
cpu_set_t cpus;
- if (MR_next_cpu < CPU_SETSIZE) {
+ if (cpu < CPU_SETSIZE) {
CPU_ZERO(&cpus);
- CPU_SET(MR_next_cpu, &cpus);
- if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) == 0)
- {
- MR_next_cpu++;
- } else {
- perror("Warning: Couldn't set CPU affinity");
+ CPU_SET(cpu, &cpus);
+ if (sched_setaffinity(0, sizeof(cpu_set_t), &cpus) == -1) {
+ perror("Warning: Couldn't set CPU affinity: ");
/*
** If this failed once it will probably fail again so we
** disable it.
@@ -224,14 +304,12 @@ MR_pin_thread(void)
}
} else {
perror("Warning: Couldn't set CPU affinity due to a static "
- "system limit");
+ "system limit: ");
MR_thread_pinning = MR_FALSE;
}
- }
- MR_UNLOCK(&MR_next_cpu_lock, "MR_pin_thread");
+}
#endif /* defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
defined(MR_HAVE_SCHED_SETAFFINITY) */
-}
void
MR_finalize_thread_stuff(void)
@@ -394,6 +472,11 @@ MR_init_context_maybe_generator(MR_Conte
c->MR_ctxt_resume_owner_thread = (MercuryThread) NULL;
c->MR_ctxt_resume_c_depth = 0;
c->MR_ctxt_saved_owners = NULL;
+ #if defined(MR_LL_PARALLEL_CONJ) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_LOCK(&MR_next_context_id_lock, "MR_create_context");
+ c->MR_ctxt_num_id = MR_next_context_id++;
+ MR_UNLOCK(&MR_next_context_id_lock, "MR_create_context");
+ #endif
#endif
#ifndef MR_HIGHLEVEL_CODE
@@ -660,6 +743,73 @@ MR_destroy_context(MR_Context *c)
}
void
+MR_load_context(MR_Context *load_context_c)
+{
+#ifndef MR_HIGHLEVEL_CODE
+ MR_succip_word = (MR_Word) load_context_c->MR_ctxt_succip;
+ MR_sp_word = (MR_Word) load_context_c->MR_ctxt_sp;
+ MR_maxfr_word = (MR_Word) load_context_c->MR_ctxt_maxfr;
+ MR_curfr_word = (MR_Word) load_context_c->MR_ctxt_curfr;
+ #ifdef MR_USE_MINIMAL_MODEL_STACK_COPY
+ MR_gen_next = load_context_c->MR_ctxt_gen_next;
+ MR_cut_next = load_context_c->MR_ctxt_cut_next;
+ MR_pneg_next = load_context_c->MR_ctxt_pneg_next;
+ #endif
+ #ifdef MR_THREAD_SAFE
+ MR_parent_sp = load_context_c->MR_ctxt_parent_sp;
+ #endif
+#endif
+#ifdef MR_USE_TRAIL
+ #ifdef MR_THREAD_SAFE
+ MR_ENGINE(MR_eng_context).MR_ctxt_trail_zone =
+ load_context_c->MR_ctxt_trail_zone;
+ #else
+ MR_trail_zone = load_context_c->MR_ctxt_trail_zone;
+ #endif
+ MR_trail_ptr = load_context_c->MR_ctxt_trail_ptr;
+ MR_ticket_counter = load_context_c->MR_ctxt_ticket_counter;
+ MR_ticket_high_water = load_context_c->MR_ctxt_ticket_high_water;
+#endif
+#ifndef MR_HIGHLEVEL_CODE
+ MR_ENGINE(MR_eng_context).MR_ctxt_detstack_zone =
+ load_context_c->MR_ctxt_detstack_zone;
+ MR_ENGINE(MR_eng_context).MR_ctxt_prev_detstack_zones =
+ load_context_c->MR_ctxt_prev_detstack_zones;
+ MR_ENGINE(MR_eng_context).MR_ctxt_nondetstack_zone =
+ load_context_c->MR_ctxt_nondetstack_zone;
+ MR_ENGINE(MR_eng_context).MR_ctxt_prev_nondetstack_zones =
+ load_context_c->MR_ctxt_prev_nondetstack_zones;
+ #ifdef MR_USE_MINIMAL_MODEL_STACK_COPY
+ MR_ENGINE(MR_eng_context).MR_ctxt_genstack_zone =
+ load_context_c->MR_ctxt_genstack_zone;
+ MR_ENGINE(MR_eng_context).MR_ctxt_cutstack_zone =
+ load_context_c->MR_ctxt_cutstack_zone;
+ MR_ENGINE(MR_eng_context).MR_ctxt_pnegstack_zone =
+ load_context_c->MR_ctxt_pnegstack_zone;
+ MR_gen_stack = (MR_GenStackFrame *)
+ MR_ENGINE(MR_eng_context).MR_ctxt_genstack_zone->
+ MR_zone_min;
+ MR_cut_stack = (MR_CutStackFrame *)
+ MR_ENGINE(MR_eng_context).MR_ctxt_cutstack_zone->
+ MR_zone_min;
+ MR_pneg_stack = (MR_PNegStackFrame *)
+ MR_ENGINE(MR_eng_context).MR_ctxt_pnegstack_zone->
+ MR_zone_min;
+ #endif
+ #ifdef MR_EXEC_TRACE_INFO_IN_CONTEXT
+ MR_trace_call_seqno = load_context_c->MR_ctxt_call_seqno;
+ MR_trace_call_depth = load_context_c->MR_ctxt_call_depth;
+ MR_trace_event_number = load_context_c->MR_ctxt_event_number;
+ #endif
+#endif /* ! MR_HIGH_LEVEL_CODE */
+ MR_set_min_heap_reclamation_point(load_context_c);
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_threadscope_post_run_context();
+#endif
+}
+
+void
MR_flounder(void)
{
MR_fatal_error("computation floundered");
@@ -771,6 +921,10 @@ MR_check_pending_contexts(MR_bool block)
void
MR_schedule_context(MR_Context *ctxt)
{
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_threadscope_post_context_runnable(ctxt);
+#endif
MR_LOCK(&MR_runqueue_lock, "schedule_context");
ctxt->MR_ctxt_next = NULL;
if (MR_runqueue_tail) {
@@ -862,8 +1016,10 @@ MR_define_entry(MR_do_runnext);
** up the Mercury runtime. It cannot exit by this route.
*/
assert(thd != MR_primordial_thread);
- MR_UNLOCK(&MR_runqueue_lock, "MR_do_runnext (ii)");
MR_destroy_thread(MR_cur_engine());
+ MR_num_exited_engines++;
+ MR_UNLOCK(&MR_runqueue_lock, "MR_do_runnext (ii)");
+ pthread_exit(0);
}
/* Search for a ready context which we can handle. */
@@ -944,13 +1100,18 @@ MR_define_entry(MR_do_runnext);
if (MR_ENGINE(MR_eng_this_context) == NULL) {
MR_ENGINE(MR_eng_this_context) = MR_create_context("from spark",
MR_CONTEXT_SIZE_SMALL, NULL);
- MR_load_context(MR_ENGINE(MR_eng_this_context));
#ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ MR_threadscope_post_create_context_for_spark(MR_ENGINE(MR_eng_this_context));
if (MR_profile_parallel_execution) {
MR_atomic_inc_int(
&MR_profile_parallel_contexts_created_for_sparks);
}
#endif
+ MR_load_context(MR_ENGINE(MR_eng_this_context));
+ } else {
+#if defined(MR_LL_PARALLEL_CONJ) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_threadscope_post_run_context();
+#endif
}
MR_parent_sp = spark.MR_spark_parent_sp;
MR_assert(MR_parent_sp != MR_sp);
@@ -992,6 +1153,11 @@ MR_END_MODULE
MR_Code*
MR_do_join_and_continue(MR_SyncTerm *jnc_st, MR_Code *join_label)
{
+ /*
+ * XXX: We should take the current TSC time here and use it to post the
+ * various 'context stopped' threadscope events. This profile will be more
+ * accurate.
+ */
if (!jnc_st->MR_st_is_shared) {
/* This parallel conjunction has only executed sequentially. */
if (--jnc_st->MR_st_count == 0) {
@@ -1047,6 +1213,9 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
jnc_st->MR_st_orig_context->MR_ctxt_resume = join_label;
MR_schedule_context(jnc_st->MR_st_orig_context);
MR_UNLOCK(&MR_sync_term_lock, "continue ii");
+#ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ MR_threadscope_post_stop_context(MR_TS_STOP_REASON_FINISHED);
+#endif
return MR_ENTRY(MR_do_runnext);
}
} else {
@@ -1096,9 +1265,17 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
** away once we enable work-stealing. - pbone.
*/
if (jnc_ctxt == jnc_st->MR_st_orig_context) {
+#ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ MR_threadscope_post_stop_context(MR_TS_STOP_REASON_BLOCKED);
+#endif
MR_save_context(jnc_ctxt);
MR_ENGINE(MR_eng_this_context) = NULL;
+ } else {
+#ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ MR_threadscope_post_stop_context(MR_TS_STOP_REASON_FINISHED);
+#endif
}
+
/* Finally look for other work. */
MR_UNLOCK(&MR_sync_term_lock, "continue_2 ii");
return MR_ENTRY(MR_do_runnext);
Index: runtime/mercury_context.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.h,v
retrieving revision 1.56
diff -u -p -b -r1.56 mercury_context.h
--- runtime/mercury_context.h 27 Nov 2009 03:51:19 -0000 1.56
+++ runtime/mercury_context.h 1 Dec 2009 04:19:57 -0000
@@ -256,6 +256,10 @@ struct MR_SparkDeque_Struct {
struct MR_Context_Struct {
const char *MR_ctxt_id;
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_Unsigned MR_ctxt_num_id;
+#endif
MR_ContextSize MR_ctxt_size;
MR_Context *MR_ctxt_next;
MR_Code *MR_ctxt_resume;
@@ -434,6 +438,13 @@ extern MR_PendingContext *MR_pending_
** instructions, even when done from within a critical section.
*/
extern volatile MR_Integer MR_num_outstanding_contexts_and_all_sparks;
+
+ /*
+ ** The number of engines that have exited so far, We can spinn on this to
+ ** make sure that our engines have exited before finalizing some global
+ ** resources.
+ */
+ extern volatile MR_Unsigned MR_num_exited_engines;
#endif /* !MR_LL_PARALLEL_CONJ */
/*---------------------------------------------------------------------------*/
@@ -468,7 +479,11 @@ extern void MR_init_thread_stuff
/*
** MR_pin_thread() pins the current thread to the next available processor ID,
** if thread pinning is enabled.
+** MR_pin_primordial_thread() is a special case for the primordial thread. It
+** should only be executed once, and only by the primordial thread _before_
+** the other threads are started.
*/
+extern void MR_pin_primordial_thread(void);
extern void MR_pin_thread(void);
/*
@@ -582,71 +597,8 @@ extern void MR_schedule_context(
#define MR_IF_NOT_HIGHLEVEL_CODE(x)
#endif
-#define MR_load_context(cptr) \
- do { \
- MR_Context *load_context_c; \
- \
- load_context_c = (cptr); \
- MR_IF_NOT_HIGHLEVEL_CODE( \
- MR_succip_word = (MR_Word) load_context_c->MR_ctxt_succip; \
- MR_sp_word = (MR_Word) load_context_c->MR_ctxt_sp; \
- MR_maxfr_word = (MR_Word) load_context_c->MR_ctxt_maxfr; \
- MR_curfr_word = (MR_Word) load_context_c->MR_ctxt_curfr; \
- MR_IF_USE_MINIMAL_MODEL_STACK_COPY( \
- MR_gen_next = load_context_c->MR_ctxt_gen_next; \
- MR_cut_next = load_context_c->MR_ctxt_cut_next; \
- MR_pneg_next = load_context_c->MR_ctxt_pneg_next; \
- ) \
- MR_IF_THREAD_SAFE( \
- MR_parent_sp = load_context_c->MR_ctxt_parent_sp; \
- ) \
- ) \
- MR_IF_USE_TRAIL( \
- MR_IF_NOT_THREAD_SAFE( \
- MR_trail_zone = load_context_c->MR_ctxt_trail_zone; \
- ) \
- MR_IF_THREAD_SAFE( \
- MR_ENGINE(MR_eng_context).MR_ctxt_trail_zone = \
- load_context_c->MR_ctxt_trail_zone; \
- ) \
- MR_trail_ptr = load_context_c->MR_ctxt_trail_ptr; \
- MR_ticket_counter = load_context_c->MR_ctxt_ticket_counter; \
- MR_ticket_high_water = load_context_c->MR_ctxt_ticket_high_water; \
- ) \
- MR_IF_NOT_HIGHLEVEL_CODE( \
- MR_ENGINE(MR_eng_context).MR_ctxt_detstack_zone = \
- load_context_c->MR_ctxt_detstack_zone; \
- MR_ENGINE(MR_eng_context).MR_ctxt_prev_detstack_zones = \
- load_context_c->MR_ctxt_prev_detstack_zones; \
- MR_ENGINE(MR_eng_context).MR_ctxt_nondetstack_zone = \
- load_context_c->MR_ctxt_nondetstack_zone; \
- MR_ENGINE(MR_eng_context).MR_ctxt_prev_nondetstack_zones = \
- load_context_c->MR_ctxt_prev_nondetstack_zones; \
- MR_IF_USE_MINIMAL_MODEL_STACK_COPY( \
- MR_ENGINE(MR_eng_context).MR_ctxt_genstack_zone = \
- load_context_c->MR_ctxt_genstack_zone; \
- MR_ENGINE(MR_eng_context).MR_ctxt_cutstack_zone = \
- load_context_c->MR_ctxt_cutstack_zone; \
- MR_ENGINE(MR_eng_context).MR_ctxt_pnegstack_zone = \
- load_context_c->MR_ctxt_pnegstack_zone; \
- MR_gen_stack = (MR_GenStackFrame *) \
- MR_ENGINE(MR_eng_context).MR_ctxt_genstack_zone-> \
- MR_zone_min; \
- MR_cut_stack = (MR_CutStackFrame *) \
- MR_ENGINE(MR_eng_context).MR_ctxt_cutstack_zone-> \
- MR_zone_min; \
- MR_pneg_stack = (MR_PNegStackFrame *) \
- MR_ENGINE(MR_eng_context).MR_ctxt_pnegstack_zone-> \
- MR_zone_min; \
- ) \
- MR_IF_EXEC_TRACE_INFO_IN_CONTEXT( \
- MR_trace_call_seqno = load_context_c->MR_ctxt_call_seqno; \
- MR_trace_call_depth = load_context_c->MR_ctxt_call_depth; \
- MR_trace_event_number = load_context_c->MR_ctxt_event_number; \
- ) \
- ) \
- MR_set_min_heap_reclamation_point(load_context_c); \
- } while (0)
+void
+MR_load_context(MR_Context *c);
#define MR_save_context(cptr) \
do { \
Index: runtime/mercury_engine.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_engine.c,v
retrieving revision 1.58
diff -u -p -b -r1.58 mercury_engine.c
--- runtime/mercury_engine.c 4 Apr 2007 01:09:52 -0000 1.58
+++ runtime/mercury_engine.c 30 Nov 2009 10:34:15 -0000
@@ -20,6 +20,8 @@ ENDINIT
#include "mercury_engine.h"
#include "mercury_memory_zones.h" /* for MR_create_zone() */
#include "mercury_memory_handlers.h" /* for MR_default_handler() */
+#include "mercury_threadscope.h" /* for MR_threadscope_setup_engine()
+ and event posting */
#include "mercury_dummy.h"
@@ -147,6 +149,11 @@ MR_init_engine(MercuryEngine *eng)
eng->MR_eng_c_depth = 0;
#endif
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_threadscope_setup_engine(eng);
+#endif
+
/*
** Don't allocate a context for this engine until it is actually needed.
*/
@@ -164,6 +171,13 @@ void MR_finalize_engine(MercuryEngine *e
if (eng->MR_eng_this_context) {
MR_destroy_context(eng->MR_eng_this_context);
}
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ if (eng->MR_eng_ts_buffer) {
+ MR_threadscope_finalize_engine(eng);
+ }
+#endif
}
/*---------------------------------------------------------------------------*/
@@ -513,6 +527,9 @@ MR_define_label(engine_done);
MR_GOTO_LABEL(engine_done_2);
}
+#ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ MR_threadscope_post_stop_context(MR_TS_STOP_REASON_YIELDING);
+#endif
MR_save_context(this_ctxt);
this_ctxt->MR_ctxt_resume = MR_LABEL(engine_done_2);
this_ctxt->MR_ctxt_resume_owner_thread = owner->MR_saved_owner_thread;
Index: runtime/mercury_engine.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_engine.h,v
retrieving revision 1.50
diff -u -p -b -r1.50 mercury_engine.h
--- runtime/mercury_engine.h 30 Oct 2009 03:33:28 -0000 1.50
+++ runtime/mercury_engine.h 30 Nov 2009 10:34:15 -0000
@@ -392,6 +392,16 @@ typedef struct MR_mercury_engine_struct
#ifdef MR_THREAD_SAFE
MercuryThread MR_eng_owner_thread;
MR_Unsigned MR_eng_c_depth;
+#if defined(MR_LL_PARALLEL_CONJ) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ /*
+ ** For each profiling event add this offset to the time so that events on
+ ** different engines that occur at the same time have the same time in
+ ** clock ticks.
+ */
+ MR_int_least64_t MR_eng_cpu_clock_ticks_offset;
+ struct MR_threadscope_event_buffer *MR_eng_ts_buffer;
+ MR_Unsigned MR_eng_id;
+#endif
#endif
jmp_buf *MR_eng_jmp_buf;
MR_Word *MR_eng_exception;
Index: runtime/mercury_thread.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_thread.c,v
retrieving revision 1.36
diff -u -p -b -r1.36 mercury_thread.c
--- runtime/mercury_thread.c 27 Nov 2009 03:51:20 -0000 1.36
+++ runtime/mercury_thread.c 30 Nov 2009 10:34:15 -0000
@@ -13,6 +13,7 @@
#include "mercury_memory.h"
#include "mercury_context.h" /* for MR_do_runnext */
#include "mercury_thread.h"
+#include "mercury_threadscope.h"
#include <stdio.h>
#include <errno.h>
@@ -89,6 +90,7 @@ MR_create_thread_2(void *goal0)
if (goal != NULL) {
MR_init_thread(MR_use_now);
(goal->func)(goal->arg);
+ /* XXX: We should clean up the engine here */
} else {
MR_pin_thread();
MR_init_thread(MR_use_later);
@@ -129,6 +131,17 @@ MR_init_thread(MR_when_to_use when_to_us
#ifdef MR_THREAD_SAFE
MR_ENGINE(MR_eng_owner_thread) = pthread_self();
+#if defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ /*
+ ** TSC Synchronization is not used support is commented out. See
+ ** runtime/mercury_threadscope.h
+ **
+ if (when_to_use == MR_use_later) {
+ MR_threadscope_sync_tsc_slave();
+ }
+ */
+#endif
#endif
switch (when_to_use) {
@@ -137,6 +150,7 @@ MR_init_thread(MR_when_to_use when_to_us
MR_fatal_error("Sorry, not implemented: "
"--high-level-code and multiple engines");
#else
+ /* This call may never return */
(void) MR_call_engine(MR_ENTRY(MR_do_runnext), MR_FALSE);
#endif
MR_destroy_engine(eng);
@@ -152,6 +166,10 @@ MR_init_thread(MR_when_to_use when_to_us
MR_ENGINE(MR_eng_this_context) =
MR_create_context("init_thread",
MR_CONTEXT_SIZE_REGULAR, NULL);
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ MR_threadscope_post_create_context(MR_ENGINE(MR_eng_this_context));
+#endif
}
MR_load_context(MR_ENGINE(MR_eng_this_context));
MR_save_registers();
@@ -189,7 +207,6 @@ MR_destroy_thread(void *eng0)
{
MercuryEngine *eng = eng0;
MR_destroy_engine(eng);
- pthread_exit(0);
}
#endif
Index: runtime/mercury_threadscope.c
===================================================================
RCS file: runtime/mercury_threadscope.c
diff -N runtime/mercury_threadscope.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ runtime/mercury_threadscope.c 1 Dec 2009 06:31:17 -0000
@@ -0,0 +1,1203 @@
+/*
+** vim: ts=4 sw=4 expandtab
+*/
+/*
+INIT mercury_sys_init_threadscope
+ENDINIT
+*/
+/*
+** Copyright (C) 2009 The University of Melbourne.
+** Copyright (C) 2008-2009 The GHC Team.
+**
+** This file may only be copied under the terms of the GNU Library General
+** Public License - see the file COPYING.LIB in the Mercury distribution.
+*/
+
+/*
+** Event log format
+**
+** The log format is designed to be extensible: old tools should be
+** able to parse (but not necessarily understand all of) new versions
+** of the format, and new tools will be able to understand old log
+** files.
+**
+** Each event has a specific format. If you add new events, give them
+** new numbers: we never re-use old event numbers.
+**
+** - The format is endian-independent: all values are represented in
+** bigendian order.
+**
+** - The format is extensible:
+**
+** - The header describes each event type and its length. Tools
+** that don't recognise a particular event type can skip those events.
+**
+** - There is room for extra information in the event type
+** specification, which can be ignored by older tools.
+**
+** - Events can have extra information added, but existing fields
+** cannot be changed. Tools should ignore extra fields at the
+** end of the event record.
+**
+** - Old event type ids are never re-used; just take a new identifier.
+**
+**
+** The format
+** ----------
+**
+** log : EVENT_HEADER_BEGIN
+** EventType*
+** EVENT_HEADER_END
+** EVENT_DATA_BEGIN
+** Event*
+** EVENT_DATA_END
+**
+** EventType :
+** EVENT_ET_BEGIN
+** Word16 -- unique identifier for this event
+** Int16 -- >=0 size of the event in bytes (minus the header)
+** -- -1 variable size
+** Word32 -- length of the next field in bytes
+** Word8* -- string describing the event
+** Word32 -- length of the next field in bytes
+** Word8* -- extra info (for future extensions)
+** EVENT_ET_END
+**
+** Event :
+** Word16 -- event_type
+** Word64 -- time (nanosecs)
+** [Word16] -- length of the rest (for variable-sized events only)
+** ... extra event-specific info ...
+**
+** All values a packed, no attempt is made to align them.
+**
+** New events must be registered with GHC. These are kept in the GHC-events
+** package.
+**
+*/
+
+#include "mercury_imp.h"
+
+#include "mercury_threadscope.h"
+
+#include "mercury_atomic_ops.h"
+
+#include <stdio.h>
+#include <string.h>
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+
+/***************************************************************************/
+
+/*
+** Markers for begin/end of the Header.
+*/
+#define MR_TS_EVENT_HEADER_BEGIN 0x68647262 /* 'h' 'd' 'r' 'b' */
+#define MR_TS_EVENT_HEADER_END 0x68647265 /* 'h' 'd' 'r' 'e' */
+
+#define MR_TS_EVENT_DATA_BEGIN 0x64617462 /* 'd' 'a' 't' 'b' */
+#define MR_TS_EVENT_DATA_END 0xffff
+
+/*
+** Markers for begin/end of the list of Event Types in the Header.
+** Header, Event Type, Begin = hetb
+** Header, Event Type, End = hete
+*/
+#define MR_TS_EVENT_HET_BEGIN 0x68657462 /* 'h' 'e' 't' 'b' */
+#define MR_TS_EVENT_HET_END 0x68657465 /* 'h' 'e' 't' 'e' */
+
+/*
+** Markers for the beginning and end of individual event types.
+*/
+#define MR_TS_EVENT_ET_BEGIN 0x65746200 /* 'e' 't' 'b' 0 */
+#define MR_TS_EVENT_ET_END 0x65746500 /* 'e' 't' 'e' 0 */
+
+/*
+** The threadscope events:
+*/
+#define MR_TS_EVENT_CREATE_THREAD 0 /* (thread) */
+#define MR_TS_EVENT_RUN_THREAD 1 /* (thread) */
+#define MR_TS_EVENT_STOP_THREAD 2 /* (thread, status) */
+#define MR_TS_EVENT_THREAD_RUNNABLE 3 /* (thread) */
+#define MR_TS_EVENT_MIGRATE_THREAD 4 /* (thread, new_cap) */
+#define MR_TS_EVENT_RUN_SPARK 5 /* (thread) */
+#define MR_TS_EVENT_STEAL_SPARK 6 /* (thread, victim_cap) */
+#define MR_TS_EVENT_SHUTDOWN 7 /* () */
+#define MR_TS_EVENT_THREAD_WAKEUP 8 /* (thread, other_cap) */
+#define MR_TS_EVENT_GC_START 9 /* () */
+#define MR_TS_EVENT_GC_END 10 /* () */
+#define MR_TS_EVENT_REQUEST_SEQ_GC 11 /* () */
+#define MR_TS_EVENT_REQUEST_PAR_GC 12 /* () */
+#define MR_TS_EVENT_CREATE_SPARK_THREAD 15 /* (spark_thread) */
+#define MR_TS_EVENT_LOG_MSG 16 /* (message ...) */
+#define MR_TS_EVENT_STARTUP 17 /* (num_capabilities) */
+#define MR_TS_EVENT_BLOCK_MARKER 18 /* (size, end_time, capability) */
+#define MR_TS_EVENT_USER_MSG 19 /* (message ...) */
+#define MR_TS_EVENT_GC_IDLE 20 /* () */
+#define MR_TS_EVENT_GC_WORK 21 /* () */
+#define MR_TS_EVENT_GC_DONE 22 /* () */
+#define MR_TS_EVENT_CALL_MAIN 23 /* () */
+
+#define MR_TS_NUM_EVENT_TAGS 24
+
+#if 0 /* DEPRECATED EVENTS: */
+#define EVENT_CREATE_SPARK 13 /* (cap, thread) */
+#define EVENT_SPARK_TO_THREAD 14 /* (cap, thread, spark_thread) */
+#endif
+
+/*
+** GHC uses 2MB per buffer. Note that the minimum buffer size is the size of
+** the largest message plus the size of the block marker message, however it is
+** _sensible_ for the buffer to be much larger so that we make system calls
+** less often.
+*/
+#define MR_TS_BUFFERSIZE (2*1024*1024)
+#define MR_TS_FILENAME_FORMAT ("%s.eventlog")
+#define MR_TSC_SYNC_NUM_ROUNDS (10)
+#define MR_TSC_SYNC_NUM_BEST_ROUNDS (3)
+
+/* Uncomment this to enable some debugging code */
+/* #define MR_DEBUG_THREADSCOPE 1 */
+
+#if MR_DEBUG_THREADSCOPE
+#define MR_DO_THREADSCOPE_DEBUG(x) do { x; } while(0)
+#else
+#define MR_DO_THREADSCOPE_DEBUG(x)
+#endif
+
+/***************************************************************************/
+
+struct MR_threadscope_event_buffer {
+ MR_UnsignedChar MR_tsbuffer_data[MR_TS_BUFFERSIZE];
+
+ /* The current writing position in the buffer. */
+ MR_Unsigned MR_tsbuffer_pos;
+
+ /* The position of the start of the most recent block. */
+ MR_Integer MR_tsbuffer_block_open_pos;
+
+ /* A cheap userspace lock to make buffers reentrant. */
+ volatile MR_Us_Lock MR_tsbuffer_lock;
+};
+
+/*
+** We define some types and functions to write them. These types are set
+** carefully to match the ones that GHC uses.
+*/
+typedef MR_uint_least16_t EventType;
+typedef MR_uint_least64_t Time;
+typedef MR_int_least64_t Timedelta;
+
+/*
+** The difference between two positions in the eventlog file measured in bytes.
+*/
+typedef MR_uint_least32_t EventlogOffset;
+
+typedef struct {
+ EventType etd_event_type;
+ const char *etd_description;
+} EventTypeDesc;
+
+/***************************************************************************/
+
+static EventTypeDesc event_type_descs[] = {
+ {
+ /*
+ ** The startup event informs threadscope of the number of engines we're
+ ** using. It should be given outside of a block.
+ */
+ MR_TS_EVENT_STARTUP,
+ "Startup (num_engines)"
+ },
+ {
+ /*
+ ** The last event in the log. It should be given outside of a block.
+ */
+ MR_TS_EVENT_SHUTDOWN, "Shutdown"
+ },
+ {
+ /*
+ ** A block of events belonging to the named engine follows,
+ ** The length of this block is given including the block message
+ ** itself, the time that this block finishes is also given.
+ ** Blocks _must not_ exist within other blocks.
+ */
+ MR_TS_EVENT_BLOCK_MARKER,
+ "A block of events generated by a specific engine follows"
+ },
+ {
+ /*
+ ** Called when a context is created or re-used.
+ */
+ MR_TS_EVENT_CREATE_THREAD,
+ "A context is created or re-used"
+ },
+ {
+ /*
+ ** Called from MR_schedule_context()
+ */
+ MR_TS_EVENT_THREAD_RUNNABLE,
+ "The context is being placed on the run queue"
+ },
+ {
+ /*
+ ** The named context begun executing on the engine named by the current
+ ** block.
+ */
+ MR_TS_EVENT_RUN_THREAD, "Run context"
+ },
+ {
+ /*
+ ** The named context finished executing on the engine named by the
+ ** current block. The reason why the context stopped is given.
+ */
+ MR_TS_EVENT_STOP_THREAD,
+ "Context stopped"
+ },
+ {
+ /*
+ ** This event is posted when a context is created for a spark.
+ */
+ MR_TS_EVENT_CREATE_SPARK_THREAD,
+ "Create a context for executing a spark"
+ },
+ {
+ /*
+ ** Start a garbage collection run
+ */
+ MR_TS_EVENT_GC_START,
+ "Start GC"
+ },
+ {
+ /*
+ ** Stop a garbage collection run
+ */
+ MR_TS_EVENT_GC_END,
+ "Stop GC",
+ },
+ {
+ /*
+ ** The runtime system is about to call main/2. This message has no
+ ** parameters.
+ */
+ MR_TS_EVENT_CALL_MAIN,
+ "About to call main/2"
+ },
+ {
+ /* Mark the end of this array. */
+ MR_TS_NUM_EVENT_TAGS, NULL
+ }
+};
+
+static MR_uint_least16_t event_type_sizes[] = {
+ [MR_TS_EVENT_STARTUP] = 2, /* MR_EngineId */
+ [MR_TS_EVENT_SHUTDOWN] = 0,
+ [MR_TS_EVENT_BLOCK_MARKER] = 4 + 8 + 2,
+ /* EnginelogOffset, Time, MR_EngineId */
+ [MR_TS_EVENT_CREATE_THREAD] = 4, /* MR_ContextId */
+ [MR_TS_EVENT_THREAD_RUNNABLE] = 4, /* MR_ContextId */
+ [MR_TS_EVENT_RUN_THREAD] = 4, /* MR_ContextId */
+ [MR_TS_EVENT_STOP_THREAD] = 4 + 2,
+ /* MR_ContextId, MR_ContextStopReason */
+ [MR_TS_EVENT_CREATE_SPARK_THREAD] = 4, /* MR_ContextId */
+ [MR_TS_EVENT_GC_START] = 0,
+ [MR_TS_EVENT_GC_END] = 0,
+ [MR_TS_EVENT_CALL_MAIN] = 0,
+};
+
+static FILE* MR_threadscope_output_file = NULL;
+static char* MR_threadscope_output_filename;
+
+/*
+** The TSC value recorded when the primordial thread called
+** MR_setup_threadscope(), this is used retroactivly to initialise the
+** MR_eng_cpu_clock_ticks_offset field in the engine structure once it is
+** created.
+*/
+static MR_uint_least64_t MR_primordial_first_tsc;
+
+static MercuryLock MR_next_engine_id_lock;
+static MR_EngineId MR_next_engine_id = 0;
+
+static Timedelta MR_global_offset;
+
+static struct MR_threadscope_event_buffer global_buffer;
+
+/***************************************************************************/
+
+/*
+** Is there enough room for this statically sized event in the current engine's
+** buffer _and_ enough room for the block marker event.
+*/
+static __inline__ MR_bool enough_room_for_event(
+ struct MR_threadscope_event_buffer *buffer,
+ EventType event_type)
+{
+ return (buffer->MR_tsbuffer_pos + event_type_sizes[event_type] +
+ event_type_sizes[MR_TS_EVENT_BLOCK_MARKER] +
+ ((2 + 8) * 2)) /* (EventType, Time) * 2 */
+ < MR_TS_BUFFERSIZE;
+}
+
+/*
+** Is a block currently open?
+*/
+static __inline__ MR_bool block_is_open(
+ struct MR_threadscope_event_buffer *buffer)
+{
+ return !(buffer->MR_tsbuffer_block_open_pos == -1);
+}
+
+/*
+** Put words into the current engine's buffer in big endian order.
+*/
+static __inline__ void put_byte(
+ struct MR_threadscope_event_buffer *buffer,
+ int byte)
+{
+ buffer->MR_tsbuffer_data[buffer->MR_tsbuffer_pos++] = byte;
+}
+
+static __inline__ void put_be_int16(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_int_least16_t word)
+{
+ put_byte(buffer, (word >> 8) & 0xFF);
+ put_byte(buffer, word & 0xFF);
+}
+
+static __inline__ void put_be_uint16(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_uint_least16_t word)
+{
+ put_byte(buffer, (word >> 8) & 0xFF);
+ put_byte(buffer, word & 0xFF);
+}
+
+static __inline__ void put_be_uint32(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_uint_least32_t word)
+{
+ put_be_uint16(buffer, (word >> 16) & 0xFFFF);
+ put_be_uint16(buffer, word & 0xFFFF);
+}
+
+static __inline__ void put_be_uint64(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_uint_least64_t word)
+{
+ put_be_uint32(buffer, (word >> 32) & 0xFFFFFFFF);
+ put_be_uint32(buffer, word & 0xFFFFFFFF);
+}
+
+static __inline__ void put_string(
+ struct MR_threadscope_event_buffer *buffer,
+ const char *string)
+{
+ unsigned i, len;
+
+ len = strlen(string);
+ put_be_uint32(buffer, len);
+ for (i = 0; i < len; i++) {
+ put_byte(buffer, string[i]);
+ }
+}
+
+static __inline__ void put_timestamp(
+ struct MR_threadscope_event_buffer *buffer,
+ Time timestamp)
+{
+ put_be_uint64(buffer, timestamp);
+}
+
+static __inline__ void put_eventlog_offset(
+ struct MR_threadscope_event_buffer *buffer,
+ EventlogOffset offset)
+{
+ put_be_uint32(buffer, offset);
+}
+
+static __inline__ void put_event_header(
+ struct MR_threadscope_event_buffer *buffer,
+ EventType event_type, Time timestamp)
+{
+ put_be_uint16(buffer, event_type);
+ put_timestamp(buffer, timestamp);
+}
+
+static __inline__ void put_engine_id(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_EngineId engine_num)
+{
+ put_be_uint16(buffer, engine_num);
+}
+
+static __inline__ void put_context_id(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_ContextId context_id)
+{
+ put_be_uint32(buffer, context_id);
+}
+
+static __inline__ void put_stop_reason(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_ContextStopReason reason)
+{
+ put_be_uint16(buffer, reason);
+}
+
+/***************************************************************************/
+
+static struct MR_threadscope_event_buffer*
+MR_create_event_buffer(void);
+
+/*
+** The prelude is everything up to and including the 'DATA_BEGIN' marker
+*/
+static void
+MR_open_output_file_and_write_prelude(void);
+
+static void
+MR_close_output_file(void);
+
+static void
+put_event_type(struct MR_threadscope_event_buffer *buffer,
+ EventTypeDesc *event_type);
+
+static MR_bool
+flush_event_buffer(struct MR_threadscope_event_buffer *buffer);
+
+static void
+maybe_close_block(struct MR_threadscope_event_buffer *buffer);
+
+static void
+open_block(struct MR_threadscope_event_buffer *buffer);
+
+static void
+start_gc_callback(void);
+static void
+stop_gc_callback(void);
+static void
+pause_thread_gc_callback(void);
+static void
+resume_thread_gc_callback(void);
+
+/***************************************************************************/
+
+static MR_uint_least64_t
+get_current_time_nanosecs(void);
+
+/***************************************************************************/
+
+void
+MR_setup_threadscope(void)
+{
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In setup threadscope thread: 0x%lx\n", pthread_self())
+ );
+ /* This value is used later when setting up the primordial engine */
+ MR_primordial_first_tsc = MR_read_cpu_tsc();
+
+ /* Setup locks. */
+ pthread_mutex_init(&MR_next_engine_id_lock, MR_MUTEX_ATTR);
+
+ /*
+ ** These variables are used for TSC synchronization which is not used. See
+ ** below.
+ **
+ pthread_mutex_init(&MR_tsc_sync_slave_lock, MR_MUTEX_ATTR);
+ MR_US_COND_CLEAR(&MR_tsc_sync_slave_entry_cond);
+ MR_US_COND_CLEAR(&MR_tsc_sync_master_entry_cond);
+ MR_US_COND_CLEAR(&MR_tsc_sync_t0);
+ MR_US_COND_CLEAR(&MR_tsc_sync_t1);
+ */
+
+ /* Configure Boehm */
+ GC_mercury_callback_start_collect = start_gc_callback;
+ GC_mercury_callback_stop_collect = stop_gc_callback;
+ GC_mercury_callback_pause_thread = pause_thread_gc_callback;
+ GC_mercury_callback_resume_thread = resume_thread_gc_callback;
+
+ /* Clear the global buffer and setup the file */
+ global_buffer.MR_tsbuffer_pos = 0;
+ global_buffer.MR_tsbuffer_block_open_pos = -1;
+ global_buffer.MR_tsbuffer_lock = MR_US_LOCK_INITIAL_VALUE;
+ MR_open_output_file_and_write_prelude();
+
+ /*
+ ** Put the startup event in the buffer.
+ */
+ put_event_header(&global_buffer, MR_TS_EVENT_STARTUP, 0);
+ put_engine_id(&global_buffer, (MR_EngineId)MR_num_threads);
+ flush_event_buffer(&global_buffer);
+}
+
+void
+MR_finalize_threadscope(void)
+{
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In finalize threadscope thread: 0x%lx\n", pthread_self())
+ );
+ flush_event_buffer(&global_buffer);
+ MR_close_output_file();
+}
+
+void
+MR_threadscope_setup_engine(MercuryEngine *eng)
+{
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In threadscope setup engine thread: 0x%lx\n", pthread_self())
+ );
+ MR_LOCK(&MR_next_engine_id_lock, "MR_get_next_engine_id");
+ eng->MR_eng_id = MR_next_engine_id++;
+ MR_UNLOCK(&MR_next_engine_id_lock, "MR_get_next_engine_id");
+
+ if (eng->MR_eng_id == 0) {
+ MR_global_offset = -MR_primordial_first_tsc;
+ }
+ eng->MR_eng_cpu_clock_ticks_offset = MR_global_offset;
+
+ eng->MR_eng_ts_buffer = MR_create_event_buffer();
+}
+
+void
+MR_threadscope_finalize_engine(MercuryEngine *eng)
+{
+ struct MR_threadscope_event_buffer *buffer = eng->MR_eng_ts_buffer;
+
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In threadscope finalize engine thread: 0x%lx\n", pthread_self())
+ );
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_SHUTDOWN)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+ put_event_header(buffer, MR_TS_EVENT_SHUTDOWN, get_current_time_nanosecs());
+
+ flush_event_buffer(buffer);
+ eng->MR_eng_ts_buffer = NULL;
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+#if 0
+/*
+** It looks like we don't need this on modern CPUs including multi-socket
+** systems (goliath). If we find systems where this is needed we can enable it
+** via a runtime check.
+*/
+/*
+** The synchronization of TSCs operates as follows:
+** The master and slave enter their functions. Both threads spin until the
+** other is ready (signaling the other before they begin spinning). Then for
+** MR_TSC_SYNC_NUM_ROUNDS: The master spins waiting for the slave. The slave
+** records it's current TSC, signals the master and spins waiting for a reply.
+** The master upon hearing from the slave records it's TSC and then signals
+** the slave. The slave can then compute the delay in this round. The slave
+** takes the NR_TSC_SYNC_NUM_BEST_ROUNDS best delays (smallest) and computes
+** the offset as the average between between the difference of the clocks based
+** on Cristan's algorithm (1989).
+*/
+
+typedef struct {
+ Timedelta delay;
+ Timedelta offset;
+} TimeDelayOffset;
+
+static MercuryLock MR_tsc_sync_slave_lock;
+volatile static MR_Us_Cond MR_tsc_sync_slave_entry_cond;
+volatile static MR_Us_Cond MR_tsc_sync_master_entry_cond;
+volatile static MR_Us_Cond MR_tsc_sync_t0;
+volatile static MR_Us_Cond MR_tsc_sync_t1;
+static Time MR_tsc_sync_master_time;
+
+static int
+compare_time_delay_offset_by_delay(const void *a, const void *b);
+
+void
+MR_threadscope_sync_tsc_master(void)
+{
+ unsigned i;
+
+ /*
+ ** Wait for a slave to enter.
+ */
+ MR_US_COND_SET(&MR_tsc_sync_master_entry_cond);
+ MR_US_SPIN_COND(&MR_tsc_sync_slave_entry_cond);
+ MR_US_COND_CLEAR(&MR_tsc_sync_slave_entry_cond);
+
+ for (i = 0; i < MR_TSC_SYNC_NUM_ROUNDS; i++) {
+ /*
+ ** Wait to receive the message from the slave at T0
+ */
+ MR_US_SPIN_COND(&MR_tsc_sync_t0);
+ MR_US_COND_CLEAR(&MR_tsc_sync_t0);
+
+ /*
+ ** Read our TSC and send the slave a message.
+ */
+ MR_tsc_sync_master_time = MR_read_cpu_tsc();
+ MR_US_COND_SET(&MR_tsc_sync_t1);
+ }
+
+}
+
+void
+MR_threadscope_sync_tsc_slave(void)
+{
+ unsigned i, j;
+ TimeDelayOffset delay_offset[MR_TSC_SYNC_NUM_ROUNDS];
+ Timedelta total_offset;
+ MercuryEngine *eng = MR_thread_engine_base;
+
+ /*
+ ** Only one slave may enter at a time.
+ */
+ MR_LOCK(&MR_tsc_sync_slave_lock, "MR_threadscope_sync_tsc_slave");
+
+ /*
+ ** Tell the master we're ready to begin and wait for it to tell us it's ready.
+ */
+ MR_US_COND_SET(&MR_tsc_sync_slave_entry_cond);
+ MR_US_SPIN_COND(&MR_tsc_sync_master_entry_cond);
+ MR_US_COND_CLEAR(&MR_tsc_sync_master_entry_cond);
+
+ for (i = 0; i < MR_TSC_SYNC_NUM_ROUNDS; i++) {
+ Time slave_tsc_at_t0;
+ Time slave_tsc_at_t2;
+
+ /*
+ ** Get the current time and signal that we've done so (T=0).
+ */
+ slave_tsc_at_t0 = MR_read_cpu_tsc();
+ MR_US_COND_SET(&MR_tsc_sync_t0);
+
+ /*
+ ** Wait for the master to reply, the master handles T=1, here we
+ ** proceed to T=2.
+ */
+ MR_US_SPIN_COND(&MR_tsc_sync_t1);
+ slave_tsc_at_t2 = MR_read_cpu_tsc();
+ MR_US_COND_CLEAR(&MR_tsc_sync_t1);
+
+ /*
+ ** Here are Cristian's formulas. Delay is the round trip time,
+ ** slave_tsc_at_t0 + delay/2 is the time on the slave's clock that the
+ ** master processed the slaves message and sent it's own. This is
+ ** accurate if the communication delays in either direction are
+ ** uniform, that is communication latency is synchronous.
+ */
+ delay_offset[i].delay = slave_tsc_at_t2 - slave_tsc_at_t0;
+ delay_offset[i].offset =
+ MR_tsc_sync_master_time - (slave_tsc_at_t0 + delay_offset[i].delay/2);
+ }
+ /* By now the master thread will return and continue with it's normal work. */
+
+ /*
+ ** We do this debugging output while holding the lock, so that the output
+ ** is reasonable.
+ */
+ MR_DO_THREADSCOPE_DEBUG({
+ fprintf(stderr, "TSC Synchronization for thread 0x%x\n", pthread_self());
+ for (i = 0; i < MR_TSC_SYNC_NUM_ROUNDS; i++) {
+ fprintf(stderr, "delay: %ld offset (local + global = total) %ld + %ld = %ld\n",
+ delay_offset[i].delay, delay_offset[i].offset, MR_global_offset,
+ delay_offset[i].offset + MR_global_offset);
+ }
+ });
+ MR_UNLOCK(&MR_tsc_sync_slave_lock, "MR_threadscope_sync_tsc_slave");
+
+ /*
+ ** Now to average the best offsets.
+ */
+ qsort(&delay_offset, MR_TSC_SYNC_NUM_ROUNDS, sizeof(TimeDelayOffset),
+ compare_time_delay_offset_by_delay);
+ total_offset = 0;
+ for (i = 0; i < MR_TSC_SYNC_NUM_BEST_ROUNDS; i++) {
+ total_offset = delay_offset[i].offset;
+ }
+ eng->MR_eng_cpu_clock_ticks_offset = total_offset + MR_global_offset;
+
+ MR_DO_THREADSCOPE_DEBUG({
+ fprintf(stderr, "TSC Synchronization offset for thread 0x%x: %ld\n",
+ pthread_self(), eng->MR_eng_cpu_clock_ticks_offset);
+ });
+}
+
+static int
+compare_time_delay_offset_by_delay(const void *a, const void *b) {
+ TimeDelayOffset *tdo_a = (TimeDelayOffset*)a;
+ TimeDelayOffset *tdo_b = (TimeDelayOffset*)b;
+
+ if (tdo_a->delay > tdo_b->delay) {
+ return 1;
+ } else if (tdo_a->delay < tdo_b->delay) {
+ return -1;
+ } else {
+ return 0;
+ }
+}
+
+#endif
+
+/***************************************************************************/
+
+void
+MR_threadscope_post_create_context(MR_Context *context)
+{
+ struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_CREATE_THREAD)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_CREATE_THREAD, get_current_time_nanosecs());
+ put_context_id(buffer, context->MR_ctxt_num_id);
+
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_create_context_for_spark(MR_Context *context)
+{
+ struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_CREATE_SPARK_THREAD)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_CREATE_SPARK_THREAD,
+ get_current_time_nanosecs());
+ put_context_id(buffer, context->MR_ctxt_num_id);
+
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+void
+MR_threadscope_post_context_runnable(MR_Context *context)
+{
+ struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_THREAD_RUNNABLE)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_THREAD_RUNNABLE, get_current_time_nanosecs());
+ put_context_id(buffer, context->MR_ctxt_num_id);
+
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+static void
+MR_threadscope_post_run_context_locked(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_Context *context)
+{
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_RUN_THREAD)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_RUN_THREAD,
+ get_current_time_nanosecs());
+ put_context_id(buffer,
+ MR_thread_engine_base->MR_eng_this_context->MR_ctxt_num_id);
+}
+
+void
+MR_threadscope_post_run_context(void)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+
+ context = MR_thread_engine_base->MR_eng_this_context;
+
+ if (context) {
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+ MR_threadscope_post_run_context_locked(buffer, context);
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+ }
+}
+
+static void
+MR_threadscope_post_stop_context_locked(
+ struct MR_threadscope_event_buffer *buffer,
+ MR_Context *context,
+ MR_ContextStopReason reason)
+{
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_STOP_THREAD)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_STOP_THREAD, get_current_time_nanosecs());
+ put_context_id(buffer, context->MR_ctxt_num_id);
+ put_stop_reason(buffer, reason);
+}
+
+void
+MR_threadscope_post_stop_context(MR_ContextStopReason reason)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+ context = MR_thread_engine_base->MR_eng_this_context;
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+ MR_threadscope_post_stop_context_locked(buffer, context, reason);
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+extern void
+MR_threadscope_post_calling_main(void) {
+ struct MR_threadscope_event_buffer *buffer = MR_ENGINE(MR_eng_ts_buffer);
+
+ MR_US_SPIN_LOCK(&(buffer->MR_tsbuffer_lock));
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_CALL_MAIN)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_CALL_MAIN, get_current_time_nanosecs());
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+}
+
+/***************************************************************************/
+
+static struct MR_threadscope_event_buffer*
+MR_create_event_buffer(void)
+{
+ struct MR_threadscope_event_buffer* buffer;
+
+ buffer = MR_GC_NEW(MR_threadscope_event_buffer_t);
+ buffer->MR_tsbuffer_pos = 0;
+ buffer->MR_tsbuffer_block_open_pos = -1;
+ buffer->MR_tsbuffer_lock = MR_US_LOCK_INITIAL_VALUE;
+
+ return buffer;
+}
+
+/***************************************************************************/
+
+static void
+MR_open_output_file_and_write_prelude(void)
+{
+ MR_Unsigned filename_len;
+ char *progname_copy;
+ char *progname_base;
+ MR_Unsigned i;
+
+ progname_copy = strdup(MR_progname);
+ progname_base = basename(progname_copy);
+
+ /*
+ ** This is an over-approximation on the amount of space needed for this
+ ** filename.
+ */
+ filename_len = strlen(progname_base) + strlen(MR_TS_FILENAME_FORMAT) + 1;
+ MR_threadscope_output_filename = MR_GC_NEW_ARRAY(char, filename_len);
+ snprintf(MR_threadscope_output_filename, filename_len,
+ MR_TS_FILENAME_FORMAT, progname_base);
+ free(progname_copy);
+ progname_copy = NULL;
+ progname_base = NULL;
+
+ MR_threadscope_output_file = fopen(MR_threadscope_output_filename, "w");
+ if (!MR_threadscope_output_file) {
+ perror(MR_threadscope_output_filename);
+ return;
+ }
+
+ put_be_uint32(&global_buffer, MR_TS_EVENT_HEADER_BEGIN);
+ put_be_uint32(&global_buffer, MR_TS_EVENT_HET_BEGIN);
+ for ( i = 0;
+ event_type_descs[i].etd_event_type != MR_TS_NUM_EVENT_TAGS;
+ i++) {
+ put_event_type(&global_buffer, &event_type_descs[i]);
+ }
+ put_be_uint32(&global_buffer, MR_TS_EVENT_HET_END);
+ put_be_uint32(&global_buffer, MR_TS_EVENT_HEADER_END);
+ put_be_uint32(&global_buffer, MR_TS_EVENT_DATA_BEGIN);
+
+ flush_event_buffer(&global_buffer);
+}
+
+static void
+MR_close_output_file(void)
+{
+ if (MR_threadscope_output_file) {
+ put_be_uint16(&global_buffer, MR_TS_EVENT_DATA_END);
+ if (flush_event_buffer(&global_buffer)) {
+ if (EOF == fclose(MR_threadscope_output_file)) {
+ perror(MR_threadscope_output_filename);
+ }
+ MR_threadscope_output_file = NULL;
+ MR_threadscope_output_filename = NULL;
+ }
+ }
+}
+
+static void
+put_event_type(struct MR_threadscope_event_buffer *buffer, EventTypeDesc *event_type)
+{
+ put_be_uint32(buffer, MR_TS_EVENT_ET_BEGIN);
+
+ put_be_uint16(buffer, event_type->etd_event_type);
+ put_be_int16(buffer, event_type_sizes[event_type->etd_event_type]);
+
+ put_string(buffer, event_type->etd_description);
+
+ /* There is no extended data in any of our events */
+ put_be_uint32(buffer, 0);
+
+ put_be_uint32(buffer, MR_TS_EVENT_ET_END);
+}
+
+static MR_bool
+flush_event_buffer(struct MR_threadscope_event_buffer *buffer)
+{
+ maybe_close_block(buffer);
+
+ /*
+ ** fwrite handles locking for us, so we have no concurrent access problems.
+ */
+ if (MR_threadscope_output_file && buffer->MR_tsbuffer_pos) {
+ if (0 == fwrite(buffer->MR_tsbuffer_data, buffer->MR_tsbuffer_pos, 1,
+ MR_threadscope_output_file)) {
+ perror(MR_threadscope_output_filename);
+ MR_threadscope_output_file = NULL;
+ MR_threadscope_output_filename = NULL;
+ }
+ }
+ buffer->MR_tsbuffer_pos = 0;
+
+ return (MR_threadscope_output_filename ? MR_TRUE : MR_FALSE);
+}
+
+static void
+maybe_close_block(struct MR_threadscope_event_buffer *buffer)
+{
+ MR_Unsigned saved_pos;
+
+ if (buffer->MR_tsbuffer_block_open_pos != -1)
+ {
+ saved_pos = buffer->MR_tsbuffer_pos;
+ buffer->MR_tsbuffer_pos = buffer->MR_tsbuffer_block_open_pos +
+ sizeof(EventType) + sizeof(Time);
+ put_eventlog_offset(buffer, saved_pos - buffer->MR_tsbuffer_block_open_pos);
+ put_timestamp(buffer, get_current_time_nanosecs());
+
+ buffer->MR_tsbuffer_block_open_pos = -1;
+ buffer->MR_tsbuffer_pos = saved_pos;
+ }
+}
+
+static void
+open_block(struct MR_threadscope_event_buffer *buffer)
+{
+ maybe_close_block(buffer);
+
+ /*
+ ** Save the old position, close block uses this so that it knows where the
+ ** block maker is that it should write into.
+ */
+ buffer->MR_tsbuffer_block_open_pos = buffer->MR_tsbuffer_pos;
+
+ put_event_header(buffer, MR_TS_EVENT_BLOCK_MARKER, get_current_time_nanosecs());
+
+ /* Skip over the next two fields, they are filled in by close_block */
+ buffer->MR_tsbuffer_pos += sizeof(EventlogOffset) + sizeof(Time);
+
+ put_engine_id(buffer, MR_ENGINE(MR_eng_id));
+}
+
+static void
+start_gc_callback(void)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In gc start callback thread: 0x%lx\n", pthread_self())
+ );
+ if (MR_thread_engine_base == NULL) return;
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+ if (buffer == NULL) {
+ /* GC might be running before we're done setting up */
+ return;
+ }
+
+ if (MR_US_TRY_LOCK(&(buffer->MR_tsbuffer_lock))) {
+ context = MR_thread_engine_base->MR_eng_this_context;
+ if (context) {
+ MR_threadscope_post_stop_context_locked(buffer,
+ context, MR_TS_STOP_REASON_HEAP_OVERFLOW);
+ }
+
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_GC_START)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_GC_START,
+ get_current_time_nanosecs());
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+ }
+}
+
+static void
+stop_gc_callback(void)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In gc stop callback thread: 0x%lx\n", pthread_self());
+ );
+ if (MR_thread_engine_base == NULL) return;
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+ if (buffer == NULL) {
+ /* GC might be running before we're done setting up */
+ return;
+ }
+
+ if (MR_US_TRY_LOCK(&(buffer->MR_tsbuffer_lock))) {
+ if (!enough_room_for_event(buffer, MR_TS_EVENT_GC_END)) {
+ flush_event_buffer(buffer);
+ open_block(buffer);
+ } else if (!block_is_open(buffer)) {
+ open_block(buffer);
+ }
+
+ put_event_header(buffer, MR_TS_EVENT_GC_END, get_current_time_nanosecs());
+
+ context = MR_thread_engine_base->MR_eng_this_context;
+ if (context) {
+ MR_threadscope_post_run_context_locked(buffer, context);
+ }
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+ }
+}
+
+static void
+pause_thread_gc_callback(void)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In gc pause thread callback thread: 0x%lx\n", pthread_self())
+ );
+ if (MR_thread_engine_base == NULL) return;
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+ if (buffer == NULL) {
+ /* GC might be running before we're done setting up */
+ return;
+ }
+
+ context = MR_thread_engine_base->MR_eng_this_context;
+ if (context && MR_US_TRY_LOCK(&(buffer->MR_tsbuffer_lock))) {
+ MR_threadscope_post_stop_context_locked(buffer, context, MR_TS_STOP_REASON_YIELDING);
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+ }
+}
+
+static void
+resume_thread_gc_callback(void)
+{
+ struct MR_threadscope_event_buffer *buffer;
+ MR_Context *context;
+
+ MR_DO_THREADSCOPE_DEBUG(
+ fprintf(stderr, "In gc resume thread callback thread: 0x%lx\n", pthread_self());
+ );
+ if (MR_thread_engine_base == NULL) return;
+ buffer = MR_thread_engine_base->MR_eng_ts_buffer;
+ if (buffer == NULL) {
+ /* GC might be running before we're done setting up */
+ return;
+ }
+
+ context = MR_thread_engine_base->MR_eng_this_context;
+ if (context && MR_US_TRY_LOCK(&(buffer->MR_tsbuffer_lock))) {
+ MR_threadscope_post_run_context_locked(buffer, context);
+ MR_US_UNLOCK(&(buffer->MR_tsbuffer_lock));
+ }
+}
+
+/***************************************************************************/
+
+static MR_uint_least64_t
+get_current_time_nanosecs(void)
+{
+ MR_uint_least64_t current_tsc;
+ MercuryEngine *eng = MR_thread_engine_base;
+
+ current_tsc = MR_read_cpu_tsc();
+ return (current_tsc + eng->MR_eng_cpu_clock_ticks_offset) /
+ (MR_cpu_cycles_per_sec / 1000000000);
+}
+
+/***************************************************************************/
+
+#endif /* defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT) */
+
+/* forward decls to suppress gcc warnings */
+void mercury_sys_init_threadscope_init(void);
+void mercury_sys_init_threadscope_init_type_tables(void);
+#ifdef MR_DEEP_PROFILING
+void mercury_sys_init_threadscope_write_out_proc_statics(FILE *fp);
+#endif
+
+void mercury_sys_init_threadscope_init(void)
+{
+#ifndef MR_HIGHLEVEL_CODE
+/* XXX: What does this do? Why do other modules have a call like this.
+ threadscope_module();
+*/
+#endif
+}
+
+void mercury_sys_init_threadscope_init_type_tables(void)
+{
+ /* no types to register */
+}
+
+#ifdef MR_DEEP_PROFILING
+void mercury_sys_init_threadscope_write_out_proc_statics(FILE *fp)
+{
+ /* no proc_statics to write out */
+}
+#endif
Index: runtime/mercury_threadscope.h
===================================================================
RCS file: runtime/mercury_threadscope.h
diff -N runtime/mercury_threadscope.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ runtime/mercury_threadscope.h 1 Dec 2009 06:17:07 -0000
@@ -0,0 +1,134 @@
+/*
+** vim:ts=4 sw=4 expandtab
+*/
+/*
+** Copyright (C) 2009 The University of Melbourne.
+**
+** This file may only be copied under the terms of the GNU Library General
+** Public License - see the file COPYING.LIB in the Mercury distribution.
+*/
+
+/*
+** mercury_threadscope.h - defines Mercury threadscope profiling support.
+**
+** See "Parallel Preformance Tuning for Haskell" - Don Jones Jr, Simon Marlow
+** and Satnam Singh for information about threadscope.
+*/
+
+#ifndef MERCURY_THREADSCOPE_H
+#define MERCURY_THREADSCOPE_H
+
+#include "mercury_types.h" /* for MR_Word, MR_Code, etc */
+#include "mercury_engine.h"
+#include "mercury_context.h"
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+
+/*
+** Reasons why a context has been stopped, not all of these apply to Mercury,
+** for instance contexts don't yield.
+*/
+#define MR_TS_STOP_REASON_HEAP_OVERFLOW 1
+#define MR_TS_STOP_REASON_STACK_OVERFLOW 2
+#define MR_TS_STOP_REASON_YIELDING 3
+#define MR_TS_STOP_REASON_BLOCKED 4
+#define MR_TS_STOP_REASON_FINISHED 5
+
+typedef struct MR_threadscope_event_buffer MR_threadscope_event_buffer_t;
+
+typedef MR_uint_least16_t MR_EngineId;
+typedef MR_uint_least16_t MR_ContextStopReason;
+typedef MR_uint_least32_t MR_ContextId;
+
+/*
+** This must be called by the primordial thread before starting any other
+** threads but after the primordial thread has been pinned.
+*/
+extern void
+MR_setup_threadscope(void);
+
+extern void
+MR_finalize_threadscope(void);
+
+extern void
+MR_threadscope_setup_engine(MercuryEngine *eng);
+
+extern void
+MR_threadscope_finalize_engine(MercuryEngine *eng);
+
+#if 0
+/*
+** It looks like we don't need TSC synchronization code on modern x86(-64) CPUs
+** including multi-socket systems (tested on goliath and taura). If we find
+** systems where this is needed we can enable it via a runtime check.
+*/
+/*
+** Synchronize a slave thread's TSC offset to the master's. The master thread
+** (with an engine) should call MR_threadscope_sync_tsc_master() for each slave
+** while each slave (with an engine) calls MR_threadscope_sync_tsc_slave().
+** All master - slave pairs must be pinned to CPUs and setup their threadscope
+** structures already (by calling MR_threadscope_setup_engine() above).
+** Multiple slaves may call the _slave at the same time, a lock is used to
+** synchronize only one at a time. Only the primordial thread may call
+** MR_threadscope_sync_tsc_master().
+*/
+extern void
+MR_threadscope_sync_tsc_master(void);
+extern void
+MR_threadscope_sync_tsc_slave(void);
+#endif
+
+/*
+** Use the following functions to post messages. All messages will read the
+** current engine's ID from the engine word, some messages will also read the
+** current context id from the context loaded into the current engine.
+*/
+
+/*
+** This context has been created, The context must be passed as a parameter so
+** that it doesn't have to be the current context.
+**
+** Using the MR_Context typedef here requires the inclusion of
+** mercury_context.h, creating a circular dependency
+*/
+extern void
+MR_threadscope_post_create_context(struct MR_Context_Struct *context);
+
+/*
+** The given context was created in order to execute a spark. It's an
+** alternative to the above event.
+*/
+extern void
+MR_threadscope_post_create_context_for_spark(struct MR_Context_Struct *ctxt);
+
+/*
+** This message says the context is now ready to run. Such as it's being
+** placed on the run queue after being blocked
+*/
+extern void
+MR_threadscope_post_context_runnable(struct MR_Context_Struct *context);
+
+/*
+** This message says we're now running the current context
+*/
+extern void
+MR_threadscope_post_run_context(void);
+
+/*
+** This message says we've stopped executing the current context,
+** a reason why should be provided.
+*/
+extern void
+MR_threadscope_post_stop_context(MR_ContextStopReason reason);
+
+/*
+** Post this message just before invoking the main/2 predicate.
+*/
+extern void
+MR_threadscope_post_calling_main(void);
+
+#endif /* defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT) */
+
+#endif /* not MERCURY_THREADSCOPE_H */
Index: runtime/mercury_wrapper.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.c,v
retrieving revision 1.202
diff -u -p -b -r1.202 mercury_wrapper.c
--- runtime/mercury_wrapper.c 30 Nov 2009 23:24:40 -0000 1.202
+++ runtime/mercury_wrapper.c 1 Dec 2009 05:43:41 -0000
@@ -65,6 +65,7 @@ ENDINIT
#include "mercury_memory.h" /* for MR_copy_string() */
#include "mercury_memory_handlers.h" /* for MR_default_handler */
#include "mercury_thread.h" /* for MR_debug_threads */
+#include "mercury_threadscope.h"
#if defined(MR_HAVE__SNPRINTF) && ! defined(MR_HAVE_SNPRINTF)
#define snprintf _snprintf
@@ -529,12 +530,13 @@ mercury_runtime_init(int argc, char **ar
#if defined(MR_THREAD_SAFE) && defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
/*
- ** Setup support for reading the CPU's TSC. This is currently used by
- ** profiling of the parallelism runtime but may be used by other profiling
+ ** Setup support for reading the CPU's TSC and detect the clock speed of the
+ ** processor. This is currently used by profiling of the parallelism
+ ** runtime and the threadscope support but may be used by other profiling
** or timing code. On architectures other than i386 and amd64 this is a
** no-op.
*/
- MR_configure_profiling_timers();
+ MR_do_cpu_feature_detection();
#endif
/*
@@ -614,6 +616,18 @@ mercury_runtime_init(int argc, char **ar
MR_ticket_high_water = 1;
#endif
#else
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ)
+ MR_pin_primordial_thread();
+#if defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ /*
+ ** We must setup threadscope before we setup the first engine.
+ ** Pin the primordial thread, if thread pinning is configured.
+ */
+ MR_setup_threadscope();
+#endif
+#endif
+
/*
** Start up the Mercury engine. We don't yet know how many slots will be
** needed for thread-local mutable values so allocate the maximum number.
@@ -627,10 +641,20 @@ mercury_runtime_init(int argc, char **ar
int i;
MR_exit_now = MR_FALSE;
- for (i = 1 ; i < MR_num_threads ; i++) {
+
+ for (i = 1; i < MR_num_threads; i++) {
MR_create_thread(NULL);
}
- MR_pin_thread();
+ #ifdef MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
+ /*
+ ** TSC Synchronization is not used support is commented out. See
+ ** runtime/mercury_threadscope.h
+ **
+ for (i = 1; i < MR_num_threads; i++) {
+ MR_threadscope_sync_tsc_master();
+ }
+ */
+ #endif
while (MR_num_idle_engines < MR_num_threads-1) {
/* busy wait until the worker threads are ready */
MR_ATOMIC_PAUSE;
@@ -2413,6 +2437,13 @@ mercury_runtime_main(void)
MR_setup_callback(MR_program_entry_point);
#endif
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+
+ MR_threadscope_post_calling_main();
+
+#endif
+
#ifdef MR_HIGHLEVEL_CODE
MR_do_interpreter();
#else
@@ -2421,6 +2452,13 @@ mercury_runtime_main(void)
MR_debugmsg0("Returning from MR_call_engine()\n");
#endif
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+
+ MR_threadscope_post_stop_context(MR_TS_STOP_REASON_FINISHED);
+
+#endif
+
#ifdef MR_DEEP_PROFILING
MR_current_call_site_dynamic = saved_cur_csd;
MR_current_callback_site = saved_cur_callback;
@@ -2929,6 +2967,18 @@ mercury_runtime_terminate(void)
pthread_cond_broadcast(&MR_runqueue_cond);
MR_UNLOCK(&MR_runqueue_lock, "exit_now");
+ while (MR_num_exited_engines < MR_num_threads - 1) {
+ MR_ATOMIC_PAUSE;
+ }
+
+#if defined(MR_THREAD_SAFE) && defined(MR_LL_PARALLEL_CONJ) && \
+ defined(MR_PROFILE_PARALLEL_EXECUTION_SUPPORT)
+ if (MR_ENGINE(MR_eng_ts_buffer))
+ MR_threadscope_finalize_engine(MR_thread_engine_base);
+
+ MR_finalize_threadscope();
+#endif
+
assert(MR_primordial_thread == pthread_self());
MR_primordial_thread = (MercuryThread) 0;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20091201/759f1341/attachment.sig>
More information about the reviews
mailing list