[m-rev.] diff: Fix poor performance on independent parallel benchmarks.

Paul Bone pbone at csse.unimelb.edu.au
Thu Oct 27 16:05:15 AEDT 2011


In some test cases all the sparks for a parallel conjunction are created
initially.  When engines look for sparks to service work stealing is used to
try to get some work.  Work stealing can fail, for example when a
compare-and-swap failed due to a race with another thread.  When this happens
the thread was unable to find work and goes to sleep.

Because in these programs new parallel work may not be created later in the
program these threads are not woken up and parallelism is lost.  Leading to
eventual sequential behavior.

This patch provides a work-around by allowing sleeping threads to wake up
periodically.

runtime/mercury_context.c:
    Use sem_timedwait rather than sem_wait when a context sleeps so that it can
    wake up periodically.  When they wake up because of a timeout they attempt
    to steal work.

    This is a temporary work around, I don't like it because it wakes up
    threads needlessly in many cases.  I'd rather work out why these threads
    fail to steal work, it could be a failed compare-and-swap or something less
    benign.  However, this solution will be good enough in the short term.

diff --git a/runtime/mercury_context.c b/runtime/mercury_context.c
index c67e2bc..dedcf2e 100644
--- a/runtime/mercury_context.c
+++ b/runtime/mercury_context.c
@@ -1843,6 +1843,8 @@ MR_define_entry(MR_do_sleep);
     MR_EngineId engine_id = MR_ENGINE(MR_eng_id);
     unsigned action;
     int result;
+    struct timespec ts;
+    struct timeval tv;
 
     while (1) {
         engine_sleep_sync_data[engine_id].d.es_state = ENGINE_STATE_SLEEPING;
@@ -1850,9 +1852,25 @@ MR_define_entry(MR_do_sleep);
 #ifdef MR_THREADSCOPE
         MR_threadscope_post_engine_sleeping();
 #endif
-        result = MR_SEM_WAIT(
+#if defined(MR_HAVE_GETTIMEOFDAY) && defined(MR_HAVE_SEMAPHORE_H)
+        gettimeofday(&tv, NULL);
+        /* Sleep for 2ms */
+        tv.tv_usec += 2000;
+
+        if (tv.tv_usec > 1000000) {
+            tv.tv_usec = tv.tv_sec % 1000000;
+            tv.tv_sec += 1;
+        }
+        ts.tv_sec = tv.tv_sec;
+        ts.tv_nsec = tv.tv_usec * 1000;
+        result = sem_timedwait(
             &(engine_sleep_sync_data[engine_id].d.es_sleep_semaphore),
-            "MR_do_sleep sleep_sem");
+            &ts);
+#else
+        MR_fatal_error(
+            "low-level parallel grades need gettimeofday() and "
+            "sem_timedwait()\n");
+#endif
 
         if (0 == result) {
             MR_CPU_LFENCE;
@@ -1924,6 +1942,12 @@ MR_define_entry(MR_do_sleep);
                     ** An interrupt woke the engine, go back to sleep.
                     */
                     break;
+                case ETIMEDOUT:
+                    /*
+                    ** A wait timed out, check for any sparks.
+                    */
+                    MR_MAYBE_TRAMPOLINE(do_work_steal(NULL));
+                    break;
                 default:
                     perror("sem_post");
                     abort();
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20111027/0f56a8f7/attachment.sig>


More information about the reviews mailing list