[m-rev.] Fix two bugs in the parallel runtime code.

Paul Bone pbone at csse.unimelb.edu.au
Sun Oct 16 14:32:51 AEDT 2011


Fix two bugs in the parallel runtime code.

One bug was caused when the master context, in MR_lc_finish() would release the
contexts used by each of the slots.  The release code attempts to save state
from the engine back into the context, which is necessary most of the time.
However, in this case it saved state from the engine running the master
context, into other contexts, so that when they where re-used they used an
invalid stack pointer.

Another bug was found in code with recursive parallel conjunctions.  Each
context structure contains a pointer to a code location, it is used as a value
for the instruction pointer when a context is resumed.  The
MR_join_and_continue operation for parallel conjunctions uses this resume to
ensure that the master context for a parallel conjunction is only resumed if it
has become blocked and is ready to be resumed.  However the field was never
cleared before and will always contain the same value parallel conjunctions are
nested as they will all have the same resume point.  This caused the master
context to be resumed before it had fully blocked, causing it to be resumed
with an invalid state.

A potential bug was found where a field should have been volatile to prevent
the compiler from caching its value when doing so would not be safe.

Widen a couple of critical sections as they didn't quite protect against some
race conditions.  This is another potential cause of bugs.

runtime/mercury_par_builtin.[ch]:
    Make the master_context field of the loop control structure volatile so
    that the compiler doesn't cache its value.

    Make the last worker to finish take a lock earlier, to ensure that the
    master context won't be left waiting forever.

    Add a comment explaining why a context must not be saved before calling
    MR_destroy_context().

    Improve debugging code to print out the value of the stack or parent stack
    pointer, depending on the code in question.

    Make the lock in MR_lc_finish() wider, so that the lock is held when the
    code checks to see if it should block.

runtime/mercury_context.c:
    MR_destroy_context() no longer saves the context before releasing it.

    MR_destroy_context() no longer sets the MR_ctxt_resume_owner_engine field
    of the context since it's not currently used.

    MR_join_and_continue(), the barrier for parallel conjunctions, how resets
    the resume code pointer of the master context when it switches to it.

runtime/mercury_context.h:
    Described the reason why the context must be saved before it is
    destroyed/released.

runtime/mercury_context.c:
runtime/mercury_engine.c:
    Call MR_save_context() before calling MR_destroy_context()

Index: runtime/mercury_context.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.c,v
retrieving revision 1.100
diff -u -p -b -r1.100 mercury_context.c
--- runtime/mercury_context.c	13 Oct 2011 02:42:21 -0000	1.100
+++ runtime/mercury_context.c	16 Oct 2011 03:30:39 -0000
@@ -1103,18 +1103,13 @@ MR_destroy_context(MR_Context *c)
 #endif
 
     /*
-    ** Save the context first, even though we are not saving a computation
-    ** that's in progress we are saving some bookkeeping information.
-    */
-    /*
     ** TODO: When retrieving a context from the cached contexts, try to
     ** retrieve one with a matching engine ID, or give each engine a local
     ** cache of spare contexts.
-    */
 #ifdef MR_LL_PARALLEL_CONJ
     c->MR_ctxt_resume_owner_engine = MR_ENGINE(MR_eng_id);
 #endif
-    MR_save_context(c);
+    */
 
     /* XXX not sure if this is an overall win yet */
 #if 0 && defined(MR_CONSERVATIVE_GC) && !defined(MR_HIGHLEVEL_CODE)
@@ -1986,6 +1981,7 @@ prepare_engine_for_context(MR_Context *c
         MR_debug_log_message("destroying old context %p",
             MR_ENGINE(MR_eng_this_context));
     #endif
+        MR_save_context(MR_ENGINE(MR_eng_this_context));
         MR_destroy_context(MR_ENGINE(MR_eng_this_context));
     }
     MR_ENGINE(MR_eng_this_context) = context;
@@ -2217,6 +2213,12 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
             MR_threadscope_post_context_runnable(jnc_st->MR_st_orig_context);
 #endif
             prepare_engine_for_context(jnc_st->MR_st_orig_context);
+            jnc_st->MR_st_orig_context->MR_ctxt_resume = NULL;
+            /*
+            ** This field must be null before we execute MR_join_and_continue
+            ** next.
+            */
+            MR_CPU_SFENCE;
             return join_label;
         }
         return MR_ENTRY(MR_do_idle_clean_context);
Index: runtime/mercury_context.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.h,v
retrieving revision 1.70
diff -u -p -b -r1.70 mercury_context.h
--- runtime/mercury_context.h	13 Oct 2011 02:42:21 -0000	1.70
+++ runtime/mercury_context.h	16 Oct 2011 03:27:06 -0000
@@ -448,6 +448,13 @@ extern  MR_Context  *MR_create_context(c
 /*
 ** MR_destroy_context(context) returns the pointed-to context structure
 ** to the free list, and releases resources as necessary.
+**
+** VERY IMPORTANT:  Call MR_save_context() before you call
+** MR_destroy_context().  Contexts are cached and calling MR_save_context()
+** saves important book-keeping information, like the stack pointer and current
+** stack segment.  If you do not call these then an old, and since freed (or
+** re-used elsewhere) stack segment may still be referenced by the context.  If
+** that context is re-used later then it will clober another context's stack!
 */
 extern  void        MR_destroy_context(MR_Context *context);
 
Index: runtime/mercury_engine.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_engine.c,v
retrieving revision 1.67
diff -u -p -b -r1.67 mercury_engine.c
--- runtime/mercury_engine.c	12 Sep 2011 16:29:55 -0000	1.67
+++ runtime/mercury_engine.c	16 Oct 2011 03:27:06 -0000
@@ -168,6 +168,7 @@ void MR_finalize_engine(MercuryEngine *e
     ** might need to be finalized.
     */
     if (eng->MR_eng_this_context) {
+        MR_save_context(eng->MR_eng_this_context);
         MR_destroy_context(eng->MR_eng_this_context);
     }
 
Index: runtime/mercury_par_builtin.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_par_builtin.c,v
retrieving revision 1.7
diff -u -p -b -r1.7 mercury_par_builtin.c
--- runtime/mercury_par_builtin.c	9 Oct 2011 09:56:20 -0000	1.7
+++ runtime/mercury_par_builtin.c	16 Oct 2011 03:27:07 -0000
@@ -113,7 +113,8 @@ MR_lc_spawn_off_func(MR_LoopControl *lc,
     MR_LoopControlSlot *lcs = &(lc->MR_lc_slots[lcs_idx]);
 
 #if MR_DEBUG_LOOP_CONTROL
-    fprintf(stderr, "lc_spawn_off(%p, %d, %p)\n", lc, lcs_idx, code_ptr);
+    fprintf(stderr, "lc_spawn_off(%p, %d, %p) sp: %p\n",
+        lc, lcs_idx, code_ptr, MR_sp);
 #endif
     if (lcs->MR_lcs_context == NULL) {
         /*
@@ -147,10 +148,29 @@ MR_lc_join(MR_LoopControl *lc, MR_Unsign
     lc->MR_lc_free_slot_hint = lcs_idx;
     /* Ensure the slot is free before we perform the decrement. */
     MR_CPU_SFENCE;
+
+    /*
+    ** We have to determine if we're either the last of first workers to
+    ** finish.  To do this we cannot use atomic decrment since we need to do
+    ** more than one comparison, Therefore we use a CAS.
+    */
     last_worker =
         MR_atomic_dec_and_is_zero_int(&(lc->MR_lc_outstanding_workers));
 
     /*
+    ** If this is the last worker to finish, then take the lock before checking
+    ** the master context feild, otherwise we might race and end up never
+    ** resuming the master.
+    */
+    if (last_worker) {
+        MR_US_SPIN_LOCK(&(lc->MR_lc_master_context_lock));
+        /*
+        ** Don't read the master field until after we have the lock
+        */
+        MR_CPU_MFENCE;
+    }
+
+    /*
     ** If the master thread is suspended, wake it up, provided that either:
     ** - the loop has finished and this is the last worker to exit, or
     ** - the loop has not finished (so the master can create more work).
@@ -161,9 +181,16 @@ MR_lc_join(MR_LoopControl *lc, MR_Unsign
         /*
         ** Now take a lock and re-read the master context field.
         */
+        if (!last_worker) {
         MR_US_SPIN_LOCK(&(lc->MR_lc_master_context_lock));
+            /*
+            ** Don't read the master field until after we have the lock
+            */
+            MR_CPU_MFENCE;
+        }
         wakeup_context = lc->MR_lc_master_context;
         lc->MR_lc_master_context = NULL;
+        MR_CPU_SFENCE; /* Make sure the field to nULL */
         MR_US_UNLOCK(&(lc->MR_lc_master_context_lock));
         if (wakeup_context != NULL) {
 #ifdef MR_DEBUG_LOOP_CONTROL
@@ -176,6 +203,8 @@ MR_lc_join(MR_LoopControl *lc, MR_Unsign
             */
             MR_schedule_context(wakeup_context);
         }
+    } else if (last_worker) {
+        MR_US_UNLOCK(&(lc->MR_lc_master_context_lock));
     }
 }
 
Index: runtime/mercury_par_builtin.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_par_builtin.h,v
retrieving revision 1.13
diff -u -p -b -r1.13 mercury_par_builtin.h
--- runtime/mercury_par_builtin.h	14 Oct 2011 00:53:27 -0000	1.13
+++ runtime/mercury_par_builtin.h	16 Oct 2011 03:27:07 -0000
@@ -307,7 +307,7 @@ struct MR_LoopControl_Struct
 
     /* This lock protects only the next field */
     MR_THREADSAFE_VOLATILE MR_Us_Lock       MR_lc_master_context_lock;
-    MR_Context                              *MR_lc_master_context;
+    MR_THREADSAFE_VOLATILE MR_Context       *MR_lc_master_context;
 
     /* Unused atm */
     MR_THREADSAFE_VOLATILE MR_bool          MR_lc_finished;
@@ -371,7 +371,8 @@ extern MR_LoopControl   *MR_lc_create(un
 */
 #define MR_lc_finish_part1(lc, part2_label)                                 \
     MR_IF_DEBUG_LOOP_CONTORL(                                               \
-        fprintf(stderr, "lc_finish_part1(%p, %p)\n", (lc), (part2_label))); \
+        fprintf(stderr, "lc_finish_part1(%p, %p), sp: %p\n",                \
+            (lc), (part2_label), MR_sp));                                   \
                                                                             \
     do {                                                                    \
         (lc)->MR_lc_finished = MR_TRUE;                                     \
@@ -382,13 +383,13 @@ extern MR_LoopControl   *MR_lc_create(un
         ** MR_lc_join_and_terminate().  See MR_lc_join_and_terminate().     \
         */                                                                  \
         MR_CPU_MFENCE;                                                      \
+        MR_US_SPIN_LOCK(&((lc)->MR_lc_master_context_lock));                \
         if ((lc)->MR_lc_outstanding_workers > 0) {                          \
             /*                                                              \
             ** This context must wait until the workers are finished.       \
             ** This must be implemented as a macro, since we cannot move    \
             ** the C stack pointer without extra work.                      \
             */                                                              \
-            MR_US_SPIN_LOCK(&((lc)->MR_lc_master_context_lock));            \
             if ((lc)->MR_lc_outstanding_workers != 0) {                     \
                 MR_save_context(MR_ENGINE(MR_eng_this_context));            \
                 MR_ENGINE(MR_eng_this_context)->MR_ctxt_resume_owner_engine = \
@@ -400,14 +401,14 @@ extern MR_LoopControl   *MR_lc_create(un
                 MR_ENGINE(MR_eng_this_context) = NULL;                      \
                 MR_idle(); /* Release the engine to the idle loop */        \
             }                                                               \
+        }                                                                   \
             MR_US_UNLOCK(&((lc)->MR_lc_master_context_lock));               \
             /* Fall through to part2 */                                     \
-        }                                                                   \
     } while (0);
 
 #define MR_lc_finish_part2(lc)                                              \
     MR_IF_DEBUG_LOOP_CONTORL(                                               \
-        fprintf(stderr, "lc_finish_part2(%p)\n", (lc)));                    \
+        fprintf(stderr, "lc_finish_part2(%p), sp: %p\n", (lc), MR_sp));     \
                                                                             \
     do {                                                                    \
         unsigned i;                                                         \
@@ -417,6 +418,12 @@ extern MR_LoopControl   *MR_lc_create(un
         */                                                                  \
         for (i = 0; i < (lc)->MR_lc_num_slots; i++) {                       \
             if ((lc)->MR_lc_slots[i].MR_lcs_context != NULL) {              \
+                /*                                                          \
+                ** Note that we don't need to save the context here, it is  \
+                ** saved in MR_join_and_terminate, which is important,      \
+                ** because if we did save it here we would write invalid    \
+                ** data into it.                                            \
+                */                                                          \
                 MR_destroy_context((lc)->MR_lc_slots[i].MR_lcs_context);    \
             }                                                               \
         }                                                                   \
@@ -435,14 +442,15 @@ extern MR_Bool MR_lc_try_get_free_slot(M
 */
 #define MR_lc_wait_free_slot(lc, lcs_idx, retry_label)                      \
     MR_IF_DEBUG_LOOP_CONTORL(                                               \
-        fprintf(stderr, "lc_wait_free_slot(%p, _, %p)\n", (lc),             \
-            retry_label));                                                  \
+        fprintf(stderr, "lc_wait_free_slot(%p, _, %p), sp: %p\n",           \
+            (lc), (retry_label), MR_sp));                                   \
                                                                             \
     do {                                                                    \
         unsigned    hint, offset, i;                                        \
                                                                             \
         if ((lc)->MR_lc_outstanding_workers == (lc)->MR_lc_num_slots) {     \
             MR_US_SPIN_LOCK(&((lc)->MR_lc_master_context_lock));            \
+            MR_CPU_MFENCE;                                                  \
             /*                                                              \
             ** Re-check outstanding workers while holding the lock.         \
             ** This ensures that we only commit to sleeping while holding   \
@@ -457,10 +465,11 @@ extern MR_Bool MR_lc_try_get_free_slot(M
                 ** unblocked.                                               \
                 */                                                          \
                 ctxt = MR_ENGINE(MR_eng_this_context);                      \
-                (lc)->MR_lc_master_context = ctxt;                          \
                 MR_save_context(ctxt);                                      \
                 ctxt->MR_ctxt_resume = retry_label;                         \
                 ctxt->MR_ctxt_resume_owner_engine = MR_ENGINE(MR_eng_id);   \
+                (lc)->MR_lc_master_context = ctxt;                          \
+                MR_CPU_SFENCE;                                              \
                 MR_US_UNLOCK(&((lc)->MR_lc_master_context_lock));           \
                 MR_ENGINE(MR_eng_this_context) = NULL;                      \
                 MR_idle();                                                  \
@@ -483,7 +492,8 @@ extern MR_Bool MR_lc_try_get_free_slot(M
         }                                                                   \
                                                                             \
         MR_IF_DEBUG_LOOP_CONTORL(                                           \
-            fprintf(stderr, "lc_wait_free_slot returning %d\n", (lcs_idx)));\
+            fprintf(stderr, "lc_wait_free_slot returning %d, sp: %p\n",     \
+                (lcs_idx), MR_sp));                                         \
                                                                             \
     } while (0);
 
@@ -501,6 +511,9 @@ extern void MR_lc_spawn_off_func(MR_Loop
 */
 #define MR_lc_join_and_terminate(lc, lcs_idx)                               \
     do {                                                                    \
+        MR_IF_DEBUG_LOOP_CONTORL(                                           \
+            fprintf(stderr, "lc_join_and_terminate(%p, %d) parent_sp: %p\n",\
+                (lc), (lcs_idx), MR_parent_sp));                             \
         /*                                                                  \
         ** Termination of this context must be handled in a macro so that   \
         ** C's stack pointer is set correctly. It might appear that we      \
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20111016/9972c6fe/attachment.sig>


More information about the reviews mailing list