[m-rev.] for post-commit review: Fix compatibility issues for the low-level C parallel grades.

Paul Bone pbone at csse.unimelb.edu.au
Mon Mar 22 17:47:20 AEDT 2010


On Sun, Mar 21, 2010 at 12:50:50AM +1100, Julien Fischer wrote:
> 
> Hi Paul,
> 
> The diff below does not correspond to the log message -- it appears to be
> the diff for the change you made that documents the .par grades.
> 
> Julien.
> 

Whoops,  Here's the same log message with the correct diff.

For post commit review by anyone.  I'm committing this to the main branch now.
I'll let our release manager (Julien) review it or wait for it to be reviewed
before pushing it onto the 10.04 branch.

Thanks.


Branches: main, 10.04

Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
parallel and threadscope grades.

We had been using types with the wrong signedness well calling atomic operations.
GCC 4.x also picked up an error where #elif was used instead of #else.

While testing these changes on a 32bit system more bugs where found on the i386
architecture and on AMD brand processors.

runtime/mercury_atomic_ops.h:
runtime/mercury_atomic_ops.c:
    Add unsigned variants of the following atomic operations:
        increment,
        add,
        add_and_fetch,
        dec_and_is_zero,

    Add a signed variant for compare and swap.

    Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
    the end of the name.

    Use volatile storage in the MR_Stats structure.

    A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
    must use 64bit values.  Therefore 64bit values in the MR_Stats structure
    are now protected by a lock on 32bit machines.

runtime/mercury_atomic_ops.h:
    Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().

runtime/mercury_atomic_ops.c:
    AMD CPUs do not conform to Intel's specification for being able to
    extract the CPU clock speed from the brand string.  When we cannot
    determine the CPU's clock speed then we write out threadscope
    timestamps in raw clock cycles rather than nanoseconds.

    On i386 machines the ebx register is used to implement PIC code,
    however the CPUID instruction uses it to output information.  Save
    this register on C's stack while we issue CPUID and retrieve the
    result in ebx.

    We now pass native machine sized values to the inline assembler code
    that implements RDTSC and RDTSCP.

    Fix commenting style in some places.

runtime/mercury_atomic_ops.c:
    Fix some incorrect C preprocessor code for conditional compilation.

runtime/mercury_grade.h:
    Increment binary compatibility number.  This should have been done in a
    prior change when the MR_runnext macro changed which broke binary
    compatibility in the parallel low-level C grades.

runtime/mercury_context.h:
    In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
    remaining before the conjunction is complete.

runtime/mercury_threadscope.c:
    Record raw cpu clock ticks rather than milliseconds when we don't
    know the processor's clock speed.

runtime/mercury_context.c:
runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
    Conform to changes in mercury_atomic_ops.h

Index: runtime/mercury_atomic_ops.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_atomic_ops.c,v
retrieving revision 1.10
diff -u -p -b -r1.10 mercury_atomic_ops.c
--- runtime/mercury_atomic_ops.c	17 Feb 2010 02:37:44 -0000	1.10
+++ runtime/mercury_atomic_ops.c	20 Mar 2010 10:02:07 -0000
@@ -24,7 +24,7 @@
 
 MR_OUTLINE_DEFN(
     MR_bool
-    MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
+    MR_compare_and_swap_int(volatile MR_Integer *addr, MR_Integer old,
         MR_Integer new_val)
 ,
     {
@@ -33,6 +33,16 @@ MR_OUTLINE_DEFN(
 )
 
 MR_OUTLINE_DEFN(
+    MR_bool
+    MR_compare_and_swap_uint(volatile MR_Unsigned *addr, MR_Unsigned old,
+        MR_Unsigned new_val)
+,
+    {
+        MR_COMPARE_AND_SWAP_WORD_BODY;
+    }
+)
+
+MR_OUTLINE_DEFN(
     MR_Integer 
     MR_atomic_add_and_fetch_int(volatile MR_Integer *addr, MR_Integer addend)
 ,
@@ -42,6 +52,15 @@ MR_OUTLINE_DEFN(
 )
 
 MR_OUTLINE_DEFN(
+    MR_Unsigned 
+    MR_atomic_add_and_fetch_uint(volatile MR_Unsigned *addr, MR_Unsigned addend)
+,
+    {
+        MR_ATOMIC_ADD_AND_FETCH_UINT_BODY;
+    }
+)
+
+MR_OUTLINE_DEFN(
     void
     MR_atomic_add_int(volatile MR_Integer *addr, MR_Integer addend)
 ,
@@ -52,6 +71,15 @@ MR_OUTLINE_DEFN(
 
 MR_OUTLINE_DEFN(
     void
+    MR_atomic_add_uint(volatile MR_Unsigned *addr, MR_Unsigned addend)
+,
+    {
+        MR_ATOMIC_ADD_UINT_BODY;
+    }
+)
+
+MR_OUTLINE_DEFN(
+    void
     MR_atomic_sub_int(volatile MR_Integer *addr, MR_Integer x)
 ,
     {
@@ -70,6 +98,15 @@ MR_OUTLINE_DEFN(
 
 MR_OUTLINE_DEFN(
     void 
+    MR_atomic_inc_uint(volatile MR_Unsigned *addr)
+,
+    {
+        MR_ATOMIC_INC_UINT_BODY;
+    }
+)
+
+MR_OUTLINE_DEFN(
+    void 
     MR_atomic_dec_int(volatile MR_Integer *addr)
 ,
     {
@@ -79,10 +116,19 @@ MR_OUTLINE_DEFN(
 
 MR_OUTLINE_DEFN(
     MR_bool
-    MR_atomic_dec_int_and_is_zero(volatile MR_Integer *addr)
+    MR_atomic_dec_and_is_zero_int(volatile MR_Integer *addr)
+,
+    {
+        MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY;
+    }
+)
+
+MR_OUTLINE_DEFN(
+    MR_bool
+    MR_atomic_dec_and_is_zero_uint(volatile MR_Unsigned *addr)
 ,
     {
-        MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY;
+        MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY;
     }
 )
 
@@ -184,29 +230,29 @@ MR_do_cpu_feature_detection(void) {
         MR_rdtsc_is_available = MR_TRUE;
 
     /*
-     * BTW: Intel can't count:
-     *
-     * http://www.pagetable.com/?p=18
-     * http://www.codinghorror.com/blog/archives/000364.html
-     *
-     * 486 (1989): family 4
-     * Pentium (1993): family 5
-     * Pentium Pro (1995): family 6, models 0 and 1
-     * Pentium 2 (1997): family 6, models 3, 5 and 6
-     * Pentium 3 (2000): family 6, models 7, 8, 10, 11
-     * Itanium (2001): family 7
-     * Pentium 4 (2000): family 15/0
-     * Itanium 2 (2002): family 15/1 and 15/2
-     * Pentium D: family 15/4
-     * Pentium M (2003): family 6, models 9 and 13
-     * Core (2006): family 6, model 14
-     * Core 2 (2006): family 6, model 15
-     * i7: family 6, model 26
-     * Atom: family 6, model 28
-     *
-     * This list is incomplete, it doesn't cover AMD or any other brand of x86
-     * processor, and it probably doesn't cover all post-pentium Intel
-     * processors.
+    ** BTW: Intel can't count:
+    **
+    ** http://www.pagetable.com/?p=18
+    ** http://www.codinghorror.com/blog/archives/000364.html
+    **
+    ** 486 (1989): family 4
+    ** Pentium (1993): family 5
+    ** Pentium Pro (1995): family 6, models 0 and 1
+    ** Pentium 2 (1997): family 6, models 3, 5 and 6
+    ** Pentium 3 (2000): family 6, models 7, 8, 10, 11
+    ** Itanium (2001): family 7
+    ** Pentium 4 (2000): family 15/0
+    ** Itanium 2 (2002): family 15/1 and 15/2
+    ** Pentium D: family 15/4
+    ** Pentium M (2003): family 6, models 9 and 13
+    ** Core (2006): family 6, model 14
+    ** Core 2 (2006): family 6, model 15
+    ** i7: family 6, model 26
+    ** Atom: family 6, model 28
+    **
+    ** This list is incomplete, it doesn't cover AMD or any other brand of x86
+    ** processor, and it probably doesn't cover all post-pentium Intel
+    ** processors.
      */
 
     /* bits 8-11 (first bit (LSB) is bit 0) */
@@ -302,10 +348,12 @@ MR_do_cpu_feature_detection(void) {
         unsigned int shift;
 
         /*
-         * This processor supports the brand string from which we can extract
-         * the clock speed.  This algorithm is described in the Intel
-         * Instruction Set Reference, Volume 2B, Chapter 3, Pages 207-208, In
-         * particular the flow chart in figure 3-10.
+        ** This processor supports the brand string from which we can
+        ** try to extract the clock speed.  This algorithm is described
+        ** in the Intel Instruction Set Reference, Volume 2B, Chapter 3,
+        ** Pages 207-208, In particular the flow chart in figure 3-10.
+        ** This does not work on AMD processors since they don't include
+        ** the clock speed in the brand string.
          */
         for (page = 0; page < 3; page++) {
             MR_cpuid(page + 0x80000002, 0, &a, &b, &c, &d);
@@ -329,10 +377,14 @@ MR_do_cpu_feature_detection(void) {
 
         MR_cpu_cycles_per_sec = parse_freq_from_x86_brand_string(buff);
 #if MR_DEBUG_CPU_FEATURE_DETECTION
+        if (MR_cpu_cycles_per_sec == 0) {
+            fprintf(stderr, "Failed to detect cycles per second "
+                "you can probably blame AMD for this.\n");
+        } else {
         fprintf(stderr, "Cycles per second: %ld\n", MR_cpu_cycles_per_sec);
+        }
 #endif
     }
-
 #endif /* __GNUC__ && (__i386__ || __x86_64__) */
 }
 
@@ -347,9 +399,9 @@ parse_freq_from_x86_brand_string(char *s
     brand_string_len = strlen(string);
     
     /*
-     * There will be at least five characters if we can parse this, three
-     * for the '?Hz' suffix, at least one for the units, plus a space at
-     * the beginning o
+    ** There will be at least five characters if we can parse this, three
+    ** for the '?Hz' suffix, at least one for the units, plus a space at
+    ** the beginning of the number.
      */
     if (!(brand_string_len > 5))
         return 0;
@@ -430,11 +482,18 @@ MR_profiling_stop_timer(MR_Timer *timer,
         {
             duration = now.MR_timer_time - timer->MR_timer_time;
             duration_squared = duration * duration;
-            MR_atomic_inc_int(&(stats->MR_stat_count_recorded));
+            MR_atomic_inc_uint(&(stats->MR_stat_count_recorded));
+  #if MR_LOW_TAG_BITS >= 3
             MR_atomic_add_int(&(stats->MR_stat_sum), duration);
-            MR_atomic_add_int(&(stats->MR_stat_sum_squares), duration_squared);
+            MR_atomic_add_uint(&(stats->MR_stat_sum_squares), duration_squared);
+  #else
+            MR_US_SPIN_LOCK(&(stats->MR_stat_sums_lock));
+            stats->MR_stat_sum += duration;
+            stats->MR_stat_sum_squares += duration_squared;
+            MR_US_UNLOCK(&(stats->MR_stat_sums_lock));
+  #endif
         } else {
-            MR_atomic_inc_int(&(stats->MR_stat_count_not_recorded));
+            MR_atomic_inc_uint(&(stats->MR_stat_count_not_recorded));
         }
     }
     else if (MR_rdtsc_is_available == MR_TRUE)
@@ -442,11 +501,18 @@ MR_profiling_stop_timer(MR_Timer *timer,
         MR_rdtsc(&(now.MR_timer_time));
         duration = now.MR_timer_time - timer->MR_timer_time;
         duration_squared = duration * duration;
-        MR_atomic_inc_int(&(stats->MR_stat_count_recorded));
+        MR_atomic_inc_uint(&(stats->MR_stat_count_recorded));
+  #if MR_LOW_TAG_BITS >= 3
         MR_atomic_add_int(&(stats->MR_stat_sum), duration);
-        MR_atomic_add_int(&(stats->MR_stat_sum_squares), duration_squared);
+        MR_atomic_add_uint(&(stats->MR_stat_sum_squares), duration_squared);
+  #else
+        MR_US_SPIN_LOCK(&(stats->MR_stat_sums_lock));
+        stats->MR_stat_sum += duration;
+        stats->MR_stat_sum_squares += duration_squared;
+        MR_US_UNLOCK(&(stats->MR_stat_sums_lock));
+  #endif
     }
-#elif /* not __GNUC__ && (__i386__ || __x86_64__) */
+#else /* not __GNUC__ && (__i386__ || __x86_64__) */
     /* No TSC support on this architecture or with this C compiler */
     MR_atomic_inc_int(&(stats->MR_stat_count_recorded));
 #endif /* not __GNUC__ && (__i386__ || __x86_64__) */
@@ -464,7 +530,7 @@ MR_read_cpu_tsc(void)
         tsc = 0;
     }
     return tsc;
-#elif /* not __GNUC__ && (__i386__ || __x86_64__) */
+#else /* not __GNUC__ && (__i386__ || __x86_64__) */
     return 0;
 #endif /* not __GNUC__ && (__i386__ || __x86_64__) */
 }
@@ -477,35 +543,56 @@ MR_read_cpu_tsc(void)
 static __inline__ void 
 MR_cpuid(MR_Unsigned code, MR_Unsigned sub_code,
         MR_Unsigned *a, MR_Unsigned *b, MR_Unsigned *c, MR_Unsigned *d) {
+#ifdef __x86_64__
     __asm__("cpuid"
         : "=a"(*a), "=b"(*b), "=c"(*c), "=d"(*d)
         : "0"(code), "2"(sub_code));
+#elif defined(__i386__)
+    /*
+    ** i386 is more register staved, in particular we can't use ebx in
+    ** position independant code.  And we can't move ebx into another
+    ** general purpose register, between register pinning, PIC, the
+    ** stack and frame pointers and the other registers used by CPUID
+    ** there are literally no general purpose registers left on i386.
+    */
+    __asm__("pushl %%ebx; \
+             cpuid; \
+             movl %%ebx, %1; \
+             popl %%ebx;"
+        : "=a"(*a), "=m"(*b), "=c"(*c), "=d"(*d)
+        : "0"(code), "2"(sub_code)
+        : "memory");
+#endif
 }
 
 static __inline__ void
 MR_rdtscp(MR_uint_least64_t *tsc, MR_Unsigned *processor_id) {
-    MR_uint_least64_t tsc_high;
+    MR_Unsigned tsc_low;
+    MR_Unsigned tsc_high;
 
     /*
     ** On 64bit systems the high 32 bits of RAX and RDX are 0 filled by
     ** rdtsc{p}
     */
     __asm__("rdtscp"
-           : "=a"(*tsc), "=d"(tsc_high), "=c"(*processor_id));
+           : "=a"(tsc_low), "=d"(tsc_high), "=c"(*processor_id));
 
-    tsc_high = tsc_high << 32;
-    *tsc |= tsc_high; 
+    *tsc = tsc_high;
+    *tsc = *tsc << 32;
+    *tsc |= tsc_low;
 }
 
 static __inline__ void
 MR_rdtsc(MR_uint_least64_t *tsc) {
-    MR_uint_least64_t tsc_high;
+    MR_Unsigned tsc_low;
+    MR_Unsigned tsc_high;
 
     __asm__("rdtsc"
-           : "=a"(*tsc), "=d"(tsc_high));
+           : "=a"(tsc_low), "=d"(tsc_high));
 
-    tsc_high = tsc_high << 32;
-    *tsc |= tsc_high; 
+    *tsc = tsc_high;
+    *tsc = *tsc << 32;
+    *tsc |= tsc_low;
 }
 
 #endif /* __GNUC__ && (__i386__ || __x86_64__) */
Index: runtime/mercury_atomic_ops.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_atomic_ops.h,v
retrieving revision 1.14
diff -u -p -b -r1.14 mercury_atomic_ops.h
--- runtime/mercury_atomic_ops.h	17 Feb 2010 02:46:17 -0000	1.14
+++ runtime/mercury_atomic_ops.h	20 Mar 2010 10:02:07 -0000
@@ -72,8 +72,11 @@
 ** Otherwise return false.
 */
 MR_EXTERN_INLINE MR_bool
-MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
+MR_compare_and_swap_int(volatile MR_Integer *addr, MR_Integer old,
         MR_Integer new_val);
+MR_EXTERN_INLINE MR_bool
+MR_compare_and_swap_uint(volatile MR_Unsigned *addr, MR_Unsigned old,
+        MR_Unsigned new_val);
 
 /*
 ** Atomically add to an integer in memory and retrieve the result.  In other
@@ -81,6 +84,8 @@ MR_compare_and_swap_word(volatile MR_Int
 */
 MR_EXTERN_INLINE MR_Integer 
 MR_atomic_add_and_fetch_int(volatile MR_Integer *addr, MR_Integer addend);
+MR_EXTERN_INLINE MR_Unsigned
+MR_atomic_add_and_fetch_uint(volatile MR_Unsigned *addr, MR_Unsigned addend);
 
 /*
 ** Atomically add the second argument to the memory pointed to by the first
@@ -88,6 +93,8 @@ MR_atomic_add_and_fetch_int(volatile MR_
 */
 MR_EXTERN_INLINE void
 MR_atomic_add_int(volatile MR_Integer *addr, MR_Integer addend);
+MR_EXTERN_INLINE void
+MR_atomic_add_uint(volatile MR_Unsigned *addr, MR_Unsigned addend);
 
 /*
 ** Atomically subtract the second argument from the memory pointed to by the
@@ -101,6 +108,8 @@ MR_atomic_sub_int(volatile MR_Integer *a
 */
 MR_EXTERN_INLINE void
 MR_atomic_inc_int(volatile MR_Integer *addr);
+MR_EXTERN_INLINE void
+MR_atomic_inc_uint(volatile MR_Unsigned *addr);
 
 /*
 ** Decrement the word pointed at by the address.
@@ -113,7 +122,9 @@ MR_atomic_dec_int(volatile MR_Integer *a
 ** zero after the decrement.
 */
 MR_EXTERN_INLINE MR_bool 
-MR_atomic_dec_int_and_is_zero(volatile MR_Integer *addr);
+MR_atomic_dec_and_is_zero_int(volatile MR_Integer *addr);
+MR_EXTERN_INLINE MR_bool 
+MR_atomic_dec_and_is_zero_uint(volatile MR_Unsigned *addr);
 
 /*
 ** For information about GCC's builtins for atomic operations see:
@@ -145,7 +156,7 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 : "=m"(*addr), "=q"(result), "=a"(old)                      \
                 : "m"(*addr), "r" (new_val), "a"(old)                       \
             );                                                              \
-            return (int) result;                                            \
+            return (MR_bool) result;                                        \
         } while (0)
 
 #elif defined(__GNUC__) && defined(__i386__)
@@ -160,18 +171,25 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 : "=m"(*addr), "=q"(result), "=a"(old)                      \
                 : "m"(*addr), "r" (new_val), "a"(old)                       \
                 );                                                          \
-            return (int) result;                                            \
+            return (MR_bool) result;                                        \
         } while (0)
 
 #endif
 
 #ifdef MR_COMPARE_AND_SWAP_WORD_BODY
     MR_EXTERN_INLINE MR_bool
-    MR_compare_and_swap_word(volatile MR_Integer *addr, MR_Integer old,
+    MR_compare_and_swap_int(volatile MR_Integer *addr, MR_Integer old,
             MR_Integer new_val) 
     {
         MR_COMPARE_AND_SWAP_WORD_BODY;
     }
+
+    MR_EXTERN_INLINE MR_bool
+    MR_compare_and_swap_uint(volatile MR_Unsigned *addr, MR_Unsigned old,
+            MR_Unsigned new_val)
+    {
+        MR_COMPARE_AND_SWAP_WORD_BODY;
+    }
 #endif
 
 /*---------------------------------------------------------------------------*/
@@ -179,22 +197,39 @@ MR_atomic_dec_int_and_is_zero(volatile M
 #if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)) && \
     !defined(MR_AVOID_COMPILER_INTRINSICS)
 
-    #define MR_ATOMIC_ADD_AND_FETCH_INT_BODY                                \
+    #define MR_ATOMIC_ADD_AND_FETCH_WORD_BODY                               \
         do {                                                                \
             return __sync_add_and_fetch(addr, addend);                      \
         } while (0)
 
+    #define MR_ATOMIC_ADD_AND_FETCH_INT_BODY MR_ATOMIC_ADD_AND_FETCH_WORD_BODY
+    #define MR_ATOMIC_ADD_AND_FETCH_UINT_BODY MR_ATOMIC_ADD_AND_FETCH_WORD_BODY
+
 #elif defined(MR_COMPARE_AND_SWAP_WORD_BODY)
     /*
     ** If there is no GCC builtin for this then it can be implemented in terms
     ** of compare and swap, assuming that that has been implemented in
     ** assembler for this architecture.
+    **
+    ** XXX: There is an add and exchange (xadd) instruction on x86, this is
+    ** better than the CAS loop below.
     */
     #define MR_ATOMIC_ADD_AND_FETCH_INT_BODY                                \
         do {                                                                \
             MR_Integer temp;                                                \
             temp = *addr;                                                   \
-            while (!MR_compare_and_swap_word(addr, temp, temp+addend)) {    \
+            while (!MR_compare_and_swap_int(addr, temp, temp+addend)) {     \
+                MR_ATOMIC_PAUSE;                                            \
+                temp = *addr;                                               \
+            }                                                               \
+            return temp+addend;                                             \
+        } while (0)
+
+    #define MR_ATOMIC_ADD_AND_FETCH_UINT_BODY                               \
+        do {                                                                \
+            MR_Unsigned temp;                                               \
+            temp = *addr;                                                   \
+            while (!MR_compare_and_swap_uint(addr, temp, temp+addend)) {    \
                 MR_ATOMIC_PAUSE;                                            \
                 temp = *addr;                                               \
             }                                                               \
@@ -211,12 +246,20 @@ MR_atomic_dec_int_and_is_zero(volatile M
     }
 #endif
 
+#ifdef MR_ATOMIC_ADD_AND_FETCH_UINT_BODY
+    MR_EXTERN_INLINE MR_Unsigned
+    MR_atomic_add_and_fetch_uint(volatile MR_Unsigned *addr, MR_Unsigned addend)
+    {
+        MR_ATOMIC_ADD_AND_FETCH_UINT_BODY;
+    }
+#endif
+
 /*---------------------------------------------------------------------------*/
 
 #if defined(__GNUC__) && defined(__x86_64__) && \
     !defined(MR_AVOID_HANDWRITTEN_ASSEMBLER)
 
-    #define MR_ATOMIC_ADD_INT_BODY                                          \
+    #define MR_ATOMIC_ADD_WORD_BODY                                         \
         do {                                                                \
             __asm__ __volatile__(                                           \
                 "lock; addq %2, %0"                                         \
@@ -225,9 +268,12 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 );                                                          \
         } while (0)
     
+    #define MR_ATOMIC_ADD_INT_BODY MR_ATOMIC_ADD_WORD_BODY
+    #define MR_ATOMIC_ADD_UINT_BODY MR_ATOMIC_ADD_WORD_BODY
+
 #elif defined(__GNUC__) && defined(__i386__)
     
-    #define MR_ATOMIC_ADD_INT_BODY                                          \
+    #define MR_ATOMIC_ADD_WORD_BODY                                         \
         do {                                                                \
             __asm__ __volatile__(                                           \
                 "lock; addl %2, %0;"                                        \
@@ -236,6 +282,9 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 );                                                          \
         } while (0)
 
+    #define MR_ATOMIC_ADD_INT_BODY MR_ATOMIC_ADD_WORD_BODY
+    #define MR_ATOMIC_ADD_UINT_BODY MR_ATOMIC_ADD_WORD_BODY
+
 #elif defined(MR_ATOMIC_ADD_AND_FETCH_INT_BODY)
 
     #define MR_ATOMIC_ADD_INT_BODY                                          \
@@ -243,6 +292,11 @@ MR_atomic_dec_int_and_is_zero(volatile M
             MR_atomic_add_and_fetch_int(addr, addend);                      \
         } while (0)
 
+    #define MR_ATOMIC_ADD_UINT_BODY                                         \
+        do {                                                                \
+            MR_atomic_add_and_fetch_uint(addr, addend);                     \
+        } while (0)
+
 #endif
 
 #ifdef MR_ATOMIC_ADD_INT_BODY
@@ -253,6 +307,14 @@ MR_atomic_dec_int_and_is_zero(volatile M
     }
 #endif
 
+#ifdef MR_ATOMIC_ADD_UINT_BODY
+    MR_EXTERN_INLINE void 
+    MR_atomic_add_uint(volatile MR_Unsigned *addr, MR_Unsigned addend)
+    {
+        MR_ATOMIC_ADD_UINT_BODY;
+    }
+#endif
+
 /*---------------------------------------------------------------------------*/
 
 #if defined(__GNUC__) && defined(__x86_64__) && \
@@ -300,7 +362,7 @@ MR_atomic_dec_int_and_is_zero(volatile M
 #if defined(__GNUC__) && defined(__x86_64__) && \
     !defined(MR_AVOID_HANDWRITTEN_ASSEMBLER)
 
-    #define MR_ATOMIC_INC_INT_BODY                                          \
+    #define MR_ATOMIC_INC_WORD_BODY                                         \
         do {                                                                \
             __asm__ __volatile__(                                           \
                 "lock; incq %0;"                                            \
@@ -309,11 +371,14 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 );                                                          \
         } while (0)
 
+    #define MR_ATOMIC_INC_INT_BODY MR_ATOMIC_INC_WORD_BODY
+    #define MR_ATOMIC_INC_UINT_BODY MR_ATOMIC_INC_WORD_BODY
+
 #elif defined(__GNUC__) && defined(__i386__) && \
     !defined(MR_AVOID_HANDWRITTEN_ASSEMBLER)
 
     /* Really 486 or better. */
-    #define MR_ATOMIC_INC_INT_BODY                                          \
+    #define MR_ATOMIC_INC_WORD_BODY                                         \
         do {                                                                \
             __asm__ __volatile__(                                           \
                 "lock; incl %0;"                                            \
@@ -322,6 +387,9 @@ MR_atomic_dec_int_and_is_zero(volatile M
                 );                                                          \
         } while (0)
 
+    #define MR_ATOMIC_INC_INT_BODY MR_ATOMIC_INC_WORD_BODY
+    #define MR_ATOMIC_INC_UINT_BODY MR_ATOMIC_INC_WORD_BODY
+
 #else
 
     /*
@@ -332,7 +400,9 @@ MR_atomic_dec_int_and_is_zero(volatile M
     **  - pbone
     */
     #define MR_ATOMIC_INC_INT_BODY                                          \
-        MR_atomic_add_int(addr, 1)                                          \
+        MR_atomic_add_int(addr, 1)
+    #define MR_ATOMIC_INC_UINT_BODY                                          \
+        MR_atomic_add_uint(addr, 1)
 
 #endif
 
@@ -344,6 +414,14 @@ MR_atomic_dec_int_and_is_zero(volatile M
     }
 #endif
 
+#ifdef MR_ATOMIC_INC_UINT_BODY
+    MR_EXTERN_INLINE void
+    MR_atomic_inc_uint(volatile MR_Unsigned *addr)
+    {
+        MR_ATOMIC_INC_UINT_BODY;
+    }
+#endif
+
 /*---------------------------------------------------------------------------*/
 
 #if defined(__GNUC__) && defined(__x86_64__) && \
@@ -399,11 +477,11 @@ MR_atomic_dec_int_and_is_zero(volatile M
 
 /*
 ** This could be trivially implemented using the __sync_sub_and_fetch compiler
-** intrinsic.  However on X86(_64) this will use a compare and exchange loop.
-** We can avoid this because we don't need to retrieve the result of the
+** intrinsic.  However on some platforms this could use a compare and exchange
+** loop. We can avoid this because we don't need to retrieve the result of the
 ** subtraction.
 */
-    #define MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY                              \
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY                             \
         do {                                                                \
             char is_zero;                                                   \
             __asm__(                                                        \
@@ -414,33 +492,56 @@ MR_atomic_dec_int_and_is_zero(volatile M
             return (MR_bool)is_zero;                                        \
         } while (0)
 
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+
 #elif defined(__GNUC__) && defined(__i386__)
     
-    #define MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY                              \
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY                              \
         do {                                                                \
             char is_zero;                                                   \
             __asm__(                                                        \
-                "lock: subl $1, %0; setz %1"                                \
+                "lock; subl $1, %0; setz %1"                                \
                 : "=m"(*addr), "=q"(is_zero)                                \
                 : "m"(*addr)                                                \
                 );                                                          \
             return (MR_bool)is_zero;                                        \
         } while (0)
 
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+
 #elif __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)
 
-    #define MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY                              \
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY                             \
         do {                                                                \
             return (__sync_sub_and_fetch(addr, 1) == 0);                    \
         } while (0)
 
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+    #define MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY \
+        MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
+
+#endif
+
+#ifdef MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY
+    MR_EXTERN_INLINE MR_bool 
+    MR_atomic_dec_and_is_zero_int(volatile MR_Integer *addr)
+    {
+        MR_ATOMIC_DEC_AND_IS_ZERO_INT_BODY;
+    }
 #endif
 
-#ifdef MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY
+#ifdef MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY
     MR_EXTERN_INLINE MR_bool 
-    MR_atomic_dec_int_and_is_zero(volatile MR_Integer *addr)
+    MR_atomic_dec_and_is_zero_uint(volatile MR_Unsigned *addr)
     {
-        MR_ATOMIC_DEC_INT_AND_IS_ZERO_BODY;
+        MR_ATOMIC_DEC_AND_IS_ZERO_UINT_BODY;
     }
 #endif
 
@@ -513,11 +614,11 @@ typedef MR_Unsigned MR_Us_Lock;
 #define MR_US_LOCK_INITIAL_VALUE (0)
 
 #define MR_US_TRY_LOCK(x)                                                   \
-    MR_compare_and_swap_word(x, 0, 1)
+    MR_compare_and_swap_uint(x, 0, 1)
 
 #define MR_US_SPIN_LOCK(x)                                                  \
     do {                                                                    \
-        while (!MR_compare_and_swap_word(x, 0, 1)) {                        \
+        while (!MR_compare_and_swap_uint(x, 0, 1)) {                        \
             MR_ATOMIC_PAUSE;                                                \
         }                                                                   \
     } while (0)
@@ -577,18 +678,29 @@ typedef MR_Unsigned MR_Us_Cond;
 */
 
 typedef struct {
-    MR_Unsigned         MR_stat_count_recorded;
-    MR_Unsigned         MR_stat_count_not_recorded;
         /*
-        ** The total number of times this event occurred is implicitly the
-        ** sum of the recorded and not_recorded counts.
+    ** The total number of times this event occurred is implicitly the sum of
+    ** the recorded and not_recorded counts.
         */
-    MR_int_least64_t    MR_stat_sum;
-    MR_uint_least64_t   MR_stat_sum_squares;
+    volatile MR_Unsigned    MR_stat_count_recorded;
+    volatile MR_Unsigned    MR_stat_count_not_recorded;
+
         /*
-        ** The sum of squares is used to calculate variance and standard
-        ** deviation.
+    ** Atomic instructions are used to update these fields, and these fields
+    ** must be 64 bit to contain the valid ranges of values.  However a 32 bit
+    ** machine cannot (usually) do atomic operations on 64 bit data.  Therefore
+    ** if we have fewer than 64 bits we protect these two fields with a lock.
+    **
+    ** The sum of squares is used to calculate variance and standard deviation.
         */
+  #if MR_LOW_TAG_BIGS >= 3 
+    volatile MR_Integer     MR_stat_sum;
+    volatile MR_Unsigned    MR_stat_sum_squares;
+  #else
+    MR_Us_Lock              MR_stat_sums_lock;
+    MR_int_least64_t        MR_stat_sum;
+    MR_uint_least64_t       MR_stat_sum_squares;
+  #endif
 } MR_Stats;
 
 typedef struct {
Index: runtime/mercury_context.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.c,v
retrieving revision 1.78
diff -u -p -b -r1.78 mercury_context.c
--- runtime/mercury_context.c	17 Feb 2010 02:37:45 -0000	1.78
+++ runtime/mercury_context.c	20 Mar 2010 10:02:07 -0000
@@ -1311,7 +1311,7 @@ MR_do_join_and_continue(MR_SyncTerm *jnc
      * accurate. 
      */
 
-    jnc_last = MR_atomic_dec_int_and_is_zero(&(jnc_st->MR_st_count));
+    jnc_last = MR_atomic_dec_and_is_zero_uint(&(jnc_st->MR_st_count));
 
     if (jnc_last) {
         if (this_context == jnc_st->MR_st_orig_context) {
Index: runtime/mercury_context.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_context.h,v
retrieving revision 1.59
diff -u -p -b -r1.59 mercury_context.h
--- runtime/mercury_context.h	10 Jan 2010 04:53:39 -0000	1.59
+++ runtime/mercury_context.h	20 Mar 2010 10:02:07 -0000
@@ -735,13 +735,13 @@ extern  void        MR_schedule_context(
   ** If you change MR_SyncTerm_Struct you need to update configure.in.
   **
   ** MR_st_count is manipulated via atomic operations, therefore it is declared
-  ** as volatile and an MR_Integer.
+  ** as volatile.
   */
 
   struct MR_SyncTerm_Struct {
     MR_Context      *MR_st_orig_context;
     MR_Word             *MR_st_parent_sp;
-    volatile MR_Integer MR_st_count;
+    volatile MR_Unsigned    MR_st_count;
   };
 
   #define MR_init_sync_term(sync_term, nbranches)                             \
Index: runtime/mercury_grade.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_grade.h,v
retrieving revision 1.81
diff -u -p -b -r1.81 mercury_grade.h
--- runtime/mercury_grade.h	11 Feb 2010 04:36:10 -0000	1.81
+++ runtime/mercury_grade.h	20 Mar 2010 10:02:07 -0000
@@ -63,7 +63,7 @@
 ** compatibility only in debugging and deep profiling grades respectively.
 */
 
-#define MR_GRADE_PART_0 v15_
+#define MR_GRADE_PART_0 v16_
 #define MR_GRADE_EXEC_TRACE_VERSION_NO  9
 #define MR_GRADE_DEEP_PROF_VERSION_NO   3
 
Index: runtime/mercury_threadscope.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_threadscope.c,v
retrieving revision 1.5
diff -u -p -b -r1.5 mercury_threadscope.c
--- runtime/mercury_threadscope.c	17 Feb 2010 03:44:13 -0000	1.5
+++ runtime/mercury_threadscope.c	20 Mar 2010 10:02:07 -0000
@@ -1291,8 +1291,17 @@ get_current_time_nanosecs(void)
     MercuryEngine       *eng = MR_thread_engine_base;
 
     current_tsc = MR_read_cpu_tsc();
+
+    if (MR_cpu_cycles_per_sec == 0) {
+        return current_tsc + eng->MR_eng_cpu_clock_ticks_offset;
+    } else {
+        /*
+        ** The large constant (10^9) here converts seconds into
+        ** nanoseconds.
+        */
     return (current_tsc + eng->MR_eng_cpu_clock_ticks_offset) / 
         (MR_cpu_cycles_per_sec / 1000000000);
+    }
 }
 
 /***************************************************************************/
Index: runtime/mercury_wsdeque.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wsdeque.c,v
retrieving revision 1.1
diff -u -p -b -r1.1 mercury_wsdeque.c
--- runtime/mercury_wsdeque.c	11 Oct 2007 11:45:22 -0000	1.1
+++ runtime/mercury_wsdeque.c	20 Mar 2010 10:02:07 -0000
@@ -85,7 +85,7 @@ MR_wsdeque_steal_top(MR_SparkDeque *dq, 
     }
 
     *ret_spark = MR_sa_element(arr, top);
-    if (!MR_compare_and_swap_word(&dq->MR_sd_top, top, top + 1)) {
+    if (!MR_compare_and_swap_int(&dq->MR_sd_top, top, top + 1)) {
         return -1;  /* abort */
     }
 
Index: runtime/mercury_wsdeque.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wsdeque.h,v
retrieving revision 1.2
diff -u -p -b -r1.2 mercury_wsdeque.h
--- runtime/mercury_wsdeque.h	15 Dec 2009 02:29:07 -0000	1.2
+++ runtime/mercury_wsdeque.h	20 Mar 2010 10:02:07 -0000
@@ -149,7 +149,7 @@ MR_wsdeque_pop_bottom(MR_SparkDeque *dq,
     }
 
     /* size = 0 */
-    success = MR_compare_and_swap_word(&dq->MR_sd_top, top, top + 1);
+    success = MR_compare_and_swap_int(&dq->MR_sd_top, top, top + 1);
     dq->MR_sd_bottom = top + 1;
     return success;
 }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20100322/8ca22d14/attachment.sig>


More information about the reviews mailing list