[m-dev.] GCC 4.6, 4.7 and parallel grades

Julien Fischer jfischer at opturion.com
Wed Jul 3 01:21:13 AEST 2013



On Wed, 3 Jul 2013, Paul Bone wrote:

> On Tue, Jul 02, 2013 at 10:58:45PM +1000, Julien Fischer wrote:
>> On Tue, Jul 2, 2013 at 9:09 PM, Paul Bone <paul at bone.id.au> wrote:
>>
>>>
>>> A user reported to me a bug whereby a program (not even a parallel one) can
>>> crash (segfault) in a parallel low-level C grade.
>>>
>>> I've narrowed this down to GCC 4.6 and 4.7, (maybe later versions too), and
>>> the -freorder-functions optimisation that is enabled at -O2.  I think that
>>> calling a clusure is somehow involved - perhaps it's what is required to
>>> make the bug come to the surface.
>>>
>>> Does anyone know how these three things: -freorder-functions,
>>> parallelism/thread safty and closures may cause such a program.  In
>>> particular do we at any point rely on the order that functions appear in an
>>> executable's .text area?  This may just be an edge case that we havn't hit
>>> before with Mercury's use of non-local gotos.
>>>
>>> I've disassembled both object files (working and broken) for the same
>>> program and diffed them, there's no difference.  There is a difference in
>>> the disassembly of the _init.o file, the main function is placed in a new
>>> section named .text.startup.  I can't imagine how this could contribute to
>>> the problem.
>>
>>
>> What about in the object file for the runtime?  (Particularly, any contain
>> code
>> that deals with closures.)
>>
>
> I could fix the problem by changing the C options for the application alone.
> In both cases the runtime and standard library were identical.  This was the
> mandelbrot application, which is quite small and if executed with the right
> (default?) options doesn't even call list.map.  The closure that triggered the
> bug was called from mandelbrot.my_map, which is the same as list map but
> included in the application for auto-parallelism tests.

As a starting point, I suggest looking at the following:

Does the problem occur if the program is built with -freorder-functions,
but the _init.o file is not?

Does the problem occur at lower optimisation levels if
-freorder-functions is enabled?

Have you tried reproducing the bug in the reg.gar.par or none.gc.par
grades?  (That should at least enable you to rule out whether it's an
issue with non-local gotos or not.)

Does the problem occur if in a nogc grade, e.g. asm_fast.par?
(Is the GC doing something odd?)

Cheers,
Julien.



More information about the developers mailing list