[m-dev.] GCC 4.6, 4.7 and parallel grades

Paul Bone paul at bone.id.au
Wed Jul 3 08:52:47 AEST 2013


On Wed, Jul 03, 2013 at 01:21:13AM +1000, Julien Fischer wrote:
>
>
> On Wed, 3 Jul 2013, Paul Bone wrote:
>
>> On Tue, Jul 02, 2013 at 10:58:45PM +1000, Julien Fischer wrote:
>>> On Tue, Jul 2, 2013 at 9:09 PM, Paul Bone <paul at bone.id.au> wrote:
>>>
>>>>
>>>> A user reported to me a bug whereby a program (not even a parallel one) can
>>>> crash (segfault) in a parallel low-level C grade.
>>>>
>>>> I've narrowed this down to GCC 4.6 and 4.7, (maybe later versions too), and
>>>> the -freorder-functions optimisation that is enabled at -O2.  I think that
>>>> calling a clusure is somehow involved - perhaps it's what is required to
>>>> make the bug come to the surface.
>>>>
>>>> Does anyone know how these three things: -freorder-functions,
>>>> parallelism/thread safty and closures may cause such a program.  In
>>>> particular do we at any point rely on the order that functions appear in an
>>>> executable's .text area?  This may just be an edge case that we havn't hit
>>>> before with Mercury's use of non-local gotos.
>>>>
>>>> I've disassembled both object files (working and broken) for the same
>>>> program and diffed them, there's no difference.  There is a difference in
>>>> the disassembly of the _init.o file, the main function is placed in a new
>>>> section named .text.startup.  I can't imagine how this could contribute to
>>>> the problem.
>>>
>>>
>>> What about in the object file for the runtime?  (Particularly, any contain
>>> code
>>> that deals with closures.)
>>>
>>
>> I could fix the problem by changing the C options for the application alone.
>> In both cases the runtime and standard library were identical.  This was the
>> mandelbrot application, which is quite small and if executed with the right
>> (default?) options doesn't even call list.map.  The closure that triggered the
>> bug was called from mandelbrot.my_map, which is the same as list map but
>> included in the application for auto-parallelism tests.
>
> As a starting point, I suggest looking at the following:
>
> Does the problem occur if the program is built with -freorder-functions,
> but the _init.o file is not?

I had considered this.  But it's only curiosity at this point as
-fno-reorder-functions fixes the problem.

> Does the problem occur at lower optimisation levels if
> -freorder-functions is enabled?
>
> Have you tried reproducing the bug in the reg.gar.par or none.gc.par
> grades?  (That should at least enable you to rule out whether it's an
> issue with non-local gotos or not.)

That is my next intended step.

> Does the problem occur if in a nogc grade, e.g. asm_fast.par?
> (Is the GC doing something odd?)
>

I'll check these out because it'd be good to know why this is happening.


-- 
Paul Bone
http://www.bone.id.au



More information about the developers mailing list