[m-dev.] GCC 4.6, 4.7 and parallel grades
Paul Bone
paul at bone.id.au
Wed Jul 3 08:52:47 AEST 2013
On Wed, Jul 03, 2013 at 01:21:13AM +1000, Julien Fischer wrote:
>
>
> On Wed, 3 Jul 2013, Paul Bone wrote:
>
>> On Tue, Jul 02, 2013 at 10:58:45PM +1000, Julien Fischer wrote:
>>> On Tue, Jul 2, 2013 at 9:09 PM, Paul Bone <paul at bone.id.au> wrote:
>>>
>>>>
>>>> A user reported to me a bug whereby a program (not even a parallel one) can
>>>> crash (segfault) in a parallel low-level C grade.
>>>>
>>>> I've narrowed this down to GCC 4.6 and 4.7, (maybe later versions too), and
>>>> the -freorder-functions optimisation that is enabled at -O2. I think that
>>>> calling a clusure is somehow involved - perhaps it's what is required to
>>>> make the bug come to the surface.
>>>>
>>>> Does anyone know how these three things: -freorder-functions,
>>>> parallelism/thread safty and closures may cause such a program. In
>>>> particular do we at any point rely on the order that functions appear in an
>>>> executable's .text area? This may just be an edge case that we havn't hit
>>>> before with Mercury's use of non-local gotos.
>>>>
>>>> I've disassembled both object files (working and broken) for the same
>>>> program and diffed them, there's no difference. There is a difference in
>>>> the disassembly of the _init.o file, the main function is placed in a new
>>>> section named .text.startup. I can't imagine how this could contribute to
>>>> the problem.
>>>
>>>
>>> What about in the object file for the runtime? (Particularly, any contain
>>> code
>>> that deals with closures.)
>>>
>>
>> I could fix the problem by changing the C options for the application alone.
>> In both cases the runtime and standard library were identical. This was the
>> mandelbrot application, which is quite small and if executed with the right
>> (default?) options doesn't even call list.map. The closure that triggered the
>> bug was called from mandelbrot.my_map, which is the same as list map but
>> included in the application for auto-parallelism tests.
>
> As a starting point, I suggest looking at the following:
>
> Does the problem occur if the program is built with -freorder-functions,
> but the _init.o file is not?
I had considered this. But it's only curiosity at this point as
-fno-reorder-functions fixes the problem.
> Does the problem occur at lower optimisation levels if
> -freorder-functions is enabled?
>
> Have you tried reproducing the bug in the reg.gar.par or none.gc.par
> grades? (That should at least enable you to rule out whether it's an
> issue with non-local gotos or not.)
That is my next intended step.
> Does the problem occur if in a nogc grade, e.g. asm_fast.par?
> (Is the GC doing something odd?)
>
I'll check these out because it'd be good to know why this is happening.
--
Paul Bone
http://www.bone.id.au
More information about the developers
mailing list