[m-rev.] For review: Workaround Linux NTPL TSX bug (Mercury Bug: 334)

Paul Bone paul at bone.id.au
Fri Jun 27 11:27:40 AEST 2014


On Fri, Jun 27, 2014 at 10:27:02AM +1000, Peter Wang wrote:
> On Thu, 26 Jun 2014 18:25:28 +1000, Paul Bone <paul at bone.id.au> wrote:
> > Branches: master, version-14_01-branch
> > 
> > Workaround Linux NTPL TSX bug (Mercury Bug: 334)
> > 
> > New versions of glibc on x86_64 attempt to use the TSX extension of newer
> > Intel processors.  This converts mutex-protected critical sections into
> > transactional memory critical sections.  However the implementation appears
> > to be buggy and the marker lock in Boehm GC causes an assertion to be
> > triggered.
> 
> Hi Paul,
> 
> Wouldn't this problem potentially affect every other mutex as well?
> If so, continuing to work on top of this workaround seems risky.

Theoretically yes, however it depends on the cause of the problem which I'm
not certain about.  The workaround above makes things reliable enough that I
can bootcheck in asm_fast.gc.par.stseg, which is pretty good.

I'm guessing, but I think that what's happening is that:

    1) This mutex is held for a long time, meaning that there's a high
       chance that the transaction is aborted due to a buffer overflow (in the
       processor's TSX implementation) or system call or context switch.
    2) The mutex is used with a condition variable, which is parhaps related
       to the bug in NTPL.

If I'm right about this then it should be possible to create a small test
program that triggers the bug.

Of course my change here is only a workaround.  The real fix is patching
glibc.


-- 
Paul Bone



More information about the reviews mailing list