[m-dev.] Lock Elision in eglibc 2.19

Paul Bone paul at bone.id.au
Wed Jun 25 21:07:01 AEST 2014


On Sat, Jun 21, 2014 at 08:20:16PM +0200, Andi Kleen wrote:
> On Sat, Jun 21, 2014 at 11:14:13AM +1000, Paul Bone wrote:
> > I noticed the following error:
> > mercury_compile: ../nptl/pthread_mutex_lock.c:80: __pthread_mutex_cond_lock: Ass
> > ertion `mutex->__data.__owner == 0' failed.
> > 
> > This is thrown (indirectly) from a call to pthread_cond_wait in
> > pthread_support.c line 2036 in Boehm GC 7.4.2  I have the same problem with
> > Boehm Gc 7.2.  There doesn't appear to be anything suspicious about the use
> > of the mutex or condition variable involved here.
> 
> This could be a variant of 
> https://sourceware.org/bugzilla/show_bug.cgi?id=16657
> 
> Do the patches there help?

I'll test these if I get time.  It'll have to wait for a good weekend. ;-)
I'll report back with information after I've tried so that we know if this
or a different patch is required.

> > A different bug affecting libtirpc and mount_nfs also started occuring when
> > I upgraded to eglibc 2.19.  When investigating this I found that eglibc 2.19
> 
> Likely related to locking?

Yes.  If I recall correctly they unlocked a lock with a thread that didn't
own the lock.  For some reason this used to work but since elision has been
introduced it now segfaults on a xend (I think, I don't remember the name
exactly) instruction.

> > As a work-around I'd like to explicitly disable elision for this mutex.
> > I've searched the glibc/eglibc sources and documentation and haven't found a
> > way to disable elision.  But some things I read (mailing list messages etc)
> > say that it should be possible either per mutex or completely (with an
> > environment variable).  Could you tell me how?  Thanks.
> 
> My patches to do this were unfortunately not accepted. glibc
> supports it internally but there is no way to request it 
> for user programs. I hope this can be revisited in the future.
> 
> The old tuning patches are in my github tree in the rtm-devel9 branch.
> http://github.com/andikleen/glibc

I've found that creating the mutex with the error checking attribute - which
is already supported and portable, avoids the crash.  So at this point the
issue isn't critical anymore although it's probably still important to fix.

I've submitted a patch to workaround this to the Boehm GC project:
https://lists.opendylan.org/pipermail/bdwgc/2014-June/005962.html

> > I have a second question that is less important, but I'd like to understand
> > nevertheless.  Your LWN article suggests that the entire critical section
> > (from pthread_mutex_lock to pthread_mutex_unlock) is a transactional memory
> > transaction.  Have I understood correctly?
> 
> Yes.
> 
> > If so, why not just start and
> > finish the transactional memory transaction within the pthread_mutex_lock
> > code?  That is, after acquiring the lock, finish the TM transaction so that
> > the processor doesn't need to handle all the memory use until the
> > pthread_mutex_unlock call specially.
> 
> The point of lock elision is to allow full parallelism of the critical
> section including all memory accesses in it. So the transaction 
> has to span the whole critical section, otherwise atomicity couldn't
> be guaranteed.

Okay that makes sense.  I did enough reading to learn that if elision fails
(say because of a buffer overflow or a system call) then NTPL can recover.
And that then it's less likely that NTPL will try to use elision on the
future.  So I'm less concerned about using this with large transactions.

> Here's a newer article that has some more details:
> 
> http://queue.acm.org/detail.cfm?id=2579227

Thanks, and thanks for all the information.


-- 
Paul Bone



More information about the developers mailing list