[m-dev.] Article about memory fragmentation

Paul Bone paul at bone.id.au
Mon Oct 10 11:35:31 AEDT 2016


On Mon, Oct 10, 2016 at 11:06:53AM +1100, Zoltan Somogyi wrote:
> 
> 
> On Mon, 10 Oct 2016 10:40:13 +1100, Paul Bone <paul at bone.id.au> wrote:
> > I received some feedback via twitter regarding the Memory Pool System (MPS),
> > another GC for uncooperative environments.
> > 
> > https://twitter.com/glaebhoerl/status/784737682706538496
> > 
> > IIRC we've tried this in the past and found that it didn't perform as well
> > as BDWGC.  But I wasn't involved with Mercury at the time and things may
> > have changed.  @glaebhoerl provided this link
> > https://gitlab.com/embeddable-common-lisp/ecl/issues/126 suggesting that it
> > might be worth a look as ti may perform better.  I know very little about
> > MPS so there's not much I can say about this.  But I said I'd pass the
> > information on to the other Mercury devs.
> 
> When we last tried MPS, MPS wasn't just slower than BDW. Reasonably often,
> in maybe 10-20% of cases, it was *much* slower, as in go-get-a-cup-of-coffee
> slower. This was despite the fact that our reason for looking into it then was that
> it was said to be faster than BDW at the time.
> 
> Even if it has improved in the meantime, I am pretty sure there is no point
> in looking into it any further, for two reasons. One, I am skeptical that they
> could improve things dramatically enough to catch up to BDW, at least for
> usage scenarios like ours. Two, with today's memory sizes, gc is, in most
> cases, a relatively small fraction of execution time. Therefore even improvements
> in the 10-20% range in gc time would translate to only very minor improvements
> in overall execution time. The investment required just isn't worth it.

When I was last benchmarking the impact of GC (2011) I found that it
accounted for a lot of the program's runtime.  These figures are from
section 3.1 of my thesis, for the icfp2000 benchmark, which probably has a
slightly higher than average allocation rate.

    1 Mutator thread and 1 GC thread:
        Mutator 20.9s   55.2%
        GC      16.9s   44.8%
        Total   37.8s
        There were 384 full heap collections.

    4 Mutator threads and 4 GC threads:
        Mutator  6.4s   47.6%
        GC       7.1s   52.4%
        Total   13.5s
        There were 288 full heap collections.

When parallelising either the mutator or the collector, the ratio is closer
to 7:3  You're right that Amdahl's law applies here, but GC is quite
significant for us.

To collect data like this you can set GC_PRINT_STATS=1 in your environment
and add together the collection times (or multiply the average by the
number of collections) and compare this to the total time.

Nevertheless I agree that trying MPS is probably not going to to be a great
use of our time.  I don't know of any specific reason why it would be faster
than BDW GC, why it would have changed from when it was measured earlier.

Cheers.

-- 
Paul Bone
http://paul.bone.id.au


More information about the developers mailing list