[m-dev.] Mercury and GCC split stacks

Paul Bone paul at bone.id.au
Mon Sep 8 12:12:05 AEST 2014

On Mon, Sep 08, 2014 at 02:13:25AM +0200, Zoltan Somogyi wrote:
> On Mon, 8 Sep 2014 00:08:54 +1000, Paul Bone <paul at bone.id.au> wrote:
> > If it's only in address space then I don't think ti would be excessive on a
> > 64bit system.
> > 
> > If we assume 10,000 threads with an average of 10 stack segments each that's
> > 100,000 stack segments  with one protected page is 400,000KB of memory
> > (assuming 4KB pages) or 409,600,000 bytes.  It's true that this would be
> > excessive if these pages were mapped to physical memory.
> Unfortunately, it is not only address space. Address space must be mapped
> by page table entries, and these are cached in TLBs. TLBs are typically the
> smallest caches in a CPU (since each entry corresponds to a page's worth
> of data, not e.g. a 64 byte cache block worth of data). There are some CPUs
> today whose TLBs are too small to hold the PTEs required to address just
> the data items that fit into their the last level cache. These can suffer
> more from TLB misses than from cache misses.

If you don't hit the protected pages often, then they shouldn't be present
in these caches most of the time.

> The best way to avoid such problems is to use whatever support for large pages
> (sometimes called superpages) the TLB supports, since in that case a TLB entry
> can map e.g. 4Mb, not just e.g. 4kb.

Ah, so you can trade-off between TLB utilisation and address space
utilisation.  Which is good because the latter is under much less pressure.

> However, the OS can replace an aligned sequence of regular-sized virtual
> pages with one such superpage only if (a) those regular pages are loaded into
> contiguous locations in main memory, and (b) they all have the exact
> same permissions. If those regular sized pages hold stack segments
> with redzones, the second condition will virtually never be true.
> I haven't yet heard of any version of Linux that employs superpages
> like this automatically, but all recent versions will do that for you
> if you explicitly ask them to. This problem with redzones prevents us
> from asking, and prevents us to benefiting from any future automatic
> employment of superpages.

What if your stack segments were 4MB in size (or a multiple) plus 4kb for
the protected page and were 4MB aligned.  Then the protected page at the top
(assuming stacks grow up) falls into a new 4MB area.  Then I hope that the
OS can use 4MB pages for the main part of the stack and 4kb pages for the
ends.  Or if we allocated 4MB protected areas at the ends of stacks then the
OS can probably still make this optimisation.  Again, even 4MB protected
areas are "probably okay", using less than 0.000001% of address space in my
example.  Yes, I know, I sound a lot like "640KB will be enough for almost

I understand that we probably can't answer these question without testing
it, and under different OSs.

Paul Bone

More information about the developers mailing list