[m-dev.] 4 tag bits

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Jul 6 19:38:57 AEST 2016



On Wed, 6 Jul 2016 14:32:12 +1000, Michael Day <mikeday at yeslogic.com> wrote:
> > :- type foo
> >      --->    foo1(thing)
> >              /* 8 things */
> >      ;       foo2(thing, thing, thing, thing, thing, thing, thing, thing).
> >
> > Then instances of foo2 are 8 words (4 granules) big, and may go on a page
> > for 4 granule allocations and have more tag bits available.  However foo1
> > only has the usual 4 tag bits, because it will not be aligned.  To make use
> > of this optimisation we can only use the minimum number of tag bits of a
> > type's constructors.
> 
> Ah yes that kills the idea.
> 
> Michael

Actually, I don't think it does.

Just to be clear about assumptions: we are talking about the case
where (on a 64 bit system) Boehm allocates a block that is bigger than
2^N words on a 2^(N+1) boundary, and allocates every block on
at least a 2^4 boundary. (This is the assumption; I don't know whether
it is true or not.) For the sake of simplicity of discussion, I am also
considering only primary tag bits, though secondary tag bits will
still be required in general.

Consider a type that has 15 functors foo1a(thing), ... foo1o(thing),
and four functors foo2a(thing1, ... thing8) ... foo2d(thing1, ... thing8).
The one-word blocks of the foo1? functors can then count on only 4-bit
alignment, while the eight-word blocks of the foo2? functors can count
on 6-bit alignment.

We could then allocate the four-bit primary tag values 0 to 14
to foo1a through foo1o, and assign the four-bit primary tag value
15 to be shared by foo2a/b/c/d, distinguished by the two bits
just before the bottom-most four bits.

Alternatively, this arrangement could be thought of as using six primary tag
bits for this type, assigning four tag values to each of the foo1? functors,
so that any word that has any of the tag values (e.g.) 001000, 011000, 101000,
or 111000 means foo1i.

Basically, all we need is a decision tree for all the functors in which the
decisions that arrive at a given functor examine a bit only if that bit
is within the "alignment zone" of the allocation size that this specific
functor will use. So if a functor's size guarantees a 6 bit aligment, then
the tree could use any of the bottom 6 bits, but if it guarantees only
4-bit aligment, then it can use only the bottom four bits.

This would work, though it would require a more drastic change than just
a change to the number of primary tag bits we use. We would need
to change the tag allocation algorithm, which currently assumes a fixed
number of tag bits, to break that assumption and to take block size
into account when deciding how many primary tag bits are available
for a functor. It would definitely require changing the RTTI structures
(which currently also assume a fixed number of tag bits), and therefore
all the code that generates or uses RTTI structures. It would also require
rewriting the code that generates tag switches. In cases such as the one
above, in which different functors use different numbers of primary tag bits,
the generated code would get either slower (doing two or more switches
on some of the ptag bits, not just one) or bigger (having more than one
alternative in a single switch on the biggest number of ptag bits (6 in the
case above corresponding to a functor, such as foo1a, that uses a
non-maximal number of ptag bits).

The real question is: would this buy us anything worthwhile? How frequent
are types that have more than 16 functors in which two or more functors
would get 5+ bit alignment? In my experience, almost all types with
that many functors are enums.

Zoltan.



More information about the developers mailing list