[m-dev.] 4 tag bits

Zoltan Somogyi zoltan.somogyi at runbox.com
Thu Jul 7 18:11:52 AEST 2016



On Thu, 7 Jul 2016 11:26:36 +1000, Paul Bone <paul at bone.id.au> wrote:
> Also on this topic is the possibility of using high tag bits.  I raised this
> as a possibility several years ago but there were doubts about the
> guarentees that the OS could provide.
> 
> If I remember correctly most x86_64 platforms, those based on the SysV ABI,
> use something like a 48bit address space for user programs, leaving the top
> 16 bits clear and therefore available for tag bits.

Actually, it depends on the hardware designers. All the current 64-bit CPUs
I know of use the scheme where the top N bits are guaranteed to be a copy
of the most significant bit*. Since they also all use the scheme where the MSB
demarcates between user space and kernel space (0 = user, 1 = kernel),
and since Mercury code won't be found in the kernel, this effectively guarantees
for us that the top N bits in addresses will be zero. The value of N of course
depends on the specific chip design.

*CPU designers imposed this scheme after the experience of the transition
from the IBM System 360 to the System 370. The System 360 used 32 bit words
but 24 bit addresses. Since each word containing an address had eight bits free,
people used to put all kinds of flags in there. When the System 370 came out
with 31 bit addresses, all these programs had to be rewritten. Imposing this
scheme (by adding circuits that check whether the top N bits of the addresses
used in load/store instructions are all the same, and abort the program
if they aren't) avoids the temptation to write such guaranteed-to-become-
obsolete-soon code.

> The drawback here is that this would make it _much_ more difficult for the
> collector to decern a pointer from other data.  This is probably easy enough
> to add to the collector but may result in the retention of more memory since
> many more values could be mistaken for pointers.

That is one drawback.

> I don't propose to add this (yet) but wanted to add it to this thread of
> discussion.

At the very beginning of the Mercury project, before Mercury programs started
running long enough to *need* a garbage collector, we tried using high tag bits;
that is why there is some long-obsolete support for it in the runtime.
NU-Prolog used six high tag bits: four as a kind of tag, and two for its
own gc, so we looked at using six high tag bits as well, to provide
a basis for a slightly more apples-to-apples comparison.

We found that even using just the two bottom tag bits available then
was faster than using six high tag bits. This is because (a) adding a tag
to the address of a newly allocated block, (b) checking the tag of an
existing block, and (c) stripping that tag off an address for dereferencing
all require more instructions (extra shifts), and these shifts both depend on
the previous instructions (usually) and have the later instructions depend
on them (almost always), so they cannot be executed in parallel with their
neighbouring instructions on superscalar machines.

Those benchmarks were done on MIPS and SPARC machines, not x86 CPUs,
and on machine microarchitectures that are now almost a quarter of a century old.
Nevertheless, since the above is still true, and since the gc over-retention
of memory is an additional source of probably-significant slowdowns,
I am pretty sure that using high tag bits is still slower than using low tag bits.
I am also pretty sure that this won't change until something in the above situation
changes, and I don't think that will happen anytime soon.

Zoltan.




More information about the developers mailing list