[m-dev.] deleting tags_high

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Jul 27 11:50:04 AEST 2018



On Fri, 27 Jul 2018 10:54:47 +1000, Paul Bone <paul at bone.id.au> wrote:
> It surprises me that tags_low is faster. 

It shouldn't. With low tags, given a tagged pointer, subtracting the known
primary tag (e.g. -1) and adding the offset of the desired field (e.g. +8)
to the value of the base pointer can be done together (by adding -1+8 = +7
to the value of the tagged pointer). Load and store instructions in most
instruction sets include the ability to specify such small offsets, so the arithmetic
is not incurring any cost over the cost of the load or store itself. With high tags,
the combined offsets cannot possibly fit into the load or store instruction
(they would need 64 bits, whereas load/store instructions typically have room
for 8 or 16 bits). The masking off of the high tag will typically take two or three
instructions (load the tag value constant in the low order bits of a register,
shift it up to the required bit positions, then mask), and *then* do the load
or store. Since the address to access depends on the result of the mask op,
there is no room for instruction level parallelism either. This is why tags high
is not just slower, but significantly slower than tags low. It also has more instructions,
making the I-cache less effective.

> Hrm, although the GC would need to
> handle this the way it handles low bits, which I don't know but it might
> treat them as interior pointers. 

Yes, we register 0-3 (on 32 bit machines) and 0-7 (on 64 machines)
as possible offsets for boehm gc, so any value at any such offset to
the start of an allocated block counts as a pointer to the block.

> One benefit of tags_high is that on x86_64 (at least at the moment) there
> are 16 bits available at the high end of a pointer.
> 
> https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details

Look up reports of the transition from the IBM S/360 to the S/370 in the 1970s.
Both used 32 bits words, but addresses were only 24 bits on the S/360.
People who stashed other info in the top 8 bits of words containing pointers
regretted it when the S/370 came along, which extended pointer size to 32 bits
(though IBM literature calls them 31 bit, since the top bit had another use).

The rule "In addition, the AMD specification requires that the most significant
16 bits of any virtual address, bits 48 through 63, must be copies of bit 47"
on the page you link to was specified by AMD specifically to avoid any similar
problems. I know because one of the architects involved said so on comp.arch
at the time.

However, just because there are 16 bits available NOW does not mean that
those same 16 bits will ALWAYS be available. The page you link to even has
diagrams illustrating this fact.

> So if using 16 or maybe 17 bits as high tag bits sounds interesting, this is
> an option.  I don't know if high tag bits will be better or worse, but I do
> think it is interesting.

For the reason I gave above, it will be worse.

> Also remember that BoehmGC's minimum allocation and alignment is two machine
> words.  So you could use 4 or 3 low tag bits.

Not all memory cells containing Mercury terms are in memory allocated by boehm;
some are in statically generated data structures. We would need to ensure that
these are aligned on 16 byte boundaries as well. Does anyone know how portable
gcc's __attribute__((aligned(16))) is?

(It would be nice to have someone on the C standardization committee again :-)

Zoltan.


More information about the developers mailing list