[m-dev.] deleting tags_high

Paul Bone paul at bone.id.au
Fri Jul 27 10:54:47 AEST 2018


On Thu, Jun 14, 2018 at 03:35:31PM +0200, Zoltan Somogyi wrote:
> Currently we support three tag_methods:
> 
> - storing primary tags in the least significant 2 or 3 bits of a word,
>   (tags_low)
> 
> - storing primary tags in the most significant N bits of a word, where
>    N may be more than 3 (tags_high)
> 
> - not storing primary tags bits at all, distinguishing functors using
>    tags stored in memory (tags_none).
> 
> I propose that we delete support for tags_high. We started with that
> because we knew it worked, since it is the method that Prolog systems
> have traditionally used. However, once we got tags_low working,
> I don't think we ever used tags_high in anger more than once,
> and that one occasion was for benchmarks to show that tags_low
> was faster :-(

It surprises me that tags_low is faster.  Hrm, although the GC would need to
handle this the way it handles low bits, which I don't know but it might
treat them as interior pointers.  Oh, one problem could be that more bit
patterns could be considered pointers, previously any bit pattern with one
of these bits set would be definitely not a pointer.  So using high tag bits
could retain more memory.

One benefit of tags_high is that on x86_64 (at least at the moment) there
are 16 bits available at the high end of a pointer.

https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details

When I've mentioned this in the past the concern was that it might change in
the future, or we may have to support something different on other
architectures.

But I learnt something new recently.  Because SpiderMonkey (the JavaScript
engine in Firefox) uses NaN boxing to represent values.  That is numbers in
JS are doubles, so objects are encoded by using the lower 48 bits of some
encodings of NaN, and other NaNs generated by arithmetic expressions are
normalised.  Therefore all pointers to objects need to have their high 17
bits clear.  But we basically get that for free because of the address space
behaviour on x86_64.  It can be a problem on other architectures though, and
when it is mmap can be given a hint about where in the virtual address space
to allocate pages:

https://searchfox.org/mozilla-central/source/js/src/gc/Memory.cpp#554

So if using 16 or maybe 17 bits as high tag bits sounds interesting, this is
an option.  I don't know if high tag bits will be better or worse, but I do
think it is interesting.

Also remember that BoehmGC's minimum allocation and alignment is two machine
words.  So you could use 4 or 3 low tag bits.


-- 
Paul Bone
http://paul.bone.id.au


More information about the developers mailing list