[m-dev.] Fwd: Re: arg packing of structured values

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Jul 27 12:49:42 AEST 2018


I accidentally sent this only to Julien, instead of the whole list.

----- Start Forwarded Message -----
Sent: Fri, 27 Jul 2018 04:35:26 +0200 (CEST)
From: "Zoltan Somogyi" <zoltan.somogyi at runbox.com>
To: "Julien Fischer" <jfischer at opturion.com>
Subject: Re: [m-dev.] arg packing of structured values



On Fri, 27 Jul 2018 10:27:11 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
> On Thu, 26 Jul 2018, Zoltan Somogyi wrote:
> 
> > Given the types
> >
> > :- type t1
> >    --->    t1f1(bool, bool).
> >
> > :- type t2
> >    --->    t2f1(bool, bool, bool).
> >
> > that the compiler will soon be able to represent in less than
> > one word each (I am working on that now), consider this type:
> >
> > :- type t
> >    --->    tf(t1, int8, t2).
> >
> > With our current type representation, this will occupy three words
> > on the heap even if argument packing is turned on, because we can
> > now pack two or more arguments into a word only if the arguments
> > are either enums or sub-word-sized integers. While types t1 and t2
> > fall into neither category, their representations *do* occupy a known,
> > small number of bits. It should therefore be possible to represent
> > values of type t in less than one word, *without* any heap allocation,
> > by giving them the same representation as we would give to
> >
> > :- type t'
> >    --->    tf'(bool, bool, int8, bool, bool, bool).
> >
> > Something similar should also be possible if t1 and t2 have
> > more than one function symbol, as long as *all* of them
> > are representable in less than a word.
> >
> > My question for you guys is: would this be a useful thing to implement?
> 
> Another question: as a programmer how much, if any, control would I (or
> indeed the compiler) have over it occurring?  If my program spends a lot
> of its time manipulating t1 and t2 values as independent entities then
> it may be better for t values to just contain pointers to those
> arguments.  (I'm thinkg of the case where the first and third arguments
> of the t value above are extracted many times.) On the other hand, if
> the program really uses t as a record then the flattening proposed here
> would be worthwhile.

My proposal leaves the representations of t1 and t2 completely unaffected,
so code that manipulates values of those types would be unaffected as well.
The only change is that we would now consider types such as t1 and t2 to be
packable in almost the same way enums and {int,uint}{8,16,32}s are packable.

What I am proposing is that the representation of t be changed so that
instead of a pointer to a three-word memory cell containing (pointer to t1,
an int8 with padding, and a pointer to t2), it should be a word containing
(ptag value 0, two bits for the bools in t1, 8 bits for the int8, three bits
for the bools in t2). Constructing a value of type t would extract the two
and three bools from the t1 and t2 args, and put them into the word
with the ptag and the int8, while deconstructing that value would shift and
mask e.g. the selected two bits out of this word, put a zero ptag in front of them,
and assign the result to the variable representing the first arg (of type t1),
and likewise for the third arg (of type t2). These masks and shifts
should be *faster* than the loads or stores for any representation
that uses pointers.

It t1 or t2 had more than one function symbol which their representations
distinguished by the ptag. we would need to store the ptag as well inside t
as well. We can omit storing the ptag of e.g. t1 inside t only as long as this ptag
was known to be always zero.

I therefore don't see exactly what your concern is. Since I see no situation
in which the proposed representation isn't superior, I also see no need
for any kind of programmer control over the process.

What do you see that I don't?

By the way, the reason why I propose that we keep a primary tag
on all of t1, t2 and t, even though they all have just one function symbol,
is that it preserves the possibility of applying the direct arg optimization to them
(since it requires types whose ptag bits are all zero).

> > Do you know of any set of types in programs you know of to which
> > this optimization is applicable?
> 
> I suspect not, firstly because sub-word sized integers at not yet very
> common in most of the code I work with and secondly, where such nesting
> has occurred in the past (older versions of the Zinc compiler spring to
> mind), it has already long since been flattened by hand.

I was expecting structures containing many enums, such as flags (e.g. t1),
being contained inside other similar structures (e.g. t).

> > Or should I try to find such type sets by implementing the
> > applicability test for this optimization, and getting the compiler to
> > spit out informational messages about any types for which the test
> > succeeds?
> 
> I think that's worth trying.

I will look into it then when I have finished with the other arg packing tasks.

It will require changes to the overall design of du_type_layout.m. Previously,
we didn't care in what order we decided the representations of t, t1 and t2,
but with this proposal, we *have to care*.

> Unrelated question: does argument packing also apply to the character
> type (on 64-bit systems)?

At the moment, we don't pack chars.

Since a UTF-8 char can be up to 32 bits according to the current standard,
we can pack only two chars into a 64 bit word, just like int32s. However,
unlike int32s, UTF-8 has several variants, including the original design
which had allowances for 48 bit characters as well. Packing chars
is fragile in the possible presence of such extensions. I have also seen
no need; in my experience, bare chars (on their own, NOT inside strings)
inside structures are pretty rare.

However, in the presence of a need, implementing the packing of chars
should not be a problem. Does anyone have such a need?

Zoltan.


----- End Forwarded Message -----

Zoltan.


More information about the developers mailing list