[m-dev.] Fwd: Re: arg packing of structured values

Julien Fischer jfischer at opturion.com
Fri Jul 27 13:23:26 AEST 2018


Hi Zoltan,

On Fri, 27 Jul 2018, Zoltan Somogyi wrote:

> I accidentally sent this only to Julien, instead of the whole list.

I'd better respond to this one then ...

> ----- Start Forwarded Message -----
> Sent: Fri, 27 Jul 2018 04:35:26 +0200 (CEST)
> From: "Zoltan Somogyi" <zoltan.somogyi at runbox.com>
> To: "Julien Fischer" <jfischer at opturion.com>
> Subject: Re: [m-dev.] arg packing of structured values
>
>
>
> On Fri, 27 Jul 2018 10:27:11 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
>> On Thu, 26 Jul 2018, Zoltan Somogyi wrote:

>> > My question for you guys is: would this be a useful thing to implement?
>> 
>> Another question: as a programmer how much, if any, control would I (or
>> indeed the compiler) have over it occurring?  If my program spends a lot
>> of its time manipulating t1 and t2 values as independent entities then
>> it may be better for t values to just contain pointers to those
>> arguments.  (I'm thinkg of the case where the first and third arguments
>> of the t value above are extracted many times.) On the other hand, if
>> the program really uses t as a record then the flattening proposed here
>> would be worthwhile.
>
> My proposal leaves the representations of t1 and t2 completely unaffected,
> so code that manipulates values of those types would be unaffected as well.
> The only change is that we would now consider types such as t1 and t2 to be
> packable in almost the same way enums and {int,uint}{8,16,32}s are packable.
>
> What I am proposing is that the representation of t be changed so that
> instead of a pointer to a three-word memory cell containing (pointer to t1,
> an int8 with padding, and a pointer to t2), it should be a word containing
> (ptag value 0, two bits for the bools in t1, 8 bits for the int8, three bits
> for the bools in t2). Constructing a value of type t would extract the two
> and three bools from the t1 and t2 args, and put them into the word
> with the ptag and the int8, while deconstructing that value would shift and
> mask e.g. the selected two bits out of this word, put a zero ptag in front of them,
> and assign the result to the variable representing the first arg (of type t1),
> and likewise for the third arg (of type t2). These masks and shifts
> should be *faster* than the loads or stores for any representation
> that uses pointers.
>
> It t1 or t2 had more than one function symbol which their representations
> distinguished by the ptag. we would need to store the ptag as well inside t
> as well. We can omit storing the ptag of e.g. t1 inside t only as long as this ptag
> was known to be always zero.
>
> I therefore don't see exactly what your concern is. Since I see no situation
> in which the proposed representation isn't superior, I also see no need
> for any kind of programmer control over the process.
>
> What do you see that I don't?

Nothing in light of the above explanation ;-)

>> Unrelated question: does argument packing also apply to the character
>> type (on 64-bit systems)?
>
> At the moment, we don't pack chars.
>
> Since a UTF-8 char can be up to 32 bits according to the current standard,
> we can pack only two chars into a 64 bit word, just like int32s.

The Unicode standard says that code points will fit in 21 bits. The
entire code space is defined to be the integers between 0x0..0x10ffff.
Mercury characters are just integers in this range (which we represent
using 32-bit integers).
(UTF-8 is only concerned with how code points are encoded *within*
strings.)

> However, unlike int32s, UTF-8 has several variants, including the
> original design which had allowances for 48 bit characters as well.

That's true, however UTF-8 has been defined as being between 1 and 4
bytes for many years now.

> Packing chars is fragile in the possible presence of such extensions.
> I have also seen no need; in my experience, bare chars (on their own,
> NOT inside strings) inside structures are pretty rare.

FWIW, here's some examples from my CSV library:

     :- type comments
        --->    no_comments
        ;       allow_comments(char).

     :- type reader_params
         --->    reader_params(
                     blank_lines     :: blank_lines,
                     trailing_fields :: trailing_fields,
                     comments        :: comments,
                     field_delimiter :: char
                ).

Julien.


More information about the developers mailing list