[m-users.] Storage format for data in low-level C grades

Julien Fischer jfischer at opturion.com
Wed Apr 16 20:02:47 AEST 2025


Hi,

On Wed, 16 Apr 2025 at 12:38, M McDonough <foolkingcrown at gmail.com> wrote:
>
> Is there any documentation on how Mercury stores types in the
> low-level C grades?

Most of the Mercury papers (particularly the early) have some discussion of how
terms are represented.

The slides for the [Packing sub-word-size
arguments](https://mercurylang.org/documentation/papers.html#subword_talk)
talk by Zoltan provides a good overview (including of some extensions
that are not yet enabled by default.)

> I'm mostly considering the situation where I need to store a string
> and a small enum type (three to seven variants). I know that using
> tuples would essentially use a dynamically allocated array with two
> MR_Word elements:
>
> :- type enum ---> a ; b ; c.
> :- type key == {enum, string}. % This will be a pointer to
> `MR_Word[2]`, essentially.
>
> I'm wondering if the compiler will choose to stash the enum in the tag
> bits in the following case:
>
> :- type enum ---> a ; b ; c.
> :- type key ---> key(enum, string). % Is this equivalent to {enum, string}?

They will effectively be the same.
(You can ask the compiler to tell you how it is going to represent types using
the --show-local-type-representation option.)

> Since I believe the compiler already will do things like stashing
> multiple enum members into a single MR_Word using what amounts to
> bit-fields to save space, it wouldn't be a huge jump to also use the
> tag bits here.

The compiler can do those sort of optimizations; IIRC, they are not
enabled by default.

> But I'm unsure if the compiler will actually take this
> step automatically. Otherwise, I would want to just define it as so to
> manually force that:
>
> :- type key ---> a(string) ; b(string) ; c(string).

I'm fairly sure we don't pack the tag into the string, so each of those
will still be two words (+ whatever the string takes).
For me, the choice between the various definitions of key would depend on what
else the program is going to do with the key type.

> It would be nice if there was some documentation on this for the
> low-level data grades in particular (I doubt the HL grades would ever
> do this?)

Actually, the high-level C grades (at least the publicly documented ones),
share the same data representation as the low-level C grades, so, yes,
they "do this".

(The high-level grades have two aspects: high-level code and (optionally)
high-level data.  All of the high-level grades necessarily use high-level code.
For C, we can choose whether to use the low- or high-level data representations.
Since the former is a bit faster, that's what is used.  The C# and Java backends
use the high-level data representation.)

Julien.


More information about the users mailing list