[m-dev.] bits_per_int, bits_per_uint and ubits_per_uint

Sat Dec 3 16:17:52 AEDT 2022

I have been looking into adding the uenum typeclass
and making sparse_bitset use it. The fast way to
compute the offset field of each bitset_elem for a given index
is to mask the off bottom 5 or 6 bits on 32 and 64 bit machines
respectively. This could now can be done via Index /\ (bits_per_int - 1),
but since in my new version, indexes and offsets are both uints,
you need Index /\ cast_to_uint(bits_per_uint - 1).

bits_per_int and bits_per_uint, defined in int.m and uint.m respectively,
both return an int. I propose adding ubits_per_uint, which returns
the same as bits_per_uint, but as a uint. Does anyone have a better name?

At the moment, both bits_per_int and bits_per_uint are defined
by foreign_procs in C grades, which return sizeof(MR_Unsigned) * CHAR_BIT.
(In Java and C# grade, they return 32 regardless of the machine's
actual word size.) If cross-compilation were not a concern, we could
make both these operations, as well as ubits_per_uint, builtin operations,
that, when targeting C, tell the code generators the value is the string
"MR_BITS_PER_WORD", which autoconfigure puts into runtime/mercury_conf.h.
(When targeting Java and C#, "32" will work.)

Of course, this won't work when crosscompiling to a machine with a
different word size. I do know that autoconfigure also puts the word size
into COMP_FLAGS as --bits-per-word={32,64}. Do I remember correctly
that crosscompilations will set the value of this option to the value
appropriate to the target machine? If so, this would allow builtin_ops.m
to implement these as builtins by returning the value of this option.

I also intend to declare unchecked_{left,right}_ushift in uint.m.
These would be the same as the existing versions which end in "_shift"
without the u, except they would take the shift amount, as well as the
value to be shifted, as uints, not ints. These not-yet-declared functions
have been recognized as builtins for two years now. The existence
of these functions would allow the new code of sparse_bitset.m to avoid
several casts.

Question: should we also define unchecked_{left,right}_ushift in int.m
as well? And should we declare and define <<u and >>u in int.m/uint.m,
which would be the checked versions of the above unchecked predicates?
(These are the names we settled on the last time this topic came up.)

By the way, changing sparse_bitset.m to use uenum would also require
changing the interface of digraph.m as well, by requiring digraph_keys
to be instances of uenum, not enum. Does anyone object?

Zoltan.