[m-dev.] bits_per_int, bits_per_uint and ubits_per_uint

Sat Dec 3 23:27:38 AEDT 2022

Hi Zoltan,

On Sat, 3 Dec 2022, Zoltan Somogyi wrote:

> I have been looking into adding the uenum typeclass
> and making sparse_bitset use it. The fast way to
> compute the offset field of each bitset_elem for a given index
> is to mask the off bottom 5 or 6 bits on 32 and 64 bit machines
> respectively. This could now can be done via Index /\ (bits_per_int - 1),
> but since in my new version, indexes and offsets are both uints,
> you need Index /\ cast_to_uint(bits_per_uint - 1).
>
> bits_per_int and bits_per_uint, defined in int.m and uint.m respectively,
> both return an int. I propose adding ubits_per_uint, which returns
> the same as bits_per_uint, but as a uint. Does anyone have a better name?

That name seems fine.

> At the moment, both bits_per_int and bits_per_uint are defined
> by foreign_procs in C grades, which return sizeof(MR_Unsigned) * CHAR_BIT.

I would note that compiler/const_prop.m does substitute in the actual
values (at whatever optimisation level it is invoked) except in .pregen
grades.

> (In Java and C# grade, they return 32 regardless of the machine's
> actual word size.) If cross-compilation were not a concern, we could
> make both these operations, as well as ubits_per_uint, builtin operations,
> that, when targeting C, tell the code generators the value is the string
> "MR_BITS_PER_WORD", which autoconfigure puts into runtime/mercury_conf.h.
> (When targeting Java and C#, "32" will work.)
>
> Of course, this won't work when crosscompiling to a machine with a
> different word size. I do know that autoconfigure also puts the word size
> into COMP_FLAGS as --bits-per-word={32,64}. Do I remember correctly
> that crosscompilations will set the value of this option to the value
> appropriate to the target machine?

Yes. The configure script uses a compile-time check with the target C
compiler to determine the number of bits-per-word on the target system.

> If so, this would allow builtin_ops.m to implement these as builtins
> by returning the value of this option.

I don't think returning the value of the option would work in .pregen
grades (i.e. if you build the .c files on a 64-bit machine and then
compile them on a 32-bit machine).

(If bits_per_int et al become builtins, we could just make the code
generators emit

       sizeof(MR_Unsigned) * CHAR_BIT

at the appropriate spot in C grades.)

> I also intend to declare unchecked_{left,right}_ushift in uint.m.
> These would be the same as the existing versions which end in "_shift"
> without the u, except they would take the shift amount, as well as the
> value to be shifted, as uints, not ints. These not-yet-declared functions
> have been recognized as builtins for two years now. The existence
> of these functions would allow the new code of sparse_bitset.m to avoid
> several casts.

No objection, although I seem to recall the idea of making the casts 
builtins was proposed at one point. (Which would address the issue
from a different angle.)

> Question: should we also define unchecked_{left,right}_ushift in int.m
> as well?

May as well; if the past is any guide someone will complain about its
absence sooner or later ;-)

> And should we declare and define <<u and >>u in int.m/uint.m,

I thought that having the safe versions was always the intention.

> which would be the checked versions of the above unchecked predicates?
> (These are the names we settled on the last time this topic came up.)
>
> By the way, changing sparse_bitset.m to use uenum would also require
> changing the interface of digraph.m as well, by requiring digraph_keys
> to be instances of uenum, not enum. Does anyone object?

No objection to that one from me.

Julien.