[m-dev.] for discussion: design issue for new integer types

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Oct 28 14:06:00 AEDT 2016



On Fri, 28 Oct 2016 11:51:51 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
> Just a reminder: please ensure you have updated to rotd-2016-10-26
> by Monday; I will be commiting part 2 of the uint change then.

I have updated.
 
> In order to not overload the current type checker, literals for each integer
> type will need to be lexically distinct.  My suggestion is that each integer
> type have a distinguishing suffix.  For the fixed size integer types these
> would be:
> 
>        i8, i16, i32, i64
>        u8, u16, u32, u64

I think those are fine.
 
> For int and uint, there are couple of choices:
> 
>        i, u
>        iw, uw     (where w == "word sized")

I prefer i and u.
 
> The suffix would not be required literals of type 'int'.

I don't think anyone would argue against that.
 
> Should the suffixes use only lowercase 'u' and 'i' or would uppercase
> also be acceptable?  Or uppercase only?

I would strongly argue against allowing I as a suffix. I am probably
the only one on this list who is old enough to have used a typewriter
whose numbers did not include 0 and 1: you were supposed to type
'o' instead of '0', and 'l' instead of '1'. This works for human readers
precisely because an 'l' is easily visually confusable with '1'. For them,
this fact allowed them to save the cost of a key. For us, the fact that
an uppercase i is also visually confusable with 1 would be
a fruitful source of confusing bugs, where people *think* they are
looking at 421, but are actually looking at 42 with an I suffix.

On the basis of symmetry, this would argue against U as well.
 
> As an aside: it's long since time we allowed some form of separator
> between groups of digits in integer (and float) literals.  I propose
> that we allow '_' between digits as in Java and C#.

I agree that is a good idea.

A followup question: should we require that the _s be where western
convention dictates the decimal commas should go, i.e. between
every third digit? I for one would prefer that, but people using the
indian number system, which puts commas around groups of *two*
digits above the thousands, would probably prefer that there
not be such a rule (look up "lakh" or "crore" on wikipedia).

We would need to delete the _s at some point anyway. If we do it
in the compiler, we can make the coding doing the deletion
generate a warning if the _s are in the "wrong" place, with the
notion of "wrong" being selected by compiler options such as
--warn-misplaced-integer-underscores-{western,indian}.

When the compiler finds e.g. an i8 suffix on any integer outside
the -128 to 127 range, we want to generate an error message anyway.
The _ check could be done at the same time.

I don't think the *scanner* should do such checks, because
it would result in suboptimal error messages. The compiler
has access to error_util.m; the library does not.

> 2. Automatic coercion and promotion.
> 
> There won't be any in Mercury.  If you are converting between integer
> types then you will be required to say so.

Agreed.

What form would those explicit coercions take? Would we have
a specific function for each pair of integer types? How about
e.g. i16 to float: would you have to convert the i16 to int first?

> 3. Representation of new integer types in the term type.
> 
> How should the new new integer types be represented in the term.const/0
> type?
> 
> The obvious way would be:
> 
>      :- type const
>          --->    atom(string)
>          ;       integer(int)
>          ;       big_integer(integer_base, integer)
>                  % An integer that is too big for `int'.
> 
>  	;       unsigned_integer(uint)
>  	;	big_unsigned_integer(integer_base, integer).
>  		% An unsigned integer that is too big for `uint'.
> 
>          ;       string(string)
>          ;       float(float)
>          ;       implementation_defined(string)
> 
>  	;	uint8(uint8)
>  	;	uint16(uint16)
>  	;	uint32(uint32)
>  	;	uint64(uint64)
>  	;	int8(int8)
>  	;	int16(int16)
>  	;	int32(int32)
>  	;	int64(int64).

I would instead suggest that we keep just the existing
integer and big_integer functors, and add a new argument to both.
This argument would say int vs uint, and 8 vs 16 vs 32 vs 64 vs
default size, *purely on the basis of the suffix, without any check
in the scanner*, for reason given above.

To allow the underscore check mentioned above, the existing argument
of the integer and big_integer functors would need to be a string,
with the conversion done in the compiler. However, doing that
would erase the need for the big_integer functor, since the integer
functor would then be able to represent everything it can.

Two other things. First, some people may be using the library's
lexer and parser modules for their own purposes (e.g. Prolog interpreters),
so if we change their basic representation, we should add their old
versions to e.g. extras under names such as old_{lexer,parser}.m.
Second, I have a big outstanding change to fact_table.m that would
be affected by a change to the term type, so please warn be before
committing such a change.

A question you did not ask was how the representation of integers
should change in the HLDS, i.e. in the cons_id type. I think I would
prefer adding a size argument to the int_const and uint_const
functors to adding a new int8_const, int16_const etc functors
to the type, because most code would want to treat all integers
the same regardless of size. I would even prefer to erase the
distinction between int_const and uint_const, but realize that
this cannot be done, because in the HLDS, we definitely want
the constant in integer, not string, form, and there is no word
sized type that can hold both all ints and all uints. However,
we could switch to int_const(integer, signedness, maybe(size)).

The checks I mentioned above (does e.g. a i8 fit in -128 to 127,
are the _s in the right place) would naturally fit in the code
(in superhomogeneous.m, I think) that converts from term consts
to cons_ids.

> 4. poly_type and format.
> 5. Reverse modes of arithmetic operations.

I will comment on these later.

Zoltan.




More information about the developers mailing list