[m-dev.] remaining design issues for new integer types

Zoltan Somogyi zoltan.somogyi at runbox.com
Tue Apr 11 02:02:53 AEST 2017



On Mon, 10 Apr 2017 13:46:10 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
>      4. poly_type and format.
> 
>      * Should string.poly_type include an alternative for uints and
>      be supported by io.format etc?
>      * What about the fixed size integers?

A format specifier for C's printf has the general form

    %[parameter][flags][width][.precision][length]type

Mercury's format functions at the moment provide a subset of that
functionality, implementing only

    %[flags][width][.precision]type

omitting the parameter and length parts. The issue we are talking about
is whether we should add the length part, and if yes, how.

Since calling conversions explicitly is a pain, my answer is yes,
we should add a length part.

The syntax for length has to be unambiguous with respect to both what may
precede it and what may follow it.

What may precede it:

- A flag consists of one of these characters: # 0-+
- The optional width optional width is either
  - a positive integer, or
  - a *, specifying that the width comes from a poly_type.
- The optional precision is a . followed by either
  - a nonnegative integer (with the empty string meaning "zero"), or
  - a *, specifying that the precision comes from a poly_type.

We may follow it is a conversion specification, which may be any of the
characters diouxXp (for integers) eEfFgG (for floats) c (for chars)
and s (for strings).

The standard C syntax for length requires putting the length specifier
*outside* the format string, as in e.g. "printf("%" PRId64 "\n", i);",
which I think looks bad even in C, and would be horrible in Mercury.

Microsoft C has length specifiers such as I32 and I64 *inside* the format
string. I think this is a lot better, but still not good enough. The reason
is that some of the conversion specifiers (EFGX) are upper case characters as well,
like I. This means that programmer have to remember which upper case letters
are conversion specifiers and which are parts of length specifiers.

I think some syntax such as the following would be much better for Mercury.

    - A "length start" character.

    - The short name of a fixed with integer type, which may be one of
      "i8", "u8", "i16", "u16", "i32", "u32", "i64", "u64",

      We could add "iw", "uw", if we want to have a non-default way
      to specify the usual word-sized signed and unsigned integer types.
      
      We could also add "imax" if we want to allow the printing of integers
      from library/integer.m via string.format.
      
      We could also add "umax" if we ever extend library/integer.m
      to have an unsigned type.

      C uses "j" instead of "imax". That seems a bit strange, and I can't
      think of an obvious unsigned analogue.

      Alternatively, we could make the length field just give the length,
      as in 8/16/32/64/J, and leave the signed/unsigned distinction for the
      conversion specifier.

    - A "length end" character.

The length start character should be some graphic character that is

    -  not currently used by string.format, and 
    -  suggests the idea of type or size (as far as a character can do so).

Two possibilities are "@" and "#". @ evokes Mercury's with_type annotations,
while # suggests that what follows is a number.

If we allow iw, uw, imax or umax, then I think we need a length end character
to visually separate the final x from the type specifier that follows (which
may also be x). The simplest thing is to make the length end char the same as
the length start char. If all legal length fields, both current and
*foreseeable future*, end in a digit, then we don't need a length end char.

I think any of the above alternatives would be acceptable. I have a preference
for the #i64# variant, but it is slight.

>      5. Reverse modes of arithmetic operations.
> 
>      The int module currently provides reverse modes for operations like (+).
>      uint currently doesn't, should it?  (We don't currently provide them for
>      arbitrary precision integers either.)

At the moment, int.m provides reverse modes for only two operations, + and -.
(The reverse modes of the multiplicative operators are complicated by the
potential non-zero values of remainders.)

I think providing reverse modes of these predicates should be trivial
(they can just call the forward mode of the appropriate operations),
and that providing them can avoid hassles when e.g. changing a type
from int to int16. I therefore think we should provide them.
 
>      6. What type should the second operand of the left and right shift operations
>      be?
> 
>      Should it be:
>
>            uint >> uint = uint    (as in Peter's version of the uint module)
>            uint >> int = uint     (as in my inttypes library)
> 
>      (The justification for the latter in the inttypes library was that we didn't have
>      literals for the various types.)

The problem is that the right domain for the shift amount is [0 .. NB),
where NB is the number of bits in the type.

The question we need to answer first, before answering your question, is
"is the current behavior of int.<< and int.>> what we want?". My answer is
that I don't think so. I think both should throw an exception if the shift
amount sa satisfies either sa < 0, or NB =< sa.

At the moment, we don't throw on either condition. For sa < a, we switch
the shift direction, and for NB <= sa, we just return a constant result
(either 0 or -1). Testing whether we want to throw an exception could be done
in C with one test: NB =< (unsigned) sa.

Making the shift amount unsigned removes all values less than zero, but still it
leaves all the values at or above NB. The above test would work for both; it's
just that the cast is redundant for one.

(If we made the operation of int.<< and int.>> stricter than it is now, we would
have to announce it, and provide the old behavior in predicates with other names.)

Zoltan.



More information about the developers mailing list