[m-rev.] for review: add builtin 8, 16 and 32 bit integer types -- Part 2

Julien Fischer jfischer at opturion.com
Mon Aug 21 01:03:35 AEST 2017


Hi Zoltan,

On Sun, 20 Aug 2017, Zoltan Somogyi wrote:

> > Add builtin 8, 16 and 32 bit integer types -- Part 2.
> > 
> > Enable support for literals of the new types.
> > 
> > Begin implementing library support for 8, 16, and 32 bit types.
> 
> The summary should say *what support* the diff adds.
> 
> > Update the compiler to use the new types.
> 
> ... to represent values of their own constants.

Done.

> in c_util.m:
> > +output_int8_expr(Stream, N, !IO) :-
> > +    io.write_string(Stream, "INT8_C(", !IO),
> > +    io.write_int8(Stream, N, !IO),
> > +    io.write_string(Stream, ")", !IO).
> 
> You should add a comment before these predicates saying that INT8_C,
> and the similar macros used in this code, are supposed to be defined in stdint.h.

Done.

> > diff --git a/compiler/lookup_switch.m b/compiler/lookup_switch.m
> > index 0aba8ef26..62e60c9ed 100644
> > --- a/compiler/lookup_switch.m
> > +++ b/compiler/lookup_switch.m
> > @@ -2,6 +2,7 @@
> > % vim: ft=mercury ts=4 sw=4 et
> > %-----------------------------------------------------------------------------%
> > % Copyright (C) 1996-2012 The University of Melbourne.
> > +% Copyright (C) 2015, 2017 The Mercury team.
> > % This file may only be copied under the terms of the GNU General
> > % Public License - see the file COPYING in the Mercury distribution.
> > %-----------------------------------------------------------------------------%
> > @@ -952,13 +963,12 @@ default_value_for_type(lt_int_least32) = const(llconst_int(0)).
> > default_value_for_type(lt_uint_least32) = const(llconst_int(0)).
> > default_value_for_type(lt_int(int_type_int)) = const(llconst_int(0)).
> > default_value_for_type(lt_int(int_type_uint)) = const(llconst_uint(0u)).
> > -% XXX FIXED SIZE INT.
> > -default_value_for_type(lt_int(int_type_int8)) = const(llconst_int8(0)).
> > -default_value_for_type(lt_int(int_type_uint8)) = const(llconst_uint8(0)).
> > -default_value_for_type(lt_int(int_type_int16)) = const(llconst_int16(0)).
> > -default_value_for_type(lt_int(int_type_uint16)) = const(llconst_uint16(0)).
> > -default_value_for_type(lt_int(int_type_int32)) = const(llconst_int32(0)).
> > -default_value_for_type(lt_int(int_type_uint32)) = const(llconst_uint32(0)).
> > +default_value_for_type(lt_int(int_type_int8)) = const(llconst_int8(cast_from_int(0))).
> > +default_value_for_type(lt_int(int_type_uint8)) = const(llconst_uint8(cast_from_int(0))).
> > +default_value_for_type(lt_int(int_type_int16)) = const(llconst_int16(cast_from_int(0))).
> > +default_value_for_type(lt_int(int_type_uint16)) = const(llconst_uint16(cast_from_int(0))).
> > +default_value_for_type(lt_int(int_type_int32)) = const(llconst_int32(cast_from_int(0))).
> > +default_value_for_type(lt_int(int_type_uint32)) = const(llconst_uint32(cast_from_int(0))).
> > default_value_for_type(lt_float) = const(llconst_float(0.0)).
> > default_value_for_type(lt_string) = const(llconst_string("")).
> > default_value_for_type(lt_data_ptr) = const(llconst_int(0)).
> 
> If the current state of bootstrapping allows you to replace each cast_from_int(0)
> with 0i8, 0u8 etc, then please do so, since that would make all the returned values
> compile time constants.

This is the change that allows the compile to recognise 0i8 etc. so that's
not possible.  I will replace the calls to cast_from_int above and elsewhere
once this has bootstrapped.

> > diff --git a/compiler/mlds_to_cs.m b/compiler/mlds_to_cs.m
> > index daea42e7b..f6f60abb8 100644
> > --- a/compiler/mlds_to_cs.m
> > +++ b/compiler/mlds_to_cs.m
> > @@ -3472,6 +3472,17 @@ output_binop_for_csharp(Info, Op, X, Y, !IO) :-
> >         io.write_string(" ", !IO),
> >         output_rval_for_csharp(Info, Y, !IO),
> >         io.write_string(")", !IO)
> > +    ;
> > +        % The special treatment of bitwise-or here is necessary to avoid
> > +        % warning CS0675 from the C# compiler.
> > +        Op = bitwise_or(int_type_int8),
> > +        io.write_string("(sbyte)((byte)", !IO),
> > +        output_rval_for_csharp(Info, X, !IO),
> > +        io.write_string(" ", !IO),
> > +        output_binary_op_for_csharp(Op, !IO),
> > +        io.write_string(" (byte)", !IO),
> > +        output_rval_for_csharp(Info, Y, !IO),
> > +        io.write_string(")", !IO)
> 
> Out of curiosity, what does warning CS0675 say without this workaround?

Mercury/css/int8.cs(504,36): warning CS0675: The operator `|' used on the sign-extended type `sbyte'.
Consider casting to a smaller unsigned type first

> > +:- pred output_int8_const_for_csharp(int8::in, io::di, io::uo) is det.
> > +
> > +output_int8_const_for_csharp(I8, !IO) :-
> > +    io.write_int8(I8, !IO).
> > +
> > +:- pred output_uint8_const_for_csharp(uint8::in, io::di, io::uo) is det.
> > +
> > +output_uint8_const_for_csharp(U8, !IO) :-
> > +    io.write_uint8(U8, !IO).
> > +
> > +:- pred output_int16_const_for_csharp(int16::in, io::di, io::uo) is det.
> > +
> > +output_int16_const_for_csharp(I16, !IO) :-
> > +    io.write_int16(I16, !IO).
> > +
> > +:- pred output_uint16_const_for_csharp(uint16::in, io::di, io::uo) is det.
> > +
> > +output_uint16_const_for_csharp(U16, !IO) :-
> > +    io.write_uint16(U16, !IO).
> > +
> > +:- pred output_int32_const_for_csharp(int32::in, io::di, io::uo) is det.
> > +
> > +output_int32_const_for_csharp(I32, !IO) :-
> > +    io.write_int32(I32, !IO).
> > +
> > +:- pred output_uint32_const_for_csharp(uint32::in, io::di, io::uo) is det.
> > +
> > +output_uint32_const_for_csharp(U32, !IO) :-
> > +    io.write_uint32(U32, !IO),
> > +    io.write_string("U", !IO).
> 
> Why does only the last add a "U" suffix?

C# does not have a specific syntax for 8- and 16-bit literals of either
signedeness.

> > diff --git a/library/int16.m b/library/int16.m
> > index a0e1499f8..4325c9c02 100644
> > --- a/library/int16.m
> > +++ b/library/int16.m
> > @@ -5,16 +5,391 @@
> > % This file may only be copied under the terms of the GNU Library General
> > % Public License - see the file COPYING.LIB in the Mercury distribution.
> > %---------------------------------------------------------------------------%
> > +%
> > +% File: int16.m
> > +% Main author: juliensf
> > +% Stability: low.
> > +%
> > +% Predicates and functions for dealing with signed 16-bit integer numbers.
> > +%
> > +%---------------------------------------------------------------------------%
> 
> I will check {int,uint}{8,16,32}.m post-commit, when I can do diffs on them.
> 
> > +to_int8(Integer, Int8) :-
> > +    Integer = i(_Sign, [Digit]),
> > +    int8.from_int(Digit, Int8).
> 
> The extremely scant documentation in integer.m on the meaning of the representation
> it uses makes me unable to determine whether ignoring _Sign here is a bug or not.
> Someone else who does know this representation should review the changes to integer.m.

It's not a bug; the the digits are also signed.

> > diff --git a/library/io.m b/library/io.m
> > index 243b611ac..51260413c 100644
> > --- a/library/io.m
> > +++ b/library/io.m
> > +:- pred write_int8_2(stream::in, int8::in, system_error::out, io::di, io::uo)
> > +    is det.
> > +:- pragma foreign_proc("C",
> > +    write_int8_2(Stream::in, Val::in, Error::out, _IO0::di, _IO::uo),
> > +    [will_not_call_mercury, promise_pure, tabled_for_io, thread_safe,
> > +        does_not_affect_liveness, no_sharing],
> > +"
> > +    if (ML_fprintf(Stream, ""%d"", Val) < 0) {
> > +        Error = errno;
> > +    } else {
> > +        Error = 0;
> > +    }
> > +").
> 
> Two stylistic concerns here. First, while the C compiler will cast Val from
> int8 to int, it might be more readable to make the cast explicit.

I've replaced the conversion specifiers with the C99 format string macros
e.g. PRId8 in this case.

> Second, I am not a fan of the predicate name scheme here; one number (8, 16
> or 32) denotes size, while the _2 suffix does not.

I have renamed them to do_write_int8, do_write_int16 etc.

> > diff --git a/library/pprint.m b/library/pprint.m
> > index 8424353d3..b2897d3f9 100644
> > --- a/library/pprint.m
> > +++ b/library/pprint.m
> > @@ -2,6 +2,7 @@
> > % vim:ts=4 sw=4 expandtab tw=0 wm=0 ft=mercury
> > %---------------------------------------------------------------------------%
> > % Copyright (C) 2000-2007, 2010-2011 The University of Melbourne
> > +% Copyright (C) 2014-2017 The Mercury team.
> > % This file may only be copied under the terms of the GNU Library General
> > % Public License - see the file COPYING.LIB in the Mercury distribution.
> > %---------------------------------------------------------------------------%
> > @@ -187,7 +188,12 @@
> > :- instance doc(doc).
> > :- instance doc(string).
> > :- instance doc(int).
> > +:- instance doc(int8).
> > +:- instance doc(int16).
> > +:- instance doc(int32).
> > :- instance doc(uint).
> > +:- instance doc(uint8).
> > +:- instance doc(uint16).
> > :- instance doc(float).
> > :- instance doc(char).
> 
> uint32 is missing.

Fixed.

> > @@ -234,6 +253,105 @@ put_uint(Stream, UInt, !State) :-
> >         put(Stream, string.uint_to_string(UInt), !State)
> >     ).
> > 
> > +put_int8(Stream, Int8, !State) :-
> > +    ( if
> > +        % Handle the common I/O case more efficiently.
> > +        dynamic_cast(!.State, IOState0),
> > +        dynamic_cast(Stream, IOStream)
> > +    then
> > +        io.write_int8(IOStream, Int8, unsafe_promise_unique(IOState0), IOState),
> > +        ( if dynamic_cast(IOState, !:State) then
> > +            !:State = unsafe_promise_unique(!.State)
> > +        else
> > +            error("stream.string_writer.put_int8: unexpected type error")
> > +        )
> > +    else
> > +        put(Stream, string.int8_to_string(Int8), !State)
> > +    ).
> 
> Does this code really handle the I/O case more efficiently than just using the else case
> for everything? It seems to have a significant amount of overhead, though I suppose it is
> a constant amount of overhead, while the latter has overhead proportional to the length
> of the string printed.

Simon added the above pattern for the builtin types many years ago.
The justification was that in the presence of type specialization it
makes io.write etc "almost" as fast as if it were written directly (i.e.
if it were not implemented in terms of stream.string_writer).

> > @@ -294,6 +412,18 @@ print(Stream, NonCanon, Term, !State) :-
> >         put(Stream, Char, !State)
> >     else if dynamic_cast(Term, UInt : uint) then
> >         put(Stream, uint_to_string(UInt), !State)
> > +    else if dynamic_cast(Term, Int8 : int8) then
> > +        put(Stream, int8_to_string(Int8), !State)
> > +    else if dynamic_cast(Term, UInt8 : uint8) then
> > +        put(Stream, uint8_to_string(UInt8), !State)
> > +    else if dynamic_cast(Term, Int16 : int16) then
> > +        put(Stream, int16_to_string(Int16), !State)
> > +    else if dynamic_cast(Term, UInt16 : uint16) then
> > +        put(Stream, uint16_to_string(UInt16), !State)
> > +    else if dynamic_cast(Term, Int32 : int32) then
> > +        put(Stream, int32_to_string(Int32), !State)
> > +    else if dynamic_cast(Term, UInt32 : uint32) then
> > +        put(Stream, uint32_to_string(UInt32), !State)
> >     else if dynamic_cast(Term, OrigUniv) then
> >         write_univ(Stream, OrigUniv, !State)
> >     else if dynamic_cast(Term, BigInt) then
> > @@ -397,6 +527,24 @@ do_write_univ_prio(Stream, NonCanon, Univ, Priority, !State) :-
> >     else if univ_to_type(Univ, UInt) then
> >         put_uint(Stream, UInt, !State),
> >         put_char(Stream, 'u', !State)
> > +    else if univ_to_type(Univ, Int8) then
> > +        put_int8(Stream, Int8, !State),
> > +        put(Stream, "i8", !State)
> > +    else if univ_to_type(Univ, UInt8) then
> > +        put_uint8(Stream, UInt8, !State),
> > +        put(Stream, "u8", !State)
> > +    else if univ_to_type(Univ, Int16) then
> > +        put_int16(Stream, Int16, !State),
> > +        put(Stream, "i16", !State)
> > +    else if univ_to_type(Univ, UInt16) then
> > +        put_uint16(Stream, UInt16, !State),
> > +        put(Stream, "u16", !State)
> > +    else if univ_to_type(Univ, Int32)then
> > +        put_int32(Stream, Int32, !State),
> > +        put(Stream, "i32", !State)
> > +    else if univ_to_type(Univ, UInt32) then
> > +        put_uint32(Stream, UInt32, !State),
> > +        put(Stream, "u32", !State)
> >     else if univ_to_type(Univ, Float) then
> >         put_float(Stream, Float, !State)
> >     else if univ_to_type(Univ, Bitmap) then
> 
> These long if-then-else chains are slow.
> 
> We ought to have a system where we peek inside Univ, get back e.g. a string
> identifying the type inside. We would then switch on that string, with
> each switch arm doing a dynamic cast on only *one* type, the one returned
> by the peek.

That's a good idea, although not one for this change.

> > diff --git a/library/string.m b/library/string.m
> > index 7f809d139..fae3c7af4 100644
> > --- a/library/string.m
> > +++ b/library/string.m
> > +    % Convert an 8-bit integer to a string.
> > +    %
> > +:- func int8_to_string(int8::in) = (string::uo) is det.
> > +
> > +    % Convert an unsigned 8-bit integer to a string.
> > +    %
> > +:- func uint8_to_string(uint8::in) = (string::uo) is det.
> > +
> > +    % Convert an 16-bit integer to a string.
> > +    %
> > +:- func int16_to_string(int16::in) = (string::uo) is det.
> > +
> > +    % Convert an unsigned 16-bit integer to a string.
> > +    %
> > +:- func uint16_to_string(uint16::in) = (string::uo) is det.
> > +
> > +    % Convert a 32-bit integer to a string.
> > +    %
> > +:- func int32_to_string(int32::in) = (string::uo) is det.
> > +
> > +    % Convert an unsigned 32-bit integer to a string.
> > +    %
> > +:- func uint32_to_string(uint32::in) = (string::uo) is det.
> 
> I would just have a single comment for all six functions, saying
> "Convert a signed/unsigned 8/16/32 bit integer to a string".
> That way it would be simpler to add versions of these functions
> that allow the specification of bases and/or grouping.

Done.

> > +:- pragma foreign_proc("C",
> > +    int8_to_string(I8::in) = (S::uo),
> > +    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
> > +"
> > +    char buffer[5];
> > +    sprintf(buffer, ""%"" PRId8, I8);
> > +    MR_allocate_aligned_string_msg(S, strlen(buffer), MR_ALLOC_ID);
> > +    strcpy(S, buffer);
> > +").
> 
> Next to each definition of the buffer, I would add the calculation that shows
> this size is big enough e.g. "1 for "-", 3 for digits (-128 to 127), 1 for \0".

Done.

> > +term_to_int8(Term, Int8) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, signed, size_8_bit),
> > +    integer.to_int8(Integer, Int8).
> > +
> > +term_to_int16(Term, Int16) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, signed, size_16_bit),
> > +    integer.to_int16(Integer, Int16).
> > +
> > +term_to_int32(Term, Int32) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, signed, size_32_bit),
> > +    integer.to_int32(Integer, Int32).
> > +
> > term_to_uint(Term, UInt) :-
> >     Term = functor(Const, [], _Context),
> >     Const = integer(_Base, Integer, unsigned, size_word),
> >     integer.to_uint(Integer, UInt).
> > 
> > +term_to_uint8(Term, UInt8) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, unsigned, size_8_bit),
> > +    integer.to_uint8(Integer, UInt8).
> > +
> > +term_to_uint16(Term, UInt16) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, unsigned, size_16_bit),
> > +    integer.to_uint16(Integer, UInt16).
> > +
> > +term_to_uint32(Term, UInt32) :-
> > +    Term = functor(Const, [], _Context),
> > +    Const = integer(_Base, Integer, unsigned, size_32_bit),
> > +    integer.to_uint32(Integer, UInt32).
> > +
> > decimal_term_to_int(Term, Int) :-
> >     Term = functor(Const, [], _Context),
> >     Const = integer(base_10, Integer, signed, size_word),
> > @@ -812,6 +866,36 @@ uint_to_decimal_term(UInt, Context) = Term :-
> >     Const = integer(base_10, integer.from_uint(UInt), unsigned, size_word),
> >     Term = functor(Const, [], Context).
> > 
> > +int8_to_decimal_term(Int8, Context) = Term :-
> > +    Const = integer(base_10, integer.from_int8(Int8), signed,
> > +        size_8_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > +uint8_to_decimal_term(UInt8, Context) = Term :-
> > +    Const = integer(base_10, integer.from_uint8(UInt8), unsigned,
> > +        size_8_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > +int16_to_decimal_term(Int16, Context) = Term :-
> > +    Const = integer(base_10, integer.from_int16(Int16), signed,
> > +        size_16_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > +uint16_to_decimal_term(UInt16, Context) = Term :-
> > +    Const = integer(base_10, integer.from_uint16(UInt16), unsigned,
> > +        size_16_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > +int32_to_decimal_term(Int32, Context) = Term :-
> > +    Const = integer(base_10, integer.from_int32(Int32), signed,
> > +        size_32_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > +uint32_to_decimal_term(UInt32, Context) = Term :-
> > +    Const = integer(base_10, integer.from_uint32(UInt32), unsigned,
> > +        size_32_bit),
> > +    Term = functor(Const, [], Context).
> > +
> > %---------------------------------------------------------------------------%
> 
> The first addition lists the new predicates by signedness first
> (order i8,i16,i32,u8,u16,u32), while the second lists them by size
> (order i8,u8,i16,u16,i32,u32). Why the inconsistency? Between modules,
> I would expect such inconsistencies; within a single module, I don't.

Fixed.

Thanks for the reivew.

Julien.


More information about the reviews mailing list