[m-rev.] for review: format uints directly

Julien Fischer jfischer at opturion.com
Thu Nov 19 22:17:29 AEDT 2020


For review by anyone.

I would also like feedback on:

1. Should we add to X_to_hex_string/1 and X_to_octal_string/1 to the
string module for all integer types?

2. Should we add X_to_binary_string/1 to the string module for all
integer types?

3. If we do, should X_to_{hex,octal,binary}_string return the string
representation of their argument as unsigned for signed integer types?

----------------------

Format uints directly.

Currently, the Mercury implementation of string formatting handles uints by
casting them to ints and then using the code for formatting signed integers as
unsigned values.  Add an implementation that works directly on uints and make
the code that formats signed integers as unsigned integers use that instead.
The new implementation is simpler and avoids unnecessary conversions to
arbitrary precision integers.

Add new functions for converting uint values directly to octal and hexadecimal
strings that use functionality provided by the underlying platforms; replace
the Mercury code that previously did that with calls to these new functions.

library/string.m:
     Add the functions uint_to_hex_string/1 and uint_to_octal_string/1.

library/string.format.m:
     Make format_uint/6 operate directly on uints instead of casting the value
     to a signed int and calling format_unsigned_int/6.

     Make format_unsigned_int/6 cast the int value to a uint and then call
     format_uint/6.

     Delete predicates and functions used to convert ints to octal and
     hexadecimal strings.  We now just use the functions exported by
     the string module.

NEWS:
     Announce the additions to the string module.

tests/hard_coded/Mmakefile:
tests/hard_coded/uint_string_conv.{m,exp*}:
      Add a test of uint string conversion.

Julien.

diff --git a/NEWS b/NEWS
index d92a012..1ee6cf1 100644
--- a/NEWS
+++ b/NEWS
@@ -146,9 +146,11 @@ Changes to the Mercury standard library

  ### Changes to the `string` module

-* The following function has been added:
+* The following functions have been added:

      - func `add_suffix/2`
+    - func `uint_to_hex_string/1`
+    - func `uint_to_octal_string/1`

  * The following function symbols have been added to the type `poly_type`:

diff --git a/library/string.format.m b/library/string.format.m
index b870418..a9587e7 100644
--- a/library/string.format.m
+++ b/library/string.format.m
@@ -466,6 +466,7 @@ format_float_component(Flags, MaybeWidth, MaybePrec, Kind, Float, String) :-
      %   native_format_int/2
      %   native_format_string/2
      %   native_format_char/2
+    %   native_format_uint/2
      %
  :- pred using_sprintf is semidet.

@@ -786,78 +787,54 @@ format_signed_int(Flags, MaybeWidth, MaybePrec, Int) = String :-
      % Format an unsigned int, unsigned octal, or unsigned hexadecimal
      % (u,o,x,X,p).
      %
-    % XXX we should replace most of this with a version that operates directly
-    % on uints.
-    %
  :- func format_unsigned_int(string_format_flags, string_format_maybe_width,
      string_format_maybe_prec, string_format_int_base, int) = string.

  format_unsigned_int(Flags, MaybeWidth, MaybePrec, Base, Int) = String :-
-    ( if Int = 0 then
-        % Zero is a special case. The abs_integer_to_decimal function
-        % returns "" for 0, but returning no digits at all is ok
-        % only if our caller explicitly allowed us to do so.
+    UInt = cast_from_int(Int),
+    String = format_uint(Flags, MaybeWidth, MaybePrec, Base, UInt).
+
+%---------------------------------------------------------------------------%
+
+:- func format_uint(string_format_flags, string_format_maybe_width,
+    string_format_maybe_prec, string_format_int_base, uint) = string.
+
+format_uint(Flags, MaybeWidth, MaybePrec, Base, UInt) = String :-
+    ( if UInt = 0u then
+        % Zero is a special case. uint_to_*string functions return "0" for 0,
+        % but we must return "" if our caller explicitly allowed us to do so.
          ( if MaybePrec = specified_prec(0) then
-            AbsIntStr = ""
+            UIntStr = ""
          else
-            AbsIntStr = "0"
+            UIntStr = "0"
          )
      else
-        % If the platform we are running on can't represent the absolute
-        % value of a 16 bit signed number natively, we are in big trouble.
-        %
-        % Our caller wants us to treat Int as unsigned, but Mercury treats it
-        % as signed. We use native arithmetic on ints (as opposed to arbitrary
-        % precision arithmetic on integers) on Int only in cases where
-        % the two notions coincide, i.e. if we know that Int is positive
-        % even when viewed as a signed number, and that is so even on
-        % 16 bit machines.
-        ( if 0 =< Int, Int =< 32767 then
-            (
-                Base = base_octal,
-                AbsIntStr = abs_int_to_octal(Int)
-            ;
-                Base = base_decimal,
-                AbsIntStr = abs_int_to_decimal(Int)
-            ;
-                ( Base = base_hex_lc
-                ; Base = base_hex_p
-                ),
-                AbsIntStr = abs_int_to_hex_lc(Int)
-            ;
-                Base = base_hex_uc,
-                AbsIntStr = abs_int_to_hex_uc(Int)
-            )
-        else
-            Div = integer.pow(integer.two, integer(int.bits_per_int)),
-            UnsignedInteger = integer(Int) mod Div,
-            (
-                Base = base_octal,
-                AbsIntStr = abs_integer_to_octal(UnsignedInteger)
-            ;
-                Base = base_decimal,
-                AbsIntStr = abs_integer_to_decimal(UnsignedInteger)
-            ;
-                ( Base = base_hex_lc
-                ; Base = base_hex_p
-                ),
-                AbsIntStr = abs_integer_to_hex_lc(UnsignedInteger)
-            ;
-                Base = base_hex_uc,
-                AbsIntStr = abs_integer_to_hex_uc(UnsignedInteger)
-            )
+        (
+            Base = base_octal,
+            UIntStr = uint_to_octal_string(UInt)
+        ;
+            Base = base_decimal,
+            UIntStr = uint_to_string(UInt)
+        ;
+            ( Base = base_hex_lc
+            ; Base = base_hex_p
+            ),
+            UIntStr = uint_to_hex_string(UInt)
+        ;
+            Base = base_hex_uc,
+            UIntStr = string.to_upper(uint_to_hex_string(UInt))
          )
      ),
-    AbsIntStrLength = string.count_codepoints(AbsIntStr),
+    UIntStrLength = string.count_codepoints(UIntStr),

      % Do we need to increase precision?
      ( if
          MaybePrec = specified_prec(Prec),
-        Prec > AbsIntStrLength
+        Prec > UIntStrLength
      then
-        PrecStr = string.pad_left(AbsIntStr, '0', Prec)
+        PrecStr = string.pad_left(UIntStr, '0', Prec)
      else
-        PrecStr = AbsIntStr
+        PrecStr = UIntStr
      ),

      % Do we need to increase the precision of an octal?
@@ -888,11 +865,11 @@ format_unsigned_int(Flags, MaybeWidth, MaybePrec, Base, Int) = String :-
                  Prefix = "0x"
              ;
                  Base = base_hex_lc,
-                Int \= 0,
+                UInt \= 0u,
                  Prefix = "0x"
              ;
                  Base = base_hex_uc,
-                Int \= 0,
+                UInt \= 0u,
                  Prefix = "0X"
              ;
                  ( Base = base_octal
@@ -919,11 +896,11 @@ format_unsigned_int(Flags, MaybeWidth, MaybePrec, Base, Int) = String :-
                  Prefix = "0x"
              ;
                  Base = base_hex_lc,
-                Int \= 0,
+                UInt \= 0u,
                  Prefix = "0x"
              ;
                  Base = base_hex_uc,
-                Int \= 0,
+                UInt \= 0u,
                  Prefix = "0X"
              ;
                  Base = base_octal,
@@ -944,15 +921,6 @@ format_unsigned_int(Flags, MaybeWidth, MaybePrec, Base, Int) = String :-

  %---------------------------------------------------------------------------%

-:- func format_uint(string_format_flags, string_format_maybe_width,
-    string_format_maybe_prec, string_format_int_base, uint) = string.
-
-format_uint(Flags, MaybeWidth, MaybePrec, Base, UInt) = String :-
-    Int = cast_to_int(UInt),
-    String = format_unsigned_int(Flags, MaybeWidth, MaybePrec, Base, Int).
-
-%---------------------------------------------------------------------------%
-
      % Format a float.
      %
  :- func format_float(string_format_flags, string_format_maybe_width,
@@ -1174,30 +1142,6 @@ justify_string(Flags, MaybeWidth, Str) = JustifiedStr :-
  % is guaranteed not to suffer from either of the problems above,
  % so we process it as an Mercury int, which is a lot faster.

-    % Convert a non-negative integer to an octal string.
-    %
-:- func abs_integer_to_octal(integer) = string.
-:- func abs_int_to_octal(int) = string.
-
-abs_integer_to_octal(Num) = NumStr :-
-    ( if Num > integer.zero then
-        Integer8 = integer.eight,
-        FrontDigitsStr = abs_int_to_octal(det_to_int(Num // Integer8)),
-        LastDigitStr = get_octal_digit(det_to_int(Num rem Integer8)),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
-abs_int_to_octal(Num) = NumStr :-
-    ( if Num > 0 then
-        FrontDigitsStr = abs_int_to_octal(Num // 8),
-        LastDigitStr = get_octal_digit(Num rem 8),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
      % Convert a non-negative integer to a decimal string.
      %
  :- func abs_integer_to_decimal(integer) = string.
@@ -1222,66 +1166,8 @@ abs_int_to_decimal(Num) = NumStr :-
          NumStr = ""
      ).

-    % Convert a non-negative integer to a hexadecimal string,
-    % using a-f for to_hex_lc and A-F for to_hex_uc.
-    %
-:- func abs_integer_to_hex_lc(integer) = string.
-:- func abs_integer_to_hex_uc(integer) = string.
-:- func abs_int_to_hex_lc(int) = string.
-:- func abs_int_to_hex_uc(int) = string.
-
-abs_integer_to_hex_lc(Num) = NumStr :-
-    ( if Num > integer.zero then
-        Integer16 = integer.sixteen,
-        FrontDigitsStr = abs_int_to_hex_lc(det_to_int(Num // Integer16)),
-        LastDigitStr = get_hex_digit_lc(det_to_int(Num rem Integer16)),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
-abs_integer_to_hex_uc(Num) = NumStr :-
-    ( if Num > integer.zero then
-        Integer16 = integer.sixteen,
-        FrontDigitsStr = abs_int_to_hex_uc(det_to_int(Num // Integer16)),
-        LastDigitStr = get_hex_digit_uc(det_to_int(Num rem Integer16)),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
-abs_int_to_hex_lc(Num) = NumStr :-
-    ( if Num > 0 then
-        FrontDigitsStr = abs_int_to_hex_lc(Num // 16),
-        LastDigitStr = get_hex_digit_lc(Num rem 16),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
-abs_int_to_hex_uc(Num) = NumStr :-
-    ( if Num > 0 then
-        FrontDigitsStr = abs_int_to_hex_uc(Num // 16),
-        LastDigitStr = get_hex_digit_uc(Num rem 16),
-        NumStr = append(FrontDigitsStr, LastDigitStr)
-    else
-        NumStr = ""
-    ).
-
  %---------------------------------------------------------------------------%

-    % Given an int between 0 and 7, return the octal digit representing it.
-    %
-:- func get_octal_digit(int) = string.
-:- pragma inline(get_octal_digit/1).
-
-get_octal_digit(Int) = Octal :-
-    ( if octal_digit(Int, OctalPrime) then
-        Octal = OctalPrime
-    else
-        unexpected($pred, "octal_digit failed")
-    ).
-
      % Given an int between 0 and 9, return the decimal digit representing it.
      %
  :- func get_decimal_digit(int) = string.
@@ -1294,40 +1180,6 @@ get_decimal_digit(Int) = Decimal :-
          unexpected($pred, "decimal_digit failed")
      ).

-    % Given an int between 0 and 15, return the hexadecimal digit
-    % representing it, using a-f for get_hex_digit_lc and
-    % A-F for get_hex_digit_uc.
-    %
-:- func get_hex_digit_lc(int) = string.
-:- func get_hex_digit_uc(int) = string.
-:- pragma inline(get_hex_digit_lc/1).
-:- pragma inline(get_hex_digit_uc/1).
-
-get_hex_digit_lc(Int) = HexLC :-
-    ( if hex_digit(Int, HexLCPrime, _HexUC) then
-        HexLC = HexLCPrime
-    else
-        unexpected($pred, "hex_digit failed")
-    ).
-
-get_hex_digit_uc(Int) = HexUC :-
-    ( if hex_digit(Int, _HexLC, HexUCPrime) then
-        HexUC = HexUCPrime
-    else
-        unexpected($pred, "hex_digit failed")
-    ).
-
-:- pred octal_digit(int::in, string::out) is semidet.
-
-octal_digit(0, "0").
-octal_digit(1, "1").
-octal_digit(2, "2").
-octal_digit(3, "3").
-octal_digit(4, "4").
-octal_digit(5, "5").
-octal_digit(6, "6").
-octal_digit(7, "7").
-
  :- pred decimal_digit(int::in, string::out) is semidet.

  decimal_digit(0, "0").
@@ -1341,25 +1193,6 @@ decimal_digit(7, "7").
  decimal_digit(8, "8").
  decimal_digit(9, "9").

-:- pred hex_digit(int::in, string::out, string::out) is semidet.
-
-hex_digit( 0, "0", "0").
-hex_digit( 1, "1", "1").
-hex_digit( 2, "2", "2").
-hex_digit( 3, "3", "3").
-hex_digit( 4, "4", "4").
-hex_digit( 5, "5", "5").
-hex_digit( 6, "6", "6").
-hex_digit( 7, "7", "7").
-hex_digit( 8, "8", "8").
-hex_digit( 9, "9", "9").
-hex_digit(10, "a", "A").
-hex_digit(11, "b", "B").
-hex_digit(12, "c", "C").
-hex_digit(13, "d", "D").
-hex_digit(14, "e", "E").
-hex_digit(15, "f", "F").
-
  %---------------------------------------------------------------------------%

      % Unlike the standard library function, this function converts a float
diff --git a/library/string.m b/library/string.m
index 0046a41..28228dd 100644
--- a/library/string.m
+++ b/library/string.m
@@ -1469,10 +1469,19 @@
  :- func int_to_base_string_group(int, int, int, string) = string.
  :- mode int_to_base_string_group(in, in, in, in) = uo is det.

-    % Convert an unsigned integer to a string.
+    % Convert an unsigned integer to a string in base 10.
      %
  :- func uint_to_string(uint::in) = (string::uo) is det.

+    % Convert an unsigned integer to a string in base 16.
+    % Alphabetic digits will be lowercase (e.g. a-f).
+    %
+:- func uint_to_hex_string(uint::in) = (string::uo) is det.
+
+    % Convert an unsigned integer to a string in base 8.
+    %
+:- func uint_to_octal_string(uint::in) = (string::uo) is det.
+
      % Convert a signed/unsigned 8/16/32/64 bit integer to a string.
      %
  :- func int8_to_string(int8::in) = (string::uo) is det.
@@ -5556,6 +5565,56 @@ int_to_base_string_group_2(NegN, Base, Curr, GroupLength, Sep, Str) :-
      Str = java.lang.Long.toString(U & 0xffffffffL);
  ").

+:- pragma foreign_proc("C",
+    uint_to_hex_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
+        does_not_affect_liveness, no_sharing],
+"
+    char buffer[21];
+    sprintf(buffer, ""%"" MR_INTEGER_LENGTH_MODIFIER ""x"", U);
+    MR_allocate_aligned_string_msg(Str, strlen(buffer), MR_ALLOC_ID);
+    strcpy(Str, buffer);
+").
+
+:- pragma foreign_proc("C#",
+    uint_to_hex_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    Str = U.ToString(""x"");
+").
+
+:- pragma foreign_proc("Java",
+    uint_to_hex_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    Str = java.lang.Integer.toHexString(U);
+").
+
+:- pragma foreign_proc("C",
+    uint_to_octal_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
+        does_not_affect_liveness, no_sharing],
+"
+    char buffer[21];
+    sprintf(buffer, ""%"" MR_INTEGER_LENGTH_MODIFIER ""o"", U);
+    MR_allocate_aligned_string_msg(Str, strlen(buffer), MR_ALLOC_ID);
+    strcpy(Str, buffer);
+").
+
+:- pragma foreign_proc("C#",
+    uint_to_octal_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    Str = System.Convert.ToString(U, 8);
+").
+
+:- pragma foreign_proc("Java",
+    uint_to_octal_string(U::in) = (Str::uo),
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    Str = java.lang.Integer.toOctalString(U);
+").
+
  %---------------------%

  :- pragma foreign_proc("C",
diff --git a/tests/hard_coded/Mmakefile b/tests/hard_coded/Mmakefile
index abdf523..1ff7d49 100644
--- a/tests/hard_coded/Mmakefile
+++ b/tests/hard_coded/Mmakefile
@@ -436,6 +436,7 @@ ORDINARY_PROGS = \
  	type_to_term \
  	type_to_term_bug \
  	uc_export_enum \
+	uint_string_conv \
  	uint16_from_bytes \
  	uint16_switch_test \
  	uint32_from_bytes \
diff --git a/tests/hard_coded/uint_string_conv.exp b/tests/hard_coded/uint_string_conv.exp
index e69de29..d29865e 100644
--- a/tests/hard_coded/uint_string_conv.exp
+++ b/tests/hard_coded/uint_string_conv.exp
@@ -0,0 +1,28 @@
+Decimal                Octal                    Hex 
+0                      0                        0 
+1                      1                        1 
+2                      2                        2 
+3                      3                        3 
+4                      4                        4 
+7                      7                        7 
+8                      10                       8 
+9                      11                       9 
+10                     12                       a 
+11                     13                       b 
+12                     14                       c 
+13                     15                       d 
+14                     16                       e 
+15                     17                       f 
+16                     20                       10 
+32                     40                       20 
+64                     100                      40 
+127                    177                      7f 
+128                    200                      80 
+255                    377                      ff 
+256                    400                      100 
+32767                  77777                    7fff 
+65535                  177777                   ffff 
+2147483647             17777777777              7fffffff 
+4294967295             37777777777              ffffffff 
+4294967295             37777777777              ffffffff 
+
diff --git a/tests/hard_coded/uint_string_conv.exp2 b/tests/hard_coded/uint_string_conv.exp2
index e69de29..1f7e451 100644
--- a/tests/hard_coded/uint_string_conv.exp2
+++ b/tests/hard_coded/uint_string_conv.exp2
@@ -0,0 +1,28 @@
+Decimal                Octal                    Hex 
+0                      0                        0 
+1                      1                        1 
+2                      2                        2 
+3                      3                        3 
+4                      4                        4 
+7                      7                        7 
+8                      10                       8 
+9                      11                       9 
+10                     12                       a 
+11                     13                       b 
+12                     14                       c 
+13                     15                       d 
+14                     16                       e 
+15                     17                       f 
+16                     20                       10 
+32                     40                       20 
+64                     100                      40 
+127                    177                      7f 
+128                    200                      80 
+255                    377                      ff 
+256                    400                      100 
+32767                  77777                    7fff 
+65535                  177777                   ffff 
+2147483647             17777777777              7fffffff 
+4294967295             37777777777              ffffffff 
+18446744073709551615   1777777777777777777777   ffffffffffffffff 
+
diff --git a/tests/hard_coded/uint_string_conv.m b/tests/hard_coded/uint_string_conv.m
index e69de29..73d8f42 100644
--- a/tests/hard_coded/uint_string_conv.m
+++ b/tests/hard_coded/uint_string_conv.m
@@ -0,0 +1,69 @@
+%---------------------------------------------------------------------------%
+% vim: ft=mercury ts=4 sw=4 et
+%---------------------------------------------------------------------------%
+% Test conversion of uints to strings.
+% The .exp file is from systems where uint is 32 bit.
+% The .exp2 file is for systems wwhere uint is 64 bit.
+%---------------------------------------------------------------------------%
+
+:- module uint_string_conv.
+:- interface.
+
+:- import_module io.
+
+:- pred main(io::di, io::uo) is det.
+
+%---------------------------------------------------------------------------%
+%---------------------------------------------------------------------------%
+
+:- implementation.
+
+:- import_module list.
+:- import_module string.
+:- import_module uint.
+
+main(!IO) :-
+    io.format("%-22s %-24s %-22s\n", [s("Decimal"), s("Octal"), s("Hex")],
+        !IO),
+    list.foldl(do_test, test_values, !IO),
+    io.nl(!IO).
+
+:- pred do_test( uint, io, io).
+:- mode do_test(in, di, uo) is det.
+
+do_test(U, !IO) :-
+   Decimal = uint_to_string(U),
+   Octal = uint_to_octal_string(U),
+   Hex = uint_to_hex_string(U),
+   io.format("%-22s %-24s %-22s\n", [s(Decimal), s(Octal), s(Hex)], !IO).
+
+:- func test_values = list(uint).
+
+test_values = [
+   0u,
+   1u,
+   2u,
+   3u,
+   4u,
+   7u,
+   8u,
+   9u,
+   10u,
+   11u,
+   12u,
+   13u,
+   14u,
+   15u,
+   16u,
+   32u,
+   64u,
+   127u,
+   128u,
+   255u,
+   256u,
+   32767u,
+   65535u,
+   2147483647u,
+   4294967295u,
+   uint.max_uint
+].


More information about the reviews mailing list