[m-rev.] for review: Make base_string_to_int check overflow/underflow for all bases.

Peter Wang novalazy at gmail.com
Fri Feb 13 17:00:25 AEDT 2015


Make base_string_to_int check for overflow and underflow when converting
from strings in all bases, not only base 10.

Previously it was stated that "numbers not in base 10 are assumed to
denote bit patterns and are not checked for overflow."  Though not a
safe assumption in general, in Mercury source files it is useful to be
able to write values with the high bit set, e.g. 0x80000000 on 32-bit
machines, that would be greater than max_int if interpreted as a
positive integer.

The changed behaviour of base_string_to_int would reject such literals
from Mercury sources, so additional changes are required to maintain
that usage.  However, unlike before, the compiler will report an
error if some non-zero bits of the literal would be discarded.

library/string.m:
	Enable overflow/underflow checking for base_string_to_int for
	any base.

	Update documentation.

library/lexer.m:
	Allow `big_integer' token functor to represent non-base 10
	literals as well.

library/term.m:
	Add `big_integer' term functor.

library/term_io.m:
	Add private helper function `integer_literal_base_prefix'.

	Conform to changes.

library/parser.m:
	Pass through `big_integer' tokens as `big_integer' terms.

	Conform to changes.

compiler/prog_util.m:
	Add predicate to convert integer literals in Mercury sources to
	ints, with the aforementioned concession for bit patterns.

	`make_functor_cons_id' can now fail due to integer tokens
	exceeding the range of `int'.

compiler/superhomogeneous.m:
	Make `unravel_var_functor_unification' convert `big_integer'
	tokens on the RHS to a simple `int' with the aforementioned
	concession for bit patterns, or add an error message if any
	significant bits would be discarded.

compiler/fact_table.m:
compiler/mercury_to_mercury.m:
compiler/module_imports.m:
compiler/prog_io_util.m:
	Conform to changes.

compiler/make.util.m:
	Delete unused predicate.

tests/general/test_string_to_int_overflow.m:
tests/general/test_string_to_int_overflow.exp:
tests/general/test_string_to_int_overflow.exp2:
tests/general/test_string_to_int_overflow.exp3:
        Rewrite test case.

tests/hard_coded/lexer_bigint.exp:
tests/hard_coded/lexer_bigint.exp2:
tests/hard_coded/read_min_int.exp:
tests/hard_coded/read_min_int.exp2:
	Update expected outputs due to the lexer and term module changes.

tests/invalid/Mmakefile:
tests/invalid/invalid_int.err_exp:
tests/invalid/invalid_int.err_exp2:
tests/invalid/invalid_int.m:
	Add new test case.

NEWS:
	Announce the changes.

diff --git a/NEWS b/NEWS
index bfa7323..9a8a016 100644
--- a/NEWS
+++ b/NEWS
@@ -89,6 +89,9 @@ Changes to the Mercury standard library:
 * Float special values, NaNs and Infinities, are now converted to strings in
   a way that is backend and grade-independent.  (Bug #348)
 
+* string.base_digit_to_int/3 and string.det_base_digit_to_int/2 now check
+  for overflow and underflow in all bases, not only base 10.
+
 * The following classification predicates have been added to the float module:
 
    - is_finite/1
@@ -177,6 +180,9 @@ Changes to the Mercury compiler:
   using GCC as the C compiler.
   See README.MacOS for further details.
 
+* The compiler now reports an error for binary/octal/hexadecimal integer
+  literals that cannot be represented in the compiler's native int type.
+
 Changes to the extras distribution:
 
 * We have added support for Unicode and other enhancements to the lex and
diff --git a/compiler/fact_table.m b/compiler/fact_table.m
index 3e7bec7..b3d2888 100644
--- a/compiler/fact_table.m
+++ b/compiler/fact_table.m
@@ -115,6 +115,7 @@
 :- import_module parse_tree.module_cmds.
 :- import_module parse_tree.prog_foreign.
 :- import_module parse_tree.prog_out.
+:- import_module parse_tree.prog_util.
 
 :- import_module assoc_list.
 :- import_module bool.
@@ -512,6 +513,9 @@ check_fact_type_and_mode(Types0, [Term | Terms], ArgNum0, PredOrFunc,
             Functor = term.integer(_),
             RequiredType = yes(builtin_type_int)
         ;
+            Functor = term.big_integer(_, _),
+            RequiredType = yes(builtin_type_int)
+        ;
             Functor = term.float(_),
             RequiredType = yes(builtin_type_float)
         ;
@@ -1073,6 +1077,13 @@ make_key_part(term.atom(_)) = _ :-
 make_key_part(term.integer(I)) =
     % convert int to base 36 to reduce the size of the I/O.
     string.int_to_base_string(I, 36).
+make_key_part(term.big_integer(Base, IntString)) = String :-
+    ( source_string_to_int(Base, IntString, I) ->
+        % convert int to base 36 to reduce the size of the I/O.
+        String = string.int_to_base_string(I, 36)
+    ;
+        unexpected($module, $pred, "integer too big")
+    ).
 make_key_part(term.float(F)) =
     string.float_to_string(F).
 make_key_part(term.string(S)) = K :-
@@ -1316,6 +1327,14 @@ write_fact_args([Arg | Args], OutputStream, !IO) :-
         io.write_int(OutputStream, Int, !IO),
         io.write_string(OutputStream, ", ", !IO)
     ;
+        Arg = term.big_integer(Base, IntString),
+        ( source_string_to_int(Base, IntString, Int) ->
+            io.write_int(OutputStream, Int, !IO),
+            io.write_string(OutputStream, ", ", !IO)
+        ;
+            unexpected($module, $pred, "integer too big")
+        )
+    ;
         Arg = term.float(Float),
         io.write_float(OutputStream, Float, !IO),
         io.write_string(OutputStream, ", ", !IO)
diff --git a/compiler/make.util.m b/compiler/make.util.m
index 470c9ae..96376c0 100644
--- a/compiler/make.util.m
+++ b/compiler/make.util.m
@@ -2017,10 +2017,6 @@ mix(H0, X) = H :-
     H1 = H0 `xor` (H0 `unchecked_left_shift` 5),
     H = H1 `xor` X.
 
-:- func concoct_second_hash(int) = int.
-
-concoct_second_hash(H) = mix(H, 0xfe3dbe7f).    % whatever
-
 %-----------------------------------------------------------------------------%
 :- end_module make.util.
 %-----------------------------------------------------------------------------%
diff --git a/compiler/mercury_to_mercury.m b/compiler/mercury_to_mercury.m
index e90ba74..07b5b59 100644
--- a/compiler/mercury_to_mercury.m
+++ b/compiler/mercury_to_mercury.m
@@ -4448,6 +4448,7 @@ mercury_limited_term_nq_to_string(VarSet, AppendVarnums, NextToGraphicToken,
                 String = FunctorString ++ "/" ++ ArityStr
             ;
                 ( Functor = term.integer(_)
+                ; Functor = term.big_integer(_, _)
                 ; Functor = term.float(_)
                 ; Functor = term.string(_)
                 ; Functor = term.implementation_defined(_)
diff --git a/compiler/module_imports.m b/compiler/module_imports.m
index fc9da08..891a10f 100644
--- a/compiler/module_imports.m
+++ b/compiler/module_imports.m
@@ -998,6 +998,7 @@ gather_implicit_import_needs_in_term(Term, !ImplicitImportNeeds) :-
             )
         ;
             ( Const = integer(_)
+            ; Const = big_integer(_, _)
             ; Const = string(_)
             ; Const = float(_)
             ; Const = implementation_defined(_)
diff --git a/compiler/prog_io_util.m b/compiler/prog_io_util.m
index c74dbb4..2b34527 100644
--- a/compiler/prog_io_util.m
+++ b/compiler/prog_io_util.m
@@ -830,6 +830,7 @@ convert_bound_inst_list(AllowConstrainedInstVar, [H0 | T0], [H | T]) :-
 
 convert_bound_inst(AllowConstrainedInstVar, InstTerm, BoundInst) :-
     InstTerm = term.functor(Functor, Args0, _),
+    require_complete_switch [Functor]
     (
         Functor = term.atom(_),
         try_parse_sym_name_and_args_from_f_args(Functor, Args0,
@@ -843,12 +844,13 @@ convert_bound_inst(AllowConstrainedInstVar, InstTerm, BoundInst) :-
         fail
     ;
         ( Functor = term.integer(_)
+        ; Functor = term.big_integer(_, _)
         ; Functor = term.float(_)
         ; Functor = term.string(_)
         ),
         Args1 = Args0,
         list.length(Args1, Arity),
-        ConsId = make_functor_cons_id(Functor, Arity)
+        make_functor_cons_id(Functor, Arity, ConsId)
     ),
     convert_inst_list(AllowConstrainedInstVar, Args1, Args),
     BoundInst = bound_functor(ConsId, Args).
diff --git a/compiler/prog_util.m b/compiler/prog_util.m
index 01348ea..bcf6c10 100644
--- a/compiler/prog_util.m
+++ b/compiler/prog_util.m
@@ -129,7 +129,18 @@
     % The reverse conversion - make a cons_id for a functor.
     % Given a const and an arity for the functor, create a cons_id.
     %
-:- func make_functor_cons_id(const, arity) = cons_id.
+:- pred make_functor_cons_id(const::in, arity::in, cons_id::out) is semidet.
+:- pred det_make_functor_cons_id(const::in, arity::in, cons_id::out) is det.
+
+    % source_string_to_int(Base, String, Int):
+    %
+    % Convert a non-negative integer literal to a native int. For base 10, this
+    % predicate succeeds iff the value of String does not exceed int.max_int.
+    % For other bases, this predicate succeeds iff the value of String can be
+    % represented by an unsigned integer of the same width as `int', and `Int'
+    % is the signed integer with the same bit pattern as that unsigned value.
+    %
+:- pred source_string_to_int(int::in, string::in, int::out) is semidet.
 
 %-----------------------------------------------------------------------------%
 
@@ -203,7 +214,9 @@
 :- import_module parse_tree.prog_out.
 
 :- import_module bool.
+:- import_module char.
 :- import_module int.
+:- import_module integer.
 :- import_module map.
 :- import_module pair.
 :- import_module require.
@@ -669,13 +682,49 @@ cons_id_maybe_arity(tabling_info_const(_)) = no.
 cons_id_maybe_arity(deep_profiling_proc_layout(_)) = no.
 cons_id_maybe_arity(table_io_entry_desc(_)) = no.
 
-make_functor_cons_id(term.atom(Name), Arity) =
-    cons(unqualified(Name), Arity, cons_id_dummy_type_ctor).
-make_functor_cons_id(term.integer(Int), _) = int_const(Int).
-make_functor_cons_id(term.string(String), _) = string_const(String).
-make_functor_cons_id(term.float(Float), _) = float_const(Float).
-make_functor_cons_id(term.implementation_defined(Name), _) =
-    impl_defined_const(Name).
+make_functor_cons_id(Functor, Arity, ConsId) :-
+    require_complete_switch [Functor]
+    (
+        Functor = term.atom(Name),
+        ConsId = cons(unqualified(Name), Arity, cons_id_dummy_type_ctor)
+    ;
+        Functor = term.integer(Int),
+        ConsId = int_const(Int)
+    ;
+        Functor = term.big_integer(Base, IntString),
+        source_string_to_int(Base, IntString, Int),
+        ConsId = int_const(Int)
+    ;
+        Functor = term.string(String),
+        ConsId = string_const(String)
+    ;
+        Functor = term.float(Float),
+        ConsId = float_const(Float)
+    ;
+        Functor = term.implementation_defined(Name),
+        ConsId = impl_defined_const(Name)
+    ).
+
+det_make_functor_cons_id(Functor, Arity, ConsId) :-
+    ( make_functor_cons_id(Functor, Arity, ConsIdPrime) ->
+        ConsId = ConsIdPrime
+    ;
+        unexpected($module, $pred, "make_functor_cons_id failed")
+    ).
+
+source_string_to_int(Base, String, Int) :-
+    ( Base = 10 ->
+        base_string_to_int(Base, String, Int)
+    ;
+        integer.from_base_string(Base, String, Integer),
+        ( Integer > integer(max_int) ->
+            NegInteger = Integer + integer(min_int) + integer(min_int),
+            integer.to_int(NegInteger, Int),
+            Int < 0
+        ;
+            integer.to_int(Integer, Int)
+        )
+    ).
 
 %-----------------------------------------------------------------------------%
 
diff --git a/compiler/superhomogeneous.m b/compiler/superhomogeneous.m
index 01a82d0..0027e09 100644
--- a/compiler/superhomogeneous.m
+++ b/compiler/superhomogeneous.m
@@ -126,7 +126,9 @@
 :- import_module pair.
 :- import_module require.
 :- import_module set.
+:- import_module string.
 :- import_module term.
+:- import_module term_io.
 :- import_module varset.
 
 %-----------------------------------------------------------------------------%
@@ -599,9 +601,10 @@ classify_unravel_var_unification(XVar, YTerm, Context, MainContext, SubContext,
     qual_info::in, qual_info::out,
     list(error_spec)::in, list(error_spec)::out) is det.
 
-unravel_var_functor_unification(XVar, YFunctor, YArgTerms0, YFunctorContext,
+unravel_var_functor_unification(XVar, YFunctor0, YArgTerms0, YFunctorContext,
         Context, MainContext, SubContext, Purity, Order, Expansion,
         !SVarState, !SVarStore, !VarSet, !ModuleInfo, !QualInfo, !Specs) :-
+    convert_big_integer_functor(YFunctor0, YFunctor, YFunctorContext, !Specs),
     substitute_state_var_mappings(YArgTerms0, YArgTerms, !VarSet,
         !SVarState, !Specs),
     (
@@ -684,7 +687,7 @@ unravel_var_functor_unification(XVar, YFunctor, YArgTerms0, YFunctorContext,
             % float, int or string constant
             %   - any errors will be caught by typechecking
             list.length(YArgTerms, Arity),
-            ConsId = make_functor_cons_id(YFunctor, Arity),
+            det_make_functor_cons_id(YFunctor, Arity, ConsId),
             MaybeQualifiedYArgTerms = YArgTerms
         ),
         % At this point, we have done state variable name expansion
@@ -742,6 +745,26 @@ unravel_var_functor_unification(XVar, YFunctor, YArgTerms0, YFunctorContext,
         )
     ).
 
+:- pred convert_big_integer_functor(term.const::in, term.const::out,
+    term.context::in, list(error_spec)::in, list(error_spec)::out) is det.
+
+convert_big_integer_functor(Functor0, Functor, Context, !Specs) :-
+    ( Functor0 = big_integer(Base, IntString) ->
+        ( source_string_to_int(Base, IntString, Int) ->
+            Functor = term.integer(Int)
+        ;
+            BasePrefix = integer_literal_base_prefix(Base),
+            Pieces = [words("Error: integer literal is too big"),
+                quote(BasePrefix ++ IntString), suffix(".")],
+            Msg = simple_msg(Context, [always(Pieces)]),
+            Spec = error_spec(severity_error, phase_parse_tree_to_hlds, [Msg]),
+            !:Specs = [Spec | !.Specs],
+            Functor = term.integer(0) % dummy
+        )
+    ;
+        Functor = Functor0
+    ).
+
     % See whether Atom indicates a term with special syntax.
     %
 :- pred maybe_unravel_special_var_functor_unification(prog_var::in,
diff --git a/library/lexer.m b/library/lexer.m
index b7c2a28..cbf755b 100644
--- a/library/lexer.m
+++ b/library/lexer.m
@@ -30,7 +30,10 @@
     --->    name(string)
     ;       variable(string)
     ;       integer(int)
-    ;       big_integer(string) % does not fit in int
+    ;       big_integer(int, string)
+                                % An integer that is too big for `int'. The
+                                % arguments are the base (2, 8, 10, 16) and
+                                % digits of the literal.
     ;       float(float)
     ;       string(string)      % "...."
     ;       implementation_defined(string) % $name
@@ -154,45 +157,91 @@
 
 %---------------------------------------------------------------------------%
 
-token_to_string(name(Name), String) :-
-    string.append_list(["token '", Name, "'"], String).
-token_to_string(variable(Var), String) :-
-    string.append_list(["variable `", Var, "'"], String).
-token_to_string(integer(Int), String) :-
-    string.int_to_string(Int, IntString),
-    string.append_list(["integer `", IntString, "'"], String).
-token_to_string(big_integer(BigInt), String) :-
-    string.append_list(["big integer `", BigInt, "'"], String).
-token_to_string(float(Float), String) :-
-    string.float_to_string(Float, FloatString),
-    string.append_list(["float `", FloatString, "'"], String).
-token_to_string(string(Token), String) :-
-    string.append_list(["string """, Token, """"], String).
-token_to_string(implementation_defined(Name), String) :-
-    string.append_list(["implementation-defined `$", Name, "'"], String).
-token_to_string(open, "token ` ('").
-token_to_string(open_ct, "token `('").
-token_to_string(close, "token `)'").
-token_to_string(open_list, "token `['").
-token_to_string(close_list, "token `]'").
-token_to_string(open_curly, "token `{'").
-token_to_string(close_curly, "token `}'").
-token_to_string(ht_sep, "token `|'").
-token_to_string(comma, "token `,'").
-token_to_string(end, "token `. '").
-token_to_string(eof, "end-of-file").
-token_to_string(junk(JunkChar), String) :-
-    char.to_int(JunkChar, Code),
-    string.int_to_base_string(Code, 16, Hex),
-    string.append_list(["illegal character <<0x", Hex, ">>"], String).
-token_to_string(io_error(IO_Error), String) :-
-    io.error_message(IO_Error, IO_ErrorMessage),
-    string.append("I/O error: ", IO_ErrorMessage, String).
-token_to_string(error(Message), String) :-
-    string.append_list(["illegal token (", Message, ")"], String).
-token_to_string(integer_dot(Int), String) :-
-    string.int_to_string(Int, IntString),
-    string.append_list(["integer `", IntString, "'."], String).
+token_to_string(Token, String) :-
+    (
+        Token = name(Name),
+        string.append_list(["token '", Name, "'"], String)
+    ;
+        Token = variable(Var),
+        string.append_list(["variable `", Var, "'"], String)
+    ;
+        Token = integer(Int),
+        string.int_to_string(Int, IntString),
+        string.append_list(["integer `", IntString, "'"], String)
+    ;
+        Token = big_integer(Base, IntString),
+        ( Base = 10 ->
+            string.append_list(["integer `", IntString, "'"], String)
+        ; Base = 16 ->
+            string.append_list(["integer `0x", IntString, "'"], String)
+        ; Base = 8 ->
+            string.append_list(["integer `0o", IntString, "'"], String)
+        ; Base = 2 ->
+            string.append_list(["integer `0b", IntString, "'"], String)
+        ;
+            unexpected($module, $pred,
+                "big_integer with base " ++ from_int(Base))
+        )
+    ;
+        Token = float(Float),
+        string.float_to_string(Float, FloatString),
+        string.append_list(["float `", FloatString, "'"], String)
+    ;
+        Token = string(TokenString),
+        string.append_list(["string """, TokenString, """"], String)
+    ;
+        Token = implementation_defined(Name),
+        string.append_list(["implementation-defined `$", Name, "'"], String)
+    ;
+        Token = open,
+        String = "token ` ('"
+    ;
+        Token = open_ct,
+        String = "token `('"
+    ;
+        Token = close,
+        String = "token `)'"
+    ;
+        Token = open_list,
+        String = "token `['"
+    ;
+        Token = close_list,
+        String = "token `]'"
+    ;
+        Token = open_curly,
+        String = "token `{'"
+    ;
+        Token = close_curly,
+        String = "token `}'"
+    ;
+        Token = ht_sep,
+        String = "token `|'"
+    ;
+        Token = comma,
+        String = "token `,'"
+    ;
+        Token = end,
+        String = "token `. '"
+    ;
+        Token = eof,
+        String = "end-of-file"
+    ;
+        Token = junk(JunkChar),
+        char.to_int(JunkChar, Code),
+        string.int_to_base_string(Code, 16, Hex),
+        string.append_list(["illegal character <<0x", Hex, ">>"], String)
+    ;
+        Token = io_error(IO_Error),
+        io.error_message(IO_Error, IO_ErrorMessage),
+        string.append("I/O error: ", IO_ErrorMessage, String)
+    ;
+        Token = error(Message),
+        string.append_list(["illegal token (", Message, ")"], String)
+    ;
+        Token = integer_dot(Int),
+        string.int_to_string(Int, IntString),
+        string.append_list(["integer `", IntString, "'."], String)
+    ).
 
     % We build the tokens up as lists of characters in reverse order.
     % When we get to the end of each token, we call
@@ -233,7 +282,7 @@ get_token_list_2(Stream, Token0, Context0, Tokens, !IO) :-
         ; Token0 = string(_)
         ; Token0 = variable(_)
         ; Token0 = integer(_)
-        ; Token0 = big_integer(_)
+        ; Token0 = big_integer(_, _)
         ; Token0 = implementation_defined(_)
         ; Token0 = junk(_)
         ; Token0 = name(_)
@@ -272,7 +321,7 @@ string_get_token_list_max(String, Len, Tokens, !Posn) :-
         ; Token = string(_)
         ; Token = variable(_)
         ; Token = integer(_)
-        ; Token = big_integer(_)
+        ; Token = big_integer(_, _)
         ; Token = integer_dot(_)
         ; Token = implementation_defined(_)
         ; Token = junk(_)
@@ -2445,10 +2494,8 @@ rev_char_list_to_int(RevChars, Base, Token) :-
 conv_string_to_int(String, Base, Token) :-
     ( string.base_string_to_int(Base, String, Int) ->
         Token = integer(Int)
-    ; Base = 10 ->
-        Token = big_integer(String)
     ;
-        Token = error("invalid integer token")
+        Token = big_integer(Base, String)
     ).
 
 :- pred rev_char_list_to_float(list(char)::in, token::out) is det.
diff --git a/library/parser.m b/library/parser.m
index 1ded525..108b89d 100644
--- a/library/parser.m
+++ b/library/parser.m
@@ -274,7 +274,7 @@ check_for_bad_token(token_cons(Token, LineNum0, Tokens), Message, LineNum) :-
         ( Token = name(_)
         ; Token = variable(_)
         ; Token = integer(_)
-        ; Token = big_integer(_)
+        ; Token = big_integer(_, _)
         ; Token = float(_)
         ; Token = string(_)
         ; Token = implementation_defined(_)
@@ -373,8 +373,8 @@ parse_left_term(MaxPriority, TermKind, OpPriority, Term, !PS) :-
                 IntToken = integer(X),
                 NegX = 0 - X
             ;
-                IntToken = big_integer(BigString),
-                max_int_plus_1(int.bits_per_int, BigString),
+                IntToken = big_integer(10, BigString),
+                decimal_max_int_plus_1(int.bits_per_int, BigString),
                 NegX = int.min_int
             )
         ->
@@ -661,9 +661,9 @@ parse_simple_term_2(integer(Int), Context, _, Term, !PS) :-
     get_term_context(!.PS, Context, TermContext),
     Term = ok(term.functor(term.integer(Int), [], TermContext)).
 
-parse_simple_term_2(big_integer(_), _Context, _, _Term, !PS) :-
-    % The term type does not yet support big integers.
-    fail.
+parse_simple_term_2(big_integer(Base, String), Context, _, Term, !PS) :-
+    get_term_context(!.PS, Context, TermContext),
+    Term = ok(term.functor(term.big_integer(Base, String), [], TermContext)).
 
 parse_simple_term_2(float(Float), Context, _, Term, !PS) :-
     get_term_context(!.PS, Context, TermContext),
@@ -993,7 +993,7 @@ make_error(ParserState, Message) = error(Message, Tokens) :-
 could_start_term(name(_), yes).
 could_start_term(variable(_), yes).
 could_start_term(integer(_), yes).
-could_start_term(big_integer(_), yes).
+could_start_term(big_integer(_, _), yes).
 could_start_term(float(_), yes).
 could_start_term(string(_), yes).
 could_start_term(implementation_defined(_), yes).
@@ -1015,10 +1015,10 @@ could_start_term(integer_dot(_), no).
 
 %---------------------------------------------------------------------------%
 
-:- pred max_int_plus_1(int::in, string::in) is semidet.
+:- pred decimal_max_int_plus_1(int::in, string::in) is semidet.
 
-max_int_plus_1(32, "2147483648").
-max_int_plus_1(64, "9223372036854775808").
+decimal_max_int_plus_1(32, "2147483648").
+decimal_max_int_plus_1(64, "9223372036854775808").
 
 %---------------------------------------------------------------------------%
 
diff --git a/library/string.m b/library/string.m
index 9283b2b..62dcdd0 100644
--- a/library/string.m
+++ b/library/string.m
@@ -1095,15 +1095,15 @@
     % must contain one or more digits in the specified base, optionally
     % preceded by a plus or minus sign. For bases > 10, digits 10 to 35
     % are represented by the letters A-Z or a-z. If the string does not match
-    % this syntax or the base is 10 and the number is not in the range
-    % [min_int, max_int], the predicate fails.
+    % this syntax or the number is not in the range [min_int, max_int],
+    % the predicate fails.
     %
 :- pred base_string_to_int(int::in, string::in, int::out) is semidet.
 
     % Convert a signed base N string to an int. Throws an exception
     % if the string argument is not precisely an optional sign followed by
-    % a non-empty string of base N digits and, if the base is 10, the number
-    % is in the range [min_int, max_int].
+    % a non-empty string of base N digits and the number is in the range
+    % [min_int, max_int].
     %
 :- func det_base_string_to_int(int, string) = int.
 
@@ -4981,7 +4981,9 @@ string.det_base_string_to_int(Base, S) = N :-
 accumulate_int(Base, Char, N0, N) :-
     char.base_digit_to_int(Base, Char, M),
     N = (Base * N0) + M,
-    ( N0 =< N ; Base \= 10 ).       % Fail on overflow for base 10 numbers.
+    % Fail on overflow.
+    % XXX depends on undefined behaviour
+    N0 =< N.
 
 :- pred accumulate_negative_int(int::in, char::in,
     int::in, int::out) is semidet.
@@ -4989,7 +4991,9 @@ accumulate_int(Base, Char, N0, N) :-
 accumulate_negative_int(Base, Char, N0, N) :-
     char.base_digit_to_int(Base, Char, M),
     N = (Base * N0) - M,
-    ( N =< N0 ; Base \= 10 ).       % Fail on underflow for base 10 numbers.
+    % Fail on overflow.
+    % XXX depends on undefined behaviour
+    N =< N0.
 
 %---------------------%
 
diff --git a/library/term.m b/library/term.m
index 069b333..7269bd3 100644
--- a/library/term.m
+++ b/library/term.m
@@ -51,6 +51,9 @@
 :- type const
     --->    atom(string)
     ;       integer(int)
+    ;       big_integer(int, string)
+            % An integer that is too big for `int'. The arguments are the base
+            % (2, 8, 10, 16) and digits of the literal.
     ;       string(string)
     ;       float(float)
     ;       implementation_defined(string).
diff --git a/library/term_io.m b/library/term_io.m
index 621343a..014282c 100644
--- a/library/term_io.m
+++ b/library/term_io.m
@@ -179,6 +179,10 @@
 
 :- interface.
 
+    % Return the prefix for integer literals of the given base.
+    %
+:- func integer_literal_base_prefix(int) = string.
+
     % Convert a character to the corresponding octal escape code.
     %
     % We use ISO-Prolog style octal escapes, which are of the form '\nnn\';
@@ -247,6 +251,7 @@
 :- import_module lexer.
 :- import_module list.
 :- import_module parser.
+:- import_module require.
 :- import_module string.
 :- import_module stream.string_writer.
 
@@ -543,6 +548,9 @@ term_io.write_constant(Const, !IO) :-
 
 term_io.write_constant(term.integer(I), _, !IO) :-
     io.write_int(I, !IO).
+term_io.write_constant(term.big_integer(Base, S), _, !IO) :-
+    io.write_string(integer_literal_base_prefix(Base), !IO),
+    io.write_string(S, !IO).
 term_io.write_constant(term.float(F), _, !IO) :-
     io.write_float(F, !IO).
 term_io.write_constant(term.atom(A), NextToGraphicToken, !IO) :-
@@ -560,6 +568,8 @@ term_io.format_constant(Const) =
 
 term_io.format_constant_agt(term.integer(I), _) =
     string.int_to_string(I).
+term_io.format_constant_agt(term.big_integer(Base, S), _) =
+    integer_literal_base_prefix(Base) ++ S.
 term_io.format_constant_agt(term.float(F), _) =
     string.float_to_string(F).
 term_io.format_constant_agt(term.atom(A), NextToGraphicToken) =
@@ -569,6 +579,19 @@ term_io.format_constant_agt(term.string(S), _) =
 term_io.format_constant_agt(term.implementation_defined(N), _) =
     "$" ++ N.
 
+integer_literal_base_prefix(Base) = Prefix :-
+    ( Base = 10 ->
+        Prefix = ""
+    ; Base = 16 ->
+        Prefix = "0x"
+    ; Base = 8 ->
+        Prefix = "0o"
+    ; Base = 2 ->
+        Prefix = "0b"
+    ;
+        unexpected($module, $pred, "unsupported base")
+    ).
+
 %---------------------------------------------------------------------------%
 
 term_io.quote_char(C, !IO) :-
diff --git a/tests/general/test_string_to_int_overflow.exp b/tests/general/test_string_to_int_overflow.exp
index 765d20a..e63fd63 100644
--- a/tests/general/test_string_to_int_overflow.exp
+++ b/tests/general/test_string_to_int_overflow.exp
@@ -1 +1,34 @@
-[yes(999), no, yes(-1), yes(999)]
+999
+999
+no
+no
+--------
+2147483647
+no
+-2147483648
+no
+--------
+no
+no
+no
+no
+--------
+2147483647
+no
+-2147483648
+no
+--------
+no
+no
+no
+no
+--------
+2147483647
+no
+-2147483648
+no
+--------
+no
+no
+no
+no
diff --git a/tests/general/test_string_to_int_overflow.exp2 b/tests/general/test_string_to_int_overflow.exp2
index d7616f0..0396465 100644
--- a/tests/general/test_string_to_int_overflow.exp2
+++ b/tests/general/test_string_to_int_overflow.exp2
@@ -1 +1,34 @@
-[yes(999), yes(99999999999999999999), yes(1099511627775), yes(999)]
+999
+999
+no
+no
+--------
+2147483647
+2147483648
+-2147483648
+-2147483649
+--------
+9223372036854775807
+no
+-9223372036854775808
+no
+--------
+2147483647
+2147483648
+-2147483648
+-2147483649
+--------
+9223372036854775807
+no
+-9223372036854775808
+no
+--------
+2147483647
+2147483648
+-2147483648
+-2147483649
+--------
+9223372036854775807
+no
+-9223372036854775808
+no
diff --git a/tests/general/test_string_to_int_overflow.exp3 b/tests/general/test_string_to_int_overflow.exp3
deleted file mode 100644
index c8e6e1d..0000000
--- a/tests/general/test_string_to_int_overflow.exp3
+++ /dev/null
@@ -1 +0,0 @@
-[yes(999), no, yes(1099511627775), yes(999)]
diff --git a/tests/general/test_string_to_int_overflow.m b/tests/general/test_string_to_int_overflow.m
index 120d56f..a3c33db 100644
--- a/tests/general/test_string_to_int_overflow.m
+++ b/tests/general/test_string_to_int_overflow.m
@@ -1,10 +1,4 @@
-%-----------------------------------------------------------------------------%
-% test_string_to_int_overflow.m
-% Ralph Becket <rafe at csse.unimelb.edu.au>
-% Mon Feb  2 13:29:05 EST 2009
 % vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
-%
-%-----------------------------------------------------------------------------%
 
 :- module test_string_to_int_overflow.
 
@@ -12,8 +6,6 @@
 
 :- import_module io.
 
-
-
 :- pred main(io::di, io::uo) is det.
 
 %-----------------------------------------------------------------------------%
@@ -28,14 +20,68 @@
 %-----------------------------------------------------------------------------%
 
 main(!IO) :-
-    Xs = [
-        ( if string.to_int("999", I0) then yes(I0) else no),
-        ( if string.to_int("99999999999999999999", I1) then yes(I1) else no ),
-        ( if base_string_to_int(16, "ffffffffff", I2) then yes(I2) else no ),
-        ( if base_string_to_int(10, "999", I3) then yes(I3) else no )
-    ],
-    io.print(Xs, !IO),
-    io.nl(!IO).
+    test(string.to_int("999"), !IO),
+    test(base_string_to_int(10, "999"), !IO),
+    test(string.to_int("99999999999999999999"), !IO),
+    test(string.to_int("-99999999999999999999"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(10, "2147483647"), !IO),
+    test(base_string_to_int(10, "2147483648"), !IO),
+    test(base_string_to_int(10, "-2147483648"), !IO),
+    test(base_string_to_int(10, "-2147483649"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(10, "9223372036854775807"), !IO),
+    test(base_string_to_int(10, "9223372036854775808"), !IO),
+    test(base_string_to_int(10, "-9223372036854775808"), !IO),
+    test(base_string_to_int(10, "-9223372036854775809"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(16, "7fffffff"), !IO),
+    test(base_string_to_int(16, "80000000"), !IO),
+    test(base_string_to_int(16, "-80000000"), !IO),
+    test(base_string_to_int(16, "-80000001"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(16, "7fffffffffffffff"), !IO),
+    test(base_string_to_int(16, "8000000000000000"), !IO),
+    test(base_string_to_int(16, "-8000000000000000"), !IO),
+    test(base_string_to_int(16, "-8000000000000001"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(36, "ZIK0ZJ"), !IO),
+    test(base_string_to_int(36, "ZIK0ZK"), !IO),
+    test(base_string_to_int(36, "-ZIK0ZK"), !IO),
+    test(base_string_to_int(36, "-ZIK0ZL"), !IO),
+
+    line(!IO),
+
+    test(base_string_to_int(36, "1Y2P0IJ32E8E7"), !IO),
+    test(base_string_to_int(36, "1Y2P0IJ32E8E8"), !IO),
+    test(base_string_to_int(36, "-1Y2P0IJ32E8E8"), !IO),
+    test(base_string_to_int(36, "-1Y2P0IJ32E8E9"), !IO).
+
+:- pred test(pred(T), io, io).
+:- mode test(pred(out) is semidet, di, uo) is det.
+
+test(P, !IO) :-
+    ( P(X) ->
+        io.write(X, !IO),
+        io.nl(!IO)
+    ;
+        io.write_string("no\n", !IO)
+    ).
+
+:- pred line(io::di, io::uo) is det.
+
+line(!IO) :-
+    io.write_string("--------\n", !IO).
 
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
diff --git a/tests/hard_coded/lexer_bigint.exp b/tests/hard_coded/lexer_bigint.exp
index 05c1843..69e4867 100644
--- a/tests/hard_coded/lexer_bigint.exp
+++ b/tests/hard_coded/lexer_bigint.exp
@@ -1,51 +1,51 @@
 integer(2147483646)
 integer(2147483647)
-big_integer("2147483648")
+big_integer(10, "2147483648")
 name("-")
 integer(2147483647)
 name("-")
-big_integer("2147483648")
+big_integer(10, "2147483648")
 name("-")
-big_integer("2147483649")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("9223372036854775807")
-big_integer("9223372036854775808")
-big_integer("9223372036854775809")
+big_integer(10, "2147483649")
+big_integer(2, "11111111111111111111111111111111")
+big_integer(8, "37777777777")
+big_integer(16, "ffffffff")
+big_integer(10, "9223372036854775807")
+big_integer(10, "9223372036854775808")
+big_integer(10, "9223372036854775809")
 name("-")
-big_integer("9223372036854775807")
+big_integer(10, "9223372036854775807")
 name("-")
-big_integer("9223372036854775808")
+big_integer(10, "9223372036854775808")
 name("-")
-big_integer("9223372036854775809")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("999999999999999999999999987654321")
+big_integer(10, "9223372036854775809")
+big_integer(2, "1111111111111111111111111111111111111111111111111111111111111111")
+big_integer(8, "1777777777777777777777")
+big_integer(16, "ffffffffffffffff")
+big_integer(10, "999999999999999999999999987654321")
 
 integer(2147483646)
 integer(2147483647)
-big_integer("2147483648")
+big_integer(10, "2147483648")
 name("-")
 integer(2147483647)
 name("-")
-big_integer("2147483648")
+big_integer(10, "2147483648")
 name("-")
-big_integer("2147483649")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("9223372036854775807")
-big_integer("9223372036854775808")
-big_integer("9223372036854775809")
+big_integer(10, "2147483649")
+big_integer(2, "11111111111111111111111111111111")
+big_integer(8, "37777777777")
+big_integer(16, "ffffffff")
+big_integer(10, "9223372036854775807")
+big_integer(10, "9223372036854775808")
+big_integer(10, "9223372036854775809")
 name("-")
-big_integer("9223372036854775807")
+big_integer(10, "9223372036854775807")
 name("-")
-big_integer("9223372036854775808")
+big_integer(10, "9223372036854775808")
 name("-")
-big_integer("9223372036854775809")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("999999999999999999999999987654321")
+big_integer(10, "9223372036854775809")
+big_integer(2, "1111111111111111111111111111111111111111111111111111111111111111")
+big_integer(8, "1777777777777777777777")
+big_integer(16, "ffffffffffffffff")
+big_integer(10, "999999999999999999999999987654321")
diff --git a/tests/hard_coded/lexer_bigint.exp2 b/tests/hard_coded/lexer_bigint.exp2
index fd58a93..fba7130 100644
--- a/tests/hard_coded/lexer_bigint.exp2
+++ b/tests/hard_coded/lexer_bigint.exp2
@@ -11,18 +11,18 @@ integer(4294967295)
 integer(4294967295)
 integer(4294967295)
 integer(9223372036854775807)
-big_integer("9223372036854775808")
-big_integer("9223372036854775809")
+big_integer(10, "9223372036854775808")
+big_integer(10, "9223372036854775809")
 name("-")
 integer(9223372036854775807)
 name("-")
-big_integer("9223372036854775808")
+big_integer(10, "9223372036854775808")
 name("-")
-big_integer("9223372036854775809")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("999999999999999999999999987654321")
+big_integer(10, "9223372036854775809")
+big_integer(2, "1111111111111111111111111111111111111111111111111111111111111111")
+big_integer(8, "1777777777777777777777")
+big_integer(16, "ffffffffffffffff")
+big_integer(10, "999999999999999999999999987654321")
 
 integer(2147483646)
 integer(2147483647)
@@ -37,15 +37,15 @@ integer(4294967295)
 integer(4294967295)
 integer(4294967295)
 integer(9223372036854775807)
-big_integer("9223372036854775808")
-big_integer("9223372036854775809")
+big_integer(10, "9223372036854775808")
+big_integer(10, "9223372036854775809")
 name("-")
 integer(9223372036854775807)
 name("-")
-big_integer("9223372036854775808")
+big_integer(10, "9223372036854775808")
 name("-")
-big_integer("9223372036854775809")
-integer(-1)
-integer(-1)
-integer(-1)
-big_integer("999999999999999999999999987654321")
+big_integer(10, "9223372036854775809")
+big_integer(2, "1111111111111111111111111111111111111111111111111111111111111111")
+big_integer(8, "1777777777777777777777")
+big_integer(16, "ffffffffffffffff")
+big_integer(10, "999999999999999999999999987654321")
diff --git a/tests/hard_coded/read_min_int.exp b/tests/hard_coded/read_min_int.exp
index 86226d3..a264b69 100644
--- a/tests/hard_coded/read_min_int.exp
+++ b/tests/hard_coded/read_min_int.exp
@@ -1,13 +1,13 @@
 foo(-2147483648)
 foo(2147483647)
-Syntax error at big integer `2147483648': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775807': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
 
 foo(-2147483648)
 foo(2147483647)
-Syntax error at big integer `2147483648': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775807': unexpected token at start of (sub)term
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
+io.read: the term read did not have the right type
diff --git a/tests/hard_coded/read_min_int.exp2 b/tests/hard_coded/read_min_int.exp2
index 0f63cc5..6b71ec4 100644
--- a/tests/hard_coded/read_min_int.exp2
+++ b/tests/hard_coded/read_min_int.exp2
@@ -3,11 +3,11 @@ foo(2147483647)
 foo(2147483648)
 foo(-9223372036854775808)
 foo(9223372036854775807)
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
+io.read: the term read did not have the right type
 
 foo(-2147483648)
 foo(2147483647)
 foo(2147483648)
 foo(-9223372036854775808)
 foo(9223372036854775807)
-Syntax error at big integer `9223372036854775808': unexpected token at start of (sub)term
+io.read: the term read did not have the right type
diff --git a/tests/invalid/Mmakefile b/tests/invalid/Mmakefile
index 2645d49..613474f 100644
--- a/tests/invalid/Mmakefile
+++ b/tests/invalid/Mmakefile
@@ -132,6 +132,7 @@ SINGLEMODULE= \
 	instance_bug \
 	instance_no_type \
 	instance_var_bug \
+	invalid_int \
 	invalid_event \
 	invalid_export_detism \
 	invalid_instance_declarations \
diff --git a/tests/invalid/invalid_int.err_exp b/tests/invalid/invalid_int.err_exp
new file mode 100644
index 0000000..5b9499d
--- /dev/null
+++ b/tests/invalid/invalid_int.err_exp
@@ -0,0 +1,18 @@
+invalid_int.m:019: Error: integer literal is too big
+invalid_int.m:019:   `0b100000000000000000000000000000000'.
+invalid_int.m:020: Error: integer literal is too big
+invalid_int.m:020:   `0b1111111111111111111111111111111111111111111111111111111111111111'.
+invalid_int.m:021: Error: integer literal is too big
+invalid_int.m:021:   `0b10000000000000000000000000000000000000000000000000000000000000000'.
+invalid_int.m:024: Error: integer literal is too big `0o40000000000'.
+invalid_int.m:025: Error: integer literal is too big
+invalid_int.m:025:   `0o1777777777777777777777'.
+invalid_int.m:026: Error: integer literal is too big
+invalid_int.m:026:   `0o2000000000000000000000'.
+invalid_int.m:029: Error: integer literal is too big `0x100000000'.
+invalid_int.m:030: Error: integer literal is too big `0x110000000'.
+invalid_int.m:031: Error: integer literal is too big `0xffffffffffffffff'.
+invalid_int.m:032: Error: integer literal is too big `0x10000000000000000'.
+invalid_int.m:035: Error: integer literal is too big `2147483648'.
+invalid_int.m:036: Error: integer literal is too big `9223372036854775807'.
+invalid_int.m:037: Error: integer literal is too big `9223372036854775808'.
diff --git a/tests/invalid/invalid_int.err_exp2 b/tests/invalid/invalid_int.err_exp2
new file mode 100644
index 0000000..4bf4dc0
--- /dev/null
+++ b/tests/invalid/invalid_int.err_exp2
@@ -0,0 +1,6 @@
+invalid_int.m:021: Error: integer literal is too big
+invalid_int.m:021:   `0b10000000000000000000000000000000000000000000000000000000000000000'.
+invalid_int.m:026: Error: integer literal is too big
+invalid_int.m:026:   `0o2000000000000000000000'.
+invalid_int.m:032: Error: integer literal is too big `0x10000000000000000'.
+invalid_int.m:037: Error: integer literal is too big `9223372036854775808'.
diff --git a/tests/invalid/invalid_int.m b/tests/invalid/invalid_int.m
new file mode 100644
index 0000000..daf772f
--- /dev/null
+++ b/tests/invalid/invalid_int.m
@@ -0,0 +1,42 @@
+%-----------------------------------------------------------------------------%
+
+:- module invalid_int.
+:- interface.
+
+:- import_module io.
+
+:- pred main(io::di, io::uo) is det.
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+
+:- implementation.
+:- import_module list.
+
+main(!IO) :-
+    X = {
+        0b11111111111111111111111111111111,
+        0b100000000000000000000000000000000,
+        0b1111111111111111111111111111111111111111111111111111111111111111,
+        0b10000000000000000000000000000000000000000000000000000000000000000,
+
+        0o37777777777,
+        0o40000000000,
+        0o1777777777777777777777,
+        0o2000000000000000000000,
+
+        0xffffffff,
+        0x100000000,
+        0x110000000,
+        0xffffffffffffffff,
+        0x10000000000000000,
+
+        2147483647,
+        2147483648,
+        9223372036854775807,
+        9223372036854775808
+    },
+    io.write(X, !IO).
+
+%-----------------------------------------------------------------------------%
+% vim: ft=mercury ts=4 sts=4 sw=4 et
-- 
2.1.2




More information about the reviews mailing list