[m-rev.] for review: allow optional underscores in numeric literals

Paul Bone paul at bone.id.au
Wed Jan 11 10:31:19 AEDT 2017


On Mon, Jan 09, 2017 at 04:38:24PM +1100, Julien Fischer wrote:
> ------------------------------
> 
> Allow optional underscores in numeric literals.

Thanks Julien!  This is a feature that is rarely implemented by most
languages but is very very useful.

> Allow the optional use of underscores in numeric literals for the purpose of
> improving their readability (e.g. by grouping digits etc).  We allow any number
> of underscores between digits and also between the radix prefix (if present) and
> the initial digit.  (When integer type suffixes are supported we will also
> allow them to be preceded by any number of underscores.)  The following are
> *not* allowed:
> 
>    1. Leading underscores.
>    2. Trailing underscores.
>    3. Underscores inside the components of a radix prefix (e.g.
>       0_xffff or 0__b101010.)
>    4. Underscores immediately adjacent to the decimal point in a float
>       literal (e.g. 123_._123.)
>    5. Underscores immediately adjacent to the exponent ('e' or 'E) in
>       a float literal (e.g. 123_e12 or 123E_12.)
>    6. Underscores immediately adjacent to the optional sign of an exponent
>       in a float literal (e.g. 123_+e12 or 123-_E12.)
>    7. Underscores between the optional sign of an exponent and the exponent
>       indicator (e.g. 123+_e12.)

These make sense.  There's no point in supporting them as they don't improve
readability or group digits.

Does this leave only underscores within sequences of digits or are there
other cases that we might also want to exclude?

> diff --git a/library/lexer.m b/library/lexer.m
> index c527ffd..fddd8aa 100644
> --- a/library/lexer.m
> +++ b/library/lexer.m
> @@ -1912,13 +1976,19 @@ get_zero(Stream, Token, !IO) :-
>          )
>      ).
>
> +:- type last_digit_is_underscore
> +    --->    last_digit_is_underscore
> +    ;       last_digit_is_not_underscore.
> +

Could you add a comment explaining the purpose of this type?  I
mistakeningly thought that the above code was saying what the last digit of
the entire token was, rather than saying what the last digit _that it had
seen_ was.  Thanks.

> diff --git a/library/string.m b/library/string.m
> index 1c4299a..cb70509 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -1377,6 +1377,16 @@
>  :- interface.
>  :- include_module format.
>  :- include_module parse_util.
> +
> +    % Exported for use by lexer.m (XXX perhaps it ought to be defined in
> +    % that module instead?)
> +    %
> +    % Like base_string_to_int/3, but allow for an arbitrary number of
> +    % underscores between the other characters.  Leading and trailing
> +    % underscores are allowed.
> +    %
> +:- pred base_string_to_int_underscore(int::in, string::in, int::out) is semidet.
> +
>  :- implementation.
> 
>  :- include_module parse_runtime.
> @@ -5476,6 +5486,97 @@ accumulate_negative_int(Base, Char, N0, N) :-
> 
>  %---------------------%
> 
> +base_string_to_int_underscore(Base, String, Int) :-
> +    index(String, 0, Char),
> +    End = count_code_units(String),
> +    ( if Char = ('-') then
> +        End > 1,
> +        foldl_between(base_negative_accumulator_underscore(Base), String, 1, End, 0, Int)

Why create extra code rather than doing as you did above and parse the
number as a positive number then multiply by -1?


> *.exp

I would prefer that we call these literals and update other docs or messages
either now or later.

Otherwise this change is good.  Thanks.


-- 
Paul Bone
http://paul.bone.id.au


More information about the reviews mailing list