[m-rev.] for review: allow optional underscores in numeric literals

Julien Fischer jfischer at opturion.com
Wed Jan 11 14:07:55 AEDT 2017


Hi Paul,

On Wed, 11 Jan 2017, Paul Bone wrote:

> On Mon, Jan 09, 2017 at 04:38:24PM +1100, Julien Fischer wrote:
>> ------------------------------
>>
>> Allow optional underscores in numeric literals.
>
> Thanks Julien!  This is a feature that is rarely implemented by most
> languages but is very very useful.

Quite a few languages provide something of the sort.  (The proposal
for adding this to Python give a summary of the various ways this
has been done for a number of languages
<https://www.python.org/dev/peps/pep-0515/>.)

>> Allow the optional use of underscores in numeric literals for the purpose of
>> improving their readability (e.g. by grouping digits etc).  We allow any number
>> of underscores between digits and also between the radix prefix (if present) and
>> the initial digit.  (When integer type suffixes are supported we will also
>> allow them to be preceded by any number of underscores.)  The following are
>> *not* allowed:
>>
>>    1. Leading underscores.
>>    2. Trailing underscores.
>>    3. Underscores inside the components of a radix prefix (e.g.
>>       0_xffff or 0__b101010.)
>>    4. Underscores immediately adjacent to the decimal point in a float
>>       literal (e.g. 123_._123.)
>>    5. Underscores immediately adjacent to the exponent ('e' or 'E) in
>>       a float literal (e.g. 123_e12 or 123E_12.)
>>    6. Underscores immediately adjacent to the optional sign of an exponent
>>       in a float literal (e.g. 123_+e12 or 123-_E12.)
>>    7. Underscores between the optional sign of an exponent and the exponent
>>       indicator (e.g. 123+_e12.)
>
> These make sense.  There's no point in supporting them as they don't improve
> readability or group digits.
>
> Does this leave only underscores within sequences of digits or are there
> other cases that we might also want to exclude?

It leaves (1) underscores within sequences of digits, (2) between a
radix prefix and sequence of digits and eventally (3) between the
last digit and a suffix.

I think (2) and (3) are worth having where, for example, you have
complicated groupings of bits within binary literals.

>> diff --git a/library/lexer.m b/library/lexer.m
>> index c527ffd..fddd8aa 100644
>> --- a/library/lexer.m
>> +++ b/library/lexer.m
>> @@ -1912,13 +1976,19 @@ get_zero(Stream, Token, !IO) :-
>>          )
>>      ).
>>
>> +:- type last_digit_is_underscore
>> +    --->    last_digit_is_underscore
>> +    ;       last_digit_is_not_underscore.
>> +
>
> Could you add a comment explaining the purpose of this type?  I
> mistakeningly thought that the above code was saying what the last digit of
> the entire token was, rather than saying what the last digit _that it had
> seen_ was.  Thanks.

Ok, how's this:

      This type records whether the last digit seen by the lexer as it
      processes a numeric token was an underscore or not.
      Note that there may be other intervening characters in the token
      between the last digit and the current one (e.g. the decimal point
      in a float literal.)

>> diff --git a/library/string.m b/library/string.m
>> index 1c4299a..cb70509 100644
>> --- a/library/string.m
>> +++ b/library/string.m
>> @@ -1377,6 +1377,16 @@
>>  :- interface.
>>  :- include_module format.
>>  :- include_module parse_util.
>> +
>> +    % Exported for use by lexer.m (XXX perhaps it ought to be defined in
>> +    % that module instead?)
>> +    %
>> +    % Like base_string_to_int/3, but allow for an arbitrary number of
>> +    % underscores between the other characters.  Leading and trailing
>> +    % underscores are allowed.
>> +    %
>> +:- pred base_string_to_int_underscore(int::in, string::in, int::out) is semidet.
>> +
>>  :- implementation.
>>
>>  :- include_module parse_runtime.
>> @@ -5476,6 +5486,97 @@ accumulate_negative_int(Base, Char, N0, N) :-
>>
>>  %---------------------%
>>
>> +base_string_to_int_underscore(Base, String, Int) :-
>> +    index(String, 0, Char),
>> +    End = count_code_units(String),
>> +    ( if Char = ('-') then
>> +        End > 1,
>> +        foldl_between(base_negative_accumulator_underscore(Base), String, 1, End, 0, Int)
>
> Why create extra code rather than doing as you did above and parse the
> number as a positive number then multiply by -1?

(1) I just cut-and-pasted-and-modfied the existing code for
base_string_to_int/3 and (2) you need to fail on overflow and the test
for that is different in the negative case.

>> *.exp
>
> I would prefer that we call these literals and update other docs or messages
> either now or later.

I'm fine with "literals" as well.  I will update all the error messages
as part of the change that provides the user facing documentation.

> Otherwise this change is good.  Thanks.

Thanks for the review.

Julien.


More information about the reviews mailing list