[m-rev.] for review: add more character classification predicates

Julien Fischer jfischer at opturion.com
Tue Jul 25 17:15:53 AEST 2017


Hi Peter,

On Tue, 25 Jul 2017, Peter Wang wrote:

> On Mon, 24 Jul 2017 21:51:20 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
>>
>> +    % True iff the character is a Unicode Control code point, that is a code
>> +    % point in General Category `Other,control' (`Cc').
>> +    %
>> +:- pred is_control(char::in) is semidet.
>> +
>> +    % True iff the character is a Unicode Space Separator code point, that is a
>> +    % code point in General Category `Separator,space' (`Zs').
>> +    %
>> +:- pred is_space_separator(char::in) is semidet.
>
> Is this category subject to extension? Maybe mention the Unicode
> version.

I think it's unlikely, however ... I suggest we have a blanket comment
at the head of the module along the lines of:

      All predicates and functions exported by this module that deal with
      Unicode conform to version 10 of the Unicode standard.

(It may become more important if we suddenly add lots of support for
emoji to the standard library ;-) )

>> +    % True iff the character  is a Unicode Line Separator code point, that is a
>> +    % code point in General Category `Separator,line' (`Zl').
>> +    %
>> +:- pred is_line_separator(char::in) is semidet.
>> +
>> +    % True iff the character is a Unicode Paragraph Separator code point, that
>> +    % is a code point in General Category `Separator,paragraph' (`Zp').
>> +    %
>> +:- pred is_paragraph_separator(char::in) is semidet.
>> +
>> +    % True iff the character is a Unicode Private-use code point, that is a
>> +    % code point in General Category `Other,private use' (`Co').
>> +    %
>> +:- pred is_private_use(char::in) is semidet.
>> +
>>   %---------------------------------------------------------------------------%
>>
>>       % Convert a char to a pretty_printer.doc for formatting.
>> @@ -324,7 +350,7 @@
>>   :- pred int_to_hex_char(int, char).
>>   :- mode int_to_hex_char(in, out) is semidet.
>>
>> -    % Succeeds if char is a decimal digit (0-9) or letter (a-z or A-Z).
>> +    % True iff the characters  is a decimal digit (0-9) or letter (a-z or A-Z).
>
> Double space.

Fixed.

>>       % Returns the character's value as a digit (0-9 or 10-35).
>>       %
>>   :- pragma obsolete(digit_to_int/2).
>> @@ -1019,6 +1045,36 @@ is_noncharacter(Char) :-
>>       ; Int /\ 0xfffe = 0xfffe
>>       ).
>>
>> +is_control(Char) :-
>> +    Int = char.to_int(Char),
>> +    ( 0x0000 =< Int, Int =< 0x001f
>> +    ; 0x007f =< Int, Int =< 0x009f
>> +    ).
>> +
>> +is_space_separator(Char) :-
>> +    Int = char.to_int(Char),
>> +    ( Int = 0x0020
>> +    ; Int = 0x00a0
>> +    ; Int = 0x1680
>> +    ; Int =< 0x2000, Int =< 0x200a
>
> That's not right.

Fixed.

Thanks for that.

Julien.


More information about the reviews mailing list