[m-rev.] for review: make characters an instance of the uenum typeclass

Julien Fischer jfischer at opturion.com
Tue Dec 20 19:36:32 AEDT 2022


Hi Peter,

On Tue, 20 Dec 2022, Peter Wang wrote:

>> diff --git a/NEWS b/NEWS
>> index a5322b4..913abd3 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -102,6 +102,18 @@ Changes to the Mercury standard library
>>       - func `promise_only_solution/1`
>>       - pred `promise_only_solution_io/4`
>>
>> +### Changes to the `char` module
>> +
>> +* The following type has had its typeclass memberships changed:
>> +
>> +    - The type `character` is now an instance of the new `uenum` typeclass.
>> +
>
> Just state this without the leadup?

It is consistent with how it has been done elsewhere in the NEWS file.

>> diff --git a/extras/lex/lex.m b/extras/lex/lex.m

...

>> @@ -720,10 +720,10 @@ read_from_string(Offset, Result, String, unsafe_promise_unique(String)) :-
>>           )
>>   ].
>>
>> -:- instance regexp(sparse_bitset(T)) <= (regexp(T),enum(T)) where [
>> +:- instance regexp(sparse_bitset(T)) <= (regexp(T),uenum(T)) where [
>>       re(SparseBitset) = charset(Charset) :-
>>           Charset = sparse_bitset.foldl(
>> -            func(Enum, Set0) = insert(Set0, char.det_from_int(to_int(Enum))),
>> +            func(Enum, Set0) = insert(Set0, char.det_from_uint(to_uint(Enum))),
>>               SparseBitset,
>>               sparse_bitset.init)
>>   ].
>
> (BTW, sparse_bitset is an inefficient representation for large charsets,
> like valid_unicode_chars. diet should be better.)

That's a separate change.

>> +:- pragma foreign_proc("C",
>> +    to_uint(Character::in) = (UInt::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
>> +        does_not_affect_liveness],
>> +"
>> +    UInt = (MR_UnsignedChar) Character;
>> +").
>> +
>> +:- pragma foreign_proc("C#",
>> +    to_uint(Character::in) = (UInt::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe],
>> +"
>> +    UInt = (uint) Character;
>> +").
>> +
>> +:- pragma foreign_proc("Java",
>> +    to_uint(Character::in) = (UInt::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe],
>> +"
>> +    UInt = Character;
>> +").
>> +
>> +:- pragma foreign_proc("C",
>> +    from_uint(UInt::in, Character::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
>> +        does_not_affect_liveness],
>> +"
>> +    Character = (MR_UnsignedChar) UInt;
>> +    SUCCESS_INDICATOR = (UInt <= 0x10ffff);
>> +").
>> +
>> +:- pragma foreign_proc("C#",
>> +    from_uint(UInt::in, Character::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe],
>> +"
>> +    Character = (int) UInt;
>> +    SUCCESS_INDICATOR = (UInt <= 0x10ffff);
>> +").
>> +
>> +:- pragma foreign_proc("Java",
>> +    from_uint(UInt::in, Character::out),
>> +    [will_not_call_mercury, promise_pure, thread_safe],
>> +"
>> +    Character = UInt;
>> +    SUCCESS_INDICATOR = ((UInt & 0xffffffffL) <= (0x10ffff & 0xffffffffL));
>> +").
>
> Do we need the foreign procs, or can we cast to/from int?

We can avoid the foreign_procs in the to_uint direction; I have replaced
them. Using foreign_procs for the from_uint direction requires less
comparisons for those targets that support unsigned integers directly.

>> diff --git a/tests/hard_coded/char_uint_conv.m b/tests/hard_coded/char_uint_conv.m
>> index e69de29..bc59026 100644
>> --- a/tests/hard_coded/char_uint_conv.m
>> +++ b/tests/hard_coded/char_uint_conv.m
>> +    char.det_from_int(0x1fb00), % BLOCK SEXTANT-1
>> +    char.det_from_int(0x1fbf9), % SEGEMENTED DIGIT NINE
>> +
>> +    % CJK Unified Idenographs Extension B
>
> Ideographs

Fixed.

> That looks fine, otherwise.

Thanks.

Julien.


More information about the reviews mailing list