[m-rev.] for post-commit review: improve string.m
Zoltan Somogyi
zoltan.somogyi at runbox.com
Mon Mar 25 17:56:19 AEDT 2024
On 2024-03-25 12:35 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
>> -from_utf8_code_unit_list(CodeList, String) :-
>> +from_utf8_code_unit_list(CodeUnits, String) :-
>> ( if internal_encoding_is_utf8 then
>> - from_code_unit_list(CodeList, String)
>> + from_code_unit_list(CodeUnits, String)
>> else
>> - decode_utf8(CodeList, [], RevChars),
>> + decode_utf8(CodeUnits, [], RevChars),
>> + % XXX This checks whether RevChars represents a well-formed string.
>> + % Why? The call to decode_utf8 should have ensured that already.
>> semidet_from_rev_char_list(RevChars, String)
>> ).
>
> decode_utf8 doesn't reject code points in the Unicode surrogate range
> (D800-DFFF, strictly speaking not valid to encode in UTF-8),
> which ARE rejected by semidet_from_rev_char_list.
> The change to semidet_from_rev_char_list was made later.
Ok, I have modified acc_rev_chars_from_utf8_code_units to reject surrogates.
The diff is attached for your review.
> If decode_utf8 (acc_rev_chars_from_utf8_code_units) detected surrogates,
> then there could be a variant of semidet_from_rev_char_list that assumes
> validity. I think it would not be worthwhile.
Agreed.
Thanks for both this review, and the module qual catch.
Zoltan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Log.sc
Type: application/octet-stream
Size: 533 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20240325/f1888805/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFF.sc
Type: application/octet-stream
Size: 2894 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20240325/f1888805/attachment-0001.obj>
More information about the reviews
mailing list