[m-dev.] a problem with trie string switches and string well-formedness
Zoltan Somogyi
zoltan.somogyi at runbox.com
Sat Mar 23 13:31:37 AEDT 2024
On 2024-03-23 13:21 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
> Yes, if the compiler uses Mercury strings to represent potentially
> non-well-formed sequences then we have to be a little bit careful
> (in case other parts of the compiler expect only well formed strings),
> or else use a different type, such as list(int), to represent such
> sequences.
The place where we want to put these not-necessarily-well-formed strings,
the arguments of the offset_str_eq operator, are processed by *very* few
parts of the compiler. I think documenting this property of the argument,
and looking over those compiler components, should be enough.
>> However, I don't know as much about i18n as you guys.
>>
>> My opinion is that the best way forward is this:
>>
>> - stop insisting that the operand we give to strcmp now, and strncmp soon,
>> are well formed. This requires a version of from_code_unit_list_in_encoding
>> that does not check well-formedness.
>
> Where do we insist that of strcmp/strncmp?
We don't insist on it in general. We insist on it in this specific
use case, with the code starting on line 503 in ml_string_switch.m.
I will add a version of from_code_unit_list_in_encoding that does not
insist on well-formedness, and switch to using that in that code.
>> - renaming the offset_str_eq operation to something that reflects the fact that
>> it operates on code unit sequences, i.e. on strings that may not be well formed.
>> offset_nwf_str_eq? offset_cus_eq? Maybe something else?
>
> The other _str operations compare arbitrary code units sequences
> already, so it just needs to be documented as such. I don't see any need
> to rename the operations.
Ok, will do. Thanks for the info.
Zoltan.
More information about the developers
mailing list