[m-rev.] for review: Fix some handling of ill-formed sequences in string module.
Zoltan Somogyi
zoltan.somogyi at runbox.com
Wed Jul 27 14:47:48 AEST 2022
2022-07-27 14:28 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
>> >> Do we even need the Mercury implementation since there are foreign_proc
>> >> implementations for all three target languages?
>> >>
>> >
>> > Not really, but we have them for other predicates as well.
>>
>> Were those left over from when they were the definition in Erlang grades?
>>
>
> I thought we had pure Mercury implementations as a fallback where
> possible, which was useful for bringing up backends or maybe as a
> reference.
I don't think we have such a policy. If we did, it would be limited anyway,
because
- in the presence of a foreign_proc for the current backend, we don't check
whether the Mercury code compiles, and it may not, due to changes in
the predicates it calls, and
- if the code does compile, there is nothing to prevent its semantics
suffering bitrot if the foreign_procs for the same predicate are updated
but the Mercury version misses out.
> We can delete them if they serve no purpose.
I wouldn't go that far. They can be useful, *provided* that the limitations
above are kept in mind.
>> The NEWS update mentions two other predicates: why no update
>> to them?
>
> Do you mean (unsafe_)index_next_repl and (unsafe_)prev_index_repl?
Yes.
> % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
> %
> - % Like index_next/4 but MaybeReplaced is `replaced_code_unit(CodeUnit)'
> - % iff Char is the Unicode REPLACEMENT CHARACTER (U+FFFD) but String
> - % does NOT contain an encoding of U+FFFD beginning at Index;
> - % CodeUnit is the code unit at Index.
> - % (MaybeReplaced is always `not_replaced' when strings are UTF-16
> - % encoded.)
> + % Like index_next/4 but also returns MaybeReplaced on success.
> + % When Char is not U+FFFD then MaybeReplaced is always `not_replaced'.
I would add a comma after U+FFFD.
> + % When Char is U+FFFD then there are two cases:
I would add " (the Unicode replacement character), " after U+FFFD.
And the same for prev_index_repl.
The rest of the diff is all fine.
Zoltan.
More information about the reviews
mailing list