[m-rev.] for review: Fix some handling of ill-formed sequences in string module.

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Jul 27 14:47:48 AEST 2022


2022-07-27 14:28 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
>> >> Do we even need the Mercury implementation since there are foreign_proc
>> >> implementations for all three target languages?
>> >> 
>> > 
>> > Not really, but we have them for other predicates as well.
>> 
>> Were those left over from when they were the definition in Erlang grades?
>> 
> 
> I thought we had pure Mercury implementations as a fallback where
> possible, which was useful for bringing up backends or maybe as a
> reference.

I don't think we have such a policy. If we did, it would be limited anyway,
because

- in the presence of a foreign_proc for the current backend, we don't check
  whether the Mercury code compiles, and it may not, due to changes in
  the predicates it calls, and

- if the code does compile, there is nothing to prevent its semantics
  suffering bitrot if the foreign_procs for the same predicate are updated
  but the Mercury version misses out.

> We can delete them if they serve no purpose.

I wouldn't go that far. They can be useful, *provided* that the limitations
above are kept in mind.
 
>> The NEWS update mentions two other predicates: why no update
>> to them?
> 
> Do you mean (unsafe_)index_next_repl and (unsafe_)prev_index_repl?

Yes.

>      % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
>      %
> -    % Like index_next/4 but MaybeReplaced is `replaced_code_unit(CodeUnit)'
> -    % iff Char is the Unicode REPLACEMENT CHARACTER (U+FFFD) but String
> -    % does NOT contain an encoding of U+FFFD beginning at Index;
> -    % CodeUnit is the code unit at Index.
> -    % (MaybeReplaced is always `not_replaced' when strings are UTF-16
> -    % encoded.)
> +    % Like index_next/4 but also returns MaybeReplaced on success.
> +    % When Char is not U+FFFD then MaybeReplaced is always `not_replaced'.

I would add a comma after U+FFFD.

> +    % When Char is U+FFFD then there are two cases:

I would add " (the Unicode replacement character), " after U+FFFD.

And the same for prev_index_repl.

The rest of the diff is all fine.

Zoltan.


More information about the reviews mailing list