[m-rev.] for review: Make string.all_match fail on UTF-8 string containing ill-formed sequence.

Mark Brown mark at mercurylang.org
Wed Oct 30 19:18:13 AEDT 2019


This looks fine.

On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Make all_match(Pred, String) always fail if the string contains an
>     ill-formed code unit sequence, and strings use UTF-8 encoding.
>     Such sequences do not contain any code points that could satisfy a
>     test on code points. Previously, all_match would call Pred(U+FFFD)
>     for every code unit in an ill-formed sequence.
>
>     Define all_match to rule out an interpretation that could ignore
>     ill-formed sequences.
> ---
>  library/string.m | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 3e9acf8c1..8d253e36b 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -527,8 +527,8 @@
>
>      % all_match(TestPred, String):
>      %
> -    % True if TestPred is true when applied to each character (code point) in
> -    % String or if String is the empty string.
> +    % True iff `String' is empty or contains only code points that satisfy
> +    % `TestPred'.
>      %
>  :- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.
>
> @@ -3201,10 +3201,6 @@ is_well_formed(_) :-
>  % For speed, most of these predicates have C versions as well as
>  % Mercury versions. XXX why not all?
>
> -% XXX ILSEQ Behaviour depends on target language.
> -% The generic versions use all_match which currently uses unsafe_index_next and
> -% ignores the first ill-formed sequence and everything thereafter.
> -
>  :- pragma foreign_proc("C",
>      is_all_alpha(S::in),
>      [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
> @@ -3347,9 +3343,6 @@ is_all_digits(S) :-
>
>  %---------------------%
>
> -% XXX ILSEQ all_match should fail if it encounters an ill-formed sequence;
> -% instead it acts as if the String ends there.
> -
>  all_match(P, String) :-
>      all_match_loop(P, String, 0).
>
> @@ -3357,7 +3350,8 @@ all_match(P, String) :-
>      int::in) is semidet.
>
>  all_match_loop(P, String, Cur) :-
> -    ( if unsafe_index_next(String, Cur, Next, Char) then
> +    ( if unsafe_index_next_repl(String, Cur, Next, Char, IsReplaced) then
> +        IsReplaced = no,
>          P(Char),
>          all_match_loop(P, String, Next)
>      else
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews


More information about the reviews mailing list