[m-rev.] for review: Fix two bugs in string.contains_char.

Mark Brown mark at mercurylang.org
Wed Oct 30 19:13:58 AEDT 2019


This looks fine.

On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Fix C implementation of contains_char to fail when asked to test for
>     a surrogate code point in a string. It previously would (always)
>     succeed, which is a bug.
>
>     Fix generic implementation so that contains_char(String, '\uFFFD')
>     will not succeed just because String contains an ill-formed sequence
>     (in UTF-8 grades).
>
>     Delete obsolete comment.
> ---
>  library/string.m | 34 ++++++++++++++--------------------
>  1 file changed, 14 insertions(+), 20 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 950146233..380a03f11 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -3366,13 +3366,6 @@ all_match_loop(P, String, Cur) :-
>
>  %---------------------%
>
> -% XXX ILSEQ Behaviour depends on target language.
> -%  - C/C#/Java: ill-formed sequences don't prevent matching of later chars
> -%  - generic: unsafe_index_next stops at first ill-formed sequence
> -
> -    % strchr always returns true when searching for '\0',
> -    % but the '\0' is an implementation detail which really
> -    % shouldn't be considered to be part of the string itself.
>  :- pragma foreign_proc("C",
>      contains_char(Str::in, Ch::in),
>      [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
> @@ -3382,12 +3375,13 @@ all_match_loop(P, String, Cur) :-
>      size_t  len;
>      if (MR_is_ascii(Ch)) {
>          // Fast path.
> -        SUCCESS_INDICATOR = (strchr(Str, Ch) != NULL) && Ch != '\\0';
> +        // strchr always returns true when searching for NUL,
> +        // but the NUL is not part of the string itself.
> +        SUCCESS_INDICATOR = (Ch != '\\0') && (strchr(Str, Ch) != NULL);
>      } else {
> -        // XXX ILSEQ Handle Ch being surrogate or invalid valid.
>          len = MR_utf8_encode(buf, Ch);
>          buf[len] = '\\0';
> -        SUCCESS_INDICATOR = (strstr(Str, buf) != NULL);
> +        SUCCESS_INDICATOR = (len > 0) && (strstr(Str, buf) != NULL);
>      }
>  ").
>  :- pragma foreign_proc("C#",
> @@ -3410,19 +3404,19 @@ all_match_loop(P, String, Cur) :-
>  ").
>
>  contains_char(String, Char) :-
> -    contains_char(String, Char, 0).
> +    contains_char_loop(String, Char, 0).
>
> -:- pred contains_char(string::in, char::in, int::in) is semidet.
> +:- pred contains_char_loop(string::in, char::in, int::in) is semidet.
>
> -contains_char(Str, Char, I) :-
> -    ( if unsafe_index_next(Str, I, J, IndexChar) then
> -        ( if IndexChar = Char then
> -            true
> -        else
> -            contains_char(Str, Char, J)
> -        )
> +contains_char_loop(Str, Char, I) :-
> +    unsafe_index_next_repl(Str, I, J, IndexChar, IsReplaced),
> +    ( if
> +        IndexChar = Char,
> +        IsReplaced = no
> +    then
> +        true
>      else
> -        fail
> +        contains_char_loop(Str, Char, J)
>      ).
>
>  %---------------------%
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews


More information about the reviews mailing list