[m-rev.] for review: Make string.words_separator skip ill-formed sequences in UTF-8.

Mark Brown mark at mercurylang.org
Wed Oct 30 19:22:32 AEDT 2019


This looks fine.

On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Make words_separator never consider ill-formed sequences in UTF-8
>     strings as potential separators, as they cannot contain any code
>     points that could satisfy any given SepP predicate on code points.
>     Previously, words_separator would call SepP(U+FFFD) for every code
>     unit in an ill-formed sequence.
> ---
>  library/string.m | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 8d253e36b..75dd54abd 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -4174,8 +4174,6 @@ unsafe_substring(Str, Start, Count, SubString) :-
>
>  %---------------------%
>
> -% XXX ILSEQ unsafe_index_next causes truncation at first ill-formed sequence.
> -
>  words_separator(SepP, String) = Words :-
>      skip_to_next_word_start(SepP, String, 0, WordStart),
>      words_loop(SepP, String, WordStart, Words).
> @@ -4209,7 +4207,8 @@ words_loop(SepP, String, WordStartPos, Words) :-
>
>  skip_to_next_word_start(SepP, String, CurPos, NextWordStartPos) :-
>      ( if
> -        unsafe_index_next(String, CurPos, NextPos, Char),
> +        unsafe_index_next_repl(String, CurPos, NextPos, Char, IsReplaced),
> +        IsReplaced = no,
>          SepP(Char)
>      then
>          skip_to_next_word_start(SepP, String, NextPos, NextWordStartPos)
> @@ -4224,10 +4223,8 @@ skip_to_next_word_start(SepP, String, CurPos, NextWordStartPos) :-
>      string::in, int::in, int::out) is det.
>
>  skip_to_word_end(SepP, String, CurPos, PastWordEndPos) :-
> -    ( if
> -        unsafe_index_next(String, CurPos, NextPos, Char)
> -    then
> -        ( if SepP(Char) then
> +    ( if unsafe_index_next_repl(String, CurPos, NextPos, Char, IsReplaced) then
> +        ( if IsReplaced = no, SepP(Char) then
>              PastWordEndPos = CurPos
>          else
>              skip_to_word_end(SepP, String, NextPos, PastWordEndPos)
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews


More information about the reviews mailing list