[m-rev.] for review: Make string.prefix_length/suffix_length stop at ill-formed sequence.
Mark Brown
mark at mercurylang.org
Wed Oct 30 19:16:23 AEDT 2019
This looks fine.
On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
> Make prefix_length and suffix_length stop at an ill-formed sequence
> in UTF-8 strings. Such a sequence does not contain any code point
> that could satisfy a test on code points. Previously, prefix_length
> and suffix_length would would call Pred(U+FFFD) for every code unit
> in an ill-formed sequence.
>
> Tweak documentation.
>
> Delete obsolete comments.
> ---
> library/string.m | 19 ++++++-------------
> 1 file changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 380a03f11..3e9acf8c1 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -575,7 +575,7 @@
> % prefix_length(Pred, String):
> %
> % The length (in code units) of the maximal prefix of `String' consisting
> - % entirely of characters (code points) satisfying Pred.
> + % entirely of code points satisfying `Pred'.
> %
> :- func prefix_length(pred(char)::in(pred(in) is semidet), string::in)
> = (int::out) is det.
> @@ -583,7 +583,7 @@
> % suffix_length(Pred, String):
> %
> % The length (in code units) of the maximal suffix of `String' consisting
> - % entirely of characters (code points) satisfying Pred.
> + % entirely of code points satisfying `Pred'.
> %
> :- func suffix_length(pred(char)::in(pred(in) is semidet), string::in)
> = (int::out) is det.
> @@ -3525,9 +3525,6 @@ compare_ignore_case_ascii_loop(X, Y, I, CommonLen, Res) :-
>
> %---------------------%
>
> - % XXX ILSEQ unsafe_index_next effectively truncates the string at the first
> - % ill-formed sequence.
> - %
> prefix_length(P, S) = Index :-
> prefix_length_loop(P, S, 0, Index).
>
> @@ -3536,7 +3533,8 @@ prefix_length(P, S) = Index :-
>
> prefix_length_loop(P, S, I, Index) :-
> ( if
> - unsafe_index_next(S, I, J, Char),
> + unsafe_index_next_repl(S, I, J, Char, IsReplaced),
> + IsReplaced = no,
> P(Char)
> then
> prefix_length_loop(P, S, J, Index)
> @@ -3544,9 +3542,6 @@ prefix_length_loop(P, S, I, Index) :-
> Index = I
> ).
>
> - % XXX ILSEQ unsafe_index_next effectively truncates the string at the first
> - % ill-formed sequence.
> - %
> suffix_length(P, S) = End - Index :-
> End = length(S),
> suffix_length_loop(P, S, End, Index).
> @@ -3556,7 +3551,8 @@ suffix_length(P, S) = End - Index :-
>
> suffix_length_loop(P, S, I, Index) :-
> ( if
> - unsafe_prev_index(S, I, J, Char),
> + unsafe_prev_index_repl(S, I, J, Char, IsReplaced),
> + IsReplaced = no,
> P(Char)
> then
> suffix_length_loop(P, S, J, Index)
> @@ -4640,9 +4636,6 @@ chomp(S) = Chomp :-
> Chomp = S
> ).
>
> -% XXX ILSEQ Once the problems with prefix_length, suffix_length are fixed,
> -% there should be no problem stripping strings containing ill-formed sequences.
> -
> strip(S0) = S :-
> L = prefix_length(char.is_whitespace, S0),
> R = suffix_length(char.is_whitespace, S0),
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews
More information about the reviews
mailing list