[m-rev.] for review: Make compare_ignore_case_ascii loop over code units.

Mark Brown mark at mercurylang.org
Tue Oct 29 00:53:23 AEDT 2019


On Mon, Oct 28, 2019 at 2:30 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Make compare_ignore_case_ascii loop over code units instead of code
>     points, allowing it to work on strings that contain contain

Remove a "contain".

>     ill-formed code unit sequences.
> ---
>  library/string.m | 63 ++++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index f3ddd97ab..5d7afb542 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -574,7 +574,7 @@
>
>      % compare_ignore_case_ascii(Res, X, Y):
>      %
> -    % Compare two strings by code point order, ignoring the case of letters
> +    % Compare two strings by code unit order, ignoring the case of letters
>      % (A-Z, a-z) in the ASCII range.
>      % Equivalent to `compare(Res, to_lower(X), to_lower(Y))'
>      % but more efficient.
> @@ -3413,41 +3413,42 @@ unsafe_compare_substrings_loop(X, Y, IX, IY, Rem, Res) :-
>
>  %---------------------%
>
> -% XXX ILSEQ unsafe_index_next effectively truncates either or both strings
> -% at the first ill-formed sequence.
> -
>  compare_ignore_case_ascii(Res, X, Y) :-
> -    compare_ignore_case_ascii_loop(X, Y, 0, Res).
> +    LenX = length(X),
> +    LenY = length(Y),
> +    Rem = min(LenX, LenY),
> +    compare_ignore_case_ascii_loop(X, Y, 0, Rem, Res0),
> +    (
> +        Res0 = (=),
> +        compare(Res, LenX, LenY)
> +    ;
> +        ( Res0 = (<)
> +        ; Res0 = (>)
> +        ),
> +        Res = Res0
> +    ).
>
> -:- pred compare_ignore_case_ascii_loop(string::in, string::in, int::in,
> +:- pred compare_ignore_case_ascii_loop(string::in, string::in, int::in, int::in,
>      comparison_result::uo) is det.
>
> -compare_ignore_case_ascii_loop(X, Y, I, Res) :-
> -    ( if unsafe_index_next(X, I, IX, CharX) then
> -        ( if unsafe_index_next(Y, I, _IY, CharY) then
> -            char.to_lower(CharX, LowerCharX),
> -            char.to_lower(CharY, LowerCharY),
> -            compare(CharRes, LowerCharX, LowerCharY),
> -            (
> -                CharRes = (=),
> -                % CharX = CharY, or both are in the ASCII range.
> -                % In either case, we must have IX = IY.
> -                compare_ignore_case_ascii_loop(X, Y, IX, Res)
> -            ;
> -                ( CharRes = (<)
> -                ; CharRes = (>)
> -                ),
> -                Res = CharRes
> -            )
> -        else
> -            % X longer than Y.
> -            Res = (>)
> -        )
> -    else if unsafe_index_next(Y, I, _IY, _CharY) then
> -        % X shorter than Y.
> -        Res = (<)
> -    else
> +compare_ignore_case_ascii_loop(X, Y, I, Rem, Res) :-
> +    ( if Rem = 0 then
>          Res = (=)
> +    else
> +        unsafe_index_code_unit(X, I, CodeX),
> +        unsafe_index_code_unit(Y, I, CodeY),
> +        to_lower_code_unit(CodeX, LowerCodeX),
> +        to_lower_code_unit(CodeY, LowerCodeY),
> +        compare(Res0, LowerCodeX, LowerCodeY),
> +        (
> +            Res0 = (=),
> +            compare_ignore_case_ascii_loop(X, Y, I + 1, Rem - 1, Res)

Why not compare I to the end index, and save decrementing Rem each
iteration? That would be clearer in my opinion.

Otherwise that's fine.

Mark


More information about the reviews mailing list