[m-rev.] for review: Define string.sub_string_search_start for out-of-range starting offset.

Mark Brown mark at mercurylang.org
Thu Oct 31 15:21:30 AEDT 2019


This looks fine.

On Thu, Oct 31, 2019 at 1:42 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Define sub_string_search_start to fail if the BeginAt parameter is
>     negative or past the end of the string to search. The original C
>     implementation did not check for an out-of-range starting offset,
>     and could crash the program. The C implementation was later amended
>     to fail instead, but not other implementations.
>
>     Check for negative starting offset in non-C implementations of
>     sub_string_search_start.
>
> tests/hard_coded/string_sub_string_search.m:
>     Extend test case.
> ---
>  library/string.m                            | 37 ++++++++++++++++-----
>  tests/hard_coded/string_sub_string_search.m | 10 ++++++
>  2 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 3a1646b81..2e319148b 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -601,6 +601,8 @@
>      % `Index' is the code unit position in `String' where the first
>      % occurrence of `SubString' occurs such that 'Index' is greater than or
>      % equal to `BeginAt'. Indices start at zero.
> +    % Fails if `BeginAt' is negative or greater than
> +    % length(String) - length(SubString).
>      %
>  :- pred sub_string_search_start(string::in, string::in, int::in, int::out)
>      is semidet.
> @@ -3603,8 +3605,12 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
>          Index::out),
>      [will_not_call_mercury, promise_pure, thread_safe],
>  "{
> -    Index = WholeString.IndexOf(Pattern, BeginAt,
> -        System.StringComparison.Ordinal);
> +    if (BeginAt < 0 || BeginAt > WholeString.Length) {
> +        Index = -1;
> +    } else {
> +        Index = WholeString.IndexOf(Pattern, BeginAt,
> +            System.StringComparison.Ordinal);
> +    }
>      SUCCESS_INDICATOR = (Index >= 0);
>  }").
>  :- pragma foreign_proc("Java",
> @@ -3612,7 +3618,13 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
>          Index::out),
>      [will_not_call_mercury, promise_pure, thread_safe],
>  "
> -    Index = WholeString.indexOf(Pattern, BeginAt);
> +    // String.indexOf will check BeginAt > WholeString.Length
> +    // so we don't need to do it first.
> +    if (BeginAt < 0) {
> +        Index = -1;
> +    } else {
> +        Index = WholeString.indexOf(Pattern, BeginAt);
> +    }
>      SUCCESS_INDICATOR = (Index >= 0);
>  ").
>  :- pragma foreign_proc("Erlang",
> @@ -3620,14 +3632,23 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
>          Index::out),
>      [will_not_call_mercury, promise_pure, thread_safe],
>  "
> -    Index = mercury__string:sub_string_search_start_2(String, SubString,
> -        BeginAt, size(String), size(SubString)),
> +    case BeginAt >= 0 of
> +        true ->
> +            Index = mercury__string:sub_string_search_start_2(String, SubString,
> +                BeginAt, size(String), size(SubString));
> +        false ->
> +            Index = -1
> +    end,
>      SUCCESS_INDICATOR = (Index =/= -1)
>  ").
>
>  sub_string_search_start(String, SubString, BeginAt, Index) :-
> -    sub_string_search_start_loop(String, SubString, BeginAt,
> -        length(String), length(SubString), Index).
> +    ( if BeginAt < 0 then
> +        fail
> +    else
> +        sub_string_search_start_loop(String, SubString, BeginAt,
> +            length(String), length(SubString), Index)
> +    ).
>
>      % Brute force string searching. For short Strings this is good;
>      % for longer strings Boyer-Moore is much better.
> @@ -3636,7 +3657,7 @@ sub_string_search_start(String, SubString, BeginAt, Index) :-
>      int::in, int::out) is semidet.
>
>  sub_string_search_start_loop(String, SubString, I, Len, SubLen, Index) :-
> -    I < Len,
> +    I =< Len - SubLen,
>      ( if
>          % XXX This is inefficient -- there is no (in, in, in) = in is semidet
>          % mode of between, so this ends up calling the (in, in, in) = out
> diff --git a/tests/hard_coded/string_sub_string_search.m b/tests/hard_coded/string_sub_string_search.m
> index bd308f3a2..7e3c4e32b 100644
> --- a/tests/hard_coded/string_sub_string_search.m
> +++ b/tests/hard_coded/string_sub_string_search.m
> @@ -26,8 +26,18 @@ main(!IO) :-
>          string.sub_string_search("cat", "at", 1),
>          string.sub_string_search("cat", "t", 2),
>
> +        not string.sub_string_search_start("catcatcat", "cat", -1, _),
> +        string.sub_string_search_start("catcatcat", "cat", 0, 0),
>          string.sub_string_search_start("catcatcat", "cat", 1, 3),
> +        string.sub_string_search_start("catcatcat", "cat", 2, 3),
> +        string.sub_string_search_start("catcatcat", "cat", 3, 3),
> +        string.sub_string_search_start("catcatcat", "cat", 4, 6),
> +        string.sub_string_search_start("catcatcat", "cat", 5, 6),
> +        string.sub_string_search_start("catcatcat", "cat", 6, 6),
> +        not string.sub_string_search_start("catcatcat", "cat", 7, _),
> +        not string.sub_string_search_start("catcatcat", "cat", 8, _),
>          not string.sub_string_search_start("catcatcat", "cat", 9, _),
> +        not string.sub_string_search_start("catcatcat", "cat", 10, _),
>
>          string.sub_string_search("cαtcατcat", "cατ", length("cαt"))
>      ->
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews


More information about the reviews mailing list