[m-rev.] for review: Define string.sub_string_search_start for out-of-range starting offset.
Mark Brown
mark at mercurylang.org
Thu Oct 31 15:21:30 AEDT 2019
This looks fine.
On Thu, Oct 31, 2019 at 1:42 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
> Define sub_string_search_start to fail if the BeginAt parameter is
> negative or past the end of the string to search. The original C
> implementation did not check for an out-of-range starting offset,
> and could crash the program. The C implementation was later amended
> to fail instead, but not other implementations.
>
> Check for negative starting offset in non-C implementations of
> sub_string_search_start.
>
> tests/hard_coded/string_sub_string_search.m:
> Extend test case.
> ---
> library/string.m | 37 ++++++++++++++++-----
> tests/hard_coded/string_sub_string_search.m | 10 ++++++
> 2 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 3a1646b81..2e319148b 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -601,6 +601,8 @@
> % `Index' is the code unit position in `String' where the first
> % occurrence of `SubString' occurs such that 'Index' is greater than or
> % equal to `BeginAt'. Indices start at zero.
> + % Fails if `BeginAt' is negative or greater than
> + % length(String) - length(SubString).
> %
> :- pred sub_string_search_start(string::in, string::in, int::in, int::out)
> is semidet.
> @@ -3603,8 +3605,12 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
> Index::out),
> [will_not_call_mercury, promise_pure, thread_safe],
> "{
> - Index = WholeString.IndexOf(Pattern, BeginAt,
> - System.StringComparison.Ordinal);
> + if (BeginAt < 0 || BeginAt > WholeString.Length) {
> + Index = -1;
> + } else {
> + Index = WholeString.IndexOf(Pattern, BeginAt,
> + System.StringComparison.Ordinal);
> + }
> SUCCESS_INDICATOR = (Index >= 0);
> }").
> :- pragma foreign_proc("Java",
> @@ -3612,7 +3618,13 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
> Index::out),
> [will_not_call_mercury, promise_pure, thread_safe],
> "
> - Index = WholeString.indexOf(Pattern, BeginAt);
> + // String.indexOf will check BeginAt > WholeString.Length
> + // so we don't need to do it first.
> + if (BeginAt < 0) {
> + Index = -1;
> + } else {
> + Index = WholeString.indexOf(Pattern, BeginAt);
> + }
> SUCCESS_INDICATOR = (Index >= 0);
> ").
> :- pragma foreign_proc("Erlang",
> @@ -3620,14 +3632,23 @@ sub_string_search_start_2(String, SubString, I, Length, SubLength) ->
> Index::out),
> [will_not_call_mercury, promise_pure, thread_safe],
> "
> - Index = mercury__string:sub_string_search_start_2(String, SubString,
> - BeginAt, size(String), size(SubString)),
> + case BeginAt >= 0 of
> + true ->
> + Index = mercury__string:sub_string_search_start_2(String, SubString,
> + BeginAt, size(String), size(SubString));
> + false ->
> + Index = -1
> + end,
> SUCCESS_INDICATOR = (Index =/= -1)
> ").
>
> sub_string_search_start(String, SubString, BeginAt, Index) :-
> - sub_string_search_start_loop(String, SubString, BeginAt,
> - length(String), length(SubString), Index).
> + ( if BeginAt < 0 then
> + fail
> + else
> + sub_string_search_start_loop(String, SubString, BeginAt,
> + length(String), length(SubString), Index)
> + ).
>
> % Brute force string searching. For short Strings this is good;
> % for longer strings Boyer-Moore is much better.
> @@ -3636,7 +3657,7 @@ sub_string_search_start(String, SubString, BeginAt, Index) :-
> int::in, int::out) is semidet.
>
> sub_string_search_start_loop(String, SubString, I, Len, SubLen, Index) :-
> - I < Len,
> + I =< Len - SubLen,
> ( if
> % XXX This is inefficient -- there is no (in, in, in) = in is semidet
> % mode of between, so this ends up calling the (in, in, in) = out
> diff --git a/tests/hard_coded/string_sub_string_search.m b/tests/hard_coded/string_sub_string_search.m
> index bd308f3a2..7e3c4e32b 100644
> --- a/tests/hard_coded/string_sub_string_search.m
> +++ b/tests/hard_coded/string_sub_string_search.m
> @@ -26,8 +26,18 @@ main(!IO) :-
> string.sub_string_search("cat", "at", 1),
> string.sub_string_search("cat", "t", 2),
>
> + not string.sub_string_search_start("catcatcat", "cat", -1, _),
> + string.sub_string_search_start("catcatcat", "cat", 0, 0),
> string.sub_string_search_start("catcatcat", "cat", 1, 3),
> + string.sub_string_search_start("catcatcat", "cat", 2, 3),
> + string.sub_string_search_start("catcatcat", "cat", 3, 3),
> + string.sub_string_search_start("catcatcat", "cat", 4, 6),
> + string.sub_string_search_start("catcatcat", "cat", 5, 6),
> + string.sub_string_search_start("catcatcat", "cat", 6, 6),
> + not string.sub_string_search_start("catcatcat", "cat", 7, _),
> + not string.sub_string_search_start("catcatcat", "cat", 8, _),
> not string.sub_string_search_start("catcatcat", "cat", 9, _),
> + not string.sub_string_search_start("catcatcat", "cat", 10, _),
>
> string.sub_string_search("cαtcατcat", "cατ", length("cαt"))
> ->
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews
More information about the reviews
mailing list