[m-rev.] for review: Fix some handling of ill-formed sequences in string module.
Peter Wang
novalazy at gmail.com
Wed Jul 27 11:34:21 AEST 2022
On Tue, 26 Jul 2022 18:30:54 +1000 Julien Fischer <jfischer at opturion.com> wrote:
>
> Hi Peter,
>
> On Tue, 26 Jul 2022, Peter Wang wrote:
>
> > I can add something to NEWS if required.
>
> Please do so.
>
> In light of the fixes to unsafe_index_{next,prev}_repl this should
> probably go on to the release branch as well -- I will sort that after
> it is committed.
Good idea.
>
> > ----
> >
> > library/string.m:
> > Document that string.duplicate_char may throw an exception.
> >
> > Fix string.all_match to fail if the string being tested contains
> > any ill-formed code unit sequences.
>
> I think the documentation for string.all_match should say as much.
>
Done.
> >
> > Fix the Mercury implementation of string.contains_char to continue
> > searching for the character past any ill-formed code unit sequences.
>
> Do we even need the Mercury implementation since there are foreign_proc
> implementations for all three target languages?
>
Not really, but we have them for other predicates as well.
> Again, I think the behaviour w.r.t ill-formed sequences should be
> documented.
>
> That looks fine otherwise.
How about these changes?
Peter
diff --git a/NEWS b/NEWS
index 67148bf68..4aaa32fa3 100644
--- a/NEWS
+++ b/NEWS
@@ -355,6 +355,15 @@ Changes to the Mercury standard library
### Changes to the `string` module
+* We have fixed the behaviour of the following predicates when called on a
+ string containing ill-formed code unit sequences:
+
+ - pred `all_match/2`
+ - pred `index_next_repl/5`
+ - pred `unsafe_index_next_repl/5`
+ - pred `prev_index_repl/5`
+ - pred `unsafe_prev_index_repl/5`
+
* The following predicate has been added:
- pred `contains_match/2`
diff --git a/library/string.m b/library/string.m
index 573ea2909..811c474a6 100644
--- a/library/string.m
+++ b/library/string.m
@@ -576,14 +576,15 @@
% all_match(TestPred, String):
%
% True iff String is empty or contains only code points that satisfy
- % TestPred.
+ % TestPred. False if String contains an ill-formed code unit sequence.
%
:- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.
% contains_match(TestPred, String):
%
% True iff String contains at least one code point that satisfies
- % TestPred.
+ % TestPred. Any ill-formed code unit sequences in String are ignored
+ % as they do not encode code points.
%
:- pred contains_match(pred(char)::in(pred(in) is semidet), string::in)
is semidet.
@@ -591,6 +592,8 @@
% contains_char(String, Char):
%
% Succeed if the code point Char occurs in String.
+ % Any ill-formed code unit sequences within String are ignored
+ % as they will not contain Char.
%
:- pred contains_char(string::in, char::in) is semidet.
More information about the reviews
mailing list