[m-rev.] for review: Fix some handling of ill-formed sequences in string module.

Peter Wang novalazy at gmail.com
Wed Jul 27 11:34:21 AEST 2022


On Tue, 26 Jul 2022 18:30:54 +1000 Julien Fischer <jfischer at opturion.com> wrote:
> 
> Hi Peter,
> 
> On Tue, 26 Jul 2022, Peter Wang wrote:
> 
> > I can add something to NEWS if required.
> 
> Please do so.
> 
> In light of the fixes to unsafe_index_{next,prev}_repl this should
> probably go on to the release branch as well -- I will sort that after
> it is committed.

Good idea.

> 
> > ----
> >
> > library/string.m:
> >    Document that string.duplicate_char may throw an exception.
> >
> >    Fix string.all_match to fail if the string being tested contains
> >    any ill-formed code unit sequences.
> 
> I think the documentation for string.all_match should say as much.
> 

Done.

> >
> >    Fix the Mercury implementation of string.contains_char to continue
> >    searching for the character past any ill-formed code unit sequences.
> 
> Do we even need the Mercury implementation since there are foreign_proc
> implementations for all three target languages?
> 

Not really, but we have them for other predicates as well.

> Again, I think the behaviour w.r.t ill-formed sequences should be
> documented.
> 
> That looks fine otherwise.

How about these changes?

Peter

diff --git a/NEWS b/NEWS
index 67148bf68..4aaa32fa3 100644
--- a/NEWS
+++ b/NEWS
@@ -355,6 +355,15 @@ Changes to the Mercury standard library

 ### Changes to the `string` module

+* We have fixed the behaviour of the following predicates when called on a
+  string containing ill-formed code unit sequences:
+
+   - pred `all_match/2`
+   - pred `index_next_repl/5`
+   - pred `unsafe_index_next_repl/5`
+   - pred `prev_index_repl/5`
+   - pred `unsafe_prev_index_repl/5`
+
 * The following predicate has been added:

    - pred `contains_match/2`
diff --git a/library/string.m b/library/string.m
index 573ea2909..811c474a6 100644
--- a/library/string.m
+++ b/library/string.m
@@ -576,14 +576,15 @@
     % all_match(TestPred, String):
     %
     % True iff String is empty or contains only code points that satisfy
-    % TestPred.
+    % TestPred. False if String contains an ill-formed code unit sequence.
     %
 :- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.

     % contains_match(TestPred, String):
     %
     % True iff String contains at least one code point that satisfies
-    % TestPred.
+    % TestPred. Any ill-formed code unit sequences in String are ignored
+    % as they do not encode code points.
     %
 :- pred contains_match(pred(char)::in(pred(in) is semidet), string::in)
     is semidet.
@@ -591,6 +592,8 @@
     % contains_char(String, Char):
     %
     % Succeed if the code point Char occurs in String.
+    % Any ill-formed code unit sequences within String are ignored
+    % as they will not contain Char.
     %
 :- pred contains_char(string::in, char::in) is semidet.




More information about the reviews mailing list