[m-rev.] for review: Make string.all_match fail on UTF-8 string containing ill-formed sequence.
Peter Wang
novalazy at gmail.com
Wed Oct 30 17:09:37 AEDT 2019
library/string.m:
Make all_match(Pred, String) always fail if the string contains an
ill-formed code unit sequence, and strings use UTF-8 encoding.
Such sequences do not contain any code points that could satisfy a
test on code points. Previously, all_match would call Pred(U+FFFD)
for every code unit in an ill-formed sequence.
Define all_match to rule out an interpretation that could ignore
ill-formed sequences.
---
library/string.m | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/library/string.m b/library/string.m
index 3e9acf8c1..8d253e36b 100644
--- a/library/string.m
+++ b/library/string.m
@@ -527,8 +527,8 @@
% all_match(TestPred, String):
%
- % True if TestPred is true when applied to each character (code point) in
- % String or if String is the empty string.
+ % True iff `String' is empty or contains only code points that satisfy
+ % `TestPred'.
%
:- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.
@@ -3201,10 +3201,6 @@ is_well_formed(_) :-
% For speed, most of these predicates have C versions as well as
% Mercury versions. XXX why not all?
-% XXX ILSEQ Behaviour depends on target language.
-% The generic versions use all_match which currently uses unsafe_index_next and
-% ignores the first ill-formed sequence and everything thereafter.
-
:- pragma foreign_proc("C",
is_all_alpha(S::in),
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
@@ -3347,9 +3343,6 @@ is_all_digits(S) :-
%---------------------%
-% XXX ILSEQ all_match should fail if it encounters an ill-formed sequence;
-% instead it acts as if the String ends there.
-
all_match(P, String) :-
all_match_loop(P, String, 0).
@@ -3357,7 +3350,8 @@ all_match(P, String) :-
int::in) is semidet.
all_match_loop(P, String, Cur) :-
- ( if unsafe_index_next(String, Cur, Next, Char) then
+ ( if unsafe_index_next_repl(String, Cur, Next, Char, IsReplaced) then
+ IsReplaced = no,
P(Char),
all_match_loop(P, String, Next)
else
--
2.23.0
More information about the reviews
mailing list