[m-rev.] for review: add is_(leading|trailing)_surrogate/1 to char module
Sebastian Godelet
sebastian.godelet+github at gmail.com
Sun Dec 21 13:24:37 AEDT 2014
For review by anyone.
For input validation and character encoding transformations
it is essential to check for surrogate characters and their correct
sequence.
NEWS:
Announce the addition of is_(leading|trailing)_surrogate/1.
library/char.m:
Add is_leading_surrogate/1 which succeeds if a character is a
leading surrogate character.
is_trailing_surrogate/1 succeeds if the character is a trailing
surrogate character.
Sebastian.
--
diff --git a/NEWS b/NEWS
index 5862a26..4bf175c 100644
--- a/NEWS
+++ b/NEWS
@@ -72,6 +72,7 @@ Changes to the Mercury standard library:
- decimal_digit_to_int/2, det_decimal_digit_to_int/1
- hex_digit_to_int/2, det_hex_digit_to_int/1
- base_digit_to_int/3, det_base_digit_to_int/2
+ - is_leading_surrogate/1, is_trailing_surrogate/1
The following predicates in the char module have been deprecated and
will either be removed or have their semantics changed in a future
release. diff --git a/library/char.m b/library/char.m
index 007c330..5ef1885 100644
--- a/library/char.m
+++ b/library/char.m
@@ -258,6 +258,18 @@
%
:- pred is_surrogate(char::in) is semidet.
+ % Succeed if `Char' is a leading Unicode surrogate code point.
+ % A leading surrogate code point is in the inclusive range from
+ % 0xd800 to 0xdbff.
+ %
+:- pred is_leading_surrogate(char::in) is semidet.
+
+ % Succeed if `Char' is a trailing Unicode surrogate code point.
+ % A trailing surrogate code point is in the inclusive range from
+ % 0xdc00 to 0xdfff.
+ %
+:- pred is_trailing_surrogate(char::in) is semidet.
+
% Succeed if `Char' is a Noncharacter code point.
% Sixty-six code points are not used to encode characters.
% These code points should not be used for interchange, but may be
used @@ -991,6 +1003,16 @@ is_surrogate(Char) :-
Int >= 0xd800,
Int =< 0xdfff.
+is_leading_surrogate(Char) :-
+ Int = char.to_int(Char),
+ Int >= 0xd800,
+ Int =< 0xdbff.
+
+is_trailing_surrogate(Char) :-
+ Int = char.to_int(Char),
+ Int >= 0xdc00,
+ Int =< 0xdfff.
+
is_noncharacter(Char) :-
Int = char.to_int(Char),
( 0xfdd0 =< Int, Int =< 0xfdef
More information about the reviews
mailing list