[m-rev.] for review: add is_(leading|trailing)_surrogate/1 to char module

Sebastian Godelet sebastian.godelet+github at gmail.com
Sun Dec 21 13:24:37 AEDT 2014


For review by anyone.

For input validation and character encoding transformations
it is essential to check for surrogate characters and their correct
sequence.

NEWS:
    Announce the addition of is_(leading|trailing)_surrogate/1.

library/char.m:
    Add is_leading_surrogate/1 which succeeds if a character is a
    leading surrogate character.
    is_trailing_surrogate/1 succeeds if the character is a trailing
    surrogate character.

Sebastian.

--

diff --git a/NEWS b/NEWS
index 5862a26..4bf175c 100644
--- a/NEWS
+++ b/NEWS
@@ -72,6 +72,7 @@ Changes to the Mercury standard library:
     - decimal_digit_to_int/2, det_decimal_digit_to_int/1
     - hex_digit_to_int/2, det_hex_digit_to_int/1
     - base_digit_to_int/3, det_base_digit_to_int/2
+    - is_leading_surrogate/1, is_trailing_surrogate/1
 
   The following predicates in the char module have been deprecated and
will either be removed or have their semantics changed in a future
release. diff --git a/library/char.m b/library/char.m
index 007c330..5ef1885 100644
--- a/library/char.m
+++ b/library/char.m
@@ -258,6 +258,18 @@
     %
 :- pred is_surrogate(char::in) is semidet.
 
+    % Succeed if `Char' is a leading Unicode surrogate code point.
+    % A leading surrogate code point is in the inclusive range from
+    % 0xd800 to 0xdbff.
+    %
+:- pred is_leading_surrogate(char::in) is semidet.
+
+    % Succeed if `Char' is a trailing Unicode surrogate code point.
+    % A trailing surrogate code point is in the inclusive range from
+    % 0xdc00 to 0xdfff.
+    %
+:- pred is_trailing_surrogate(char::in) is semidet.
+
     % Succeed if `Char' is a Noncharacter code point.
     % Sixty-six code points are not used to encode characters.
     % These code points should not be used for interchange, but may be
used @@ -991,6 +1003,16 @@ is_surrogate(Char) :-
     Int >= 0xd800,
     Int =< 0xdfff.
 
+is_leading_surrogate(Char) :-
+    Int = char.to_int(Char),
+    Int >= 0xd800,
+    Int =< 0xdbff.
+
+is_trailing_surrogate(Char) :-
+    Int = char.to_int(Char),
+    Int >= 0xdc00,
+    Int =< 0xdfff.
+
 is_noncharacter(Char) :-
     Int = char.to_int(Char),
     ( 0xfdd0 =< Int, Int =< 0xfdef



More information about the reviews mailing list