[m-rev.] for review: Document that string.to_utf8_code_unit_list throws exceptions.

Peter Wang novalazy at gmail.com
Mon Nov 4 16:51:40 AEDT 2019


library/string.m:
    Document that string.to_utf8_code_unit_list throws an exception
    if the string contains an unpaired surrogate code point.
---
 library/string.m | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/library/string.m b/library/string.m
index 9adabf662..657494f5f 100644
--- a/library/string.m
+++ b/library/string.m
@@ -210,6 +210,8 @@
 :- pred to_code_unit_list(string::in, list(int)::out) is det.
 
     % Convert a string into a list of UTF-8 code units.
+    % Throws an exception if the string contains an unpaired surrogate code
+    % point, as the encoding of surrogate code points is prohibited in UTF-8.
     %
 :- pred to_utf8_code_unit_list(string::in, list(int)::out) is det.
 
@@ -1973,10 +1975,6 @@ to_code_unit_list_loop(String, Index, End, List) :-
 
 %---------------------%
 
-% XXX ILSEQ Behaviour differs according to target language.
-%   - java: throws exception on unpaired surrogate (correct as written)
-%   - csharp: infinite loop on string containing unpaired surrogate
-
 to_utf8_code_unit_list(String, CodeList) :-
     ( if internal_encoding_is_utf8 then
         to_code_unit_list(String, CodeList)
@@ -1990,7 +1988,7 @@ encode_utf8(Char, CodeList0, CodeList) :-
     ( if char.to_utf8(Char, CharCodes) then
         CodeList = CharCodes ++ CodeList0
     else
-        unexpected($pred, "char.to_utf8 failed")
+        unexpected($pred, "surrogate code point")
     ).
 
 %---------------------%
-- 
2.23.0



More information about the reviews mailing list