[m-rev.] for review: Improve definition of string.index, index_next, prev_index.

Peter Wang novalazy at gmail.com
Wed Oct 30 17:09:33 AEDT 2019


library/string.m:
    Fix definition of index/3 and index_next/4 to account for an offset
    into a non-initial code unit in a well-formed code unit sequence.

    Similarly for prev_index/4.
---
 library/string.m | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/library/string.m b/library/string.m
index 7f093807e..8c87b545c 100644
--- a/library/string.m
+++ b/library/string.m
@@ -252,10 +252,9 @@
     % sequence in `String' then `Char' is the code point encoded by that
     % sequence.
     %
-    % If the code unit in `String' at `Index' is part of an ill-formed sequence
-    % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
-    % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
-    % strings are UTF-16 encoded).
+    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
+    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
+    % code point at `Index' (when strings are UTF-16 encoded).
     %
     % Fails if `Index' is out of range (negative, or greater than or equal to
     % the length of `String').
@@ -299,10 +298,10 @@
     % sequence, and `NextIndex' is the offset immediately following that
     % sequence.
     %
-    % If the code unit in `String' at `Index' is part of an ill-formed sequence
-    % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
-    % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
-    % strings are UTF-16 encoded), and `NextIndex' is Index + 1.
+    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
+    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
+    % code point at `Index' (when strings are UTF-16 encoded), and `NextIndex'
+    % is Index + 1.
     %
     % Fails if `Index' is out of range (negative, or greater than or equal to
     % the length of `String').
@@ -325,11 +324,10 @@
     % `String' then `Char' is the code point encoded by that sequence, and
     % `PrevIndex' is the initial code unit offset of that sequence.
     %
-    % If the code unit in `String' at `Index - 1' is part of an ill-formed
-    % sequence then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when
-    % strings are UTF-8 encoded) or the unpaired surrogate code point at
-    % `Index - 1' (when strings are UTF-16 encoded), and `PrevIndex' is
-    % `Index - 1'.
+    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
+    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
+    % code point at `Index - 1' (when strings are UTF-16 encoded), and
+    % `PrevIndex' is `Index - 1'.
     %
     % Fails if `Index' is out of range (non-positive, or greater than the
     % length of `String').
-- 
2.23.0



More information about the reviews mailing list