[m-rev.] for review: Improve definition of string.index, index_next, prev_index.
Mark Brown
mark at mercurylang.org
Wed Oct 30 18:35:59 AEDT 2019
This looks fine.
On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
> Fix definition of index/3 and index_next/4 to account for an offset
> into a non-initial code unit in a well-formed code unit sequence.
>
> Similarly for prev_index/4.
> ---
> library/string.m | 24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 7f093807e..8c87b545c 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -252,10 +252,9 @@
> % sequence in `String' then `Char' is the code point encoded by that
> % sequence.
> %
> - % If the code unit in `String' at `Index' is part of an ill-formed sequence
> - % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
> - % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
> - % strings are UTF-16 encoded).
> + % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> + % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> + % code point at `Index' (when strings are UTF-16 encoded).
> %
> % Fails if `Index' is out of range (negative, or greater than or equal to
> % the length of `String').
> @@ -299,10 +298,10 @@
> % sequence, and `NextIndex' is the offset immediately following that
> % sequence.
> %
> - % If the code unit in `String' at `Index' is part of an ill-formed sequence
> - % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
> - % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
> - % strings are UTF-16 encoded), and `NextIndex' is Index + 1.
> + % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> + % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> + % code point at `Index' (when strings are UTF-16 encoded), and `NextIndex'
> + % is Index + 1.
> %
> % Fails if `Index' is out of range (negative, or greater than or equal to
> % the length of `String').
> @@ -325,11 +324,10 @@
> % `String' then `Char' is the code point encoded by that sequence, and
> % `PrevIndex' is the initial code unit offset of that sequence.
> %
> - % If the code unit in `String' at `Index - 1' is part of an ill-formed
> - % sequence then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when
> - % strings are UTF-8 encoded) or the unpaired surrogate code point at
> - % `Index - 1' (when strings are UTF-16 encoded), and `PrevIndex' is
> - % `Index - 1'.
> + % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> + % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> + % code point at `Index - 1' (when strings are UTF-16 encoded), and
> + % `PrevIndex' is `Index - 1'.
> %
> % Fails if `Index' is out of range (non-positive, or greater than the
> % length of `String').
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews
More information about the reviews
mailing list