[m-rev.] for review: Improve definition of string.index, index_next, prev_index.

Mark Brown mark at mercurylang.org
Wed Oct 30 18:35:59 AEDT 2019


This looks fine.

On Wed, Oct 30, 2019 at 5:10 PM Peter Wang <novalazy at gmail.com> wrote:
>
> library/string.m:
>     Fix definition of index/3 and index_next/4 to account for an offset
>     into a non-initial code unit in a well-formed code unit sequence.
>
>     Similarly for prev_index/4.
> ---
>  library/string.m | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index 7f093807e..8c87b545c 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -252,10 +252,9 @@
>      % sequence in `String' then `Char' is the code point encoded by that
>      % sequence.
>      %
> -    % If the code unit in `String' at `Index' is part of an ill-formed sequence
> -    % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
> -    % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
> -    % strings are UTF-16 encoded).
> +    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> +    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> +    % code point at `Index' (when strings are UTF-16 encoded).
>      %
>      % Fails if `Index' is out of range (negative, or greater than or equal to
>      % the length of `String').
> @@ -299,10 +298,10 @@
>      % sequence, and `NextIndex' is the offset immediately following that
>      % sequence.
>      %
> -    % If the code unit in `String' at `Index' is part of an ill-formed sequence
> -    % then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when strings are
> -    % UTF-8 encoded) or the unpaired surrogate code point at `Index' (when
> -    % strings are UTF-16 encoded), and `NextIndex' is Index + 1.
> +    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> +    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> +    % code point at `Index' (when strings are UTF-16 encoded), and `NextIndex'
> +    % is Index + 1.
>      %
>      % Fails if `Index' is out of range (negative, or greater than or equal to
>      % the length of `String').
> @@ -325,11 +324,10 @@
>      % `String' then `Char' is the code point encoded by that sequence, and
>      % `PrevIndex' is the initial code unit offset of that sequence.
>      %
> -    % If the code unit in `String' at `Index - 1' is part of an ill-formed
> -    % sequence then `Char' is either a U+FFFD REPLACEMENT CHARACTER (when
> -    % strings are UTF-8 encoded) or the unpaired surrogate code point at
> -    % `Index - 1' (when strings are UTF-16 encoded), and `PrevIndex' is
> -    % `Index - 1'.
> +    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
> +    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
> +    % code point at `Index - 1' (when strings are UTF-16 encoded), and
> +    % `PrevIndex' is `Index - 1'.
>      %
>      % Fails if `Index' is out of range (non-positive, or greater than the
>      % length of `String').
> --
> 2.23.0
>
> _______________________________________________
> reviews mailing list
> reviews at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/reviews


More information about the reviews mailing list