[m-rev.] for review: Add string indexing predicates that indicate a code unit was replaced.

Julien Fischer jfischer at opturion.com
Fri Nov 15 17:21:59 AEDT 2019


Hi Peter,

On Thu, 14 Nov 2019, Peter Wang wrote:

> library/string.m:
>    Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
>    unsafe_prev_index_repl predicates that return an indication if a
>    replacement character was returned because an ill-formed code unit
>    sequence was encountered.
>
>    Add more pragma inlines for indexing predicates.
>
>    Remove may_not_duplicate attribute on the Erlang version of
>    unsafe_prev_index_repl, which would conflict with the pragma inline
>    declaration.

Does the compiler not complain about that?

>    This requires the helper function do_unsafe_prev_index
>    to be exported.
>
> tests/hard_coded/string_append_ooi_ilseq.m:
> tests/hard_coded/string_set_char_ilseq.m:
>    Use index_next_repl in test cases.
>
> NEWS:
>    Announce additions.
> ---
> NEWS                                       |   4 +
> library/string.m                           | 236 ++++++++++++---------
> tests/hard_coded/string_append_ooi_ilseq.m |  20 +-
> tests/hard_coded/string_set_char_ilseq.m   |  18 +-
> 4 files changed, 169 insertions(+), 109 deletions(-)


>
> diff --git a/library/string.m b/library/string.m
> index a9fbfd693..a28c33792 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -262,6 +262,10 @@
> % Reading characters from strings.
> %
> 
> +:- type maybe_replaced
> +    --->    not_replaced
> +    ;       replaced_code_unit(uint8).
> +

Document that type.

Is there a reason you didn't have the predicates below return a single
value?  (i.e. a value that's either a character in the string of the
invalid code unit)

>     % index(String, Index, Char):
>     %
>     % If `Index' is the initial code unit offset of a well-formed code unit
> @@ -324,6 +328,17 @@
>     %
> :- pred index_next(string::in, int::in, int::out, char::uo) is semidet.
> 
> +    % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
> +    %
> +    % Like index_next/4 but `MaybeReplaced' is `replaced_code_unit(CodeUnit)'
> +    % iff `Char' is U+FFFD but `String' does NOT contain an encoding of U+FFFD

I suggest:

     iff `Char' s the Unicode REPLACEMENT CHARACTER (U+FFFD) but ...

(It's probably good to get the actual name of that code point in a least
one spot, so it's absolutely clear what is being referrred to.)

The diff looks fine otherwise.

Julien.


More information about the reviews mailing list