[m-rev.] for review: Add string indexing predicates that indicate a code unit was replaced.
Julien Fischer
jfischer at opturion.com
Fri Nov 15 17:21:59 AEDT 2019
Hi Peter,
On Thu, 14 Nov 2019, Peter Wang wrote:
> library/string.m:
> Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
> unsafe_prev_index_repl predicates that return an indication if a
> replacement character was returned because an ill-formed code unit
> sequence was encountered.
>
> Add more pragma inlines for indexing predicates.
>
> Remove may_not_duplicate attribute on the Erlang version of
> unsafe_prev_index_repl, which would conflict with the pragma inline
> declaration.
Does the compiler not complain about that?
> This requires the helper function do_unsafe_prev_index
> to be exported.
>
> tests/hard_coded/string_append_ooi_ilseq.m:
> tests/hard_coded/string_set_char_ilseq.m:
> Use index_next_repl in test cases.
>
> NEWS:
> Announce additions.
> ---
> NEWS | 4 +
> library/string.m | 236 ++++++++++++---------
> tests/hard_coded/string_append_ooi_ilseq.m | 20 +-
> tests/hard_coded/string_set_char_ilseq.m | 18 +-
> 4 files changed, 169 insertions(+), 109 deletions(-)
>
> diff --git a/library/string.m b/library/string.m
> index a9fbfd693..a28c33792 100644
> --- a/library/string.m
> +++ b/library/string.m
> @@ -262,6 +262,10 @@
> % Reading characters from strings.
> %
>
> +:- type maybe_replaced
> + ---> not_replaced
> + ; replaced_code_unit(uint8).
> +
Document that type.
Is there a reason you didn't have the predicates below return a single
value? (i.e. a value that's either a character in the string of the
invalid code unit)
> % index(String, Index, Char):
> %
> % If `Index' is the initial code unit offset of a well-formed code unit
> @@ -324,6 +328,17 @@
> %
> :- pred index_next(string::in, int::in, int::out, char::uo) is semidet.
>
> + % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
> + %
> + % Like index_next/4 but `MaybeReplaced' is `replaced_code_unit(CodeUnit)'
> + % iff `Char' is U+FFFD but `String' does NOT contain an encoding of U+FFFD
I suggest:
iff `Char' s the Unicode REPLACEMENT CHARACTER (U+FFFD) but ...
(It's probably good to get the actual name of that code point in a least
one spot, so it's absolutely clear what is being referrred to.)
The diff looks fine otherwise.
Julien.
More information about the reviews
mailing list