[m-rev.] for post-commit review: runtime/mercury_string.[ch]
Peter Wang
novalazy at gmail.com
Tue Apr 2 12:47:51 AEDT 2024
On Sat, 30 Mar 2024 10:00:23 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> @@ -295,6 +298,8 @@ MR_hash_string6(MR_ConstString s)
> MR_bool
> MR_utf8_next(const MR_String s_, MR_Integer *pos)
> {
> + // XXX Several functions have this cast from const MR_String. Why?
> + // With gcc 7.5.0 a least, things work fine without the cast.
> const unsigned char *s = (const unsigned char *) s_;
> int c;
>
MR_String is defined as pointer to possibly-signed char.
We don't want to think about signed chars in these functions
(or ever, really).
The other reason is that the UTF-8 code was adapted from code that I had
written for another project, and the casts were carried over.
> @@ -477,7 +481,21 @@ extern MR_int_least32_t MR_utf8_get(const MR_String s, MR_Integer pos);
> extern MR_int_least32_t MR_utf8_get_mb(const MR_String s, MR_Integer pos,
> int *width);
>
> -// Decode the code point beginning at `pos' in `s', and advance `*pos'.
> +// Decode the code point beginning at `pos' in `s', and set `*pos'
> +// to the position of the next code point, if
> +//
> +// - there *is* a next code point, i.e. if the code point is not NUL, and
> +// - if there is a well formed code point starting at pos.
> +//
> +// If there is a code point at pos and it is NUL, then return it, but
> +// do not update pos.
As it is, MR_utf8_get_next *will* advance pos past a NUL terminator.
I've updated the comment to reflect the current behaviour.
Peter
More information about the reviews
mailing list