[m-rev.] for post-commit review: runtime/mercury_string.[ch]

Peter Wang novalazy at gmail.com
Tue Apr 2 12:47:51 AEDT 2024


On Sat, 30 Mar 2024 10:00:23 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> @@ -295,6 +298,8 @@ MR_hash_string6(MR_ConstString s)
>  MR_bool
>  MR_utf8_next(const MR_String s_, MR_Integer *pos)
>  {
> +    // XXX Several functions have this cast from const MR_String. Why?
> +    // With gcc 7.5.0 a least, things work fine without the cast.
>      const unsigned char *s = (const unsigned char *) s_;
>      int c;
>  

MR_String is defined as pointer to possibly-signed char.
We don't want to think about signed chars in these functions
(or ever, really).

The other reason is that the UTF-8 code was adapted from code that I had
written for another project, and the casts were carried over.

> @@ -477,7 +481,21 @@ extern MR_int_least32_t MR_utf8_get(const MR_String s, MR_Integer pos);
>  extern MR_int_least32_t MR_utf8_get_mb(const MR_String s, MR_Integer pos,
>                              int *width);
>  
> -// Decode the code point beginning at `pos' in `s', and advance `*pos'.
> +// Decode the code point beginning at `pos' in `s', and set `*pos'
> +// to the position of the next code point, if
> +//
> +// - there *is* a next code point, i.e. if the code point is not NUL, and
> +// - if there is a well formed code point starting at pos.
> +//
> +// If there is a code point at pos and it is NUL, then return it, but
> +// do not update pos.

As it is, MR_utf8_get_next *will* advance pos past a NUL terminator.
I've updated the comment to reflect the current behaviour.

Peter


More information about the reviews mailing list