[m-users.] Confused by action of string.prefix_length

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Jun 8 18:11:42 AEST 2022


2022-06-08 18:02 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
> Code unit corresponds to "byte" in the UTF-8 encoding, but it is not a
> synonym. It is standard Unicode terminology. Please take some time to
> read about Unicode encodings. Every programmer must know the basics,
> it's part of the landscape just as ASCII and 8-bit code pages was.

And while Mercury uses UTF-8 to represent strings when targeting C,
it uses UTF-16 when targeting C# or Java, since that is what those languages
use. In UTF-16, a code unit is NOT one byte. The documentation of
string operations must work for all of Mercury's target languages.
Talking about bytes in that documentation would prevent that.

Zoltan.


More information about the users mailing list