[m-rev.] for review: improve unicode support

Julien Fischer juliensf at csse.unimelb.edu.au
Wed Apr 6 02:55:03 AEST 2011


On Mon, 4 Apr 2011, Peter Wang wrote:

> On 2011-04-01, Julien Fischer <juliensf at csse.unimelb.edu.au> wrote:
>>
>>> Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the
>>> internal string representation (depending on the backend).  User code may be
>>> written to those assumptions.  Other external encodings can be supported in
>>> the future by translating to/from Unicode internally.
>>>
>>> The `char' type now represents a Unicode code point.
>>>
>>> NOTE: questions about how to handle unpaired surrogate code points, etc. have
>>> been left for later.
>>
>> ...
>>> diff --git a/runtime/mercury_string.h b/runtime/mercury_string.h
>>> index b12afbe..0a99f2d 100644
>>> --- a/runtime/mercury_string.h
>>> +++ b/runtime/mercury_string.h
>>> @@ -15,24 +15,21 @@
>>> #include <stdarg.h>
>>>
>>> /*
>>> -** Mercury characters are given type `MR_Char', which is a typedef for `char'.
>>> -** But BEWARE: when stored in an MR_Integer, the value must be
>>> -** first cast to `MR_UnsignedChar'.
>>> -** Mercury strings are stored as pointers to '\0'-terminated arrays of MR_Char.
>>> +** Mercury characters (Unicode code points) are given type `MR_Char', which is
>>> +** a typedef for `int'.
>>
>> The assumption here being that int is a 32-bit quantity?  (That's almost
>> certainly true for anything Mercury currently runs on -- it's probably
>> worth documenting that assumption here though.)
>
> I changed the typedefs to MR_[u]int_least32_t.

For C99 compilers we could just define it as [u]int32_t.  (Apparently,
Microsoft have finally decided to provide stdint.h as of Visual Studio
2010.)  It's not so important for now though.

>> Otherwise, that looks fine -- please bootcheck it on a number of
>> platforms, in a number of grdes before committing though.
>
> Committed.  Phew.

Actually one small further request: could you please add a list
of what still needs to be implemented and / or is on the wishlist
for Unicode support to the TODO file.

Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list