[m-rev.] for review: improve unicode support
Peter Wang
novalazy at gmail.com
Mon Apr 4 17:16:03 AEST 2011
On 2011-04-01, Julien Fischer <juliensf at csse.unimelb.edu.au> wrote:
>
> Hi Peter,
>
> Thanks for all your work on this!
>
> On Mon, 28 Mar 2011, Peter Wang wrote:
>
> >Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the
> >internal string representation (depending on the backend). User code may be
> >written to those assumptions. Other external encodings can be supported in
> >the future by translating to/from Unicode internally.
> >
> >The `char' type now represents a Unicode code point.
> >
> >NOTE: questions about how to handle unpaired surrogate code points, etc. have
> >been left for later.
>
> ...
>
> >diff --git a/runtime/mercury_string.c b/runtime/mercury_string.c
> >index 0f20219..e7ca64e 100644
> >--- a/runtime/mercury_string.c
> >+++ b/runtime/mercury_string.c
> >+
> >+size_t
> >+MR_utf8_encode(char s_[], int c)
> >+{
> >+ unsigned char *s = (unsigned char *)s_;
> >+ unsigned int uc = c;
>
> Shouldn't the second argument have type MR_Char?
> (And likewise in the function prototype for this.)
Fixed.
> >diff --git a/runtime/mercury_string.h b/runtime/mercury_string.h
> >index b12afbe..0a99f2d 100644
> >--- a/runtime/mercury_string.h
> >+++ b/runtime/mercury_string.h
> >@@ -15,24 +15,21 @@
> >#include <stdarg.h>
> >
> >/*
> >-** Mercury characters are given type `MR_Char', which is a typedef for `char'.
> >-** But BEWARE: when stored in an MR_Integer, the value must be
> >-** first cast to `MR_UnsignedChar'.
> >-** Mercury strings are stored as pointers to '\0'-terminated arrays of MR_Char.
> >+** Mercury characters (Unicode code points) are given type `MR_Char', which is
> >+** a typedef for `int'.
>
> The assumption here being that int is a 32-bit quantity? (That's almost
> certainly true for anything Mercury currently runs on -- it's probably
> worth documenting that assumption here though.)
I changed the typedefs to MR_[u]int_least32_t.
>
> In conclusion, a couple of things:
> - you should add an entry to the NEWS file concerning the addition of
> Unicode support
Done.
> - does tabling of characters work correctly?
Yes.
>
> Otherwise, that looks fine -- please bootcheck it on a number of
> platforms, in a number of grdes before committing though.
Committed. Phew.
Peter
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list