[m-dev.] Unicode support in Mercury

Peter Wang novalazy at gmail.com
Mon May 9 21:56:41 AEST 2011


On 2011-05-09, Julien Fischer <juliensf at csse.unimelb.edu.au> wrote:
> 
> On Mon, 9 May 2011, Ian MacLarty wrote:
> 
> >2011/5/9 Matt Giuca <matt.giuca at gmail.com>:
> >>I feel like languages have two choices: either provide an 8-bit clean
> >>string type (e.g., C, Lua, Go, PHP, Ruby), or provide an abstract
> >>Unicode string type where the user doesn't need to be aware of the
> >>representation (e.g., Java, Python).
> >
> >I don't think this is true for Java.  The Java length method returns
> >the number of code units in the string, not the number of code points
> >(for that there is codePointCount).  Mercury's approach seems to me to
> >be the same as Java's, except that Java uses UTF16, making it less
> >likely for the length to return a different value from codePointCount.
> >See http://download.oracle.com/javase/6/docs/api/java/lang/String.html
> >and http://download.oracle.com/javase/6/docs/api/java/lang/Character.html#unicode.
> 
> Nor apparently in C#,
> <http://msdn.microsoft.com/en-us/library/system.string.length.aspx>

Yes, and Java and C# string index operators work with code units,
so you need to be aware of UTF-16 and surrogates.
Same goes for Python 3, by default.

Peter
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions:          mercury-developers-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the developers mailing list