[m-dev.] Unicode support in Mercury

Paul Bone pbone at csse.unimelb.edu.au
Tue May 10 12:02:22 AEST 2011


On Tue, May 10, 2011 at 10:16:28AM +1000, Peter Schachte wrote:
> On 09/05/11 22:11, Matt Giuca wrote:
> > Here he proposes three alternatives: #1 which is to leave the
> > interface as-is and let users deal with surrogate pairs. This is the
> > current position of Mercury (except in UTF-8 backends, Mercury lets
> > users deal with individual bytes rather than surrogate pairs). #2, as
> > I have been suggesting, which is to keen the UTF-16 backend but change
> > the interface to deal in characters/codepoints -- as Guido says this
> > is bad due to performance, and I'm backing down on this proposal now.
> > #3 which is to change the implementation to UTF-32 and have a
> > character/codepoint interface, which Guido calls "the ideal
> > situation".
> 
> What about a fourth option:  make the string index type an abstract type
> (stopping anyone from using +1 on it), and implement next_index to take a
> string and an index, and return a new index, etc.?  Keep the efficient
> implementation, but hide the index type.

I like this suggestion for the ease of programming with it and for keeping an
iteration over a string to constant time.

The tricky part is how to migrate users to this option, that's a hard problem
and not the sort of problem that I'm much good at solving.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/developers/attachments/20110510/a50509dd/attachment.sig>


More information about the developers mailing list