[m-dev.] Unicode support in Mercury

Mark Brown mark at csse.unimelb.edu.au
Wed May 11 23:57:34 AEST 2011


On 11-May-2011, Ben Schmidt <schmidtb at student.unimelb.edu.au> wrote:
>> FWIW, I'm interpreting the topic as being a choice between two styles
>> of interface, array-like and stream-like.
>>      Array-like: you access any individual char by supplying an integer
>>      index from a contiguous range.
>>      Stream-like: you can access the first character or the next 
>> character,
>>      but no others.
> As a casual observer, it seems to me it's about consistency.
> At the moment you index by byte, but a character is returned (and
> multiple indices return the same character or some fail, because some
> bytes are shared by the a multi-byte representation of a single
> character). That is a bit crazy.
> You should index by byte and get a byte, or index by character and get a
> character.

The index is a bona fide character index.  This needs to be supported
with an interface and documented better, as per Peter Wang's earlier

Users need to ignore the fact that it happens to be currently implemented
as an int.  That's just for least backwards-incompatibility.

> But there shouldn't be a situation where the indexing
> and the return type are different; you should be able to always increase
> the index by 1 to get the next item until you run out. Nobody thinks of
> a string as a sparse array.

Why think of a string as an array at all, sparse or dense?  I think array
is the wrong concept, since it depends on fixed size data and in general
code points are not fixed size.


mercury-developers mailing list
Post messages to:       mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions:          mercury-developers-request at csse.unimelb.edu.au

More information about the developers mailing list