[m-rev.] for review: add some unicode support to Mercury
Ian MacLarty
maclarty at cs.mu.OZ.AU
Thu Jul 6 10:00:21 AEST 2006
On Wed, Jul 05, 2006 at 08:53:14AM +1000, Michael Day wrote:
>
> > Add the function utf8_length to the string module to determine the number
> > of unicode characters in a string.
>
> Might it not be more useful to add this to a separate utf8 module, that
> can also include functionality such as utf8_foldl that iterates over
> UNICODE codepoints rather than bytes, without having to clutter up the
> string module?
>
Perhaps it should be a `unicode' module (so it could be extended to
support encodings besides utf-8)?
Maybe something like:
:- module unicode.
:- interface.
:- type unicode_char. % == int?
:- typeclass unicode_string(String) where [
func length(String) = int,
func to_unicode_chars(String) = list(unicode_char),
func from_unicode_chars(list(unicode_char)) = String,
pred foldl(...
func foldl(...
pred foldr(...
func foldr(...
func map(...
pred write(Stream, String::in, io::di, io::uo) is det
<= output_stream(Stream),
func encoding_name(String) = string % for error messages maybe?
].
:- instance unicode_string(string). % Mercury strings are utf-8
% encoded.
:- func unicode_char_from_code(int) = unicode_char.
:- func unicode_char_to_code(unicode_char) = int.
This is a bigger change, so I propose to commit my changes to the lexer
so that we accept the new escape sequences (after review), but I'll
remove utf8_length/1 for now.
Ian.
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list