[m-rev.] for review: add some unicode support to Mercury

Ian MacLarty maclarty at cs.mu.OZ.AU
Thu Jul 6 10:00:21 AEST 2006


On Wed, Jul 05, 2006 at 08:53:14AM +1000, Michael Day wrote:
> 
> > Add the function utf8_length to the string module to determine the number
> > of unicode characters in a string.
> 
> Might it not be more useful to add this to a separate utf8 module, that
> can also include functionality such as utf8_foldl that iterates over
> UNICODE codepoints rather than bytes, without having to clutter up the
> string module?
> 

Perhaps it should be a `unicode' module (so it could be extended to
support encodings besides utf-8)?

Maybe something like:

	:- module unicode.

	:- interface.

	:- type unicode_char. % == int?

	:- typeclass unicode_string(String) where [
		func length(String) = int,
		func to_unicode_chars(String) = list(unicode_char),
		func from_unicode_chars(list(unicode_char)) = String,
		pred foldl(...
		func foldl(...
		pred foldr(...
		func foldr(...
		func map(...
		pred write(Stream, String::in, io::di, io::uo) is det
			<= output_stream(Stream),
		func encoding_name(String) = string % for error messages maybe?
	].

	:- instance unicode_string(string).  	% Mercury strings are utf-8
						% encoded.

	:- func unicode_char_from_code(int) = unicode_char.

	:- func unicode_char_to_code(unicode_char) = int.

This is a bigger change, so I propose to commit my changes to the lexer
so that we accept the new escape sequences (after review), but I'll
remove utf8_length/1 for now.

Ian.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list