[m-rev.] for review: add some unicode support to Mercury

Peter Moulder Peter.Moulder at infotech.monash.edu.au
Wed Jul 19 16:01:57 AEST 2006


On Wed, Jul 05, 2006 at 12:24:25AM +1000, Ian MacLarty wrote:

> +The sequence @samp{\x} introduces
>  a hexadecimal escape; it must be followed by a sequence of hexadecimal
>  digits and then a closing backslash.  It is replaced
>  with the character whose character code is identified by the hexadecimal
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I suggest changing to `byte whose value', to clarify that e.g. \xa0\ is
replaced by just one byte rather than being equivalent to \u00a0.

> +    % Return the number of unicode characters in a UTF-8 encoded string.

I suggest explicitly stating that the result is undefined (or
unspecified or similar) if the given string isn't valid utf-8.  (The
existing documentation gives the false impression that it counts only
valid, complete unicode characters.)

> +++ tests/hard_coded/unicode.m	4 Jul 2006 10:01:15 -0000
...
> +utf8_strings = [
> +	"\u0003",
> +	"\U00000003",
> +	"\u0394",  % delta
> +	"\u03A0",  % pi
> +	"\uFFFF",
> +	"\U0010FFFF",
> +	"\U000ABCDE",
> +	"r\u00E9sum\u00E9", % "resume" with accents
> +	"abc123"
> +].

It would be nice to add "\u005cu0041" as an example (0x5c = backslash),
and similarly "\x5c\u0041", "\x5c\\u0041", "\\u0041" and "u0041".
It would be good for some of these examples to use lowercase hex digits.

Otherwise looks fine to me.

pjrm.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at csse.unimelb.edu.au
administrative address: owner-mercury-reviews at csse.unimelb.edu.au
unsubscribe: Address: mercury-reviews-request at csse.unimelb.edu.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at csse.unimelb.edu.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list