[m-rev.] for review: add some unicode support to Mercury
Peter Moulder
Peter.Moulder at infotech.monash.edu.au
Wed Jul 19 16:01:57 AEST 2006
On Wed, Jul 05, 2006 at 12:24:25AM +1000, Ian MacLarty wrote:
> +The sequence @samp{\x} introduces
> a hexadecimal escape; it must be followed by a sequence of hexadecimal
> digits and then a closing backslash. It is replaced
> with the character whose character code is identified by the hexadecimal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I suggest changing to `byte whose value', to clarify that e.g. \xa0\ is
replaced by just one byte rather than being equivalent to \u00a0.
> + % Return the number of unicode characters in a UTF-8 encoded string.
I suggest explicitly stating that the result is undefined (or
unspecified or similar) if the given string isn't valid utf-8. (The
existing documentation gives the false impression that it counts only
valid, complete unicode characters.)
> +++ tests/hard_coded/unicode.m 4 Jul 2006 10:01:15 -0000
...
> +utf8_strings = [
> + "\u0003",
> + "\U00000003",
> + "\u0394", % delta
> + "\u03A0", % pi
> + "\uFFFF",
> + "\U0010FFFF",
> + "\U000ABCDE",
> + "r\u00E9sum\u00E9", % "resume" with accents
> + "abc123"
> +].
It would be nice to add "\u005cu0041" as an example (0x5c = backslash),
and similarly "\x5c\u0041", "\x5c\\u0041", "\\u0041" and "u0041".
It would be good for some of these examples to use lowercase hex digits.
Otherwise looks fine to me.
pjrm.
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at csse.unimelb.edu.au
administrative address: owner-mercury-reviews at csse.unimelb.edu.au
unsubscribe: Address: mercury-reviews-request at csse.unimelb.edu.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at csse.unimelb.edu.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list