[m-rev.] for review: add some unicode support to Mercury

Ian MacLarty maclarty at csse.unimelb.edu.au
Fri Jul 21 15:32:21 AEST 2006


On Fri, Jul 21, 2006 at 02:29:51PM +1000, Julien Fischer wrote:
> 
> On Fri, 21 Jul 2006, Ian MacLarty wrote:
> 
> >On Wed, Jul 19, 2006 at 04:01:57PM +1000, Peter Moulder wrote:
> >>On Wed, Jul 05, 2006 at 12:24:25AM +1000, Ian MacLarty wrote:
> >>
> >>>+The sequence @samp{\x} introduces
> >>> a hexadecimal escape; it must be followed by a sequence of hexadecimal
> >>> digits and then a closing backslash.  It is replaced
> >>> with the character whose character code is identified by the hexadecimal
> >>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>I suggest changing to `byte whose value', to clarify that e.g. \xa0\ is
> >>replaced by just one byte rather than being equivalent to \u00a0.
> >>
> >>>+    % Return the number of unicode characters in a UTF-8 encoded string.
> >>
> >>I suggest explicitly stating that the result is undefined (or
> >>unspecified or similar) if the given string isn't valid utf-8.  (The
> >>existing documentation gives the false impression that it counts only
> >>valid, complete unicode characters.)
> >>
> >>>+++ tests/hard_coded/unicode.m	4 Jul 2006 10:01:15 -0000
> >>...
> >>>+utf8_strings = [
> >>>+	"\u0003",
> >>>+	"\U00000003",
> >>>+	"\u0394",  % delta
> >>>+	"\u03A0",  % pi
> >>>+	"\uFFFF",
> >>>+	"\U0010FFFF",
> >>>+	"\U000ABCDE",
> >>>+	"r\u00E9sum\u00E9", % "resume" with accents
> >>>+	"abc123"
> >>>+].
> >>
> >>It would be nice to add "\u005cu0041" as an example (0x5c = backslash),
> >>and similarly "\x5c\u0041", "\x5c\\u0041", "\\u0041" and "u0041".
> >>It would be good for some of these examples to use lowercase hex digits.
> >>
> >>Otherwise looks fine to me.
> >>
> >
> >Here's the new diff and CVS log (the interdiff is almost as big as the
> >diff, so I'm just posting the diff).
> >
> >I'll post the new unicode module as a separate change.
> >
> >Estimated hours taken: 6
> >Branches: main
> >
> >Add escape sequences for encoding unicode characters in Mercury string
> 
> A small point: shouldn't it be "Unicode" rather than "unicode" in the
> reference manual and other documentation.
> 

You're right.  I've made the change.

Ian.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at csse.unimelb.edu.au
administrative address: owner-mercury-reviews at csse.unimelb.edu.au
unsubscribe: Address: mercury-reviews-request at csse.unimelb.edu.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at csse.unimelb.edu.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list