FW: [mercury-users] Records

Fergus Henderson fjh at cs.mu.OZ.AU
Wed Nov 10 10:45:41 AEDT 1999


On 10-Nov-1999, Richard A. O'Keefe <ok at hermes.otago.ac.nz> wrote:
>  - Java, ANSI C++, and C9x all allow Unicode characters in comments,
>    (String,wchar_t*,wchar_t*) strings, *and in identifiers*, using the
>    same syntax.

In ANSI/ISO C++ and C99 (note x=9 ;-), the `wchar_t' type
*may* be Unicode, but it is not required to be.

It would not be that hard to add `wchar' and `wstring' types
to the Mercury standard library, if there was a big demand for it.
But this type would then be dependent on the implementation
(i.e. typically on the underlying C implementation).

Alternatively it would be possible to change the representation of the
`string' and `char' types in the Mercury implementation so that they
matched C's `wchar_t' type.  However, that would be more work, and it
would also cause some backwards compatibility problems for programs
passing strings across the C interface.


It also would not be that hard to allow C++/C99-style UCN
(universal character name) `\u....' and `\U........' escape sequences
in Mercury identifiers, if there was a big demand for it.  We could
also build tools to convert programs in \u...\U........ syntax to/from
other representations (such as UTF-8).

However, I'm not sure that this would be worthwhile.  The introduction
of UCNs in C was controversial.  Paul Eggert, whose opinion I have a
high respect for, has described UCNs as being of interest to language
lawyers, and a source of make-work for implementors, but unlikely to be
used much in practice.  Here's a quote from one article that he recently
wrote in comp.std.c:

| The problem of writing internationalized code is a large one, and UCNs
| attack only a tiny part of it.  They don't solve the overall problem,
| nor do they pretend to.
| 
| Anybody who regularly writes and deploys applications that _do_ solve
| the overall problem necessarily uses technologies that are more
| powerful than UCNs.  UCNs will displace these technologies only if
| they offer compelling advantages.  But UCNs have no real advantages.
| 
| On the contrary.  UCNs don't work across a wide spectrum of programming
| languages.  They are unreadable without special tools that (by and
| large) don't exist.  And they require Unicode support, perhaps even at
| run-time for practical implementations.  All of these are serious
| technical drawbacks that should be immediately obvious to anyone with
| experience in developing internationalized code.  The UCN-related
| ambiguities and undefined behaviors in C99 are additional red flags.


Anyway, even if were to add UCN support to Mercury, for the core
language and for the identifiers in the standard library I think we
should stick with just the printable ASCII characters that appear on
standard keyboards.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list