[mercury-users] Re: [m-dev.] Character type class

Richard A. O'Keefe ok at hermes.otago.ac.nz
Tue Feb 1 10:05:15 AEDT 2000


	> In practical terms, Unicode supports all the major language groups.
	> FWIT, Unicode is *not* adequate for one of our clients - they want
	> to make sgml/xml databases of ancient documents including the ancient
	> Japanese script, Sanscrit, etc, etc, *none* of which are supported
	> by Unicode.
	
Unicode is not the international standard, ISO 10646 is.
Unicode is just the "Base Mode Plane" of ISO 10646.
[This isn't quite true; that's not what the Unicode consortium
 mean by "Unicode", but it's what most programmers mean by it.]

There's a reason why sizeof (wchar_t) is 4 under Solaris,
and that is that ISO 10646 is a 31-bit character set, not a 16-bit one.
XML and HTML 4 are defined in terms of ISO 10646, not just the BMP.

More precisely, Unicode contains 2048 so called "surrogate characters",
pairs of which are used to represent an additional million ISO 10646
characters (technically, an additional 917 504 characters).  Unicode
Version 3.0 (the current version, there've been about 10 revisions so
far) contains 57709 defined characters, so there is clearly a need for
some such escape mechanism.

By the time you have full support for the surrogate characters, you
might as well support ISO 10646.

On the other hand, I note that there are 6400 "private use"
characters reserved in Unicode.  Sanskrit, like other Indic scripts,
doesn't contain an outrageous number of characters.  There'd be room for
Sanskrit, ancient Egyptian, and several other scripts in the private use
area, no?

If not, then the Unicode site (www.unicode.org) is chock full of proposals
for one script or another, so it might be advisable to use those proposals.

Unicode 2.0 just had about thirty-nine thousand characters, so it's clear
that the 16-bit limit is close to being breached *in practice*.  Mercury
might as well imitate UNIX C and go all the way to 31 bits.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list