[m-dev.] Character type class

Thomas Conway conway at cs.mu.OZ.AU
Thu Jan 27 09:14:27 AEDT 2000


On Wed, Jan 26, 2000 at 07:10:49PM EST, Michael Day wrote:
> 
> > I am sure that a certain New Zealander with initials "R. A. O'K." will
> > want unicode handling, so I'll save time and say that now.
> 
> Would it make sense for all character sets to be translatable into
> UNICODE? I understand that not everything maps to it yet? Any UNICODE
> experts like to chime in?

In my Real World (TM) job, I've been dealing with unicode issues.
There are three relevant standards:
	- ISO <some number I can't remember> and Unicode in which
	  each "code" is a 16 bit number. There are some escapes
	  to allow more than 65536 distinct codes. So far, Unicode
	  has meanings associated with most of the codes, with some
	  reserved for application specific use. Unicode and the ISO
	  one both have 8 and 16 bit encodings called UTF8 and UTF16.
	- ISO <some other number I can't remember> represents each
	  code with a 32 bit number. It also has escapes to enable
	  the representation of more than 2^32 codes. I'm not sure
	  what it currently maps, and what it doesn't.

In practical terms, Unicode supports all the major language groups.
FWIT, Unicode is *not* adequate for one of our clients - they want
to make sgml/xml databases of ancient documents including the ancient
Japanese script, Sanscrit, etc, etc, *none* of which are supported
by Unicode.

-- 
 Thomas Conway )O+     Every sword has two edges.
     Mercurian            <conway at cs.mu.oz.au>
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list