[m-dev.] Character type class
Thomas Conway
conway at cs.mu.OZ.AU
Thu Jan 27 09:14:27 AEDT 2000
On Wed, Jan 26, 2000 at 07:10:49PM EST, Michael Day wrote:
>
> > I am sure that a certain New Zealander with initials "R. A. O'K." will
> > want unicode handling, so I'll save time and say that now.
>
> Would it make sense for all character sets to be translatable into
> UNICODE? I understand that not everything maps to it yet? Any UNICODE
> experts like to chime in?
In my Real World (TM) job, I've been dealing with unicode issues.
There are three relevant standards:
- ISO <some number I can't remember> and Unicode in which
each "code" is a 16 bit number. There are some escapes
to allow more than 65536 distinct codes. So far, Unicode
has meanings associated with most of the codes, with some
reserved for application specific use. Unicode and the ISO
one both have 8 and 16 bit encodings called UTF8 and UTF16.
- ISO <some other number I can't remember> represents each
code with a 32 bit number. It also has escapes to enable
the representation of more than 2^32 codes. I'm not sure
what it currently maps, and what it doesn't.
In practical terms, Unicode supports all the major language groups.
FWIT, Unicode is *not* adequate for one of our clients - they want
to make sgml/xml databases of ancient documents including the ancient
Japanese script, Sanscrit, etc, etc, *none* of which are supported
by Unicode.
--
Thomas Conway )O+ Every sword has two edges.
Mercurian <conway at cs.mu.oz.au>
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list