[m-rev.] [slightly OT] charset characteristics (EBCDIC etc) (was Re: string__format

Peter Moulder pmoulder at csse.monash.edu.au
Sat Nov 23 01:15:36 AEDT 2002


On Fri, Nov 22, 2002 at 10:58:48PM +1100, Fergus Henderson wrote:
> On 22-Nov-2002, Peter Moulder <pmoulder at csse.monash.edu.au> wrote:
> > If writing in C, the obvious code would be something like
> > 
> >   (val < 10
> >    ? '0' + val
> >    : 'a' - 10 + val)
> 
> This assumes that the numeric codes for 'a'-'f' are contiguous,
> which is not guaranteed by the C standard, I'm pretty sure.

(FWIW, I too am almost certain it isn't guaranteed by the C standard.)

> I think it might fail for EBCDIC systems?

It's true that a-z is non-contiguous in EBCDIC, though they are
band-contigous: 0x81-89,91-99,a2-a9, and similarly 0xc1-c9 etc. for A-Z.
(Does anyone know why?)  0-9 are 0xf0-f9.

Note that the sort order of alphanumerics is 0-9A-Za-z in ascii, while
a-zA-Z0-9 in ebcdic.  The space character (0x20 or 0x40) precedes
alphanumerics in each case.

There are a few variants of ebcdic in use, just as some early micros
used variants of ascii (and of course there are non-US ASCII's): using
the same codes for alphanumerics and other common characters, but
differing elsewhere.  I believe the above paragraphs hold true for all
ebcdic variants, but I'm not sure about the following paragraph.

EBCDIC has a cent character and distinct split and unsplit vbars (|),
but it lacks I think [ and ].  ([ and ] among others are of course
missing from many non-US ASCII's as well.)  I believe not all whitespace
characters have exact equivalents either (can anyone be more specific?).
The characters for which C provides trigraph sequences are
  [    ]    {    }    #    \    ^    |    ~
(from c89 section A12.1); presumably these form the set of C
characters that we cannot assume to exist in the user's character set.


On the original question of hex conversion: the quoted algorithm works
for all charsets I know of (not counting a 5-bit charset I once heard
of, where lowercase aren't directly representable), so I think in
practice it would be acceptable to assume that a-f (and similarly A-F
and 0-9) are contiguous and in order.  Test cases can put the mind at
rest.

pjm.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list