[m-rev.] [slightly OT] charset characteristics (EBCDIC etc) (was Re: string__format
Peter Moulder
pmoulder at csse.monash.edu.au
Sat Nov 23 01:15:36 AEDT 2002
On Fri, Nov 22, 2002 at 10:58:48PM +1100, Fergus Henderson wrote:
> On 22-Nov-2002, Peter Moulder <pmoulder at csse.monash.edu.au> wrote:
> > If writing in C, the obvious code would be something like
> >
> > (val < 10
> > ? '0' + val
> > : 'a' - 10 + val)
>
> This assumes that the numeric codes for 'a'-'f' are contiguous,
> which is not guaranteed by the C standard, I'm pretty sure.
(FWIW, I too am almost certain it isn't guaranteed by the C standard.)
> I think it might fail for EBCDIC systems?
It's true that a-z is non-contiguous in EBCDIC, though they are
band-contigous: 0x81-89,91-99,a2-a9, and similarly 0xc1-c9 etc. for A-Z.
(Does anyone know why?) 0-9 are 0xf0-f9.
Note that the sort order of alphanumerics is 0-9A-Za-z in ascii, while
a-zA-Z0-9 in ebcdic. The space character (0x20 or 0x40) precedes
alphanumerics in each case.
There are a few variants of ebcdic in use, just as some early micros
used variants of ascii (and of course there are non-US ASCII's): using
the same codes for alphanumerics and other common characters, but
differing elsewhere. I believe the above paragraphs hold true for all
ebcdic variants, but I'm not sure about the following paragraph.
EBCDIC has a cent character and distinct split and unsplit vbars (|),
but it lacks I think [ and ]. ([ and ] among others are of course
missing from many non-US ASCII's as well.) I believe not all whitespace
characters have exact equivalents either (can anyone be more specific?).
The characters for which C provides trigraph sequences are
[ ] { } # \ ^ | ~
(from c89 section A12.1); presumably these form the set of C
characters that we cannot assume to exist in the user's character set.
On the original question of hex conversion: the quoted algorithm works
for all charsets I know of (not counting a 5-bit charset I once heard
of, where lowercase aren't directly representable), so I think in
practice it would be acceptable to assume that a-f (and similarly A-F
and 0-9) are contiguous and in order. Test cases can put the mind at
rest.
pjm.
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list