[mercury-users] io__read_line_as_string predicate problem

Ralph Becket rafe at cs.mu.OZ.AU
Mon Mar 24 10:17:38 AEDT 2003

Petr Nemec, Sunday, 23 March 2003:
> Hallo everybody,
>  I have a problem with the io__read_line_as_string predicate when
> reading a text file containing non ASCII characters. I am unable to
> read such line with this predicate, whereas io__read_line and
> consequential conversion via string__from_char_list works fine. Is
> there a way to avoid "touching" each character twice ? Time is my
> greatest enemy :).
> Thanks
>  Petr
> Nemec

By non-ASCII I presume you mean something like Unicode.

Under the C backends, strings are simply immutable arrays of char
(and a char is invariably 8-bits).  Moreover, Mercury strings are the
same as C strings, so an embedded NUL (char code 0) is assumed to mark
the end of a string.  (Whether or not this is a good thing has been
debated several times on the developers' list.)  The C-based
representation means that strings are not really vectors of (Mercury)
chars if you're using something like UTF.

io__read_line_as_string simply treats the file as a sequence of bytes
which it reads in and presents to the user as a string.  This raises
both problems to do with strings: an embedded NUL in the file will
make the string appear truncated to Mercury and, besides, you have to do
your own conversion from bytes to UTF (or whatever.)

The solution is to (a) write a predicate that reads in UTF characters
from the input stream and (b) store the results in an array(char) rather
than a string.

That said, I don't really understand why using string__from_char_list
after io__read_line should work if io__read_line_as_string doesn't.
What exactly goes wrong?

- Ralph
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe

More information about the users mailing list