[mercury-users] read_line and lex (was: Many data elements using Mercury on cygwin?)
Ralph Becket
rafe at cs.mu.OZ.AU
Fri Feb 7 09:51:08 AEDT 2003
Douglas Auclair, Friday, 7 February 2003:
> Dear Ralph,
>
> Thank you for your response.
>
> From: Ralph Becket (rafe at cs.mu.OZ.AU)
> >- you probably want to use read_line_as_string rather than read_line;
>
> I prefer receiving the result as a list(char), not string, that's why I
> used read_line/3. list(char) simplifies the scanning task for me.
Well... I'd be tempted to argue that you might be suffering from overly
list-centric thinking!
There is a memory issue here. On a 32 bit architecture, each char you
read in causes eight bytes of memory to be consumed (a list cons cell
contains four bytes of payload - of which only one is used for a char -
and four bytes for the tail pointer.) So list(char) is eight times more
costly than string in space.
> >- many files can be read in their entirety and then processed (you'd use
> >read_file_as_string);
>
> Okay, but these files are big: one is 12 meg, another 2 meg, and three
> others at 1 meg, 1/2 meg and 1/2 meg -- my point in using read_line/3 was
> to collect only the subset of information I need. Will the OS (cygwin) and
> the Mercury runtime be able to handle all that?
Few machines these days would have a problem allocating a 12 MByte
string.
In terms of list-based allocation, that would expand to 96 MBytes of
allocated data (this is just in terms of allocation; if you do not have
GC turned on for some reason then this becomes a serious issue.)
> >- you should seriously consider using lex in extras: it would make
> >reading and parsing your file a trivial matter.
>
> Thanks. I'll look into lex. As I write scanners and parsers quite often,
> this particular task was already trivial for me (each line follows the same
> format, and each line had at most 15 columns, only 3 of which I care
> about), but if lex is a time-saver I'll use that instead. What about the
> lexer module in the library, does it use lex, or are the two entirely
> disjoint? Is there some (additional)documentation for it? (I don't have
> extras in front of me, so I only see the lexer docs.)
Well, here's bit of code you could use that should be fairly economical:
io.read_line_as_string(Result, !IO),
(
Result = ok(String),
( if
Words = string.words(String),
string.to_int(list.index1(Words, 1), Int1),
string.to_int(list.index1(Words, 2), Int2),
String3 = list.index1(Words, 3)
then
... do something with Int1, Int2, String3 ...
else
exception.throw("argh!")
)
;
Result = eof,
... finish up ...
;
Result = error(_),
exception.throw(Result)
)
Lex might shorten this a bit and possibly be a little more economical,
but for such a simple problem the above should work fine.
> Off topic: are you going to write a tutorial on existential quantification
> and typing like your excellent tutorial you have on Mercury in general? :-)
Better: I'm nearly finished the first draft of the Mercury book. You
can download what's in the Mercury CVS repository under books/tutorial.
Cheers,
Ralph
--------------------------------------------------------------------------
mercury-users mailing list
post: mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the users
mailing list