[mercury-users] read_line and lex (was: Many data elements using Mercury on cygwin?)

Ralph Becket rafe at cs.mu.OZ.AU
Fri Feb 7 09:51:08 AEDT 2003


Douglas Auclair, Friday,  7 February 2003:
> Dear Ralph,
> 
> Thank you for your response.
> 
> From: Ralph Becket (rafe at cs.mu.OZ.AU)
> >- you probably want to use read_line_as_string rather than read_line;
> 
> I prefer receiving the result as a list(char), not string, that's why I 
> used read_line/3.  list(char) simplifies the scanning task for me.

Well... I'd be tempted to argue that you might be suffering from overly
list-centric thinking!

There is a memory issue here.  On a 32 bit architecture, each char you
read in causes eight bytes of memory to be consumed (a list cons cell
contains four bytes of payload - of which only one is used for a char -
and four bytes for the tail pointer.)  So list(char) is eight times more
costly than string in space.

> >- many files can be read in their entirety and then processed (you'd  use 
> >read_file_as_string);
> 
> Okay, but these files are big:  one is 12 meg, another 2 meg, and three 
> others at 1 meg, 1/2 meg and 1/2 meg -- my point in using read_line/3 was 
> to collect only the subset of information I need.  Will the OS (cygwin) and 
> the Mercury runtime be able to handle all that?

Few machines these days would have a problem allocating a 12 MByte
string.

In terms of list-based allocation, that would expand to 96 MBytes of
allocated data (this is just in terms of allocation; if you do not have
GC turned on for some reason then this becomes a serious issue.)

> >- you should seriously consider using lex in extras: it would make  
> >reading and parsing your file a trivial matter.
> 
> Thanks.  I'll look into lex.  As I write scanners and parsers quite often, 
> this particular task was already trivial for me (each line follows the same 
> format, and each line had at most 15 columns, only 3 of which I care 
> about), but if lex is a time-saver I'll use that instead.  What about the 
> lexer module in the library, does it use lex, or are the two entirely 
> disjoint?  Is there some (additional)documentation for it?  (I don't have 
> extras in front of me, so I only see the lexer docs.)

Well, here's bit of code you could use that should be fairly economical:

	io.read_line_as_string(Result, !IO),
	(
		Result = ok(String),
		( if
			Words   = string.words(String),
			string.to_int(list.index1(Words, 1), Int1),
			string.to_int(list.index1(Words, 2), Int2),
			String3 = list.index1(Words, 3)
		  then
			... do something with Int1, Int2, String3 ...
		  else
			exception.throw("argh!")
		)
	;
		Result = eof,
		... finish up ...
	;
		Result = error(_),
		exception.throw(Result)
	)

Lex might shorten this a bit and possibly be a little more economical,
but for such a simple problem the above should work fine.

> Off topic:  are you going to write a tutorial on existential quantification 
> and typing like your excellent tutorial you have on Mercury in general? :-)

Better: I'm nearly finished the first draft of the Mercury book.  You
can download what's in the Mercury CVS repository under books/tutorial.

Cheers,
	Ralph
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list