[mercury-users] a question on efficient parsing of a file
Ralph Becket
rafe at csse.unimelb.edu.au
Tue May 18 09:52:47 AEST 2010
Hi Vladimir,
I'd recommend doing something like this: reading your file one line at a
time with io.read_line_as_string then using the parsing_utils library to
extract the floats from each line:
io.read_line_as_string(Result, !IO),
(
Result = ok(String),
some [!PS] (
parsing_utils.new_src_and_ps(String, Src, PS),
( if
parsing_utils.zero_or_more(parsing_utils.float_literal,
Src, Xs, !PS),
parsing_utils.eof(Src, _, !PS)
then
... do something with Xs (a list of floats) ...
else
... report a syntax error ...
)
)
;
Result = eof
;
Result = error(ErrorCode),
... report the IO error ...
)
Hope this helps!
-- Ralph
Vladimir Gubarkov, Monday, 17 May 2010:
>
> Hi,
>
> Imagine I have a long enough (to fit in memory) text file with regular
> data, say, a lot of float numbers, divided by space.
>
> Now I want to parse those to find the sum of all numbers. In prolog
> (namely, SWI) I used 'phrase_from_file' predicate which allowed to
> parse file by DCG in lazy manner (no need to read whole file to
> memory). If it's interesting I used next code:
>
> :- set_prolog_flag(float_format,'%.15g').
> integer(I) -->
> digit(D0),
> digits(D),
> { number_chars(I, [D0|D])
> }.
> digits([D|T]) -->
> digit(D), !,
> digits(T).
> digits([]) -->
> [].
> digit(D) -->
> [D],
> { code_type(D, digit)
> }.
> float(F) -->
> ( "-", {Sign = -1}
> ; "", {Sign = 1}
> ), !,
> integer(N),
> ",",
> integer(D),
> {F is Sign * (N + D / 10^(ceiling(log10(D))))
> }.
> sum(S, Total) -->
> float(F1), !,
> " ",
> { S1 is S + F1},
> sum(S1, Total).
> sum(Total, Total) -->
> [].
> go1 :-
> phrase_from_file(sum(0, S),'numbers_large.txt',
> [buffer_size(16384)]),
> writeln(S).
>
> Now, for an excercise in mercury, I'm willing to write the mercury
> analog. If I understand correctly there is no direct analog to
> 'phrase_from_file' in mercury, am I right?
>
> So, I decided to fake this by constructing some type like:
>
> :- type parse_state ---> state(buffer_size, buffer, io.state).
> :- type buffer_size == int.
> :- type buffer == list(char).
>
> and pass this aroung those predicates like
>
> :- pred some_dcg_pred(some_term::out, parse_state::in,
> parse_state::out) is semidet.
>
> My thought was that I would take chars from 'buffer', and if it's
> empty -> read 'buffer_size' chars from io.state.
>
> But! It seems that io library provides no support for buffered reading
> Oo. And without that, I guess, it'll be rather slow (reading 1 char at
> a time). Interesting, that I've looked inside the source of io module
> and it internally uses buffered reading, but predicates not exported
> to interface of those.
>
> Diar sirs, what could you recommend on writing efficient (as well as
> elegant) analog to prolog code?
>
> Sincerely yours,
>
> Vladimir.
--------------------------------------------------------------------------
mercury-users mailing list
Post messages to: mercury-users at csse.unimelb.edu.au
Administrative Queries: owner-mercury-users at csse.unimelb.edu.au
Subscriptions: mercury-users-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the users
mailing list