[mercury-users] a question on efficient parsing of a file

Vladimir Gubarkov xonixx at gmail.com
Tue May 18 02:47:12 AEST 2010


Hi,

Imagine I have a long enough (to fit in memory) text file with regular data,
say, a lot of float numbers, divided by space.
Now I want to parse those to find the sum of all numbers. In prolog (namely,
SWI) I used 'phrase_from_file' predicate which allowed to parse file by DCG
in lazy manner (no need to read whole file to memory). If it's interesting I
used next code:


:- set_prolog_flag(float_format,'%.15g').

integer(I) -->
        digit(D0),
        digits(D),
        { number_chars(I, [D0|D])
        }.

digits([D|T]) -->
        digit(D), !,
        digits(T).
digits([]) -->
        [].

digit(D) -->
        [D],
        { code_type(D, digit)
        }.

float(F) -->
    ( "-", {Sign = -1}
    ; "", {Sign = 1}
    ), !,
    integer(N),
    ",",
    integer(D),
    {F is Sign * (N + D / 10^(ceiling(log10(D))))
    }.

sum(S, Total) -->
    float(F1), !,
    " ",
    { S1 is S + F1},
    sum(S1, Total).
sum(Total, Total) -->
    [].

go1 :-
    phrase_from_file(sum(0, S),'numbers_large.txt', [buffer_size(16384)]),
    writeln(S).


Now, for an excercise in mercury, I'm willing to write the mercury analog.
If I understand correctly there is no direct analog to 'phrase_from_file' in
mercury, am I right?

So, I decided to fake this by constructing some type like:

:- type parse_state ---> state(buffer_size, buffer, io.state).
:- type buffer_size == int.
:- type buffer == list(char).

and pass this aroung those predicates like

:- pred some_dcg_pred(some_term::out, parse_state::in, parse_state::out) is
semidet.

My thought was that I would take chars from 'buffer', and if it's empty ->
read 'buffer_size' chars from io.state.

But! It seems that io library provides no support for buffered reading Oo.
And without that, I guess, it'll be rather slow (reading 1 char at a time).
Interesting, that I've looked inside the source of io module and it
internally uses buffered reading, but predicates not exported to interface
of those.


Diar sirs, what could you recommend on writing efficient (as well as
elegant) analog to prolog code?


Sincerely yours,
Vladimir.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20100517/5c5973ef/attachment.html>


More information about the users mailing list