[mercury-users] a question on efficient parsing of a file

Ralph Becket rafe at csse.unimelb.edu.au
Tue May 18 09:52:47 AEST 2010


Hi Vladimir,

I'd recommend doing something like this: reading your file one line at a
time with io.read_line_as_string then using the parsing_utils library to
extract the floats from each line:

    io.read_line_as_string(Result, !IO),
    (
        Result = ok(String),
        some [!PS] (
            parsing_utils.new_src_and_ps(String, Src, PS),
            ( if
                parsing_utils.zero_or_more(parsing_utils.float_literal,
                    Src, Xs, !PS),
                parsing_utils.eof(Src, _, !PS)
              then
                ... do something with Xs (a list of floats) ...
              else
                ... report a syntax error ...
            )
        )
    ;
        Result = eof
    ;
        Result = error(ErrorCode),
        ... report the IO error ...
    )

Hope this helps!
-- Ralph

Vladimir Gubarkov, Monday, 17 May 2010:
> 
>    Hi,
> 
>    Imagine I have a long enough (to fit in memory) text file with regular
>    data, say, a lot of float numbers, divided by space.
> 
>    Now I want to parse those to find the sum of all numbers. In prolog
>    (namely, SWI) I used 'phrase_from_file' predicate which allowed to
>    parse file by DCG in lazy manner (no need to read whole file to
>    memory). If it's interesting I used next code:
> 
>    :- set_prolog_flag(float_format,'%.15g').
>    integer(I) -->
>            digit(D0),
>            digits(D),
>            { number_chars(I, [D0|D])
>            }.
>    digits([D|T]) -->
>            digit(D), !,
>            digits(T).
>    digits([]) -->
>            [].
>    digit(D) -->
>            [D],
>            { code_type(D, digit)
>            }.
>    float(F) -->
>        ( "-", {Sign = -1}
>        ; "", {Sign = 1}
>        ), !,
>        integer(N),
>        ",",
>        integer(D),
>        {F is Sign * (N + D / 10^(ceiling(log10(D))))
>        }.
>    sum(S, Total) -->
>        float(F1), !,
>        " ",
>        { S1 is S + F1},
>        sum(S1, Total).
>    sum(Total, Total) -->
>        [].
>    go1 :-
>        phrase_from_file(sum(0, S),'numbers_large.txt',
>    [buffer_size(16384)]),
>        writeln(S).
> 
>    Now, for an excercise in mercury, I'm willing to write the mercury
>    analog. If I understand correctly there is no direct analog to
>    'phrase_from_file' in mercury, am I right?
> 
>    So, I decided to fake this by constructing some type like:
> 
>    :- type parse_state ---> state(buffer_size, buffer, io.state).
>    :- type buffer_size == int.
>    :- type buffer == list(char).
> 
>    and pass this aroung those predicates like
> 
>    :- pred some_dcg_pred(some_term::out, parse_state::in,
>    parse_state::out) is semidet.
> 
>    My thought was that I would take chars from 'buffer', and if it's
>    empty -> read 'buffer_size' chars from io.state.
> 
>    But! It seems that io library provides no support for buffered reading
>    Oo. And without that, I guess, it'll be rather slow (reading 1 char at
>    a time). Interesting, that I've looked inside the source of io module
>    and it internally uses buffered reading, but predicates not exported
>    to interface of those.
> 
>    Diar sirs, what could you recommend on writing efficient (as well as
>    elegant) analog to prolog code?
> 
>    Sincerely yours,
> 
>    Vladimir.
--------------------------------------------------------------------------
mercury-users mailing list
Post messages to:       mercury-users at csse.unimelb.edu.au
Administrative Queries: owner-mercury-users at csse.unimelb.edu.au
Subscriptions:          mercury-users-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the users mailing list