[mercury-users] a question on efficient parsing of a file

Vladimir Gubarkov xonixx at gmail.com
Wed May 19 19:21:11 AEST 2010


On Wed, May 19, 2010 at 6:16 AM, Ralph Becket <rafe at csse.unimelb.edu.au>wrote:

> Vladimir Gubarkov, Tuesday, 18 May 2010:
>

>    And third - it operates over string and current position, and my
> >    experiments seem to show that this way is not as fast as operating on
> >    list of chars.
>
> I'm very surprised to hear that.  parsing_utils reads chars directly
> from the string source, which is just a bounds test and a memory
> reference (see string.index).  List creation and manipulation is
> typically much slower.
>
> Ok, maybe I'm wrong, I'll definitely look more deeply into parsing_utils
ones again. But what I've done was - writing next code to compare different
approaches:

:- module fast_string_test.

:- interface.

:- import_module io.

:- pred main(io, io).
:- mode main(di, uo) is det.

:- implementation.

:- import_module list, string, int, parsing_utils.

to_char_list_(Str) = ( first_char(Str, C, Rest) ->
              [C | to_char_list_(Rest)]
            ; []
            ).

make_list(Len, C) = (if Len = 0 then [] else [C | make_list(Len-1, C)]).

make_long_string(Len, Char) = S :-
    CharLst = make_list(Len, Char),
    string.to_char_list(S, CharLst).


:- type pos == int.
:- type fast_string --->
    fast_string(string, pos).

:- func to_fast_string(string) = fs.
:- func from_fast_string(fs) = string.
:- func fs_length(fs) = int.

:- pred front_char(string, fs, fs). % string of len 1
:- mode front_char(out, in, out) is semidet.

:- pred front(string, int, fs, fs).
:- mode front(out, in, in, out) is semidet.

:- type fs == fast_string.

to_fast_string(Str) = fast_string(Str, 0).
from_fast_string(FS @ fast_string(Str, Pos)) = string.unsafe_substring(Str,
Pos, fs_length(FS)).

fs_length(fast_string(Str,Pos)) = string.length(Str) - Pos.


front_char(C, !FS) :- front(C, 1, !FS).

front(FrontStr, Count, F0 @ fast_string(Str, Pos), fast_string(Str, NewPos))
:-
    fs_length(F0) >= Count,
    NewPos = Pos + Count,
    FrontStr = string.unsafe_substring(Str, Pos, Count).

%%% fast_string tst

to_char_list_fs(FS) = ( front_char(C, FS, FS1) ->
              [C | to_char_list_fs(FS1)]
            ; []
            ).

to_char_list_s_fs(Str) = Lst :-
    Fs = to_fast_string(Str),
    Lst = to_char_list_fs(Fs).

main(!IO) :-

    N = 50000000,
    S = make_long_string(N, 'A'),
    print("made\n", !IO),
    %CL = to_char_list_(S):list(character),
    %CL = to_char_list_s_fs(S):list(string),
    CL = string.to_char_list(S),
    print("to_c_l_\n", !IO),
    print(length(CL):int, !IO)

    %print(from_fast_string(to_fast_string("12345")), !IO)
    %print(to_char_list_s_fs("ABCDEFGHIJK"), !IO)
    .

(compile with *--optimize-constructor-last-call --infer-all*)

I believe that method used by 'CL = to_char_list_s_fs(S):list(string)' is
similar to used in parsing_utils (or, maybe I'm wrong?).

But it appears, that turning string to list of chars in this way is a way
slowly then by built-in string.to_char_list


> HTH,
> -- Ralph
> --------------------------------------------------------------------------
> mercury-users mailing list
> Post messages to:       mercury-users at csse.unimelb.edu.au
> Administrative Queries: owner-mercury-users at csse.unimelb.edu.au
> Subscriptions:          mercury-users-request at csse.unimelb.edu.au
> --------------------------------------------------------------------------
>

Sincerely yours,
Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20100519/f2a06f50/attachment.html>


More information about the users mailing list