[mercury-users] a question on efficient parsing of a file
Vladimir Gubarkov
xonixx at gmail.com
Wed May 19 19:21:11 AEST 2010
On Wed, May 19, 2010 at 6:16 AM, Ralph Becket <rafe at csse.unimelb.edu.au>wrote:
> Vladimir Gubarkov, Tuesday, 18 May 2010:
>
> And third - it operates over string and current position, and my
> > experiments seem to show that this way is not as fast as operating on
> > list of chars.
>
> I'm very surprised to hear that. parsing_utils reads chars directly
> from the string source, which is just a bounds test and a memory
> reference (see string.index). List creation and manipulation is
> typically much slower.
>
> Ok, maybe I'm wrong, I'll definitely look more deeply into parsing_utils
ones again. But what I've done was - writing next code to compare different
approaches:
:- module fast_string_test.
:- interface.
:- import_module io.
:- pred main(io, io).
:- mode main(di, uo) is det.
:- implementation.
:- import_module list, string, int, parsing_utils.
to_char_list_(Str) = ( first_char(Str, C, Rest) ->
[C | to_char_list_(Rest)]
; []
).
make_list(Len, C) = (if Len = 0 then [] else [C | make_list(Len-1, C)]).
make_long_string(Len, Char) = S :-
CharLst = make_list(Len, Char),
string.to_char_list(S, CharLst).
:- type pos == int.
:- type fast_string --->
fast_string(string, pos).
:- func to_fast_string(string) = fs.
:- func from_fast_string(fs) = string.
:- func fs_length(fs) = int.
:- pred front_char(string, fs, fs). % string of len 1
:- mode front_char(out, in, out) is semidet.
:- pred front(string, int, fs, fs).
:- mode front(out, in, in, out) is semidet.
:- type fs == fast_string.
to_fast_string(Str) = fast_string(Str, 0).
from_fast_string(FS @ fast_string(Str, Pos)) = string.unsafe_substring(Str,
Pos, fs_length(FS)).
fs_length(fast_string(Str,Pos)) = string.length(Str) - Pos.
front_char(C, !FS) :- front(C, 1, !FS).
front(FrontStr, Count, F0 @ fast_string(Str, Pos), fast_string(Str, NewPos))
:-
fs_length(F0) >= Count,
NewPos = Pos + Count,
FrontStr = string.unsafe_substring(Str, Pos, Count).
%%% fast_string tst
to_char_list_fs(FS) = ( front_char(C, FS, FS1) ->
[C | to_char_list_fs(FS1)]
; []
).
to_char_list_s_fs(Str) = Lst :-
Fs = to_fast_string(Str),
Lst = to_char_list_fs(Fs).
main(!IO) :-
N = 50000000,
S = make_long_string(N, 'A'),
print("made\n", !IO),
%CL = to_char_list_(S):list(character),
%CL = to_char_list_s_fs(S):list(string),
CL = string.to_char_list(S),
print("to_c_l_\n", !IO),
print(length(CL):int, !IO)
%print(from_fast_string(to_fast_string("12345")), !IO)
%print(to_char_list_s_fs("ABCDEFGHIJK"), !IO)
.
(compile with *--optimize-constructor-last-call --infer-all*)
I believe that method used by 'CL = to_char_list_s_fs(S):list(string)' is
similar to used in parsing_utils (or, maybe I'm wrong?).
But it appears, that turning string to list of chars in this way is a way
slowly then by built-in string.to_char_list
> HTH,
> -- Ralph
> --------------------------------------------------------------------------
> mercury-users mailing list
> Post messages to: mercury-users at csse.unimelb.edu.au
> Administrative Queries: owner-mercury-users at csse.unimelb.edu.au
> Subscriptions: mercury-users-request at csse.unimelb.edu.au
> --------------------------------------------------------------------------
>
Sincerely yours,
Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20100519/f2a06f50/attachment.html>
More information about the users
mailing list