[m-rev.] diff: (2nd try) string splitting routines

Ralph Becket rafe at csse.unimelb.edu.au
Fri Feb 2 15:10:32 AEDT 2007
Previous message: [m-rev.] diff: (2nd try) string splitting routines
Next message: [m-rev.] diff: (2nd try) string splitting routines
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Only a few comments below, otherwise that looks good, thanks.

Ondrej Bojar, Friday,  2 February 2007:
> (Extra comment for Ralph: split_at_string is useful for splitting at 
> e.g. "<tab>".)
> 
> Estimated hours taken: 3
> 
> A few handy functions for splitting a string added.
> 
> library/string.m:
>     Added remove_suffix_if_present, split_at_separator, split_at_char,
>     split_at_string
> 
> tests/hard_coded/string_split.m:
>     A simple test of split_at_* functions.
> 
> tests/hard_coded/string_split.exp:
>     Expected results of the tests of split_at_* functions.
> 
> tests/hard_coded/string_various.m:
>     Created. Added testcase for remove_suffix_if_present.
> 
> tests/hard_coded/string_various.exp:
>     Created. Added results for remove_suffix_if_present.
> 
> 
> Index: library/string.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/string.m,v
> retrieving revision 1.254
> diff -u -r1.254 string.m
> --- library/string.m	18 Jan 2007 07:33:03 -0000	1.254
> +++ library/string.m	2 Feb 2007 03:26:14 -0000
> @@ -67,6 +67,11 @@
>      %
>  :- pred string.remove_suffix(string::in, string::in, string::out) is 
> semidet.
> 
> +    % string.remove_suffix_if_present(Suffix, String) returns `String' 
> minus
> +    % `Suffix' if `String' ends with `Suffix', `String' otherwise
> +    %
> +:- func string.remove_suffix_if_present(string, string) = string.
> +
>      % string.prefix(String, Prefix) is true iff Prefix is a prefix of 
> String.
>      % Same as string.append(Prefix, _, String).
>      %
> @@ -555,6 +560,8 @@
>      % string.words_separator(char.is_whitespace, " the cat  sat on the 
>  mat") =
>      %   ["the", "cat", "sat", "on", "the", "mat"]
>      %
> +    % Note the difference to string.split_at_separator
> +    %
>  :- func string.words_separator(pred(char), string) = list(string).
>  :- mode string.words_separator(pred(in) is semidet, in) = out is det.
> 
> @@ -563,6 +570,34 @@
>      %
>  :- func string.words(string) = list(string).
> 
> +    % string.split_at_separator(SepP, String) returns the list of
> +    % substrings of String (in first to last order) that are delimited
> +    % by chars matched by SepP. For example,
> +    %
> +    % string.split_at_separator(char.is_whitespace, " a cat  sat on the 
>  mat")
> +    %   = ["", "a", "cat", "", "sat", "on", "the", "", "mat"]
> +    %
> +    % Note the difference to string.words_separator
> +    %
> +:- func string.split_at_separator(pred(char), string) = list(string).
> +:- mode string.split_at_separator(pred(in) is semidet, in) = out is det.
> +
> +    % string.split_at_char(Char, String) =
> +    %     string.split_at_separator(unify(Char), String)
> +    %
> +:- func string.split_at_char(char, string) = list(string).
> +
> +    % string.split_at_string(Separator, String) returns the list of 
> substrings
> +    % of String that are delimited by Separator. For example,
> +    %
> +    % string.split_at_string("|||", "|||fld2|||fld3")
> +    %  = ["", "fld2", [fld3"]

Put the result on the same line as the function call.

> +    %
> +    % Always the first match of Separator is used to break the String, for
> +    % example: string.split_at_string("aa", "xaaayaaaz") = ["x", "ay", 
> "az"]
> +    %
> +:- func string.split_at_string(string, string) = list(string).
> +
>      % string.split(String, Count, LeftSubstring, RightSubstring):
>      % `LeftSubstring' is the left-most `Count' characters of `String',
>      % and `RightSubstring' is the remainder of `String'.
> @@ -969,6 +1004,17 @@
>      string.to_char_list(C, LC),
>      char_list_remove_suffix(LA, LB, LC).
> 
> +string.remove_suffix_if_present(Suffix, String) = Out :-
> +    string.length(String, Length),
> +    LeftCount = Length - length(Suffix),

Why not just this?

    LeftCount = length(String) - length(Suffix)

> +    string.split(String, LeftCount, LeftString, RightString),
> +    ( RightString = Suffix ->
> +        Out = LeftString
> +    ;
> +        Out = String
> +    ).
> +
> +
>  :- pragma promise_equivalent_clauses(string.prefix/2).
> 
>  string.prefix(String::in, Prefix::in) :-
> @@ -4108,6 +4154,56 @@
> 
> 
> %------------------------------------------------------------------------------%
> 
> +string.split_at_separator(DelimPred, InStr) = OutStrs :-
> +    Count = string.length(InStr),
> +    split_at_separator2(DelimPred, InStr, Count, Count, [], OutStrs).
> +
> +:- pred split_at_separator2(pred(char)::in(pred(in) is semidet), 
> string::in,
> +    int::in, int::in, list(string)::in, list(string)::out) is det.
> +split_at_separator2(DelimPred, Str, I, ThisSegEnd, ITail, OTail) :-
> +    % walk Str backwards extending accumulated list of chunks as chars
> +    % matching DelimPred are found

Comments should start with a capital letter and finish with a full stop.

> +    ( I < 0 -> % we're at the beginning

And here.

> +        ( ThisSegEnd<0 ->
> +            OTail = ["" | ITail]
> +        ;
> +            ThisSeg = string.unsafe_substring(Str, 0, ThisSegEnd+1),
> +            OTail = [ThisSeg | ITail]
> +        )
> +    ;
> +        C = string.unsafe_index(Str, I),
> +        ( DelimPred(C) -> % chop here

And here.

> +            ThisSeg = string.unsafe_substring(Str, I+1, ThisSegEnd-I),
> +            TTail = [ ThisSeg | ITail ],
> +            split_at_separator2(DelimPred, Str, I-1, I-1, TTail, OTail)
> +        ; % extend current segment

And here.

> +            split_at_separator2(DelimPred, Str, I-1, ThisSegEnd, ITail, 
> OTail)
> +        )
> +    ).
> +
> +%------------------------------------------------------------------------------%
> +
> +string.split_at_char(C, String) =
> +    string.split_at_separator(unify(C), String).
> +
> +%------------------------------------------------------------------------------%
> +
> +split_at_string(Needle, Total) =
> +    split_at_string(0, length(Needle), Needle, Total).
> +
> +:- func split_at_string(int, int, string, string) = list(string).
> +split_at_string(StartAt, NeedleLen, Needle, Total) = Out :-
> +    ( sub_string_search_start(Total, Needle, StartAt, NeedlePos) ->
> +        BeforeNeedle = substring(Total, StartAt, NeedlePos-StartAt),
> +        Tail = split_at_string(NeedlePos+NeedleLen, NeedleLen, Needle, 
> Total),
> +        Out = [BeforeNeedle | Tail]
> +    ;
> +        string__split(Total, StartAt, _skip, Last),
> +        Out = [Last]
> +    ).
> +
> +%------------------------------------------------------------------------------%
> +
>      % preceding_boundary(SepP, String, I) returns the largest index J =< I
>      % in String of the char that is SepP and min(-1, I) if there is no 
> such J.
>      % preceding_boundary/3 is intended for finding (in reverse) 
> consecutive
> Index: tests/hard_coded/string_split.exp
> ===================================================================
> RCS file: tests/hard_coded/string_split.exp
> diff -N tests/hard_coded/string_split.exp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ tests/hard_coded/string_split.exp	2 Feb 2007 03:00:02 -0000
> @@ -0,0 +1,6 @@
> +hello:world:how:are:you!
> +hello<tab>world<tab>how<tab>are<tab><tab>you!
> +user<tab>group<tab>id1<tab>id2
> +x<tab>ay<tab>az
> +x<tab>a <tab>aax <tab> x
> +col1<tab>col2:val2<tab>col3<tab>
> Index: tests/hard_coded/string_split.m
> ===================================================================
> RCS file: tests/hard_coded/string_split.m
> diff -N tests/hard_coded/string_split.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ tests/hard_coded/string_split.m	2 Feb 2007 02:36:14 -0000
> @@ -0,0 +1,36 @@
> +:- module string_split.
> +:- interface.
> +:- import_module io.
> +
> +:- pred main(io::di, io::uo) is det.
> +
> +:- implementation.
> +
> +:- import_module string, char.
> +
> +main(!IO) :-
> +  io__write_list(
> +    split_at_separator(char__is_upper, "helloXworldXhowXareYyou!"),
> +    ":", io__write_string, !IO),
> +  io__nl(!IO),
> +  io__write_list(
> +    split_at_separator(char__is_whitespace, "hello world\thow 
> are\t\tyou!"),
> +    "<tab>", io__write_string, !IO),
> +  io__nl(!IO),
> +  io__write_list(
> +    split_at_char(':', "user:group:id1:id2"),
> +    "<tab>", io__write_string, !IO),
> +  io__nl(!IO),
> +  io__write_list(
> +    split_at_string("aa", "xaaayaaaz"),
> +    "<tab>", io__write_string, !IO),
> +  io__nl(!IO),
> +  io__write_list(
> +    split_at_string("aaa", "xaaaa aaaaax aaa x"),
> +    "<tab>", io__write_string, !IO),
> +  io__nl(!IO),
> +  io__write_list(
> +    split_at_string(":::", "col1:::col2:val2:::col3:::"),
> +    "<tab>", io__write_string, !IO),
> +  io__nl(!IO),
> +  true.
> Index: tests/hard_coded/string_various.exp
> ===================================================================
> RCS file: tests/hard_coded/string_various.exp
> diff -N tests/hard_coded/string_various.exp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ tests/hard_coded/string_various.exp	2 Feb 2007 02:59:24 -0000
> @@ -0,0 +1,3 @@
> +myfile
> +myfile
> +myfile.gz
> Index: tests/hard_coded/string_various.m
> ===================================================================
> RCS file: tests/hard_coded/string_various.m
> diff -N tests/hard_coded/string_various.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ tests/hard_coded/string_various.m	2 Feb 2007 02:58:29 -0000
> @@ -0,0 +1,18 @@
> +:- module string_various.
> +:- interface.
> +:- import_module io.
> +
> +:- pred main(io::di, io::uo) is det.
> +
> +:- implementation.
> +
> +:- import_module string, char.
> +
> +main(!IO) :-
> +  io__write_string(remove_suffix_if_present(".gz", "myfile"), !IO),
> +  io__nl(!IO),
> +  io__write_string(remove_suffix_if_present(".gz", "myfile.gz"), !IO),
> +  io__nl(!IO),
> +  io__write_string(remove_suffix_if_present(".gz", "myfile.gz.gz"), !IO),
> +  io__nl(!IO),
> +  true.
> 
> -- 
> Ondrej Bojar (mailto:obo at cuni.cz)
> http://www.cuni.cz/~obo
> --------------------------------------------------------------------------
> mercury-reviews mailing list
> Post messages to:       mercury-reviews at csse.unimelb.edu.au
> Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
> Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
> --------------------------------------------------------------------------
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
Previous message: [m-rev.] diff: (2nd try) string splitting routines
Next message: [m-rev.] diff: (2nd try) string splitting routines
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list