[m-rev.] diff: string splitting routines to string.m

Ralph Becket rafe at csse.unimelb.edu.au
Fri Feb 2 11:13:24 AEDT 2007


Ondrej Bojar, Friday,  2 February 2007:
> (May I commit this one?)
> 
> Estimated hours taken: 1.5
> 
> A few handy functions for splitting a string added.
> 
> library/string.m:
>     Added chomp/2, split_at_separator, split_at_char, split_at_string
> 
> tests/hard_coded/string_split.m:
>     A simple test of split_at_* functions.
> 
> tests/hard_coded/string_split.exp:
>     Expected results of the tests of split_at_* functions.
> 
> tests/hard_coded/string_strip.m:
>     Added testcase for chomp/2
> 
> tests/hard_coded/string_strip.exp:
>     Added results for chomp/2
> 
> tests/hard_coded/string_strip.exp2:
>     Removed the alternative expected result. (Don't know how to regenerate
>     this one.)
> 
> Index: library/string.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/string.m,v
> retrieving revision 1.254
> diff -u -r1.254 string.m
> --- library/string.m	18 Jan 2007 07:33:03 -0000	1.254
> +++ library/string.m	1 Feb 2007 07:32:18 -0000
> @@ -367,6 +367,11 @@
>      %
>  :- func string.chomp(string) = string.
> 
> +    % string.chomp(Tail, String):
> +    % `String' minus `Tail' if `String' ends with `Tail', `String' 
> otherwise
> +    %
> +:- func string.chomp(string, string) = string.
> +

We already have the pred string.remove_suffix.  I would rather you added
a function version of string.remove_suffix (`chomp' isn't a good name,
even if it's used in millions of Perl scripts).

>      % string.lstrip(String):
>      % `String' minus any initial whitespace characters.
>      %
> @@ -555,6 +560,8 @@
>      % string.words_separator(char.is_whitespace, " the cat  sat on the 
>  mat") =
>      %   ["the", "cat", "sat", "on", "the", "mat"]
>      %
> +    % Note the difference to string.split_at_separator
> +    %
>  :- func string.words_separator(pred(char), string) = list(string).
>  :- mode string.words_separator(pred(in) is semidet, in) = out is det.
> 
> @@ -563,6 +570,33 @@
>      %
>  :- func string.words(string) = list(string).
> 
> +    % string.split_at_separator(SepP, String) returns the list of
> +    % substrings of String (in first to last order) that are delimited
> +    % by chars matched by SepP. For example,
> +    %
> +    % string.split_at_separator(char.is_whitespace, " the cat  sat on 
> the  mat")
> +    %   = ["", "the", "cat", "", "sat", "on", "the", "", "mat"]
> +    %
> +    % Note the difference to string.words_separator
> +    %
> +:- func string.split_at_separator(pred(char), string) = list(string).
> +:- mode string.split_at_separator(pred(in) is semidet, in) = out is det.

Is this generally useful enough to go in the string module?  (I have no
idea one way or the other.)

> +    % string.split_at_char(Char, String) returns the list of substrings
> +    % ("fields") of String as delimited by Char. For example,
> +    %
> +    % string.split_at_char('|', "|fld2|fld3") = ["", "fld2", [fld3"]
> +    %
> +:- func string.split_at_char(char, string) = list(string).

The documentation for this might be better written as

	% string.split_at_char(Char, String) =
	%       string.split_at_separator(unify(Char), String).

> +    % string.split_at_string(Separator, String) returns the list of 
> substrings
> +    % of String that are delimited by Separator. For example,
> +    %
> +    % string.split_at_string("|||", "|||fld2|||fld3")
> +    %  = ["", "fld2", [fld3"]
> +    %
> +:- func string.split_at_string(string, string) = list(string).

What does string.split_at_string("aaa", "xaaaa aaaaax aaa x" return?
Is this useful enough to go in string.m?

> %------------------------------------------------------------------------------%
> 
> +string.split_at_separator(DelimPred, InStr) = OutStrs :-
> +    Count = string.length(InStr),
> +    split_at_separator2(DelimPred, InStr, Count, Count, [], OutStrs).
> +
> +:- pred split_at_separator2(pred(char), string, int, int,
> +    list(string), list(string)).
> +:- mode split_at_separator2(pred(in) is semidet, in, in, in, in, out) 
> is det.

Single-mode predicates should use pred-mode syntax:

:- pred split_at_separator2(pred(char)::in(pred(in) is semidet), string::in,
	int::in, int::in, list(string)::in, list(string)::out) is det.


> +split_at_separator2(DelimPred, Str, I, ThisSegEnd, ITail, OTail) :-
> +    % walk Str backwards extending accumulated list of chunks as chars
> +    % matching DelimPred are found
> +    (
> +    if I < 0
> +    then % we're at the beginning

    if I < 0 then % we're at the beginning

> +        (
> +        if ThisSegEnd<0
> +        then OTail = ["" | ITail]
> +        else
> +            ThisSeg = string.unsafe_substring(Str, 0, ThisSegEnd+1),
> +            OTail = [ThisSeg | ITail]
> +        )

        ( if ThisSegEnd<0 then
	    OTail = ["" | ITail]
          else
            ThisSeg = string.unsafe_substring(Str, 0, ThisSegEnd+1),
            OTail = [ThisSeg | ITail]
        )

> +    else
> +        C = string.unsafe_index(Str, I),
> +        (
> +        if DelimPred(C)
> +        then % chop here
> +            ThisSeg = string.unsafe_substring(Str, I+1, ThisSegEnd-I),
> +            TTail = [ ThisSeg | ITail ],
> +            split_at_separator2(DelimPred, Str, I-1, I-1, TTail, OTail)
> +        else % extend current segment
> +            split_at_separator2(DelimPred, Str, I-1, ThisSegEnd, ITail, 
> OTail)
> +        )

Ditto with the formatting here.

In general you should adhere to the coding style used in the module you
are editing.

> +    ).
> +
> +%------------------------------------------------------------------------------%
> +
> +string.split_at_char(C, String)
> +    = string.split_at_separator((pred(X::in)is semidet:-X=C), String).

Put the `=' after the head, not before the result.

> +
> +%------------------------------------------------------------------------------%
> +
> +split_at_string(Needle, Total)
> +    = split_at_string(0, length(Needle), Needle, Total).

`='

> +
> +:- func split_at_string(int, int, string, string) = list(string).
> +split_at_string(StartAt, NeedleLen, Needle, Total) = Out :-
> +    if sub_string_search_start(Total, Needle, StartAt, NeedlePos)
> +    then
> +        BeforeNeedle = substring(Total, StartAt, NeedlePos-StartAt),
> +        Tail = split_at_string(NeedlePos+NeedleLen, NeedleLen, Needle, 
> Total),
> +        Out = [BeforeNeedle | Tail]
> +    else
> +        string__split(Total, StartAt, _skip, Last),
> +        Out = [Last].

if-then-else should be in parentheses and formatted in a standard way.

> +
> +%------------------------------------------------------------------------------%
> +
>      % preceding_boundary(SepP, String, I) returns the largest index J =< I
>      % in String of the char that is SepP and min(-1, I) if there is no 
> such J.
>      % preceding_boundary/3 is intended for finding (in reverse) 
> consecutive
> @@ -4154,6 +4244,13 @@
> 
> 
> %-----------------------------------------------------------------------------%
> 
> +chomp(Suffix, In) = Out :-
> +  if string__remove_suffix(In, Suffix, Prefix)
> +  then Out = Prefix
> +  else Out = In.

Ditto.

Cheers,
-- Ralph
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list