[m-rev.] diff: string splitting routines to string.m
Ralph Becket
rafe at csse.unimelb.edu.au
Fri Feb 2 11:13:24 AEDT 2007
Ondrej Bojar, Friday, 2 February 2007:
> (May I commit this one?)
>
> Estimated hours taken: 1.5
>
> A few handy functions for splitting a string added.
>
> library/string.m:
> Added chomp/2, split_at_separator, split_at_char, split_at_string
>
> tests/hard_coded/string_split.m:
> A simple test of split_at_* functions.
>
> tests/hard_coded/string_split.exp:
> Expected results of the tests of split_at_* functions.
>
> tests/hard_coded/string_strip.m:
> Added testcase for chomp/2
>
> tests/hard_coded/string_strip.exp:
> Added results for chomp/2
>
> tests/hard_coded/string_strip.exp2:
> Removed the alternative expected result. (Don't know how to regenerate
> this one.)
>
> Index: library/string.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/string.m,v
> retrieving revision 1.254
> diff -u -r1.254 string.m
> --- library/string.m 18 Jan 2007 07:33:03 -0000 1.254
> +++ library/string.m 1 Feb 2007 07:32:18 -0000
> @@ -367,6 +367,11 @@
> %
> :- func string.chomp(string) = string.
>
> + % string.chomp(Tail, String):
> + % `String' minus `Tail' if `String' ends with `Tail', `String'
> otherwise
> + %
> +:- func string.chomp(string, string) = string.
> +
We already have the pred string.remove_suffix. I would rather you added
a function version of string.remove_suffix (`chomp' isn't a good name,
even if it's used in millions of Perl scripts).
> % string.lstrip(String):
> % `String' minus any initial whitespace characters.
> %
> @@ -555,6 +560,8 @@
> % string.words_separator(char.is_whitespace, " the cat sat on the
> mat") =
> % ["the", "cat", "sat", "on", "the", "mat"]
> %
> + % Note the difference to string.split_at_separator
> + %
> :- func string.words_separator(pred(char), string) = list(string).
> :- mode string.words_separator(pred(in) is semidet, in) = out is det.
>
> @@ -563,6 +570,33 @@
> %
> :- func string.words(string) = list(string).
>
> + % string.split_at_separator(SepP, String) returns the list of
> + % substrings of String (in first to last order) that are delimited
> + % by chars matched by SepP. For example,
> + %
> + % string.split_at_separator(char.is_whitespace, " the cat sat on
> the mat")
> + % = ["", "the", "cat", "", "sat", "on", "the", "", "mat"]
> + %
> + % Note the difference to string.words_separator
> + %
> +:- func string.split_at_separator(pred(char), string) = list(string).
> +:- mode string.split_at_separator(pred(in) is semidet, in) = out is det.
Is this generally useful enough to go in the string module? (I have no
idea one way or the other.)
> + % string.split_at_char(Char, String) returns the list of substrings
> + % ("fields") of String as delimited by Char. For example,
> + %
> + % string.split_at_char('|', "|fld2|fld3") = ["", "fld2", [fld3"]
> + %
> +:- func string.split_at_char(char, string) = list(string).
The documentation for this might be better written as
% string.split_at_char(Char, String) =
% string.split_at_separator(unify(Char), String).
> + % string.split_at_string(Separator, String) returns the list of
> substrings
> + % of String that are delimited by Separator. For example,
> + %
> + % string.split_at_string("|||", "|||fld2|||fld3")
> + % = ["", "fld2", [fld3"]
> + %
> +:- func string.split_at_string(string, string) = list(string).
What does string.split_at_string("aaa", "xaaaa aaaaax aaa x" return?
Is this useful enough to go in string.m?
> %------------------------------------------------------------------------------%
>
> +string.split_at_separator(DelimPred, InStr) = OutStrs :-
> + Count = string.length(InStr),
> + split_at_separator2(DelimPred, InStr, Count, Count, [], OutStrs).
> +
> +:- pred split_at_separator2(pred(char), string, int, int,
> + list(string), list(string)).
> +:- mode split_at_separator2(pred(in) is semidet, in, in, in, in, out)
> is det.
Single-mode predicates should use pred-mode syntax:
:- pred split_at_separator2(pred(char)::in(pred(in) is semidet), string::in,
int::in, int::in, list(string)::in, list(string)::out) is det.
> +split_at_separator2(DelimPred, Str, I, ThisSegEnd, ITail, OTail) :-
> + % walk Str backwards extending accumulated list of chunks as chars
> + % matching DelimPred are found
> + (
> + if I < 0
> + then % we're at the beginning
if I < 0 then % we're at the beginning
> + (
> + if ThisSegEnd<0
> + then OTail = ["" | ITail]
> + else
> + ThisSeg = string.unsafe_substring(Str, 0, ThisSegEnd+1),
> + OTail = [ThisSeg | ITail]
> + )
( if ThisSegEnd<0 then
OTail = ["" | ITail]
else
ThisSeg = string.unsafe_substring(Str, 0, ThisSegEnd+1),
OTail = [ThisSeg | ITail]
)
> + else
> + C = string.unsafe_index(Str, I),
> + (
> + if DelimPred(C)
> + then % chop here
> + ThisSeg = string.unsafe_substring(Str, I+1, ThisSegEnd-I),
> + TTail = [ ThisSeg | ITail ],
> + split_at_separator2(DelimPred, Str, I-1, I-1, TTail, OTail)
> + else % extend current segment
> + split_at_separator2(DelimPred, Str, I-1, ThisSegEnd, ITail,
> OTail)
> + )
Ditto with the formatting here.
In general you should adhere to the coding style used in the module you
are editing.
> + ).
> +
> +%------------------------------------------------------------------------------%
> +
> +string.split_at_char(C, String)
> + = string.split_at_separator((pred(X::in)is semidet:-X=C), String).
Put the `=' after the head, not before the result.
> +
> +%------------------------------------------------------------------------------%
> +
> +split_at_string(Needle, Total)
> + = split_at_string(0, length(Needle), Needle, Total).
`='
> +
> +:- func split_at_string(int, int, string, string) = list(string).
> +split_at_string(StartAt, NeedleLen, Needle, Total) = Out :-
> + if sub_string_search_start(Total, Needle, StartAt, NeedlePos)
> + then
> + BeforeNeedle = substring(Total, StartAt, NeedlePos-StartAt),
> + Tail = split_at_string(NeedlePos+NeedleLen, NeedleLen, Needle,
> Total),
> + Out = [BeforeNeedle | Tail]
> + else
> + string__split(Total, StartAt, _skip, Last),
> + Out = [Last].
if-then-else should be in parentheses and formatted in a standard way.
> +
> +%------------------------------------------------------------------------------%
> +
> % preceding_boundary(SepP, String, I) returns the largest index J =< I
> % in String of the char that is SepP and min(-1, I) if there is no
> such J.
> % preceding_boundary/3 is intended for finding (in reverse)
> consecutive
> @@ -4154,6 +4244,13 @@
>
>
> %-----------------------------------------------------------------------------%
>
> +chomp(Suffix, In) = Out :-
> + if string__remove_suffix(In, Suffix, Prefix)
> + then Out = Prefix
> + else Out = In.
Ditto.
Cheers,
-- Ralph
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list