[m-rev.] for review: add string.word_wrap/2

Ian MacLarty maclarty at cs.mu.OZ.AU
Mon Mar 28 02:10:52 AEST 2005


On Mon, Mar 14, 2005 at 09:48:21AM +1100, Julien Fischer wrote:
> 
> On Sun, 13 Mar 2005, Ian MacLarty wrote:
> 
> > On Sun, Mar 13, 2005 at 03:41:23PM +1100, Julien Fischer wrote:
> >
> > Okay:
> > Index: NEWS
> > ===================================================================
> > RCS file: /home/mercury1/repository/mercury/NEWS,v
> > retrieving revision 1.375
> > diff -u -r1.375 NEWS
> > --- NEWS	25 Feb 2005 08:02:09 -0000	1.375
> > +++ NEWS	13 Mar 2005 08:36:56 -0000
> > @@ -1,3 +1,8 @@
> > +NEWS since Mercury release 0.12.0:
> > +----------------------------------
> 
> There cannot yet be any NEWS since release 0.12.0 because that's still
> in the future.  I suggest saying since the 0.12 fork.
> 

Okay.

> > > > +	% Wrapped is Str with newlines inserted between words so that at most
> > > > +	% N characters appear on a line and each line contains as many
> > > > +	% whole words as possible.  If any one word exceeds N characters in
> > > > +	% length then it will be broken over two (or more) lines.
> > > I think that you should be able to insert a hyphen between the two parts
> > > of the word in this case.  (Perhaps, make this an optional argument).
> >
> > I don't think this'll be particularly useful since words that are longer than
> > a line typically aren't real words anyway.
> 
> That rather depends on how large the line width is.  What if the user
> sets it to say 16 (which is sensible in some contexts, e.g a small text
> window) but still smaller than some real words.
> 
> > > > +		string.length(Str) =< N
> > > It may be worth keeping track of the length of the strings as you go,
> > > rather than recomputing it all the time.
> > >
> >
> > I doubt it, since it won't be very often that a word needs to be broken up over
> > multiple lines (with english text and a sensible line width at least).
> 
> Again, the notion of sensible line width is application dependent.  Since
> you are putting this in the standard library, as opposed to the debugger,
> I think that it should be as general as possible.
> 

Alright then, here is the new version of word_wrap with an optional word
seperator argument:

Index: library/string.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.m,v
retrieving revision 1.230
diff -u -r1.230 string.m
--- library/string.m	28 Feb 2005 03:39:38 -0000	1.230
+++ library/string.m	27 Mar 2005 16:02:19 -0000
+
+	% word_wrap(Str, N) = Wrapped.
+	% Wrapped is Str with newlines inserted between words so that at most 
+	% N characters appear on a line and each line contains as many
+	% whole words as possible.  If any one word exceeds N characters in 
+	% length then it will be broken over two (or more) lines. 
+ 	% Sequences of whitespace characters are replaced by a single space.
+	%
+:- func string__word_wrap(string, int) = string.
+	
+	% word_wrap(Str, N, WordSeperator) = Wrapped.
+	% word_wrap/3 is like word_wrap/2, except that words that need to be
+	% broken up over multiple lines have WordSeperator inserted between
+	% each piece.  If the length of WordSeperator is greater that or equal 
+	% to N, then no seperator is used.
+	%
+:- func string__word_wrap(string, int, string) = string.
+
 %-----------------------------------------------------------------------------%
 
 :- implementation.
@@ -4714,6 +4727,120 @@
 
 %-----------------------------------------------------------------------------%
 
+string__word_wrap(Str, N) = string__word_wrap(Str, N, "").
+
+string__word_wrap(Str, N, WordSep) = Wrapped :-
+	Words = string.words(char.is_whitespace, Str),
+	SepLen = string.length(WordSep),
+	( SepLen < N ->
+		string.word_wrap_2(Words, WordSep, SepLen, 1, N, [], Wrapped)
+	;
+		string.word_wrap_2(Words, "", 0, 1, N, [], Wrapped)
+	).
+
+:- pred word_wrap_2(list(string)::in, string::in, int::in, int::in, int::in, 
+	list(string)::in, string::out) is det.
+
+word_wrap_2([], _, _, _, _, RevStrs, 
+	string.join_list("", list.reverse(RevStrs))).
+
+	% Col is the column where the next character should be written if there
+	% is space for a whole word.
+word_wrap_2([Word | Words], WordSep, SepLen, Col, N, Prev, Wrapped) :-
+	WordLen = string.length(Word),
+	(
+		% We are on the first column and the length of the word
+		% is less than the line length.
+		Col = 1, WordLen < N
+	->
+		NewCol = Col + WordLen,
+		WrappedRev = [Word | Prev],
+		NewWords = Words
+	;
+		% The word takes up the whole line.
+		Col = 1, WordLen = N
+	->
+		%
+		% We only put a newline if there are more words to follow.
+		%
+		NewCol = 1,
+		( 
+			Words = [],
+			WrappedRev = [Word | Prev]
+		; 
+			Words = [_ | _],
+			WrappedRev = ["\n", Word | Prev]
+		),
+		NewWords = Words
+	;
+		% If we add a space and the current word to the line we'll
+		% still be within the line length limit.
+		Col + WordLen < N
+	->
+		NewCol = Col + WordLen + 1,
+		WrappedRev = [Word, " " | Prev],
+		NewWords = Words
+	;
+		% Adding the word and a space takes us to the end of the
+		% line exactly.
+		Col + WordLen = N
+	->
+		%
+		% We only put a newline if there are more words to follow.
+		%
+		NewCol = 1,
+		( 
+			Words = [],
+			WrappedRev = [Word, " " | Prev]
+		; 
+			Words = [_ | _],
+			WrappedRev = ["\n", Word, " " | Prev]
+		),
+		NewWords = Words
+	;
+		%
+		% Adding the word would take us over the line limit.
+		%
+		(
+			Col = 1
+		->
+			%
+			% Break up words that are too big to fit on a line.
+			%
+			RevPieces = break_up_string_reverse(Word, N - SepLen, 
+				[]),
+			(
+				RevPieces = [LastPiece | Rest]
+			;
+				RevPieces = [],
+				error("string__word_wrap_2: no pieces")
+			),
+			RestWithSep = list.map(func(S) = S ++ WordSep ++ "\n", 
+				Rest),
+			NewCol = 1,
+			WrappedRev = list.append(RestWithSep, Prev),
+			NewWords = [LastPiece | Words]
+		;
+			NewCol = 1,
+			WrappedRev = ["\n" | Prev],
+			NewWords = [Word | Words]
+		)
+	),
+	word_wrap_2(NewWords, WordSep, SepLen, NewCol, N, WrappedRev, Wrapped).
+
+:- func break_up_string_reverse(string, int, list(string)) = list(string).
+
+break_up_string_reverse(Str, N, Prev) = Strs :-
+	(
+		string.length(Str) =< N
+	->
+		Strs = [Str | Prev]
+	;
+		string.split(Str, N, Left, Right),
+		Strs = break_up_string_reverse(Right, N, [Left | Prev])
+	).
+
+%-----------------------------------------------------------------------------%
 :- end_module string.
 
 %------------------------------------------------------------------------------%
Index: tests/general/string_test.exp
===================================================================
RCS file: /home/mercury1/repository/tests/general/string_test.exp,v
retrieving revision 1.3
diff -u -r1.3 string_test.exp
--- tests/general/string_test.exp	4 Feb 2005 05:55:16 -0000	1.3
+++ tests/general/string_test.exp	27 Mar 2005 15:49:43 -0000
@@ -17,3 +17,71 @@
 aaa|1111111|  1,300,000.00
 b  |       |      9,999.00
 cc |    333|123,456,789.99
+
+Wrapped string:
+*aaaaaaaaa
+aaaaaaaaaa
+a* bbbbb
+bbb b
+ccccc c c
+c cccc c c
+c c ccccc
+ccc cccc c
+ccc ccc
+ccc
+*ddddddddd
+dddddddddd
+dddddddddd
+dddddddddd
+dddddddddd
+ddddd* eee
+Wrapped string with hyphens:
+*aaaaaaaa-
+aaaaaaaaa-
+aaa* bbbbb
+bbb b
+ccccc c c
+c cccc c c
+c c ccccc
+ccc cccc c
+ccc ccc
+ccc
+*dddddddd-
+ddddddddd-
+ddddddddd-
+ddddddddd-
+ddddddddd-
+ddddddddd-
+d* eee
+Wrapped string with dots:
+*a...
+aa...
+aa...
+a*
+bbbbb
+bbb b
+ccccc
+c c c
+cccc
+c c c
+c
+ccccc
+ccc
+cccc
+c ccc
+ccc
+ccc
+*d...
+dd...
+dd...
+dd...
+dd...
+dd...
+dd...
+d*
+eee
+Wrapped string where seperator is too long:
+wh
+at
+ev
+er
\ No newline at end of file
Index: tests/general/string_test.m
===================================================================
RCS file: /home/mercury1/repository/tests/general/string_test.m,v
retrieving revision 1.5
diff -u -r1.5 string_test.m
--- tests/general/string_test.m	4 Feb 2005 05:55:16 -0000	1.5
+++ tests/general/string_test.m	27 Mar 2005 15:48:09 -0000
@@ -62,6 +62,33 @@
 		right(["1111111", "", "333"]), right(["1,300,000.00", 
 		"9,999.00", "123,456,789.99"])], "|") ++ "\n" },
 	write_string(Table),
+	{ Wrapped = string.word_wrap("*aaaaaaaaaaaaaaaaaaaa*  bbbbb bbb  b\t"
+		++ " ccccc c c c   cccc c c c c ccccc ccc cccc c  ccc ccc ccc "
+		++ "*dddddddddddddddddddddddddddddddddddddddddddddddddddddd*"
+		++ "                                                    eee",
+		10) },
+	{ WrappedHyphen = 
+		string.word_wrap("*aaaaaaaaaaaaaaaaaaaa*  bbbbb bbb  b\t"
+		++ " ccccc c c c   cccc c c c c ccccc ccc cccc c  ccc ccc ccc "
+		++ "*dddddddddddddddddddddddddddddddddddddddddddddddddddddd*"
+		++ "                                                    eee",
+		10, "-") },
+	{ WrappedDots = 
+		string.word_wrap("*aaaaaa*  bbbbb bbb  b\t"
+		++ " ccccc c c c   cccc c c c c ccccc ccc cccc c  ccc ccc ccc "
+		++ "*dddddddddddddd*"
+		++ "                                                    eee",
+		5, "...") },
+	{ SepTooLong = 
+		string.word_wrap("whatever", 2, "...") },
+	write_string("\nWrapped string:\n"),
+	write_string(Wrapped),
+	write_string("\nWrapped string with hyphens:\n"),
+	write_string(WrappedHyphen),
+	write_string("\nWrapped string with dots:\n"),
+	write_string(WrappedDots),
+	write_string("\nWrapped string where seperator is too long:\n"),
+	write_string(SepTooLong),
 	[].

Ian.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list