[m-rev.] Fixes to pprint and change for --dump-mlds

Mark Brown dougl at cs.mu.OZ.AU
Wed Oct 24 17:56:28 AEST 2001


On 24-Oct-2001, Ralph Becket <rafe at cs.mu.OZ.AU> wrote:
> Mark Brown, Wednesday, 24 October 2001:
> > This looks good.  I haven't finished the review yet, but here is a first
> > round of comments.
> > 
> > On 23-Oct-2001, Ralph Becket <rafe at cs.mu.OZ.AU> wrote:
> > > Estimated hours taken: 2.5
> > > Branches: main
> > > 
> > > Fixed a performance bug in pprint and a couple of formatting bugs.
> > 
> > There should be regression tests for the formatting bug fixes.  Does
> > this change affect any of the existing tests?  If so, the expected
> > output of the test may need to be updated.
> 
> There are currently no regression tests that involve pprint.  I'll
> sort some out (see comment below).
> 
> > >  best(W, K, X)           = Best
> > >  :-
> > > -    be(W, K, ["" - X], Best, []).
> > > +    be(W, K, ["" - X], [], RevBest),
> > > +    Best = list__reverse(RevBest).
> > 
> > This raises a (minor) issue though: is the reversed list still of type
> > simple_doc?  I don't think it should be, since it is intended to be
> > interpreted differently.  So you could define a rev_simple_doc type, or
> > just use list(string) in the type declarations of be/5 and push_string/3.
> > 
> > Also, the new list order should be documented, e.g. in the comment above
> > the code for be/5.
> 
> Done.
> 
> > This change is fine, provided you have tested that the performance is
> > acceptable.  Doing an MLDS dump of compiler/make_hlds.m might make a
> > good test case.
> 
> I tried that.  On ceres it took about 2-5mins per mlds_dump file (there
> are six), each of which contained more than 1.1million lines!  The
> compiler hit a peak memory usage of 450MBytes without thrashing.
> 
> I'd say that was acceptable.

Okay.

> 
> > Please address the above issues and post another diff, including test
> > cases for where formatting used to go wrong.  I'll review the remaining
> > changes after that (since I'll be able to review the formatting changes
> > more effectively if I review the test cases at the same time).
> 
> It would probably be more useful to put in current correct output tests.

I think we're talking about the same thing.  I want to see test cases
that would have failed before committing this change, but which now
produce correct output.

> I'd like to put that off until I've finished another change I'm working
> on to do with formatting infix operators.

Please save that for a separate change (which I'll also be happy to
review).  If a reviewer is reviewing a moving target, there is a
significant risk that a change will enter the diff without the reviewer
noticing, which means that that part of the change may go completely
unreviewed.  By all means start working on the next change while waiting
for review, if you like, but you should do so in a separate workspace
and merge the two together when the first change passes review and is
committed.  Although this will involve short term cost, there will be
long term benefits for reviewers and coders alike.

(If you have already started changing the current workspace, don't worry
too much about undoing those this time.  But please consider keeping
additional changes separate in future.)

> 
> - Ralph
> 
> 
> Index: pprint.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/pprint.m,v
> retrieving revision 1.5
> diff -u -r1.5 pprint.m
> --- pprint.m	26 Apr 2001 14:55:13 -0000	1.5
> +++ pprint.m	24 Oct 2001 04:31:03 -0000
> @@ -217,13 +217,13 @@
>  :- func braces(doc)                     = doc.
>  
>      % separated(PP, Sep, [X1,...,Xn]) =
> -    %   PP(X1) `<>` (Sep `<>` ... (Sep `<>` PP(Xn)) ... )
> +    %   PP(X1) `<>` group(Sep `<>` ... group(Sep `<>` PP(Xn)) ... )
>      %
>      % Note that if you want to pack as many things on one
>      % line as possible with some sort of separator, the
>      % following example illustrates a suitable idiom:
>      %
> -    %   separated(PP, group(comma_space_line), Xs)
> +    %   separated(PP, comma_space_line, Xs)
>      %
>  :- func separated(func(T) = doc, doc, list(T)) = doc.
>  

I didn't explicitly ask for it, but our convention is to post relative
diffs on the second and subsequent rounds of reviewing (you can produce
these from two existing diffs using 'interdiff').  Since this diff is
small, it's not a great problem, but in future it would be better to
post relative diffs so that the changes since the last round of reviewing
can be seen more easily.

> @@ -305,6 +305,9 @@
>  :- type simple_doc
>      ==      list(string).
>  
> +:- type rev_simple_doc
> +    ==      simple_doc.
> +
>  %------------------------------------------------------------------------------%
>  
>  nil                     = 'NIL'.
> @@ -352,7 +355,8 @@
>  
>  best(W, K, X)           = Best
>  :-
> -    be(W, K, ["" - X], Best, []).
> +    be(W, K, ["" - X], [], RevBest),
> +    Best = list__reverse(RevBest).
>  
>  %------------------------------------------------------------------------------%
>  
> @@ -362,18 +366,40 @@
>      % running times under a strict language.  The second important
>      % change is that be/5 is now a predicate that accumulates its output
>      % as a list of strings, doing away with the need for a more elaborate
> -    % simple_doc type.
> +    % simple_doc type.  The accumulated strings must be reversed to obtain
> +    % the printing order.
> +    %
> +    % W is the number of characters on a line.
> +    % K is the number of characters for output on the current line so far.
> +    % I is the current indentation string as affected by NEST and LABEL.
> +
> +:- pred be(int, int, list(pair(string, doc)), rev_simple_doc, rev_simple_doc).
> +:- mode be(in, in, in, in, out) is det.
> +
> +be(_, _, [])                      -->
> +    [].
> +
> +be(W, K, [_ - 'NIL'         | Z]) -->
> +    be(W, K, Z).
>  
> -:- pred be(int, int, list(pair(string, doc)), simple_doc, simple_doc).
> -:- mode be(in, in, in, out, in) is det.
> +be(W, K, [I - 'SEQ'(X, Y)   | Z]) -->
> +    be(W, K, [I - X, I - Y | Z]).
> +
> +be(W, K, [I - 'NEST'(J, X)  | Z]) -->
> +    be(W, K, [extend(I, J) - X | Z]).
> +
> +be(W, K, [I - 'LABEL'(L, X) | Z]) -->
> +    be(W, K, [(I ++ L) - X | Z]).
> +
> +be(W, K, [_ - 'TEXT'(S)     | Z]) -->
> +    push_string(S),
> +    be(W, (K + string__length(S)), Z).
> +
> +be(W, _, [I - 'LINE'        | Z]) -->
> +    push_string("\n"),
> +    push_string(I),
> +    be(W, string__length(I), Z).
>  
> -be(_, _, [], Out, In)             :-  Out = list__reverse(In).
> -be(W, K, [_ - 'NIL'         | Z]) --> be(W, K, Z).
> -be(W, K, [I - 'SEQ'(X, Y)   | Z]) --> be(W, K, [I - X, I - Y | Z]).
> -be(W, K, [I - 'NEST'(J, X)  | Z]) --> be(W, K, [extend(I, J) - X | Z]).
> -be(W, K, [I - 'LABEL'(L, X) | Z]) --> be(W, K, [string__append(I, L) - X | Z]).
> -be(W, K, [_ - 'TEXT'(S)     | Z]) --> [S], be(W, (K + string__length(S)), Z).
> -be(W, _, [I - 'LINE'        | Z]) --> ["\n", I], be(W, string__length(I), Z).
>  be(W, K, [I - 'GROUP'(X)    | Z]) -->
>      ( if { flattening_works(X, Z, W - K) } then
>          be(W, K, [I - flatten(X) | Z])
> @@ -381,13 +407,20 @@
>          be(W, K, [I - X | Z])
>      ).
>  
> +
> +
> +:- pred push_string(string, rev_simple_doc, rev_simple_doc).
> +:- mode push_string(in, in, out) is det.
> +
> +push_string(S, Ss, [S | Ss]).
> +
>  %------------------------------------------------------------------------------%
>  
>      % Decide whether flattening a given doc will allow it and
>      % up to the next possible 'LINE' in the following docs to
>      % fit on the remainder of the line.
>      %
> -    % XXX This solution is necessary to avoid crippling performance
> +    % This solution is necessary to avoid crippling performance
>      % problems on large terms.  A spot of laziness would do away
>      % with the need for the next three predicates.
>      %
> @@ -454,7 +487,7 @@
>  
>  :- func extend(string, int) = string.
>  
> -extend(I, J) = string__append(I, string__duplicate_char(' ', J)).
> +extend(I, J) = I ++ string__duplicate_char(' ', J).

Minor changes like this are good candidates to be committed separately
as a "trivial diff".  But we often bend the rules and piggy-back such
changes along with a larger review, so I'm happy for you to leave it in
here.

>  
>  %------------------------------------------------------------------------------%
>  
> @@ -473,7 +506,7 @@
>  
>  separated(PP, Sep, [X | Xs]) =
>      ( if Xs = [] then PP(X)
> -                 else PP(X) `<>` (Sep `<>` separated(PP, Sep, Xs))
> +                 else PP(X) `<>` group(Sep `<>` separated(PP, Sep, Xs))
>      ).
>  
>  %------------------------------------------------------------------------------%
> @@ -521,14 +554,15 @@
>              ( func(UnivArg) = to_doc(Depth - 1, univ_value(UnivArg)) ),
>              UnivArgs
>          ),
> -        Doc = text(Name) `<>`
> -            parentheses(
> +        Doc =
> +            text(Name) `<>` parentheses(
>                  group(
>                      nest(2,
> -                        line `<>` separated(id, comma_space_line, Args)
> +                        line `<>`
> +                        separated(id, comma_space_line, Args)
>                      )
>                  )
> -        )
> +            )
>      ).
>  
>  % ---------------------------------------------------------------------------- %
> @@ -618,7 +652,13 @@
>  :- func list_to_doc(int, list(T)) = doc.
>  
>  list_to_doc(Depth, Xs) =
> -    brackets(separated_to_depth(to_doc, group(comma_space_line), Depth, Xs)).
> +    brackets(
> +        nest(1,
> +            group(
> +                separated_to_depth(to_doc, comma_space_line, Depth, Xs)
> +            )
> +        )
> +    ).
>  
>  %------------------------------------------------------------------------------%
>  
> @@ -657,7 +697,7 @@
>          ( func(UnivArg) = to_doc(Depth - 1, univ_value(UnivArg)) ),
>          UnivArgs
>      ),
> -    Doc = braces(separated(id, group(comma_space_line), Args)).
> +    Doc = braces(separated(id, comma_space_line, Args)).
>  
>  %------------------------------------------------------------------------------%
>  
> @@ -676,16 +716,12 @@
>          Doc = PP(Depth, X)
>        else
>          Doc = PP(Depth, X) `<>`
> -              (Sep `<>` separated_to_depth(PP, Sep, Depth - 1, Xs))
> +              group(Sep `<>` separated_to_depth(PP, Sep, Depth - 1, Xs))
>      ).
>  
>  %------------------------------------------------------------------------------%
>  
>  word_wrapped(String) =
> -    separated(
> -        text,
> -        group(space_line),
> -        string__words(char__is_whitespace, String)
> -    ).
> +    separated(text, space_line, string__words(char__is_whitespace, String)).
>  
>  %------------------------------------------------------------------------------%
> 

I am happy with the part of the change which fixes the performance bug.
The rest of the change I'll review when test cases are available.

Cheers,
Mark.

--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list