[m-dev.] Contribution: Pretty Printing Library

Mark Anthony BROWN dougl at cs.mu.OZ.AU
Wed Apr 12 14:00:29 AEST 2000
Previous message: [m-dev.] Contribution: Pretty Printing Library
Next message: [m-dev.] Contribution: Pretty Printing Library
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,

Here's my first round of review comments.  I still have a few
things to look into...There'll be more to come.

Cheers,
Mark.

Ralph Becket writes:
> 
> Okay, I've attached the latest version of pprint.m and a couple of
> example files for you to have a look at.  pprint hasn't changed
> much; the main thing is that I've added pprint__to_doc/[1,2]
> which will convert arbitrary objects into docs for pretty printing
> (the two argument version takes a `depth' parameter which specifies
> how far into a structure to go before just printing `...')  example.m
> shows off to_doc/[1,2] at work.  I've also added a few words of
> explanation at the head of the file, although I think it'd be better
> to expand the explanation of the basic doc constructors instead.

Either is fine by me.

> 
> Cheers,
> 
> Ralph
> 

> % ---------------------------------------------------------------------------- %
> % pprint.m
> % Ralph Becket <rbeck at microsoft.com>
> % Wed Mar 22 17:44:32  2000
> % vi: ts=4 sw=4 et tw=0 wm=0
> %
> % This file may only be copied under the terms of the GNU Library General
> % Public License - see the file COPYING.LIB in the Mercury distribution.
> %
> %
> % ABOUT
> %
> % This is pretty much a direct transliteration of Philip Wadler's
> % Haskell pretty printer described in "A Prettier Printer", available
> % at http://cm.bell-labs.com/cm/cs/who/wadler/topics/recent.html
> %
> % Wadler's approach has three main advantages:
> % 1. the layout algebra is small and quite intuitive (more so than Hughes');
> % 2. the pretty printer is optimal in the sense that it will never generate
> %    output that over-runs the specified width unless that is unavoidable; and
> % 3. the pretty printer is bounded in that it never needs to look more than
> %    k characters ahead to make a formatting decision (although see the XXX
> %    comment below).
> %
> % I have made two small changes: rather than having group/1 as a non-primitive
> % function (for allowing line-breaks to be converted into spaces at the pretty
> % printer's discretion) over docs, I have extended the doc type to include a
> % 'GROUP' constructor and altered flatten/1 and be/3 appropriately.  because

s/because/Because/

> % 'UNION' only arises as a consequence of processing a 'GROUP' it turns out to
> % be simpler to do away with 'UNION' altogether and convert clauses that
> % process 'UNION' terms to processing 'GROUP's.
> %

The change is OK so far as correctness is concerned, but
is there any chance that 'UNION's may be wanted for a non-'GROUP'
purpose in future?  If so, this decision may need to be reversed.
However, it's probably alright to leave it as it is for now, since
doc is an abstract type.

> % The second change is that flattened `line' breaks become empty strings
> % rather than spaces.
> %

This is a good decision, IMHO.  But it might be worth giving some
more detail here about the ramifications, eg, needing to put a
space at the end of a line even if it is not required by the unflattened
version.

> % I have also added several obvious general purpose formatting functions.
> %
> %
> % USAGE
> %
> % There are two stages in pretty printing an object of some type T:
> % 1. convert the object to a pprint__doc using some specially written
> %    function;

Or you could use one of the generic to_doc functions.  ;-)

> % 2. call pprint__write/[4,5] or pprint__to_string/2 passing the display
> %    width and the doc.
> %
> % pprint supplies a number of primitive doc constructor functions and a
> % small number of useful, common compound constructions.  Here I just
> % describe the primitives:
> %
> % nil
> %   the empty document, corresponding to the null string;
> %
> % text(string)
> %   the document consisting of a single string;
> %
> % poly(string__poly_type)
> %   the document consisting of a single string derived from a
> %   string__poly_type;
> %
> % doc `<>` doc
> %   the composition of two docs, the first followed by the second with no
> %   intervening space;
> %
> % line
> %   a new-line which may (in a group context, see below) be flattened out
> %   by the pretty printer into an empty string;
> %
> % nest(int, doc)
> %   any `line' docs in doc that are not flattened out by the pretty printer
> %   are followed by the given number of spaces (nested `nest's add up);
> %
> % group(doc)
> %   allow flattening of the doc at the pretty printer's discretion.  A group
> %   doc is a decision point for the pretty printer - if it can flatten the
> %   doc without overrunning it will do so, otherwise no flattening occurs at
> %   this level (note that the argument doc may then contain other group docs
> %   for the pretty printer to consider).
> %
> %
> % EXAMPLES
> %
> % Below are some docs followed by the ways they might be displayed by the
> % pretty printer.

I'd append ", with various line widths" to that sentence, or words
to that effect.

> %
> % 1. text("Hello ") `<>` line `<>` text("world")
> % 
> %   Hello
> %   world
> %
> % 2. group(text("Hello ") `<>` line `<>` text("world"))
> %
> %   Hello world
> %
> %   Hello
> %   world
> %
> % 3. group(text("Hello ") `<>` nest(2, line `<>` text("world")))
> %
> %   Hello world
> %
> %   Hello
> %     world
> %
> % 4. group(
> %   text("Goodbye ") `<>`
> %   nest(2, line `<>` text("cruel ") `<>` line `<>` text("world")
> % )
> %
> %   Goodbye cruel world
> %
> %   Goodbye
> %     cruel
> %     world
> %
> % 5. group(
> %   text("Goodbye ") `<>`
> %   nest(2, line `<>` group(text("cruel ") `<>` line `<>` text("world")))
> % )
> %
> %   Goodbye cruel world
> %
> %   Goodbye
> %     cruel world
> %
> %   Goodbye
> %     cruel
> %     world
> %
> % ---------------------------------------------------------------------------- %
> 
> :- module pprint.
> 
> :- interface.
> 
> :- import_module int, string, list, io.
> 
>     % Clients must translate data structures into docs for
>     % the pretty printer to display.
>     %
> :- type doc.
> 
> :- func nil                 = doc.  % Empty document.
> :- func doc `<>` doc        = doc.  % Composition of two docs.
> :- func nest(int, doc)      = doc.  % Follow newlines in doc with indentation.
> :- func text(string)        = doc.  % Convert a string into a doc.
> :- func line                = doc.  % New-line (unless flattened out).
> :- func poly(string__poly_type) = doc. % Convert an [sifc](_) into a string.
> :- func group(doc)          = doc.  % Allow a document to be flattened.
> 
>     % A word of explanation about group/1.  `group(Doc)' gives the pretty
>     % printer the option of fitting Doc onto one line if possible by removing
>     % all the new-lines and nest-spacing from it; if this doesn't work then
>     % the pretty printer will just interpret Doc literally (of course, Doc
>     % may contain other `group'ed items.)

This comment is probably redundant, now that the primitives are
described in the header.  Here you may wish to refer readers to that
documentation, though.

(Alternatively, you could move the header documentation to here.)

> 
> :- func doc `</>` doc       = doc.  % Shorthand for `doc `<>` line `<>` doc'.
> 
>     % Various bracketing functions.
>     %
> :- func bracketed(string, string, doc)  = doc.
> :- func parentheses(doc)                = doc.
> :- func brackets(doc)                   = doc.
> :- func braces(doc)                     = doc.
> 
>     % separated(PP, Sep, [X1,...,Xn]) = PP(X1) `<>` Sep `<>` ... Sep `<>` PP(Xn)
>     %
> :- func separated(func(T) = doc, doc, list(T)) = doc.
> 
>     % Handy punctuation docs and versions with following line breaks.
>     %
> :- func comma               = doc.
> :- func semic               = doc.
> :- func space               = doc.
> :- func comma_space         = doc.
> :- func semic_space         = doc.
> :- func comma_line          = doc.
> :- func semic_line          = doc.
> :- func space_line          = doc.
> :- func comma_space_line    = doc.
> :- func semic_space_line    = doc.
> 
>     % E.g. If one wanted to pretty print a Mercury list then one might write
>     %
>     %   brackets(nest(2, separated(MyPP, comma_space_line, MyList)))
>     %
>     % where `MyPP' is a function from MyList members to docs.
> 
>     % Convert arbitrary types to docs.  This requires std_util__functor/3
>     % to work on all components of the object being converted.  The second
>     % version places a maximum depth on terms which are otherwise truncated
>     % to `...'.
>     %
>     % XXX This throws an exception if the term in question has user-defined
>     % equality.

Actually, it causes a fatal error in the Mercury runtime system,
so there is not much you can do about it at this stage.  In any
case, I wouldn't worry about that too much, since that is also what
io__write, io__print, etc, do.

In general, if a term is to be printed out (using io__write,
pprint__write, etc) then it needs to have a canonical representation,
or else the procedure needs to be cc_multi.  In this context, the best
way for a user to provide a canonical representation for a type with
user-defined equality is to do this as part of a customised document
generator, IMHO.

So my suggestion for what to do here is to change the last sentence to:

     % This causes a runtime abort if the term in question has user-defined
     % equality.

I think this doesn't need an XXX, since it is reasonable to make this
actually part of the specification, not just an unimplemented feature.

>     %
> :- func to_doc(T)           = doc.
> :- func to_doc(int, T)      = doc.
> 
>     % Convert docs to pretty printed strings.
>     % The int argument specifies a page width.

s/page width/line width/

It may not be headed for the printer.  ;-)

>     %
> :- func to_string(int, doc) = string.
> 
>     % Write docs out in pretty printed format.
>     % The int argument specifies a page width.
>     %

Ditto.

> :- pred write(int, doc, io__state, io__state).
> :- mode write(in, in, di, uo) is det.
> 
> :- pred write(io__output_stream, int, doc, io__state, io__state).
> :- mode write(in, in, in, di, uo) is det.
> 
> % ---------------------------------------------------------------------------- %
> % ---------------------------------------------------------------------------- %
> 
> :- implementation.

The implementation looks fine.

> 
> :- import_module std_util, bool.
> 
> :- type doc
>     --->    'NIL'
>     ;       'SEQ'(doc, doc)
>     ;       'NEST'(int, doc)          
>     ;       'TEXT'(string)
>     ;       'LINE'
>     ;       'GROUP'(doc).
> 
> :- type simple_doc
>     --->    nil
>     ;       string `text` simple_doc
>     ;       int `line` simple_doc.
> 
> % ---------------------------------------------------------------------------- %
> 
> nil                     = 'NIL'.
> X `<>` Y                = 'SEQ'(X, Y).
> nest(I, X)              = 'NEST'(I, X).
> text(S)                 = 'TEXT'(S).
> line                    = 'LINE'.
> group(X)                = 'GROUP'(X).
> 
> poly(s(S))              = text(string__format("%s", [s(S)])).
> poly(c(C))              = text(string__format("%c", [c(C)])).
> poly(i(I))              = text(string__format("%d", [i(I)])).
> poly(f(F))              = text(string__format("%f", [f(F)])).
> 
> % ---------------------------------------------------------------------------- %
> 
> to_string(W, X) = S :-
>     pretty(pred(H::in, T::in, [H | T]::out) is det, W, X, [], Ss),
>     S = string__append_list(list__reverse(Ss)).
> 
> write(W, X)             --> pretty(io__write_string, W, X).
> 
> write(Stream, W, X)     --> pretty(io__write_string(Stream), W, X).
> 
> % ---------------------------------------------------------------------------- %
> 
> :- pred pretty(pred(string, T, T), int, doc, T, T).
> :- mode pretty(pred(in, in, out) is det, in, in, in, out) is det.
> :- mode pretty(pred(in, di, uo) is det, in, in, di, uo) is det.
> 
> pretty(P, W, X)         --> layout(P, best(W, 0, X)).
> 
> % ---------------------------------------------------------------------------- %
> 
> :- pred layout(pred(string, T, T), simple_doc, T, T).
> :- mode layout(pred(in, in, out) is det, in, in, out) is det.
> :- mode layout(pred(in, di, uo) is det, in, di, uo) is det.
> 
> layout(_, nil)          --> [].
> layout(P, S `text` X)   --> P(S), layout(P, X).
> layout(P, I `line` X)   --> P("\n"), P(string__duplicate_char(' ', I)),
>                             layout(P, X).
> 
> % ---------------------------------------------------------------------------- %
> 
> :- func best(int, int, doc) = simple_doc.
> 
> best(W, K, X)           = be(W, K, [0 - X]).
> 
> % ---------------------------------------------------------------------------- %
> 
>     % XXX We could do with a spot of laziness to avoid exponential
>     % run-times in the worst case here.  The problem is that flatten/1
>     % need only be evaluated to the point where it can be decided
>     % (by better/4 and fits/2) whether a structure is going to fit on
>     % the remainder of the line or not.  In practice, eagerness doesn't
>     % seem to be a problem.
> 
> :- func be(int, int, list(pair(int, doc))) = simple_doc.
> 
> be(_, _, [])                     = nil.
> be(W, K, [_ - 'NIL'        | Z]) = be(W, K, Z).
> be(W, K, [I - 'SEQ'(X, Y)  | Z]) = be(W, K, [I - X, I - Y | Z]).
> be(W, K, [I - 'NEST'(J, X) | Z]) = be(W, K, [(I + J) - X | Z]).
> be(W, K, [_ - 'TEXT'(S)    | Z]) = S `text` be(W, (K + pp_util__length(S)), Z).
> be(W, _, [I - 'LINE'       | Z]) = I `line` be(W, I, Z).
> be(W, K, [I - 'GROUP'(X)   | Z]) =
>     ( if fits(W - K, Flattened) then Flattened
>                                 else be(W, K, [I - X | Z]) )
> :-
>     Flattened = be(W, K, [I - flatten(X) | Z]).
> 
> % ---------------------------------------------------------------------------- %
> 
> :- func flatten(doc) = doc.
> 
> flatten('NIL')          = 'NIL'.
> flatten('SEQ'(X, Y))    = 'SEQ'(flatten(X), flatten(Y)).
> flatten('NEST'(I, X))   = 'NEST'(I, flatten(X)).
> flatten('TEXT'(S))      = 'TEXT'(S).
> flatten('LINE')         = 'NIL'.
> flatten('GROUP'(X))     = flatten(X).
> 
> % ---------------------------------------------------------------------------- %
> 
> :- pred fits(int, simple_doc).
> :- mode fits(in, in) is semidet.
> 
> fits(W, X) :-
>     W >= 0,
>     (
>         X = nil
>     ;
>         X = S `text` Y, fits(W - pp_util__length(S), Y)
>     ;
>         X = _ `line` _
>     ).
> 
> % ---------------------------------------------------------------------------- %
> 
> X `</>` Y           = X `<>` line `<>` Y.
> 
> % ---------------------------------------------------------------------------- %
> 
> bracketed(L, R, D)  = text(L) `<>` D `<>` text(R).
> parentheses(D)      = bracketed("(", ")", D).
> brackets(D)         = bracketed("[", "]", D).
> braces(D)           = bracketed("{", "}", D).
> 
> % ---------------------------------------------------------------------------- %
> 
> separated(_,  _,   []) = nil.
> 
> separated(PP, Sep, [X | Xs]) =
>     ( if Xs = [] then
>         PP(X)
>       else
>         PP(X) `<>` Sep `<>` separated(PP, Sep, Xs)
>     ).
> 
> % ---------------------------------------------------------------------------- %
> 
> comma               = text(",").
> semic               = text(";").
> space               = text(" ").
> comma_space         = text(", ").
> semic_space         = text("; ").
> comma_line          = comma `<>` line.
> semic_line          = semic `<>` line.
> space_line          = space `<>` line.
> comma_space_line    = text(", ") `<>` line.
> semic_space_line    = text("; ") `<>` line.
> 
> % ---------------------------------------------------------------------------- %
> 
> to_doc(X) = to_doc(int__max_int, X).
> 
> % ---------------------------------------------------------------------------- %
> 
>     % XXX Need to catch errors thrown by deconstruct/4 on types with
>     % user defined equality.

See my comment above.

> 
> to_doc(Depth, X) =
>     ( if Arity = 0 then
>         text(Name)
>       else if Depth =< 0 then
>         text(Name) `<>` text("(...)")
>       else
>         text(Name) `<>`
>         parentheses(
>             group(
>                 nest(2, line `<>` separated(id, comma_space_line, Args)) `<>`
>                 line
>             )
>         )
>     )
> :-
>     deconstruct(X, Name, Arity, UnivArgs),
>     Args = list__map(
>         ( func(UnivArg) = to_doc(Depth - 1, univ_value(UnivArg)) ),
>         UnivArgs
>     ).
> 
> % ---------------------------------------------------------------------------- %

-- 
Mark Brown, PhD student            )O+  |  "Another of Fortran's breakthroughs
(m.brown at cs.mu.oz.au)                   |  was the GOTO statement, which was...
Dept. of Computer Science and Software  |  uniquely simple and understandable"
Engineering, University of Melbourne    |              -- IEEE, 1994
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
Previous message: [m-dev.] Contribution: Pretty Printing Library
Next message: [m-dev.] Contribution: Pretty Printing Library
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the developers mailing list