[m-dev.] pprint performance

Ralph Becket rafe at cs.mu.OZ.AU
Mon Feb 25 12:31:46 AEDT 2002


Peter Schachte, Monday, 25 February 2002:
> On Sat, Feb 23, 2002 at 11:49:11AM +1100, Ralph Becket wrote:
> > The real problem now is that converting a term to a doc form suitable
> > for pprint__write expands the size of the term significantly (every pair
> > of brackets costs about eight words of storage and every comma six or
> > seven words of storage.)  One would expect that in most very large terms
> > there will be a significant amount of sharing; unfortunately all this
> > sharing is lost in the conversion process.  This can lead to an
> > exponential blow-up in the memory footprint of formatted terms.
> > 
> > I've tried some experiments making judicious use of pragma memo, but
> > this is not a solution (it actually makes matters much worse.)
> 
> Do you have any insight into why memoing doesn't work?  Do your test
> cases have high degrees of fan-in that would benefit from memoing?

The test structure is an n-ary tree where the n branches at ply k
all point to the same n-ary tree at ply k + 1.

Part of the problem is that the to_doc function keeps track of "how far"
down a tree formatting has gone; if a given limit is reached then the
subtree is simply formatted as "...".  "How far" is defined such that a
right sibling is considered "further down" than a left sibling.  Memoing
therefore keys on <Depth, Term> rather than just <Term>, hence any
possible sharing based on <Term> alone is not going to be visible in the
memo table.

[to_doc/1 simply calls to_doc/2 (the version with the depth limit) with
a depth limit parameter of int__max_int.]

The other problem is that (I suspect that) memoing is currently hash
table based rather than trie based, so every subterm of the term being
marked up will be recursively hashed...

> Maybe what is needed in this cas is a kind of partial memoing --
> caching, really.  Dedicate a fixed amount of space to caching
> previously computed values, and when a new cache entry is to be added,
> just overwrite the old one with the same hash value.  When a large
> subterm is revisited, all of its subterms will be, too, so even if the
> root of the revisited subterm has been flushed from the cache, chances
> are some of its subterms or subsubterms won't be, so you'll still get
> much of the benefit with a fixed space overhead and a time overhead
> linear in the size of the term.  (Though for it to be a linear time
> overhead, you have to use pointer hashing and equivalence checking on
> subterms.)

Yes, I thought of this one.  To make it work properly, I'd have to have
separate implementations of the depth-limited and non-depth-limited
versions of the formatting code, which would be unfortunate.  Some
higher order stuff could probably make it all work without having to
duplicate anything, but...

> > A reasonable solution is probably to use some form of lazy evaluation,
> > whereby the term conversion is done on-demand by the write procedure.
> 
> I don't see how this would help if the whole term is to be printed; I
> would think it would just add an extra data structure to be built for
> each subterm.  The only benefit I would expect would be the automatic
> caching of already computed values, which is why I would expect
> caching to be better.

For very large terms, the resulting doc will exceed the size of main
memory.  The GC will therefore be invoked many times during the write
operation, which will mean traversing the entire structure created so
far (since the stack will contain a pointer to the root of the doc),
which will cause the machine to thrash.  This problem will also arise
with lazy approaches using "thunking", where the result of the
suspension overwrites the thunk the first time it is evaluated.

A better solution is to recalculate each level of mark-up as it is
required.  This way the GC can collect each sub-doc as soon as it has
been output.  We trade comutation time for space (which pays off big
time if your machine would otherwise be thrashing).

I tried this experiment using a mechanism whereby the mark-up procedure
would process n-ply at a time before "suspending".  This did lead to
significantly reduced memory usage with the expected factor of 3 or 4
slowdown.  What is confusing me now is that doubling the size of the
term that could previously be printed without thrashing does work
with the new scheme (whereas before it didn't), and in reasonable time,
but does still lead to paging.  Odd.

> Maybe the biggest improvement would be had by deforesting.  How far is
> the Mercury deforestation facility from being able to handle this?

Not a clue.  I think Simon might be able to answer this one.

- Ralph
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list