[m-rev.] for post-commit review: Match output arguments correctly?for mutually recursive code

Paul Bone paul at bone.id.au
Wed Apr 12 12:58:15 AEST 2017

On Wed, Apr 12, 2017 at 10:52:56AM +1000, Zoltan Somogyi wrote:
> On Wed, 12 Apr 2017 10:26:01 +1000, Paul Bone <paul at bone.id.au> wrote:
> > > The code we would generate for a tail call for e.g. the LLDS backend may do
> > > the right thing, but it would do so only by accident, and I don't think we
> > > want to rely on accidents.
> > >
> > > I think requiring the caller and callee to have the *same* sequence of
> > > output arguments is safer, and it can be done with simpler code.
> > > That is why I am committing the attached diff.
> > > 
> > 
> > Okay, I didn't know how much this could be relied on, we havn't changed the
> > calling conventions before (to my knowledge) so I assumed they were pretty
> > stable.  I agree that it is possible they could change or new optimisations
> > could be added and that would cause problems.
> Yes, we have changed the calling convention, several times. For the LLDS backend,
> the two main changes I remember were
> - switching from passing argument K, whether input or output, in rK,
>   to passing the Mth input argument in rM and returning the Nth output
>   arg in rN; and
> - Peter's change to pass floats in a separate set of registers.
> The first was before your time, but you *were* around for the second.
> I agree that the probability of future changes to the LLDS argument passing
> convention that would invalidate tail call optimizations in the presence of the
> ignore output arg W in my previous example is small. However,
> - I don't think I can make that assertion about the other backends, and
> - I don't think there is anything significant to gain by the looser definition
>   of tail call, because ignored outputs are reasonably rare, and ignored outputs
>   in what would otherwise be tail recursive calls are, I would guess, very rare.

Yes, and for the other backends things might change between the various
languages.  Its easy to imagine adding a feature that effects say C but not

> Basically, I don't want to chance even a small risk of incorrectness for a
> very-small-to-nonexistent gain in performance.
> BTW, there *is* a change in argument passing convention that we have
> considered for which the looser definition of tail call may not work.
> The current LLDS argument passing convention uses general purpose
> registers, because when I designed the Mercury abstract machine,
> the target machines were RISC machines with lots of registers.
> Now, the only target worth discussing is the x86, which has a
> grand total of 2 registers we can use. So I have thought several times
> creating a separate version of the LLDS backend that would pass
> arguments on the stack, not in registers. Creating that backend
> would be a lot of work for probably only a few percent improvement
> in performance, which is why I never started working on it, but
> I did consider offering it as an honours or masters project.

What's the benefit?  Less shuffling of data?

Also I gave the general concern about output arguments for inlining linear
SCCs some more thought.  To a large degree we can have a much looser
definition of tail position _for this specific optimisation_.  Because once
a call is inlined, something like an unused output is then handled by later
compilation stages and possibly optimised away.  For example.

    p(a, 3). % base case.
    p(b, X::out) :-
        q(A, X, Y).

    q(a::in, 2::out, 4::out).

Will result in something with Y = 4, then after that Y would be unused.
Which will be picked up by a later compilation stage and optimized away.

There are other cases that are also safe, such as swapping.  Once the code
is inlined the swapping is no-longer relevant.

Generally speaking, please let me know if I'm mistaken, the inline linear
tail SCCs optimisation mostly doesn't care about tail calls.  It only
handles tail calls specifically to work-around (current) limitations with
the MLDS.

There's two types of situations where loosening this may help:

 * A situation where most of the linear calls are tail calls, but maybe one
   or two isn't.  This would normally prevent this optimisation from being
   applied.  And if it were applied it would still provide constant stack
   usage (provided that the whole SCC (not just the TSCC) is linear).

 * Non-tail linear SCCs generally.  These can benefit from having their
   calls inlined to make sure they use constant stack space.  It may be
   harder to find linear non-tail SCCs.


Paul Bone

More information about the reviews mailing list