[m-dev.] Mutually recursive tailcalls in hlc.

Paul Bone paul at bone.id.au
Tue Nov 10 11:07:56 AEDT 2015

On Tue, Nov 10, 2015 at 12:39:30AM +1100, Mark Brown wrote:
> On Mon, Nov 9, 2015 at 12:27 PM, Paul Bone <paul at bone.id.au> wrote:
> > On Sat, Nov 07, 2015 at 06:06:21AM +1100, Mark Brown wrote:
> >> On Fri, Nov 6, 2015 at 11:37 AM, Paul Bone <paul at bone.id.au> wrote:
> >> > I've transformed it to this:
> >> >
> >> > /* Struct is used to represent arguments for recursive calls */
> >> > typedef struct {
> >> >   MR_Float Nr;
> >> >   MR_Float Ni;
> >> >   MR_Float Cr;
> >> >   MR_Float Ci;
> >> >   MR_Integer MaxIters;
> >> >   MR_Integer Iters_0;
> >> >   MR_Integer * Iters;
> >> > } escape_data;
> >>
> >> If all the functions in the SCC take something like this as an
> >> argument, and return void, would gcc already make the calls tail
> >> recursive? Come to think of it, could you meet gcc's criteria by just
> >> padding out the functions in the SCC so they all have the same number
> >> of arguments?
> >>
> >
> > Zoltan has already said that this is only one factor in the stack frame
> > sizes.  In addition to this I want to avoid depending on depending on any
> > behaviour of a C compiler in the high level C backend.  That's what the
> > low-level C backend is for. :-P
> Zoltan is correct. And you have not answered my questions!

Yes, GCC may optimize sibling calls if we pack arguments into structs or pad
argument lists.  This will probably be faster than using a trampoline, but
it still depends on the behaviour of a specific C compiler.  Which I think is
something we try to avoid with the high-level C backend.

I say "probably be faster" because the article you linked about sibling
calls mentioned modifying the return address in the stack frame, this can
slow down the processor's pipelining because the processor keeps a shadow
copy of the call stack so that it knows what will be executed after a "ret"
instruction.  How much it slows it down and whether the jump, test,
conditional branch and indirect jump of the trampoline are slower I'm not

> > That said if we thought that a majority of popular C compilers could
> > optimise mutual recursion in the general case, then I wouldn't bother making
> > this change to Mercury.  I'm pretty sure that that's not the case.
> I was asking about a conditional case, not the general case. You're
> right that I should ask about more than just gcc, though.
> Let me try again with a specific condition. Can the compilers we want
> perform tail call elimination if the signatures of the caller and
> callee are the same? Your argument transformation satisfies this
> condition, and the optimizations you mentioned are also relevant, but
> the actual trampoline would be redundant and may impede other C
> optimizations (such as inlining).
> Like I said earlier, it's not that I disagree with your proposal. But
> if someone asks, I'd like to be able to name a compiler we rely on
> that doesn't do tail call elimination under this condition.

You're right, it may not be worth my/YesLogic's time to implement the
general solution.

I'm not sure.  AFAIK the popular C compilers for use with Mercury are GCC,
Clang and VC++.  We could check Clang, it'll be documented somewhere.
I'm not sure how we'd find out something like this for VC++.  Which ones do
we (by which I mean Mercury users and specifically YesLogic) care about?
And which ones do we care about when we also care about mutual recursion?

Hypothetically, if GCC and Clang both optimized this, and we took advantage
of that, then would it be bad if we didn't also implement a more general
optimisation for use with VC++?  I personally wouldn't care, but someone
else might.

Paul Bone

More information about the developers mailing list