[m-dev.] I want to move and rename the dependency_graph module

Paul Bone paul at bone.id.au
Mon Feb 13 17:10:20 AEDT 2017

On Fri, Feb 10, 2017 at 01:05:59PM +1100, Zoltan Somogyi wrote:
> Paul, I assume that you want dependency graphs for the MLDS
> so that you can implement tail recursion for mutually recursive tail calls.
> Is this correct?

Yes.  First I'm improving the warnings for non tail-recursive code.  Then
I'll work implementing tail recursion for mutually recursive calls.

> When you first proposed that a bit more than a year ago, you didn't give us
> much detail about how you planned to do it. Perhaps you could do so now.
> It is always better to agree on the design approach before coding.

I hadn't yet planned how to do it.  It's fairer to say that I was planning
how to plan to do it ;-)  There are a number of things that we (YesLogic has
had some discussions) think are worth trying.

> Are you targeting all MLDS backends, or just C? And what mechanism
> do you intend to use for parameter passing, and for telling the trampoline
> which member of the clique, if any, to call next? There is more than one
> possible choice for both those questions; which combinations have you
> tested for performance? You said then you had a script to help you explore
> the issue; could you send it to us?

I'm aiming at high-level C in particular, but some things will work on most
MLDS backends.

My scripts were nothing fancy, It just ran and timed the programs that I had
prepared.  I modified the generated code by hand to make the case I needed.

There are three approaches worth trying and their combinations.

 + Inlining
 + We'll do it
 + Let the C compiler do it.


Inlining can be used to remove mutual calls or reduce SCCs.

    A -> B <-> C
         |     |
         V     V
         D     E

C can probably be inlined into B to remove the mutual recursion, The call to
B within C becomes a self-recursive call.  I haven't yet thought through all
the cases where this will or won't work, or is mediocre.  I also want to
read the HLDS inclining code to get a sense for what can be done easily.

We'll do it

This is the optimisation that we discussed last year.  So far I had
imagined creating a struct, probably on the stack of the call that enters
the SCC.  The struct is passed by reference between all the members of the
SCC.  The struct's fields are the arguments passed between the clique's
members (those in tail position) and their uses may overlap.  The trampoline
would be the simple loop as shown in the Dr Dobbs article that cites Fergus
and yourself.

    Func* fp = entry;
    while (fp != NULL) {
        fp = (Func*) (*fp)();

However if it is to pass the struct of parameters it would look a little

    Func* fp = entry;
    ParamStruct s;

    s.param1 = param1;
    s.param2 = param2;

    while (fp != NULL) {
        fp = (Func*) (*fp)(&s);

Outputs can also be retrieved from this struct.

I imagined using this type of trampoline and struct because it seems to be
the most straight forward, therefore the most likely to succeed.  So far my
testing was to establish that handling mutual recursion would provide some
benefit, and not how to handle it best, so I didn't test any other methods. 

One thing that occurs to me now is to pass the arguments normally.  We would
have to pass _all_ the arguments of the members to _each_ of the members.
This would allow the C compiler to decide where to place them (such as in
registers).  Return parameters would also have to be handled.  (Hrm, even in
normal code we could optimize in/out pairs by letting them share a single
C parameter.)

Another possibility for the trampoline is to return a token (a member of an
enum) and switch on it to decide which function to execute next.  We can
pass a more precise parameter list in this case.  This gives the C compiler
more control of parameter passing, at the cost of a little more indirection
in the trampoline.

Finally all the members of the SCC could be placed in the same function
body, and use goto statements to handle tail-calls.  This is likely to be
very fast, but it only works for SCCs with a single entry point and no
recursive non-tail calls that arn't also that entry point, at least without
code duplication.

I don't know of any other solutions that are also compatible with whichever
revision of the C standard we use (I forget and can't easily find this
information).  Let me know if there's an option that I've missed.

Let the C compiler do it

Third, it seems that if things are "just right" compilers like GCC / Clang
can do the tail recursion themselves.  In practice "just right" means that
the parameter lists and return values are the same.  We could either use the
struct from above, or simply make the parameter lists match by adding unused
arguments as necessary.  This may share code with the above idea, simply
disabling the use of a trampoline when we know the C compiler matches one
that we know will do the mutual recursion.

It's very likely that we'll be using a C compiler that supports this, so
it's a good option even if it's not portable.

If we're able to place all the procedures in a single C function with gotos
for tailcalls then that may be better than letting the C compiler implement
the tailcalls, we'd need to test.


Paul Bone

More information about the developers mailing list