[m-dev.] Mutually recursive tailcalls in hlc.

Paul Bone paul at bone.id.au
Fri Nov 6 11:37:19 AEDT 2015


On Thu, Nov 05, 2015 at 05:02:15PM +1100, Zoltan Somogyi wrote:
> 
> 
> On Thu, 5 Nov 2015 16:17:33 +1100, Paul Bone <paul at bone.id.au> wrote:
> > I setup a small test to see if it was worthwhile investigating further (and
> > it is!).  I've compiled mandelbrot in hlc.gc with --no-optimise-tailcalls
> > --no-c-optimize.  The C optimisations are disabled because C may be
> > optimising tailcalls for me.  Then edited the generated C code to create my
> > test.  This example uses direct recursion, that's okay, I'm using it as a
> > proxy for mutual recursion because with this small benchmark I can
> > confidently modify the hand generated C files to setup my test.
> 
> I don't know about others, but I don't understand exactly how
> your three versions differ, and exactly how you extrapolate from
> direct recursion to mutual recursion. I would need access to the
> source code (both .m and .c, if the compiler generated .c is hand
> modified), and I would strongly prefer that the exemplar program
> itself be mutually recursive.
> 

I can bundle up my tests with some scripts later this afternoon or next
week.  For now I'll explain briefly what's going on.

YesLogic is interested in, C backends, lately we've been using the
high-level C backend, so our scope is limited to that backend.  We've found
that processing large documents, or documents with very large elements, can
cause crashes due stack overflows.  Some of this code is mutually recursive
which Mercury's high level C backend cannot currently optimize mutually
recursive calls tail calls.

Direct tailcalls are optimised into loops.  This is done by wrapping the
procedure body in a while(true) loop with a break statement at the end.
Each directly recursive tail call is replaced with a series of assignments,
to setup the arguments, and a continue statement.  Mutual recursion is not
supported implemented.

This gives us our first two test cases, one with and one without this
transformation.  All test cases are compiled with --no-c-optimize to prevent
the C compiler from making its own tailcall optimisation.

I propose to handle mutually recursive tailcalls using a trampoline.
Without --no-optimize-tailcalls the generated C code for the inner loop of
mandelbrot looks like this:

static void MR_CALL 
mandelbrot__escape_7_p_0(
  MR_Float mandelbrot__Nr_8,
  MR_Float mandelbrot__Ni_9,
  MR_Float mandelbrot__Cr_10,
  MR_Float mandelbrot__Ci_11,
  MR_Integer mandelbrot__MaxIters_12,
  MR_Integer mandelbrot__STATE_VARIABLE_Iters_0_18,
  MR_Integer * mandelbrot__STATE_VARIABLE_Iters_19)
{
  {
    MR_bool mandelbrot__succeeded;
    MR_Integer mandelbrot__V_20_20 = (MR_Integer) 0;

    ...;
    
    if (...)
      ...;

      mandelbrot__escape_7_p_0(mandelbrot__Rr_16, mandelbrot__Ri_17, mandelbrot__Cr_10, mandelbrot__Ci_11, mandelbrot__V_36_36, mandelbrot__STATE_VARIABLE_Iters_34_34, mandelbrot__STATE_VARIABLE_Iters_19);
    else {
      ...;
      *mandelbrot__STATE_VARIABLE_Iters_19 = (MR_Integer) -1;
    }
  }
}

I've transformed it to this:

/* Struct is used to represent arguments for recursive calls */
typedef struct {
  MR_Float Nr;
  MR_Float Ni;
  MR_Float Cr;
  MR_Float Ci;
  MR_Integer MaxIters;
  MR_Integer Iters_0;
  MR_Integer * Iters;
} escape_data;

typedef void*(*escape_fp)(escape_data*);

/*
** I renamed escape_7_p_0 to escape_real_7_p_0 and changed it's signature,
** here's the forward declaration.
 */
static escape_fp MR_CALL
mandelbrot__escape_real_7_p_0(escape_data*);

/*
** A new function with the old name and signature wraps the renamed function.
*/
static void MR_CALL 
mandelbrot__escape_7_p_0(
  MR_Float mandelbrot__Nr_8,
  MR_Float mandelbrot__Ni_9,
  MR_Float mandelbrot__Cr_10,
  MR_Float mandelbrot__Ci_11,
  MR_Integer mandelbrot__MaxIters_12,
  MR_Integer mandelbrot__STATE_VARIABLE_Iters_0_18,
  MR_Integer * mandelbrot__STATE_VARIABLE_Iters_19)
{
    escape_fp   fp;
    escape_data data;

    /*
    ** Setup the parameters into a structure on the stack.
    */
    data.Nr = mandelbrot__Nr_8;
    data.Ni = mandelbrot__Ni_9;
    data.Cr = mandelbrot__Cr_10;
    data.Ci = mandelbrot__Ci_11;
    data.MaxIters = mandelbrot__MaxIters_12;
    data.Iters_0 = mandelbrot__STATE_VARIABLE_Iters_0_18;
    data.Iters = mandelbrot__STATE_VARIABLE_Iters_19;

    /* The trampoline */
    fp = mandelbrot__escape_real_7_p_0(&data);
    while (fp != NULL) {
        fp = (escape_fp)fp(&data);
    }
}

static escape_fp MR_CALL
mandelbrot__escape_real_7_p_0(escape_data *data) {

    /*
    ** Copy out the parameters into automatic variables.  This made the
    ** transformation easier to do by hand.  But by making these automatic
    ** variables we may allow the C compiler to optimize some of them to
    ** registers, but optimizations are disabled for my test.
    */
    MR_Float mandelbrot__Nr_8 = data->Nr;
    MR_Float mandelbrot__Ni_9 = data->Ni;
    MR_Float mandelbrot__Cr_10 = data->Cr;
    MR_Float mandelbrot__Ci_11 = data->Ci;
    MR_Integer mandelbrot__MaxIters_12 = data->MaxIters;
    MR_Integer mandelbrot__STATE_VARIABLE_Iters_0_18 = data->Iters_0;
    MR_Integer * mandelbrot__STATE_VARIABLE_Iters_19 = data->Iters;

    ...;

    if (...) {
        ...;

        /*
        ** Copy the parameters back into the _same_ structure.  This is
        ** important otherwise we would simply be trading stack usage for heap
        ** usage, which may be okay but can also be avoided.  Allocating a heap
        ** cell for each recursive call would also be relatively expensive.
        */
        data->Nr = mandelbrot__Rr_16;
        data->Ni = mandelbrot__Ri_17;
        data->Cr = mandelbrot__Cr_10;
        data->Ci = mandelbrot__Ci_11;
        data->MaxIters = mandelbrot__V_36_36;
        data->Iters_0 = mandelbrot__STATE_VARIABLE_Iters_34_34;
        data->Iters = mandelbrot__STATE_VARIABLE_Iters_19;

        /*
        ** Return the FP to the trampoline
        */
        return (escape_fp)mandelbrot__escape_real_7_p_0;
    } else {
        ...;

        *mandelbrot__STATE_VARIABLE_Iters_19 = (MR_Integer) -1;
    }

    /*
    ** Exit the loop.
    */
    return NULL;
}

The reason I think this is a suitable proxy for mutual recursion is that it
uses almost the same transformation.  The only difference is that a union of
structs would be used for parameter passing.  I also found it difficult to
think of a mutual recursive example that would clearly show the impact of
such a transformation that wasn't overly-contrived.  As it is this is still
contrived, why would you use a trampoline to optimize direct recursion?
However I think it's less contrived than any mutually recursive example I
could think of.

There are a number of potential optimisations that could be made.  For
example loop invariants do not need to be re-saved into the structure, they
would need to form a structure such as.

struct {
    invariants...;

    union {
        strict {
            arg1
            ...;
        } args1;
        struct {
            ...;
        } args2;
    }
}

Thanks.


-- 
Paul Bone



More information about the developers mailing list