[m-dev.] I want to move and rename the dependency_graph module

Paul Bone paul at bone.id.au
Wed Feb 15 14:58:01 AEDT 2017


On Wed, Feb 15, 2017 at 02:15:31PM +1100, Zoltan Somogyi wrote:
> 
> On Wed, 15 Feb 2017 13:08:47 +1100, Paul Bone <paul at bone.id.au> wrote:
> > I would propose a cost model, except that we cannot hope to (analytically)
> > know the cost of code duplication vs the cost of non-tail calls.  However I
> > think it would be useful to create and agree upon some kind of model.  Even
> > if we can't compare code duplication and non-tail calls, we can calculate
> > code duplication.
> 
> Actually, that is effectively impossible to do accurately in inlining.m by itself.
> The reason is what you *want* to measure is NOT the immediate increase
> in code size generated by inlining, but the *final*, overall increase in code size,
> and *that* is affected by every optimization that runs after inlining.
> That measure is affected by how the particular details of a given call site
> affect the optimizations that can be applied to inlined copy of the callee.
> For example, if the callee has a switch on X, and a call site sets the variable
> that becomes the input arg X to a given function symbol, later optimizations
> can replace the whole switch with just the selected arm. This means that
> in some cases, even inlining a procedure with a huge body can lead to an
> overall *reduction* in code size, if later optimizations can delete everything
> in the callee's body except e.g. a switch arm containing a single unification,
> whose code is smaller than the code required to make the call.
> 
> Therefore *accurate* computation of the code duplication you end up with
> (as opposed to what you *start* with) is possible only if inlining runs
> all the later passes of the compiler on the output of each inlining trial.

Right, that sounds like an interesting problem.  But one I'm not going to
solve any time soon.

> > I was thinking of something that is the super-set each PiI1..In set. 
> 
> The names we generate for MLDS variables include the variable number
> of the HLDS variable they represent. Since these will coincide between
> procedures only by accident, this is essentially equivalent. However,
> just because there is an MLDS variable named ABC_42 in both p and q
> does NOT mean that they are bound to even have the same type,
> much less any kind of data flow relationship.

Two variables should decide if they can share a field by their type, not by
their name.

So if I have:

void a(MR_Word a, MR_Word b, MyType *c);

void b(YourType *d, MR_Word e);

Then I can have:

struct ParamStruct1 (
    MR_Word a_and_e;
    MR_Word b;
    MyType *c;
    YourType *d;
};

It's only useful if the foreign language doesn't have unions.

> > One thought about inlining the TSCC members, rather than calling them in
> > each switch arm.  Is that if they are outlined we can let the C compiler
> > decide if inlining is worth while.  It may have more information than we do
> > to make such decisions.
> 
> It *may* have more information than we do about whether inlining is
> worthwhile, though I doubt it. I am certain, though, that *we* have
> more information about whether inlining is *safe*. Since the target language
> compilers will always need to be conservative about safety, I don't think
> leaving the inlining up to them is a good idea.

I was thinking that it would have information about the relative cost of a
C call versus placing code inline.

Except for static variables in C, when is inlining unsafe?  I'm now asking
for my curiosity, I think both approaches would work almost as well as
each-other.

> > > To achieve this, the MLDS backend would have to do code generation SCC by SCC.
> > > For each SCC, it would need to know what TSCCs, if any, exist inside it.
> > > It would do this by using the same algorithm to find SCCs as we already use,
> > > but this time, using as edges only the calls in the SCC that are both
> > > RECURSIVE calls and TAIL calls. This will partition the procedures in the SCC
> > > into one or more TSCCs. (While all the procedures in the SCC are by definition
> > > reachable from all other procedures in the SCC, they need not be reachable
> > > via TAIL calls from all other procedures in the SCC.)
> > > 
> > > For TSCCs that contain only one procedure, we would translate that procedure
> > > the usual way. For TSCCs that contain two or more procedures, we would want
> > > to generate code that follows the scheme above.
> > 
> > Would an MLDS->MLDS transformation be easier?
> 
> No.
> 
> If you have a choice between generating the wrong thing and then fixing it up
> or generating the right thing directly, always choose the latter. The code that
> does the fixups in the first approach would always be vulnerable to changes
> in the code generation itself screwing up the patterns that it looks for;
> this could happen even if those changes were optimizations for unrelated
> purposes.
> 
> I learned this lesson the hard way.

The case of detecting mutual recursion seems simple enough.  However I don't
have a good reason to choose one over the other, so I'm happy to agree with
you.  It's also a good principal.

> > > > This may share code with the above idea, simply
> > > > disabling the use of a trampoline when we know the C compiler matches one
> > > > that we know will do the mutual recursion.
> > > 
> > > That test would be a bitch to keep up to date.
> > 
> > I don't think it would be that hard.  However I prefer the above solution
> > anyway, I think it will (usually) lead to better code/performance.
> 
> I wasn't thinking about it being hard; I was thinking that it would be *annoying*,
> given that it would have to be repeated after every new version of gcc/clang/etc.

Yep.  I'd like to keep it in mind anyway, it may still be useful as a
fallback option.

> > > Another issue is that mutually recursive procedures often have one or more
> > > arguments that are passed to them from above the SCC and which are passed down
> > > the chain of recursive calls. In some cases, the argument is always passed
> > > unchanged; in some other cases, it is sometimes passed unchanged, and
> > > sometimes passed updated. In both cases, we should avoid the need for
> > > every tail recursive call to set the value of these parameters, because
> > > that is wasted work in the common case that the value is passed along
> > > unchanged.
> > > 
> > > With the design above, such parameters could be stored in e.g. P1I1, P2I1, and
> > > P3I2. The target language compiler may find out that it can store all these
> > > variables in the same stack slot, making assignments such as P3I2 = P2I1
> > > just before a tail call from P2 to P3 a no-operation. If some target language
> > > compilers don't, then the Mercury compiler could itself compute which
> > > the sets of input arguments, one argument from each TSCC, form a single
> > > parameters in this sense. We could then get ml_gen_proc to refer to that
> > > input argument by the shared name, and ml_call_gen to optimize its passing.
> > 
> > Implementing this in the MLDS as a seperate excess assignment elimination
> > wouldn't be too difficult.
> 
> The existing excess assignment elimination pass works only on conjunctions;
> if it sees that one conjunct is X1 := X0, then it can replace X0 with X1,
> or vice versa, in all the later conjuncts. This is relatively simple to do.
> In the case we are talking about, the assignment Xq := Xp will appear just before
> the tailcall from p to q, implemented as a branch back to the top of the driver
> loop, so this won't work. You will have to extend the analysis to q's branch in
> the switch, at which point the analysis needed is so different that you may as well
> call it by some other name.
> 
> When a procedure body updates STATE_VARIABLE_X_0 to become
> STATE_VARIABLE_X_1 and then never refers to STATE_VARIABLE_X_0 again,
> I trust the target language compilers to reuse the same stack slot for both,
> because keeping track of such things is easy for them in code without backward
> branches. Tail calls introduce backward branches, making the job much harder,
> not just for target language compilers, but for any MLDS analyses as well.
> 
> By contrast, things are much simpler at the HLDS level: you just need to match
> the input args in the procedure head with those in the input arg list of the
> recursive call. If e.g. input arg 2 in the head and input arg 3 in the tail call
> have the same type and the same name modulo a numeric suffix, then we
> can use the same MLDS var for input arg 2 of this proc and input arg 3 of the
> tail call's callee proc.
> 
> Obviously, this test can be made more sophisticated, and independent of
> variable names, but this simple test will get up a large fraction, probably
> well over half, of the improvement that is potentially available from
> this kind of optimization.

Okay.

I'd like to handle loop invariants explicitly (re the loop that drives the
recursive calls) but starting here makes sense.

> > I have a version of ml_dependency_graph.m in my workspace.  It can be
> > modified to add support for TSCCs.  My big question at this point is whether
> > we'd prefer to do this as an MLDS->MLDS transformation.
> 
> I don't think so. As far as I can see, implementing tail calls in the MLDS
> is best done at the HLDS to MLDS stage, not as an MLDS to MLDS transformation.

I agree, in the long term this and most of the other work I've done in my
branch is not useful :-(.  However it is useful in the short term.  I have
warnings for mutual recursion now 95% working (the UI needs fixing) and I
can use this to survey the Prince and other codebases to find out where the
mutual recursions are and what type they are.  We can use this to determine
what part of this to start working on first, eg HLDS inlining or HLDS->MLDS
inlining into a loop.

> > They have to be linear, there can only be one tail call on any branch.
> > Other SCC calls make it non-linear, but they're not TSCC calls.  Maybe I
> > misunderstood what you mean by linear.
> 
> I meant that each procedure in the SCC contains just one tail call,
> to some member of the SCC other than itself. For example,
> if an SCC contains p, q, r and s, then the only tail calls are p->q,
> q->r, r->s and s->p. If you add a tail call p->r, then it is not linear.
> This kind of linearity test is for the inlining we discussed at the top.

We're still talking about the proposal to use a loop and switch or gotos?

    p_label:
        // p contains two tail calls, the only way this can happen si if
        // they're on different branches.  I suppose it would matter if we
        // had to place to goto after the body of the inlined procedure.
        if (...) {
            ...
            goto q_label;
        } else {
            ...
            goto r_label;
        }
    q_label:
        ...
        goto r_label;
    r_label:
        ...
        goto s_label;
    s_label:
        ...
        goto p_label;

    end_label:
        ...

Thanks.


-- 
Paul Bone
http://paul.bone.id.au


More information about the developers mailing list