[m-dev.] I want to move and rename the dependency_graph module

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Feb 15 16:46:35 AEDT 2017



On Wed, 15 Feb 2017 14:58:01 +1100, Paul Bone <paul at bone.id.au> wrote:
> > Therefore *accurate* computation of the code duplication you end up with
> > (as opposed to what you *start* with) is possible only if inlining runs
> > all the later passes of the compiler on the output of each inlining trial.
> 
> Right, that sounds like an interesting problem.  But one I'm not going to
> solve any time soon.

I don't think anyone will solve it anytime soon. The space of possible algorithms
is very large, and the possible payoff is quite small relative to the amount
of work needed to explore that space.

> > > I was thinking of something that is the super-set each PiI1..In set. 
> > 
> > The names we generate for MLDS variables include the variable number
> > of the HLDS variable they represent. Since these will coincide between
> > procedures only by accident, this is essentially equivalent. However,
> > just because there is an MLDS variable named ABC_42 in both p and q
> > does NOT mean that they are bound to even have the same type,
> > much less any kind of data flow relationship.
> 
> Two variables should decide if they can share a field by their type, not by
> their name.

Sorry, I misunderstood you. I thought by "superset" you meant
using the same MLDS variable for two or more input args in different
procedures if their names and types matched.

> So if I have:
> 
> void a(MR_Word a, MR_Word b, MyType *c);
> 
> void b(YourType *d, MR_Word e);
> 
> Then I can have:
> 
> struct ParamStruct1 (
>     MR_Word a_and_e;
>     MR_Word b;
>     MyType *c;
>     YourType *d;
> };
> 
> It's only useful if the foreign language doesn't have unions.

I will pretend that your function "a" is named p, and your function "b" is named q,
to make the following understandable. (Using overlapping name spaces for different
kinds of things in the same example is not a good idea.)

Using the same field for arg a in proc p and arg e in proc q may save one word
in ParamStruct, but what if the tail call(s) from p to q pass the value of not a, but b
as the second arg of q?

You see, what I am *most* trying to minimize is not the size of ParamStruct,
but the number of assignments to its fields that a tail recursive call needs to make.
In the example above, if p passes b as the second arg of q, you want to store *b*
and e in the same slot, not *a* and e, since this makes the assignment a noop.

Now, if none of p's input args, or their updated versions, are passed as the second
arg of q, then making e share a slot with any input arg of p that is of the correct type
is good idea. But if there are several input args of p with the correct type, you
do *not* want to choose among them randomly.
  
> > > One thought about inlining the TSCC members, rather than calling them in
> > > each switch arm.  Is that if they are outlined we can let the C compiler
> > > decide if inlining is worth while.  It may have more information than we do
> > > to make such decisions.
> > 
> > It *may* have more information than we do about whether inlining is
> > worthwhile, though I doubt it. I am certain, though, that *we* have
> > more information about whether inlining is *safe*. Since the target language
> > compilers will always need to be conservative about safety, I don't think
> > leaving the inlining up to them is a good idea.
> 
> I was thinking that it would have information about the relative cost of a
> C call versus placing code inline.
> 
> Except for static variables in C, when is inlining unsafe?  I'm now asking
> for my curiosity, I think both approaches would work almost as well as
> each-other.

I was thinking about the safety of deleting the code of a C function
if all calls to that functions have been inlined. To do this, the C compiler
must be sure that noone has taken the address of the function anywhere,
which would require an analysis of the whole namespace in which the
function name is visible.

> > Obviously, this test can be made more sophisticated, and independent of
> > variable names, but this simple test will get up a large fraction, probably
> > well over half, of the improvement that is potentially available from
> > this kind of optimization.
> 
> Okay.
> 
> I'd like to handle loop invariants explicitly (re the loop that drives the
> recursive calls) but starting here makes sense.

Then we agree; looking for other loop invariants was what I meant
by "more sophisticated tests".

> I agree, in the long term this and most of the other work I've done in my
> branch is not useful :-(. 

That is why I brought it up in this thread :-(

> However it is useful in the short term.  I have
> warnings for mutual recursion now 95% working (the UI needs fixing) and I
> can use this to survey the Prince and other codebases to find out where the
> mutual recursions are and what type they are.  We can use this to determine
> what part of this to start working on first, eg HLDS inlining or HLDS->MLDS
> inlining into a loop.

Good.

> > > They have to be linear, there can only be one tail call on any branch.
> > > Other SCC calls make it non-linear, but they're not TSCC calls.  Maybe I
> > > misunderstood what you mean by linear.
> > 
> > I meant that each procedure in the SCC contains just one tail call,
> > to some member of the SCC other than itself. For example,
> > if an SCC contains p, q, r and s, then the only tail calls are p->q,
> > q->r, r->s and s->p. If you add a tail call p->r, then it is not linear.
> > This kind of linearity test is for the inlining we discussed at the top.
> 
> We're still talking about the proposal to use a loop and switch or gotos?

Neither; as I mention, it is for inlining. If e.g. q is an entry point,
either from higher SCCs or from non-tail recursive calls in this SCC,
then you would want to inline r at the single q->r tail call site,
inline s in the single inlined_r->s tail call site, and inline p at the
single inlined_s_in_inlined_r->p tail call site. This gets all the speed
benefit of the driver loop approach without needing ANY changes
to the code generator, at the cost of some code duplication.
This is therefore the lowest hanging fruit for making mutually recursive
SCCs that fit this pattern use constant stack space. What fraction
of mutually recursive SCCs fit this pattern is of course as yet unknown,
but maybe you can extend the tests you mentioned above to tell us.

Zoltan.


More information about the developers mailing list