[m-dev.] I want to move and rename the dependency_graph module
paul at bone.id.au
Wed Feb 15 17:13:24 AEDT 2017
On Wed, Feb 15, 2017 at 04:46:35PM +1100, Zoltan Somogyi wrote:
> On Wed, 15 Feb 2017 14:58:01 +1100, Paul Bone <paul at bone.id.au> wrote:
> > > Therefore *accurate* computation of the code duplication you end up with
> > > (as opposed to what you *start* with) is possible only if inlining runs
> > > all the later passes of the compiler on the output of each inlining trial.
> > Right, that sounds like an interesting problem. But one I'm not going to
> > solve any time soon.
> I don't think anyone will solve it anytime soon. The space of possible algorithms
> is very large, and the possible payoff is quite small relative to the amount
> of work needed to explore that space.
Yeah, I thought so too. Didn't we talk about this at lunch with Andy King
once? He mentioned supercompilation in passing and I hadn't heard of it.
Basically you both said that it's something that searches the space of
possible optimisations (or is it something else) and then picks the best?
You both said that it only works for very small programs. That made sense
> > > > I was thinking of something that is the super-set each PiI1..In set.
> > >
> > > The names we generate for MLDS variables include the variable number
> > > of the HLDS variable they represent. Since these will coincide between
> > > procedures only by accident, this is essentially equivalent. However,
> > > just because there is an MLDS variable named ABC_42 in both p and q
> > > does NOT mean that they are bound to even have the same type,
> > > much less any kind of data flow relationship.
> > Two variables should decide if they can share a field by their type, not by
> > their name.
> Sorry, I misunderstood you. I thought by "superset" you meant
> using the same MLDS variable for two or more input args in different
> procedures if their names and types matched.
> > So if I have:
> > void a(MR_Word a, MR_Word b, MyType *c);
> > void b(YourType *d, MR_Word e);
> > Then I can have:
> > struct ParamStruct1 (
> > MR_Word a_and_e;
> > MR_Word b;
> > MyType *c;
> > YourType *d;
> > };
> > It's only useful if the foreign language doesn't have unions.
> I will pretend that your function "a" is named p, and your function "b" is named q,
> to make the following understandable. (Using overlapping name spaces for different
> kinds of things in the same example is not a good idea.)
> Using the same field for arg a in proc p and arg e in proc q may save one word
> in ParamStruct, but what if the tail call(s) from p to q pass the value of not a, but b
> as the second arg of q?
> You see, what I am *most* trying to minimize is not the size of ParamStruct,
> but the number of assignments to its fields that a tail recursive call needs to make.
> In the example above, if p passes b as the second arg of q, you want to store *b*
> and e in the same slot, not *a* and e, since this makes the assignment a noop.
> Now, if none of p's input args, or their updated versions, are passed as the second
> arg of q, then making e share a slot with any input arg of p that is of the correct type
> is good idea. But if there are several input args of p with the correct type, you
> do *not* want to choose among them randomly.
I didn't consider that, I agree.
> > > > One thought about inlining the TSCC members, rather than calling them in
> > > > each switch arm. Is that if they are outlined we can let the C compiler
> > > > decide if inlining is worth while. It may have more information than we do
> > > > to make such decisions.
> > >
> > > It *may* have more information than we do about whether inlining is
> > > worthwhile, though I doubt it. I am certain, though, that *we* have
> > > more information about whether inlining is *safe*. Since the target language
> > > compilers will always need to be conservative about safety, I don't think
> > > leaving the inlining up to them is a good idea.
> > I was thinking that it would have information about the relative cost of a
> > C call versus placing code inline.
> > Except for static variables in C, when is inlining unsafe? I'm now asking
> > for my curiosity, I think both approaches would work almost as well as
> > each-other.
> I was thinking about the safety of deleting the code of a C function
> if all calls to that functions have been inlined. To do this, the C compiler
> must be sure that noone has taken the address of the function anywhere,
> which would require an analysis of the whole namespace in which the
> function name is visible.
It should know this if the function is static.
> > > Obviously, this test can be made more sophisticated, and independent of
> > > variable names, but this simple test will get up a large fraction, probably
> > > well over half, of the improvement that is potentially available from
> > > this kind of optimization.
> > Okay.
> > I'd like to handle loop invariants explicitly (re the loop that drives the
> > recursive calls) but starting here makes sense.
> Then we agree; looking for other loop invariants was what I meant
> by "more sophisticated tests".
I thought so.
> > I agree, in the long term this and most of the other work I've done in my
> > branch is not useful :-(.
> That is why I brought it up in this thread :-(
> > > > They have to be linear, there can only be one tail call on any branch.
> > > > Other SCC calls make it non-linear, but they're not TSCC calls. Maybe I
> > > > misunderstood what you mean by linear.
> > >
> > > I meant that each procedure in the SCC contains just one tail call,
> > > to some member of the SCC other than itself. For example,
> > > if an SCC contains p, q, r and s, then the only tail calls are p->q,
> > > q->r, r->s and s->p. If you add a tail call p->r, then it is not linear.
> > > This kind of linearity test is for the inlining we discussed at the top.
> > We're still talking about the proposal to use a loop and switch or gotos?
> Neither; as I mention, it is for inlining. If e.g. q is an entry point,
> either from higher SCCs or from non-tail recursive calls in this SCC,
> then you would want to inline r at the single q->r tail call site,
> inline s in the single inlined_r->s tail call site, and inline p at the
> single inlined_s_in_inlined_r->p tail call site. This gets all the speed
> benefit of the driver loop approach without needing ANY changes
> to the code generator, at the cost of some code duplication.
> This is therefore the lowest hanging fruit for making mutually recursive
> SCCs that fit this pattern use constant stack space. What fraction
> of mutually recursive SCCs fit this pattern is of course as yet unknown,
> but maybe you can extend the tests you mentioned above to tell us.
I might be able to automate it, but it's probably equally easy to check each
SCC by hand and make a tally.
I have some similar code in the deep profiler that might be able to help.
The benifit of that is that it can weight each by the number of calls made
into and within the SCC.
More information about the developers