[m-users.] Enable LCMC tail recursion optimization by default?

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Apr 20 00:27:16 AEST 2016



On Tue, 19 Apr 2016 23:06:27 +1000 (AEST), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> I am now doing a more extensive test, to see what the performance effect is
> of compiling the compiler itself with --optimize-constructor-last-call.

Those tests are now done. The results:

compiler in hlc.gc,           with no lcmc   15.29
compiler in hlc.gc,           with lcmc        15.34
compiler in asm_fast.gc, with no lcmc    15.85
compiler in asm_fast.gc, with lcmc         16.18

So when the compiler itself compiled in grade hlc.gc, enabling
last constructor modulo call leads to a 0.3% slowdown. That is
barely measurable, and would be, I think, a reasonable price
to pay to eliminate some (though not all) causes of stack exhaustion.

When the compiler is compiled in grade asm_fast.gc, the cost
is higher: about a 2.1% slowdown. That is harder to justify
imposing on everyone.

In both grades, lcmc causes an increase in the size of the
executable of between 2 and 3%: closer to 2 for hlc.gc, and
closer to 3 for asm_fast.gc. This is because this option
duplicates the bodies of the affected procedures: the new copy
contains a tail recursive version of the original code (this is the one
that does almost all the work), while the original still exists,
because it is needed to present the original calling interface
to the rest of the program. (The original calls not itself,
but the new version, so it does only one iteration on each call.)

This increase in code size, and the increased pressure it brings
on I-caches, is one possible reason for the slowdown.
I don't think we ever did a full investigation of the slowdowns,
and I think that fact was a reason why this option is not
turned on by default. (Of course, microbenchmarks that
spend all their time in predicates that are made tail recursive
by this option get speedups, but we are interested in the
performance of real programs, whose performance characteristics
are usually much more complex.)

Zoltan.


More information about the users mailing list