[m-users.] Enable LCMC tail recursion optimization by default?

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat Apr 23 19:34:40 AEST 2016

On Wed, 20 Apr 2016 13:51:36 +1000, Peter Wang <novalazy at gmail.com> wrote:
> Assuming you still have the object files, do you think you link the
> compiler using a combination of LCMC-enabled and non-LCMC-enabled .o
> files, and thereby narrow down which modules are negatively impacted?

Only crudely. There is enough noise in timing measurements that answering
the question "does this change lead to a slowdown?" is very hard if the
slowdown is small, and in this case, all the speed changes will be small.

> I'm just guessing that most modules are not transformed by LCMC,
> or are not executed enough to affect the timings.

I found that enabling lcmc in Mmake.params leads to changes in 233 C files,
while 657 C files stay the same. (All the library's C files stay the same, since
they don't need Mmake.params to enable lcmc.) Overall, and including the
library, 971 strongly connected components of procedures are transformed
by lcmc. The changed files and procedures are listed by the attached files.

On the list of changed procedures, I can see a significant number for which
lcmc is a pessimization. They are procedures that usually don't get executed,
because they implement rarely-used or not-yet-ready features, or are
needed only for handling errors, which should (hopefully) be rare. For these,
the negative of the extra code space occupied by lcmc-transformed code
is not opposed by the positive of any meaningful speedup. And then there
are the predicates for which lcmc *may* yield a speedup, but it is likely
to be very, very small, such as the predicates that iterate over argument
lists. Since very few predicates have more than ten arguments, the impact
of the extra icache misses from the bigger code generated by lcmc will
probably outweigh the larger stack usage of the non-lcmc code, since the
memory references to the stack near the stack pointer are *very* likely
to hit in the dcache.

I have long thought that the ideal way to handle lcmc and similar
optimizations is for a profiler to generate a list of procedures whose runtime
is nontrivial, and to give one or more such lists to the compiler.
The compiler should then optimize the procedures that appear on at least
one such list for speed, and every other procedure for size. That is from
the point-of-view of speed. From the point-of-view of robustness, we
would also need a list of procedures whose inputs can cause them to
exhaust the stack. Unfortunately, I don't see any easy way to generate
that list automatically. The converse, a list of procedures whose inputs
are *bounded* somehow to a level that is guaranteed *not* to stress
the stack may be easi*er* to obtain, but that task is still substantial.
And any case, I expect *most* predicates would then fall into the category
of "NOT known to NOT to be able to exhaust the stack". Compiling them
with lcmc, just in case, would yield a result that would be scarcely
distinguishable from compiling *everything* with lcmc.

Since I see no better solutions in the near term, I am fine with
making lcmc the default, with the understanding that for the modules
in the compiler in which we believe its use to be unneeded and counter-
productive, we turn it off in Mercury.options.

-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 8428 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/users/attachments/20160423/e7b3076e/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 70002 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/users/attachments/20160423/e7b3076e/attachment-0003.obj>

More information about the users mailing list