[m-dev.] why append slows down using lco
Peter Ross
petdr at cs.mu.OZ.AU
Wed May 26 14:57:17 AEST 1999
run.lco.novn: 286434 ms
run.lco.vn: 258400 ms
run.nolco.vn: 256117 ms
As you can see by the figures above, the manual addition of value
numbering brings the lco code back to approximately the same performance
as the original code. (Turning off value numbering for nolco doesn't
help as the code generated is already very good, only one excess
assignment)
The lines of assembly that slow down run.lco.novn are
cycles dmiss imiss cpi
2992 26 7 ? 0x12000f5e4 ldt $f10, -1(s2)
27541 932 50 ? 0x12000f5e8 stt $f10, -1(v0)
2827 10 8 ? 0x12000f5ec ldt $f11, 56(t2)
24719 1084 53 ? 0x12000f5f0 stt $f11, 0(s4)
which from what I can workout are simply just some of the register to
register copying (except the registers are really memory locations),
hence the need for value numbering.
Here is the performance figures for append with vn turned on.
run.nolco.vn:
cycles % dmiss % imiss %
74691 8.37% 3975 4.58% 229 2.23% append_impl_module
run.lco.vn:
cycles % dmiss % imiss %
62910 7.00% 1418 1.73% 255 2.41% append_impl_module
So the addition of lco actually saves ~11000 cycles, where we get
bitten though is in garbage collection, in particular the call to GC_malloc.
run.nolco.vn:
cycles % dmiss % imiss %
152170 17.04% 15997 18.42% 2826 27.50% GC_malloc
run.lco.vn:
cycles % dmiss % imiss %
172076 19.13% 13742 16.76% 2870 27.09% GC_malloc
The other main cost centers are GC_mark_from_mark_stack (417755, 421265)
and GC_build_fl_clear2 (137467, 137198) (lco cycles, nolco cycles), but
they both have approximately the same performance, so it is the call to
GC_malloc that is slowing the code down.
By adding a call to io__report_stats both versions of the code do the
same number of garbage collections, so it is actually the allocations
that are taking longer.
If you turn off garbage collection, you get the following numbers:
run.lco.vn.nogc: 10583 ms
run.nolco.vn.nogc: 13400 ms
Tyson suggested that the problem is due to the fact that when we create
the cell on the heap the tail cell contains garbage which the boehm gc
garbage collector may interpret as a pointer.
Setting the tail cell to zero gives us the required speedup,
if only just!
run.nolco.vn: 251950 ms
run.lco.vn.zero: 249833 ms
Pete.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list