[m-dev.] why append slows down using lco

Peter Ross petdr at cs.mu.OZ.AU
Wed May 26 14:57:17 AEST 1999

Previous message: [m-dev.] for review: sorting variables better in the debugger
Next message: [m-dev.] why append slows down using lco
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

run.lco.novn:   286434 ms
run.lco.vn:     258400 ms
run.nolco.vn:   256117 ms

As you can see by the figures above, the manual addition of value
numbering brings the lco code back to approximately the same performance
as the original code.  (Turning off value numbering for nolco doesn't
help as the code generated is already very good, only one excess
assignment)

The lines of assembly that slow down run.lco.novn are

cycles dmiss imiss       cpi
  2992    26     7         ?   0x12000f5e4 ldt     $f10, -1(s2)
 27541   932    50         ?   0x12000f5e8 stt     $f10, -1(v0)
  2827    10     8         ?   0x12000f5ec ldt     $f11, 56(t2)
 24719  1084    53         ?   0x12000f5f0 stt     $f11, 0(s4)

which from what I can workout are simply just some of the register to
register copying (except the registers are really memory locations),
hence the need for value numbering.

Here is the performance figures for append with vn turned on.

run.nolco.vn:
cycles  %      dmiss  %            imiss %
74691   8.37%  3975   4.58%        229   2.23% append_impl_module

run.lco.vn:
cycles  %      dmiss  %            imiss %
62910   7.00%  1418   1.73%        255   2.41% append_impl_module

So the addition of lco actually saves ~11000 cycles, where we get 
bitten though is in garbage collection, in particular the call to GC_malloc.

run.nolco.vn:
cycles  %      dmiss  %            imiss %
152170  17.04% 15997  18.42%       2826  27.50% GC_malloc

run.lco.vn:
cycles  %      dmiss  %            imiss %
172076  19.13% 13742  16.76%       2870  27.09% GC_malloc

The other main cost centers are GC_mark_from_mark_stack (417755, 421265)
and GC_build_fl_clear2 (137467, 137198) (lco cycles, nolco cycles), but
they both have approximately the same performance, so it is the call to 
GC_malloc that is slowing the code down.

By adding a call to io__report_stats both versions of the code do the
same number of garbage collections, so it is actually the allocations
that are taking longer.

If you turn off garbage collection, you get the following numbers:

run.lco.vn.nogc:    10583 ms
run.nolco.vn.nogc:  13400 ms

Tyson suggested that the problem is due to the fact that when we create
the cell on the heap the tail cell contains garbage which the boehm gc
garbage collector may interpret as a pointer.
Setting the tail cell to zero gives us the required speedup,
if only just!

run.nolco.vn:       251950 ms
run.lco.vn.zero:    249833 ms

Pete.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------

Previous message: [m-dev.] for review: sorting variables better in the debugger
Next message: [m-dev.] why append slows down using lco
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the developers mailing list