[m-rev.] for review: arrays and the debugger

Fergus Henderson fjh at cs.mu.OZ.AU
Mon Jun 18 21:39:34 AEST 2001


On 18-Jun-2001, Zoltan Somogyi <zs at cs.mu.OZ.AU> wrote:
> On 18-Jun-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> > The new facility proposed looks ugly to me, because it violates the
> > principle of keeping the interface simple and providing just the
> > basic building blocks, together with powerful ways of combining them,
> > rather than trying to provide every individual combination of features.
> > 
> > The proposed justification for the new facility is efficiency.
> > 
> > What influences my thinking here is the desire to avoid the evils of
> > premature optimization.  Complicating the interface and implementation
> > with an extra procedure in order to speed up pretty-printing by 2% or
> > even 10% would not be a good trade-off, IMHO.  And so far I haven't seen
> > any convincing evidence that the speed up would be that great.
> 
> The reason why I *started* working on this is because of my experience with
> debugging the deep profiler, which has arrays with several hundred thousand
> elements. Doing a "print *" would take ages and fill up your xterm's entire
> record many times over. After I modified deconstruct to return array elements
> instead of ignoring them, the problem because that the test whether the array
> was too big to print took a minute or so, because the allocation of a typeinfo
> for each element caused by laptop to swap. This is not a 2% or 10% issue; this
> is a feasible vs non-feasible kind of efficiency issue.

Yes, and certainly that issue should be fixed.
The question is how. 

Two solutions have been proposed.  One is to use limited_deconstruct/5,
the other is to use functor/3 and deconstruct/4.
The difference in efficiency between the two is small.

In the worst possible case, the difference is about a factor of two.
But I just tried a benchmarking a routine which did nothing but call
deconstruct/4 and </2 against one that called functor/3, deconstruct/4,
and </2, and the version which called functor/3 was only 14% slower.
(See the attached program `test_decons.m'.)
Real applications will probably be spending much of their time doing
other things, e.g. I/O, so the speedup will be more like 2% or 10%.

> > Both solutions are equally expressive.  I'm arguing for favouring
> > simplicity over unmeasured and probably unimportant performance gains.
> 
> And I am saying that the gain is not unimportant.

Well, I remain unconvinced.

> > In the long run, keepings things simple will often improve performance,
> > because it makes them easier to optimize.
> 
> Given that the implementations of deconstruct and limited_deconstruct differ
> only in the limit checks, I don't see how the existence of limited_deconstruct
> affects the set of optimizations applicable to deconstruct, or the other RTTI
> routines.

Suppose we decide to optimize deconstruct by providing an alternative
version that returns an array rather than a list.  Then we'll have four
different versions rather than two.  Next, remember that we have to
implement these in several different languages, for the different
back-ends (C, Java, and C# or Managed C++).  Pretty soon all these
different versions start to add up.

It's not a matter of preventing optimizations, it's just a matter of
making them more difficult by sheerly by virtue of the number of lines of
source code that need to be understood and/or modified in order to
optimize things.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
-------------- next part --------------
:- module decons_test.
:- interface.
:- import_module io.

:- pred main(state::di, state::uo) is cc_multi.

:- implementation.
:- import_module benchmarking, int, float, string, std_util, list.

:- type foo --->
	f(int) ; g(foo) ; h(float) ; i(int) ; j(string) ; k.

main -->
	{ Repeats = 5000000 },
	{ In = f(42) },
	{ benchmark_det(do_nothing, In, Out0, Repeats, Time0) },
	{ benchmark_det(decons_test_1, In, Out1, Repeats, Time1) },
	{ benchmark_det(decons_test_2, In, Out2, Repeats, Time2) },
	print("In = "), print(In), nl,
	print("Out0 = "), print(Out0), nl,
	print("Out1 = "), print(Out1), nl,
	print("Out2 = "), print(Out2), nl,
	print("Repeats = "), print(Repeats), nl,
	print("Time0 = "), print(Time0), nl,
	print("Time1 = "), print(Time1), nl,
	print("Time2 = "), print(Time2), nl,
	format("adjusted Time2/Time1 = %.2f%%", [
		f(100.0 * float(Time2 - Time0) / float(Time1 - Time0))]), nl,
	format("adjusted Time1/Time2 = %.2f%%", [
		f(100.0 * float(Time1 - Time0) / float(Time2 - Time0))]), nl.

:- pred do_nothing(T::in, int::out) is det.
do_nothing(_, 0).

:- pred decons_test_1(T::in, list(univ)::out) is det.

decons_test_1(X, L) :-
	( functor(X, _F, N), N < 1000,
	  deconstruct(X, _F2, N, L0)
	->
		L = L0
	;	
		L = []
	).

:- pred decons_test_2(T::in, list(univ)::out) is det.

decons_test_2(X, L) :-
	(
		deconstruct(X, _F, N, L0),
		N < 1000
	->
		L = L0
	;	
		L = []
	).



More information about the reviews mailing list