[m-rev.] for review: arrays and the debugger
Fergus Henderson
fjh at cs.mu.OZ.AU
Mon Jun 18 21:39:34 AEST 2001
On 18-Jun-2001, Zoltan Somogyi <zs at cs.mu.OZ.AU> wrote:
> On 18-Jun-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> > The new facility proposed looks ugly to me, because it violates the
> > principle of keeping the interface simple and providing just the
> > basic building blocks, together with powerful ways of combining them,
> > rather than trying to provide every individual combination of features.
> >
> > The proposed justification for the new facility is efficiency.
> >
> > What influences my thinking here is the desire to avoid the evils of
> > premature optimization. Complicating the interface and implementation
> > with an extra procedure in order to speed up pretty-printing by 2% or
> > even 10% would not be a good trade-off, IMHO. And so far I haven't seen
> > any convincing evidence that the speed up would be that great.
>
> The reason why I *started* working on this is because of my experience with
> debugging the deep profiler, which has arrays with several hundred thousand
> elements. Doing a "print *" would take ages and fill up your xterm's entire
> record many times over. After I modified deconstruct to return array elements
> instead of ignoring them, the problem because that the test whether the array
> was too big to print took a minute or so, because the allocation of a typeinfo
> for each element caused by laptop to swap. This is not a 2% or 10% issue; this
> is a feasible vs non-feasible kind of efficiency issue.
Yes, and certainly that issue should be fixed.
The question is how.
Two solutions have been proposed. One is to use limited_deconstruct/5,
the other is to use functor/3 and deconstruct/4.
The difference in efficiency between the two is small.
In the worst possible case, the difference is about a factor of two.
But I just tried a benchmarking a routine which did nothing but call
deconstruct/4 and </2 against one that called functor/3, deconstruct/4,
and </2, and the version which called functor/3 was only 14% slower.
(See the attached program `test_decons.m'.)
Real applications will probably be spending much of their time doing
other things, e.g. I/O, so the speedup will be more like 2% or 10%.
> > Both solutions are equally expressive. I'm arguing for favouring
> > simplicity over unmeasured and probably unimportant performance gains.
>
> And I am saying that the gain is not unimportant.
Well, I remain unconvinced.
> > In the long run, keepings things simple will often improve performance,
> > because it makes them easier to optimize.
>
> Given that the implementations of deconstruct and limited_deconstruct differ
> only in the limit checks, I don't see how the existence of limited_deconstruct
> affects the set of optimizations applicable to deconstruct, or the other RTTI
> routines.
Suppose we decide to optimize deconstruct by providing an alternative
version that returns an array rather than a list. Then we'll have four
different versions rather than two. Next, remember that we have to
implement these in several different languages, for the different
back-ends (C, Java, and C# or Managed C++). Pretty soon all these
different versions start to add up.
It's not a matter of preventing optimizations, it's just a matter of
making them more difficult by sheerly by virtue of the number of lines of
source code that need to be understood and/or modified in order to
optimize things.
--
Fergus Henderson <fjh at cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
-------------- next part --------------
:- module decons_test.
:- interface.
:- import_module io.
:- pred main(state::di, state::uo) is cc_multi.
:- implementation.
:- import_module benchmarking, int, float, string, std_util, list.
:- type foo --->
f(int) ; g(foo) ; h(float) ; i(int) ; j(string) ; k.
main -->
{ Repeats = 5000000 },
{ In = f(42) },
{ benchmark_det(do_nothing, In, Out0, Repeats, Time0) },
{ benchmark_det(decons_test_1, In, Out1, Repeats, Time1) },
{ benchmark_det(decons_test_2, In, Out2, Repeats, Time2) },
print("In = "), print(In), nl,
print("Out0 = "), print(Out0), nl,
print("Out1 = "), print(Out1), nl,
print("Out2 = "), print(Out2), nl,
print("Repeats = "), print(Repeats), nl,
print("Time0 = "), print(Time0), nl,
print("Time1 = "), print(Time1), nl,
print("Time2 = "), print(Time2), nl,
format("adjusted Time2/Time1 = %.2f%%", [
f(100.0 * float(Time2 - Time0) / float(Time1 - Time0))]), nl,
format("adjusted Time1/Time2 = %.2f%%", [
f(100.0 * float(Time1 - Time0) / float(Time2 - Time0))]), nl.
:- pred do_nothing(T::in, int::out) is det.
do_nothing(_, 0).
:- pred decons_test_1(T::in, list(univ)::out) is det.
decons_test_1(X, L) :-
( functor(X, _F, N), N < 1000,
deconstruct(X, _F2, N, L0)
->
L = L0
;
L = []
).
:- pred decons_test_2(T::in, list(univ)::out) is det.
decons_test_2(X, L) :-
(
deconstruct(X, _F, N, L0),
N < 1000
->
L = L0
;
L = []
).
More information about the reviews
mailing list