[mercury-users] Notes for a talk.

Fergus Henderson fjh at cs.mu.oz.au
Fri Oct 31 01:31:25 AEDT 1997


Ralph Becket wrote:
> I've had a look at the slides for the JICSLP talk Fergus and Zoltan
> gave in 1996 and they compare gcc C against Mercury on the Uni. of
> Montreal Pseudoknot benchmark: Mercury was a factor of 5.5 slower with
> GC and 3.7 without.
> 
> I can sell Mercury's other virtues to these guys, but as they work so
> close to the metal they're obsessed with efficiency.  What I'd like to
> know is how far the gap has narrowed in the last year and what further
> improvements are anticipated.

The gap has narrowed considerably.

As of June, on a Dec Alpha, the fastest (no gc) Mercury version was
only about 35% (i.e. a factor of 1.35) slower than the gcc C version.
The GC version is about 3.95 times slower than the gcc C version.

One reason for the improvement was that the original Mercury version
of Pseudoknot had a performance bug in it (the bug was present in the
Erlang version that was used as the basis of the Mercury version).
It was performing the inner loop many more times than the C version.
So the original results you listed above were in fact just plain wrong.

Another reason was that the Erlang version used lists to do search;
changing the Mercury version to use nondeterminism and backtracking
rather than lists improved performance for the non-GC version quite
a bit.  Using lists, it was 2.1 times slower than C, but with
backtracking it is only 1.35 times slower.  I think the main reason for
this is better cache performance:  with cheap memory reclamation on
backtracking, the benchmark fits in less than 1M, but the fully
deterministic version using lists allocates around 15M.
Poor cache performance is probably also a very significant factor
in the poor performance of the version using conservative GC.

> How much better would things get if
> there was a native back-end for the compiler?

Of the remaining 35% overhead, which is only 0.12 seconds
(Mercury takes 0.47 seconds while C takes 0.35 seconds),
I think that most of it (0.09 seconds or 25%) is start-up costs in
the Mercury runtime.  I haven't analyzed the causes for this,
but this is likely to be insignificant for real applications.
0.09 seconds is on a fairly slow machine, on our fastest machine
it is only 0.04 seconds and on a state-of-the-art machine it would
be only about 0.02 seconds.  It doesn't really seem to be worth
optimizing.

The remaining 10% overhead seems to be mostly suboptimal use of
registers and a few unnecessary loads and stores.  A native back-end
could probably eliminate that.

> What about with a
> dedicated garbage collector and full support for destructive
> assignment?

A dedicated garbage collector would give you close to the performance
of the no-GC version.

Destructive assignment is probably not significant for this benchmark.

> Also, there's a lot of interest there about parallelising compilers
> and I believe people are working on one for Mercury - can anybody give
> me some information on this?

Tom Conway is currently working on a multithreaded version,
with support for "independent and-parallelism" using an
explicit parallel conjunction operator (`&' instead of `,').
He's been working on it for a while now, but he's made quite
a bit of progress recently.  There's a good chance that this
will be in the next major release (0.8).

> Finally, I recently saw a video by one of the guys who designed the
> Alpha processor.  He showed that the cache assumption (that memory
> reference i is a good indicator of memory reference i+1) was pretty
> much shot to bits when running a C compiler and an SQL database server
> and even for most of a standard benchmark suite.  His suggestion was
> that compilers should attempt to spot rambling pointer based
> structures and reduce them to more cache friendly arrays.  It seems to
> me that Mercury would be the ideal language for this kind of
> experimentation given that it's declarative and the compiler has
> access to volumes of information about the program.  Anybody have any
> thoughts on the matter that I could bring up?

I think this sort of optimization is easier said than done, even in
Mercury ;-)

-- 
Fergus Henderson <fjh at cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3         |     -- the last words of T. S. Garp.



More information about the users mailing list