[mercury-users] Notes for a talk.

Richard A. O'Keefe ok at cs.rmit.edu.au
Fri Oct 31 18:00:27 AEDT 1997


	The SRG is mostly concerned with machine architecture,
	operating systems and networking.  Their language of choice is plain
	old C because there's a very direct and obvious mapping between it and
	the machine instructions the compiler generates.

I don't think they can have looked at the output of a modern compiler.
I've got a student at the moment who is trying to make neural net
calculations go fast.  He's learning about things like loop unrolling,
blocking, strip-mining, and the 'restrict' keyword from C9x and why it
matters.  A single line of C code may correspond to 16 different
compiler-generated replices (under at least two different circumstances).
He has a reading knowledge of SPARC assembly code, and has the SPARC V9
ISA manual and the UltraSPARC-specific manual, but it typically takes us
an hour to go through the compiler-generated code for a smallish function
and figure out how the compiler has ripped it apart and put it back
together in weird and wonderful ways.  The mapping is neither obvious
nor direct.

This, by the way, is why the Mercury implementors are *right* to avoid
writing a native back end of their own.  It's only a pity that gcc isn't
better at generating SPARC code than it is; my student has to use Sun's
compiler to get the best performance (automatic inlining, automatic
unrolling, it knows the cache size of the target machine and can tune
its blocking for that actual cache sizes, it has profile feedback, &c),
and not only that, to use Sun's specially tuned 'performance library'
we have to link with 'cc' not 'gcc'.

I was hoping that this student would have the time to look at using
some of the SPARC V9 multimedia instructions, which as far as I know
aren't yet generated for portable C code, and have to use the ".il"
facility in cc.

On the one hand, modern machines are giving us what look like huge
memories (512Mb on this machine), but the CPU has to spend 100+
instruction-equivalents waiting for it, so we really need to program
as if we had much less memory if we want top performance.  Someone
recently posted in comp.arch that you can translate old algorithm
books (Knuth V3 for example) for new machines using this dictionary:
	memory => L1 cache
	disc   => L2 cache
	tape   => physical memory.

The rule of thumb for Mercury performance will probably be the same
as it was for Prolog:
    - keep your data structures small
    - keep frequently used data and infrequently used data apart
    - use backtracking to keep your heap small




More information about the users mailing list