[m-dev.] tail call to loop optimisation for low-level grades

Peter Wang novalazy at gmail.com
Fri Jun 27 14:45:35 AEST 2008


Hi,

I was looking at why the asm_fast.gc implementation of string.hash is
about 85% times slower than the C version.  The loop looks like this,
after some cleanup:

    MR_def_static(shash__unchecked_hash_2_5_0)
	    if (MR_r2 >= MR_r3) {
		MR_GOTO_LAB(shash__unchecked_hash_2_5_0_i2);
	    }
	    {
		MR_String Str = (MR_String) MR_r1;
		MR_Word MR_tempr1 = Str[MR_r2];

		MR_r2 = MR_r2 + 1;
		MR_r4 = (MR_r4 ^ (MR_r4 << 5)) ^ MR_tempr1;
		MR_np_localtailcall(shash__unchecked_hash_2_5_0);
	    }

Turning the tail call into a loop (like the MLDS optimisation) allows us
to use a local variable in place of r4:

    MR_def_static(shash__unchecked_hash_2_5_0)
	MR_Integer Hash = MR_r4;
	while (1) {
	    if (MR_r2 >= MR_r3) {
		MR_r4 = Hash;
		MR_GOTO_LAB(shash__unchecked_hash_2_5_0_i2);
	    }
	    {
		MR_String Str = (MR_String) MR_r1;
		MR_Word MR_tempr1 = Str[MR_r2];

		MR_r2 = MR_r2 + 1;
		Hash = (Hash ^ (Hash << 5)) ^ MR_tempr1;
	    }
	}

This version is about 40% faster, and 15% slower than the C version.
I think a lot of tight loops could benefit from the same treatment.

Turning the tail call into a loop seems like it would be easy, but what
about the local variable?

Peter

--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions:          mercury-developers-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the developers mailing list