[m-dev.] tail call to loop optimisation for low-level grades
Zoltan Somogyi
zs at csse.unimelb.edu.au
Tue Jul 1 17:20:57 AEST 2008
On 27-Jun-2008, Peter Wang <novalazy at gmail.com> wrote:
> I was looking at why the asm_fast.gc implementation of string.hash is
> about 85% times slower than the C version. The loop looks like this,
> after some cleanup:
>
> MR_def_static(shash__unchecked_hash_2_5_0)
> if (MR_r2 >= MR_r3) {
> MR_GOTO_LAB(shash__unchecked_hash_2_5_0_i2);
> }
> {
> MR_String Str = (MR_String) MR_r1;
> MR_Word MR_tempr1 = Str[MR_r2];
>
> MR_r2 = MR_r2 + 1;
> MR_r4 = (MR_r4 ^ (MR_r4 << 5)) ^ MR_tempr1;
> MR_np_localtailcall(shash__unchecked_hash_2_5_0);
> }
>
> Turning the tail call into a loop (like the MLDS optimisation) allows us
> to use a local variable in place of r4:
>
> MR_def_static(shash__unchecked_hash_2_5_0)
> MR_Integer Hash = MR_r4;
> while (1) {
> if (MR_r2 >= MR_r3) {
> MR_r4 = Hash;
> MR_GOTO_LAB(shash__unchecked_hash_2_5_0_i2);
> }
> {
> MR_String Str = (MR_String) MR_r1;
> MR_Word MR_tempr1 = Str[MR_r2];
>
> MR_r2 = MR_r2 + 1;
> Hash = (Hash ^ (Hash << 5)) ^ MR_tempr1;
> }
> }
We already do something like this, during the llds to C stage (llds_out.m).
> This version is about 40% faster, and 15% slower than the C version.
> I think a lot of tight loops could benefit from the same treatment.
I will look into this.
> Turning the tail call into a loop seems like it would be easy, but what
> about the local variable?
It should be manageable.
Zoltan.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions: mercury-developers-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the developers
mailing list