[m-users.] uncaught Mercury exception using mdprof_create_feedback

Paul Bone paul at bone.id.au
Sun Oct 12 15:28:28 AEDT 2014

On Sat, Oct 11, 2014 at 10:18:40AM +0200, Matthias Guedemann wrote:
> Hi Paul,
> >     + The known best sequential performance, this is usually
> > asm_fast.gc.  + The parallel performance, asm_fast.gc.par.stseg with
> > multiple threads, + And the performance of enabling parallelism, but
> > not using it, asm_fast.gc.par.stseg with one thread.  This gives us an
> > idea of the costs/overheads of parallelism.
> ok, on i5 dual core with HT
>     asm_fast.gc                    is around 1:30m
>     asm_fast.par.gc.stseg with -P1 is around 1:55m (104% CPU)
>     asm.fast.par.gc.stseg with -P2 is around 1:17m (160% CPU)
>     asm.fast.par.gc.stseg with -P3 is around 1:00m (292% CPU)

Nice. so both the manual and auto-parallelisation works very well.  Memory
allocation/garbage collection  might be making the "-P1" version slower than
the asm_fast.gc version.  Because the garbage collector (and the whole
runtime) must be compiled with thread safety enabled.

> so, within the asm.fast.par.gc.steg grade, I get almost 2x speedup, but
> the asm_fast.gc grade is faster in general (so no surprises here). My
> guess is that on a quad core, I'd get almost 3x speedup, I'll see if I
> can verify this.
> You're right of course, like with most benchmarks, the interpretation of
> the results depend on what one tries to achieve. My principal interest
> is learning more about efficient, declarative programming. Using
> parallelism and different grades comes after profiling and algorithmic
> optimization. I like very much the idea of automatic introduction of
> parallelism as a kind of last optimization step.

Yes I agree, I have a 4-core machine, so the maximum benefit I can get
through parallelism is 4x, maybe a little more with HT.  But I can get 10x or
100x by 1) choosing an efficient algorithm 2) removing other sources of

Paul Bone

More information about the users mailing list