[m-users.] Question about any difference in efficiency on this code...

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat Aug 26 20:45:27 AEST 2023


On 2023-08-26 20:12 +10:00 AEST, "Sean Charles (emacstheviking)" <objitsu at gmail.com> wrote:
> I started out with this:
> 
>     get_random_value(0, 2, V, !IO),
>     ( if V = 0 then
>         Speed = 0.25, Color = color(gray)
>     else if V = 1 then
>         Speed = 0.75, Color = color(skyblue)
>     else
>         Speed = 1.25, Color = color(beige)
>     ),
>     Star = star(X, StarY, Speed, to_rgba(Color)).
> 
> end then, for some half0baked reason regarding not creating Speed and Color but instead directly returning Star...
> 
>     get_random_value(0, 2, V, !IO),
>     ( if V = 0 then
>         Star = star(X, StarY, 0.25, to_rgba(color(gray)))
>     else if V = 1 then
>         Star = star(X, StarY, 0.75, to_rgba(color(skyblue)))
>     else
>         Star = star(X, StarY, 1.25, to_rgba(color(beige)))
>     ).
> 
> 
> So, is there any real difference, or did I do something good / bad / indifferent at best?

There may be a difference in the performance of those two pieces of code,
but any effect will be quite small; I would be surprised if it were more than
half a percent. On my laptop, I cannot reliably measure differences that small,
because the hardware's mechanisms for raising and lowering the CPU frequency
(to keep the CPU's heat dissipation within the required limits) have a bigger
and effectively random effect.

That difference is not worth worrying about in a user program unless profiling
indicates the predicate to be a bottleneck. It can be worth worrying about
in a compiler, because once the transformation from the usually-slightly-slower form
to the usually-slightly-faster one is implemented (and it isn't hard), there is
no point in not invoking it. In fact, the Mercury compiler does have such a transformation,
which would be invoked for the top code if both start and to_rgba are function symbols,
as opposed to executable functions. (The comment at the top of compiler/follow_code.m
explains its rationale.)

Note that I say *usually* faster or slower. This is because this transformation changes
the size of the code, in that the second form above replaces one copy of the code
that computes Star with three copies. By changing which parts of the program
collide in the instruction cache with which other parts, this can change the effectiveness
of the instruction cache. The direction and size of this effect cannot be predicted
by any reasonable algorithm in the Mercury compiler, because (a) Mercury generates
not machine code but e.g. C code, so only the target language compiler (such as gcc)
knows the sizes of the instructions it selects, and (b) even for a single target ABI, the
CPUs implementing that ABI may, and almost always will, differ in the size and other
characteristics of the cache. This usually matters only if the cache is direct mapped
(which is rare these days) *and* the predicate is part of a performance bottleneck.

All of which means that there is no way to be *sure* which of the above versions
is faster on a given machine, other than executing and timing both versions.
I wouldn't worry about; write whichever version you like.

Zoltan.


More information about the users mailing list