[m-rev.] for review: fix possible integer overflow in binary search

Paul Bone paul at bone.id.au
Mon Dec 22 16:12:59 AEDT 2014


On Sun, Dec 21, 2014 at 09:11:48PM +1100, Zoltan Somogyi wrote:
> On Sun, 21 Dec 2014 10:34:54 +0800, Sebastian Godelet <sebastian.godelet+github at gmail.com> wrote:
> > But in the current Mercury implementation this is not the case,
> > as demonstrated in this gist:
> > https://gist.github.com/sebgod/ee1172fd736289aaeaa8
> 
> Thanks for catching that. The reason is that when computing
> (Lo + Hi) / 2 in Mercury, the intermediate value Lo + Hi is
> stored as MR_Integer, not MR_Unsigned, so the following
> division (or shift) treats it as signed, not unsigned. The C version
> does not have this problem.
> 
> > The implementations in the specific grades are taken from this
> > article:
> > http://googleresearch.blogspot.de/2006/06/extra-extra-read-all-about-it-nearly.html
> > But this diff uses the
> > > the standard fix for that is to compute it as Lo + (Hi - Lo) / 2.
> > together with the >> 1 optimisation.
> 
> Actually, in Mercury, that turns out not to be an optimization,
> because the Mercury version of >>, unlike the C version,
> checks for the right operand being out of range. Because of
> these tests, it is not a builtin, so just ">> 1' generates a
> procedure call.
> 
> I am in committing a modified version of your diff, which uses
> unchecked_right shift instead of >>, and which applies the fix
> to some more places that also compute Mid similarly.
> 
> I will see whether we can make the compiler automatically
> replace X >> Y with X unchecked_right_shift Y if Y is known
> to be a constant in the right range. (And similarly for <<,
> and maybe for some other ops, such as /.)

If the righthand argument to / is known then we could try to strength-reduce
/ before applying the unchecked optimisation.  The unchecked optimisation
should still be tried for /.

> > Maybe a mid_rounding_down/2 or similar could be included in the int
> > module.
> 
> Maybe, but any code that cares about performance would likely
> call that predicate in inner loops, and so would want to avoid
> the expense of a call. That means that it should be a builtin,
> which means that its addition needs to be in two stages:
> adding to compiler/builtin_ops.m, and only when that is
> installed, adding its declaration to library/int.m.
> 
> Any opinions about whether this should be done?

I'm not sure about this particular calculation.  It depends how often it's
used and the benefit of adding it to int.m.  If people use the
non-overflowing method, and the compiler optimises things well then I don't
see the benefit of putting it in int.m  Regarding people using the
non-overflowing code.  It may help if we had separate types for signed and
unsigned integers.  I know we've discussed this before and I know it's a
fair bit of work, but it would be helpful.

Cheers.


-- 
Paul Bone



More information about the reviews mailing list