[m-rev.] for review: speed up uint32 to string conversion

Julien Fischer jfischer at opturion.com
Wed Oct 27 19:53:11 AEDT 2021


Hi Zoltan,

On Wed, 27 Oct 2021, Zoltan Somogyi wrote:

>> +    MR_allocate_aligned_string_msg(S, num_digits, MR_ALLOC_ID);
>> +    S[num_digits] = '\\0';
>> +    int i = num_digits - 1;
>> +    do {
>> +        S[i] = \"0123456789\"[U % 10];
>> +        i--;
>> +    } while ( U /= 10 );
>> +").
>
> Please fix the line with "while" in three separate ways:
>
> Move the division by 10 out of the condition.
>
> Make the comparison with zero explicit.
>
> To conform to our usual style in C, avoid the space after '(' and before ')'.

Done.

>> +%---------------------------------------------------------------------------%
>> +
>> +% Same as alt1 except it uses Andrei Alexandrescu's digit counting method.
>> +% See: <https://www.facebook.com/notes/10158791579037200/>
>> +% This is biased in favour of small numbers, it doesn't really help for
>> +% this benchmark.
>> +%
>> +:- func alt2_uint32_to_string(uint32::in) = (string::uo) is det.
>> +:- pragma foreign_proc("C",
>> +    alt2_uint32_to_string(U::in) = (S::uo),
>> +    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
>> +"
>> +    uint32_t v = U;
>> +    int num_digits = 1;
>> +    for (;;) {
>> +        if (v < 10) { break; }
>> +        if (v < 100) { num_digits += 1; break; }
>> +        if (v < 1000) { num_digits +=2; break; }
>> +        if (v < 10000) { num_digits += 3; break; }
>> +        v /= UINT32_C(10000);
>> +        num_digits += 4;
>> +    }
>> +
>> +    if (U < 10) {
>> +        num_digits = 1;
>> +    } else if (U < 100) {
>> +        num_digits = 2;
>> +    } else if (U < 1000) {
>> +        num_digits = 3;
>> +    } else if (U < 10000) {
>> +        num_digits = 4;
>> +    } else if (U < 100000) {
>> +        num_digits = 5;
>> +    } else if (U < 1000000) {
>> +        num_digits = 6;
>> +    } else if (U < 10000000) {
>> +        num_digits = 7;
>> +    } else if (U < 100000000) {
>> +        num_digits = 8;
>> +    } else if (U < 1000000000) {
>> +        num_digits = 9;
>> +    } else {
>> +        num_digits = 10;
>> +    }
>
> Why is the second block of code there?

I accidently copied it there and had already spent too long staring at
it to realise I'd done something so silly  :-(

> The first has already
> computed num_digits; why compute it again? This voids the results
> of your benchmarking wrt alt2. In fact, I wouldn't be surprised
> if the C compiler *deleted* the first block here, which means
> that instead of measuring alt1 vs alt2, you measured alt1 twice :-(

Fortunately, they're easily rerun.  I've increased the number of repeats,
since the results with 20 sometimes vary a little.  We should use alt1
instead of alt3, however alt2 is the fastest version in asm_fast.gc but
the slowest in hlc.gc.

     n = 10000000; repeats = 100, grade = asm_fast.gc
     seed: a = 11582366378285079126, b = 5288812623518536388, c = 2329514002761010134
      Std: 122360ms
     Alt1: 41610ms ratio: 2.94
     Alt2: 40710ms ratio: 3.01
     Alt3: 42860ms ratio: 2.85

     n = 10000000; repeats = 100, grade = hlc.gc
     seed: a = 16644754529940811230, b = 13971597908403416844, c = 4937230762656381269
      Std: 113930ms
     Alt1: 34490ms ratio: 3.30
     Alt2: 36030ms ratio: 3.16
     Alt3: 34730ms ratio: 3.28

(Mind you, the main result of my original benchmarking is still
perfectly valid: dont use sprintf()!)

>> +% XXX TODO: try table driven approaches.
>
> Alternatives 1 and 2 *are* table driven approaches,
> though the table has only ten entries :-( But I presume
> you mean approaches in which you index U into a table,
> with or without having reduced its range.

No, that's not what I meant. I was principally referring to the
technqiues for computing the number decimal digits, discussed in the
following blog post

<https://lemire.me/blog/2021/05/28/computing-the-number-of-digits-of-an-integer-quickly/>

plus ...

> I would also propose a3, which would be not a *sequence* of
> if-then-elses, but a *tree* of if-then-else, that performs a
> binary search on num_digits. There could be several versions
> of this: one whose top level test is for num_digits being
> above 5, one for below 4, one for below 3 etc. The latter
> should perform better on workloads dominated by small
> numbers.
>
> We can also extend a1 by indexing not into a ten-element
> table of chars, but into a 100-element table of char pairs,
> 1000-element table of char triples, etc. I don't think these
> are likely to have any advantage over existing alternatives,
> but if you want to be sure, you could try them.

... both of those suggestions are mentioned in the post from
Facebook engineering I linked to.  I intend to try them out

That said, I think the immediate concern should be to replace calls to
sprintf() where possible.

Julien.


More information about the reviews mailing list