[m-rev.] for review: speed up uint32 to string conversion
Julien Fischer
jfischer at opturion.com
Wed Oct 27 19:53:11 AEDT 2021
Hi Zoltan,
On Wed, 27 Oct 2021, Zoltan Somogyi wrote:
>> + MR_allocate_aligned_string_msg(S, num_digits, MR_ALLOC_ID);
>> + S[num_digits] = '\\0';
>> + int i = num_digits - 1;
>> + do {
>> + S[i] = \"0123456789\"[U % 10];
>> + i--;
>> + } while ( U /= 10 );
>> +").
>
> Please fix the line with "while" in three separate ways:
>
> Move the division by 10 out of the condition.
>
> Make the comparison with zero explicit.
>
> To conform to our usual style in C, avoid the space after '(' and before ')'.
Done.
>> +%---------------------------------------------------------------------------%
>> +
>> +% Same as alt1 except it uses Andrei Alexandrescu's digit counting method.
>> +% See: <https://www.facebook.com/notes/10158791579037200/>
>> +% This is biased in favour of small numbers, it doesn't really help for
>> +% this benchmark.
>> +%
>> +:- func alt2_uint32_to_string(uint32::in) = (string::uo) is det.
>> +:- pragma foreign_proc("C",
>> + alt2_uint32_to_string(U::in) = (S::uo),
>> + [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
>> +"
>> + uint32_t v = U;
>> + int num_digits = 1;
>> + for (;;) {
>> + if (v < 10) { break; }
>> + if (v < 100) { num_digits += 1; break; }
>> + if (v < 1000) { num_digits +=2; break; }
>> + if (v < 10000) { num_digits += 3; break; }
>> + v /= UINT32_C(10000);
>> + num_digits += 4;
>> + }
>> +
>> + if (U < 10) {
>> + num_digits = 1;
>> + } else if (U < 100) {
>> + num_digits = 2;
>> + } else if (U < 1000) {
>> + num_digits = 3;
>> + } else if (U < 10000) {
>> + num_digits = 4;
>> + } else if (U < 100000) {
>> + num_digits = 5;
>> + } else if (U < 1000000) {
>> + num_digits = 6;
>> + } else if (U < 10000000) {
>> + num_digits = 7;
>> + } else if (U < 100000000) {
>> + num_digits = 8;
>> + } else if (U < 1000000000) {
>> + num_digits = 9;
>> + } else {
>> + num_digits = 10;
>> + }
>
> Why is the second block of code there?
I accidently copied it there and had already spent too long staring at
it to realise I'd done something so silly :-(
> The first has already
> computed num_digits; why compute it again? This voids the results
> of your benchmarking wrt alt2. In fact, I wouldn't be surprised
> if the C compiler *deleted* the first block here, which means
> that instead of measuring alt1 vs alt2, you measured alt1 twice :-(
Fortunately, they're easily rerun. I've increased the number of repeats,
since the results with 20 sometimes vary a little. We should use alt1
instead of alt3, however alt2 is the fastest version in asm_fast.gc but
the slowest in hlc.gc.
n = 10000000; repeats = 100, grade = asm_fast.gc
seed: a = 11582366378285079126, b = 5288812623518536388, c = 2329514002761010134
Std: 122360ms
Alt1: 41610ms ratio: 2.94
Alt2: 40710ms ratio: 3.01
Alt3: 42860ms ratio: 2.85
n = 10000000; repeats = 100, grade = hlc.gc
seed: a = 16644754529940811230, b = 13971597908403416844, c = 4937230762656381269
Std: 113930ms
Alt1: 34490ms ratio: 3.30
Alt2: 36030ms ratio: 3.16
Alt3: 34730ms ratio: 3.28
(Mind you, the main result of my original benchmarking is still
perfectly valid: dont use sprintf()!)
>> +% XXX TODO: try table driven approaches.
>
> Alternatives 1 and 2 *are* table driven approaches,
> though the table has only ten entries :-( But I presume
> you mean approaches in which you index U into a table,
> with or without having reduced its range.
No, that's not what I meant. I was principally referring to the
technqiues for computing the number decimal digits, discussed in the
following blog post
<https://lemire.me/blog/2021/05/28/computing-the-number-of-digits-of-an-integer-quickly/>
plus ...
> I would also propose a3, which would be not a *sequence* of
> if-then-elses, but a *tree* of if-then-else, that performs a
> binary search on num_digits. There could be several versions
> of this: one whose top level test is for num_digits being
> above 5, one for below 4, one for below 3 etc. The latter
> should perform better on workloads dominated by small
> numbers.
>
> We can also extend a1 by indexing not into a ten-element
> table of chars, but into a 100-element table of char pairs,
> 1000-element table of char triples, etc. I don't think these
> are likely to have any advantage over existing alternatives,
> but if you want to be sure, you could try them.
... both of those suggestions are mentioned in the post from
Facebook engineering I linked to. I intend to try them out
That said, I think the immediate concern should be to replace calls to
sprintf() where possible.
Julien.
More information about the reviews
mailing list