[m-dev.] LTO benchmarks
Peter Wang
novalazy at gmail.com
Wed Jan 29 15:02:11 AEDT 2020
Hi,
I tried comparing the impact of link-time optimisation of gcc and clang
on Mercury programs.
Peter
Host machine
------------
Intel i7-2600 3.40GHz (4 core, 8 thread)
Linux 5.3.18_1 (x86_64)
gcc 8.3.0
clang 9.0.1
Time to build and install compiler in one grade
-----------------------------------------------
The following are the times to run
make PARALLEL=-j8 && make PARALLEL=-j8 install
with
GRADE = hlc.gc
EXTRA_MCFLAGS = -O5 --intermod-opt
The times include configuring boehm_gc, generating and installing
documentation, etc.
gcc 10:47.941 (mins:secs)
gcc-lto 14:42.448
clang 12:14.119
clang-lto 16:12.045
clang-thinlto 14:22.325
Size of the resulting executables
---------------------------------
14023824 gcc/compiler/mercury_compile
10752312 gcc-lto/compiler/mercury_compile
14085128 clang/compiler/mercury_compile
10880208 clang-lto/compiler/mercury_compile
12509432 clang-thinlto/compiler/mercury_compile
Time to compile the ten largest modules
---------------------------------------
I timed this command in an `arena' using the `hyperfine' benchmarking
tool:
mercury_compile -s hlc.gc -O5 --intermod-opt -C libs.options
parse_tree.parse_pragma check_hlds.polymorphism
transform_hlds.table_gen ll_backend.layout_out
ll_backend.code_loc_dep backend_libs.compile_target_code
transform_hlds.intermod ll_backend.fact_table
transform_hlds.higher_order
Benchmark #1: ./gcc
Time (mean ± σ): 24.902 s ± 0.472 s [User: 24.612 s, System: 0.192 s]
Range (min … max): 24.100 s … 25.459 s 6 runs
Benchmark #2: ./gcc-lto
Time (mean ± σ): 20.739 s ± 0.103 s [User: 20.474 s, System: 0.185 s]
Range (min … max): 20.627 s … 20.897 s 6 runs
Benchmark #3: ./clang
Time (mean ± σ): 25.139 s ± 0.203 s [User: 24.837 s, System: 0.205 s]
Range (min … max): 24.970 s … 25.538 s 6 runs
Benchmark #4: ./clang-lto
Time (mean ± σ): 24.327 s ± 0.323 s [User: 24.036 s, System: 0.195 s]
Range (min … max): 23.880 s … 24.614 s 6 runs
Benchmark #5: ./clang-thinlto
Time (mean ± σ): 24.096 s ± 0.150 s [User: 23.806 s, System: 0.196 s]
Range (min … max): 23.915 s … 24.361 s 6 runs
Summary
'./gcc-lto' ran
1.16 ± 0.01 times faster than './clang-thinlto'
1.17 ± 0.02 times faster than './clang-lto'
1.20 ± 0.02 times faster than './gcc'
1.21 ± 0.01 times faster than './clang'
samples/diff benchmark
----------------------
I built the samples/diff program using each of the compilers produced
above:
mmc --mercury-linkage static -s hlc.gc -m diff [-O5 --intermod-opt]
Then timed this command:
./diff.$version input-1 input-2 >/dev/null
input-1 is a 10000 line text file, input-2 is a permuted version of input-1.
Benchmark #1: ./gcc
Time (mean ± σ): 2.247 s ± 0.003 s [User: 2.240 s, System: 0.004 s]
Range (min … max): 2.242 s … 2.252 s 10 runs
Benchmark #2: ./gcc.O5im
Time (mean ± σ): 2.162 s ± 0.007 s [User: 2.155 s, System: 0.004 s]
Range (min … max): 2.157 s … 2.181 s 10 runs
Benchmark #3: ./gcc-lto
Time (mean ± σ): 2.327 s ± 0.006 s [User: 2.321 s, System: 0.003 s]
Range (min … max): 2.321 s … 2.342 s 10 runs
Benchmark #4: ./gcc-lto.O5im
Time (mean ± σ): 2.299 s ± 0.003 s [User: 2.293 s, System: 0.003 s]
Range (min … max): 2.296 s … 2.306 s 10 runs
Benchmark #5: ./clang
Time (mean ± σ): 1.781 s ± 0.002 s [User: 1.774 s, System: 0.004 s]
Range (min … max): 1.778 s … 1.784 s 10 runs
Benchmark #6: ./clang.O5im
Time (mean ± σ): 1.804 s ± 0.002 s [User: 1.796 s, System: 0.005 s]
Range (min … max): 1.800 s … 1.807 s 10 runs
Benchmark #7: ./clang-lto
Time (mean ± σ): 1.018 s ± 0.003 s [User: 1.013 s, System: 0.004 s]
Range (min … max): 1.012 s … 1.021 s 10 runs
Benchmark #8: ./clang-lto.O5im
Time (mean ± σ): 1.171 s ± 0.002 s [User: 1.167 s, System: 0.003 s]
Range (min … max): 1.167 s … 1.175 s 10 runs
Benchmark #9: ./clang-thinlto
Time (mean ± σ): 1.457 s ± 0.005 s [User: 1.452 s, System: 0.003 s]
Range (min … max): 1.448 s … 1.466 s 10 runs
Benchmark #10: ./clang-thinlto.O5im
Time (mean ± σ): 1.455 s ± 0.012 s [User: 1.450 s, System: 0.003 s]
Range (min … max): 1.447 s … 1.487 s 10 runs
Summary
'./clang-lto' ran
1.15 ± 0.00 times faster than './clang-lto.O5im'
1.43 ± 0.01 times faster than './clang-thinlto.O5im'
1.43 ± 0.01 times faster than './clang-thinlto'
1.75 ± 0.01 times faster than './clang'
1.77 ± 0.01 times faster than './clang.O5im'
2.12 ± 0.01 times faster than './gcc.O5im'
2.21 ± 0.01 times faster than './gcc'
2.26 ± 0.01 times faster than './gcc-lto.O5im'
2.29 ± 0.01 times faster than './gcc-lto'
Here are the executable sizes:
3020016 diff/diff.gcc
3028208 diff/diff.gcc.O5im
629272 diff/diff.gcc-lto
637464 diff/diff.gcc-lto.O5im
3617928 diff/diff.clang
3626120 diff/diff.clang.O5im
286112 diff/diff.clang-lto
298400 diff/diff.clang-lto.O5im
694760 diff/diff.clang-thinlto
698856 diff/diff.clang-thinlto.O5im
More information about the developers
mailing list