[m-dev.] LTO benchmarks

Peter Wang novalazy at gmail.com
Wed Jan 29 15:02:11 AEDT 2020


Hi,

I tried comparing the impact of link-time optimisation of gcc and clang
on Mercury programs.

Peter


Host machine
------------

    Intel i7-2600 3.40GHz (4 core, 8 thread)
    Linux 5.3.18_1 (x86_64)
    gcc 8.3.0
    clang 9.0.1


Time to build and install compiler in one grade
-----------------------------------------------
The following are the times to run

    make PARALLEL=-j8 && make PARALLEL=-j8 install

with

    GRADE = hlc.gc
    EXTRA_MCFLAGS = -O5 --intermod-opt

The times include configuring boehm_gc, generating and installing
documentation, etc.

    gcc		    10:47.941 (mins:secs)
    gcc-lto	    14:42.448
    clang	    12:14.119
    clang-lto	    16:12.045
    clang-thinlto   14:22.325


Size of the resulting executables
---------------------------------

    14023824 gcc/compiler/mercury_compile
    10752312 gcc-lto/compiler/mercury_compile
    14085128 clang/compiler/mercury_compile
    10880208 clang-lto/compiler/mercury_compile
    12509432 clang-thinlto/compiler/mercury_compile


Time to compile the ten largest modules
---------------------------------------
I timed this command in an `arena' using the `hyperfine' benchmarking
tool:

    mercury_compile -s hlc.gc -O5 --intermod-opt -C libs.options
    parse_tree.parse_pragma check_hlds.polymorphism
    transform_hlds.table_gen ll_backend.layout_out
    ll_backend.code_loc_dep backend_libs.compile_target_code
    transform_hlds.intermod ll_backend.fact_table
    transform_hlds.higher_order

Benchmark #1: ./gcc
  Time (mean ± σ):     24.902 s ±  0.472 s    [User: 24.612 s, System: 0.192 s]
  Range (min … max):   24.100 s … 25.459 s    6 runs

Benchmark #2: ./gcc-lto
  Time (mean ± σ):     20.739 s ±  0.103 s    [User: 20.474 s, System: 0.185 s]
  Range (min … max):   20.627 s … 20.897 s    6 runs

Benchmark #3: ./clang
  Time (mean ± σ):     25.139 s ±  0.203 s    [User: 24.837 s, System: 0.205 s]
  Range (min … max):   24.970 s … 25.538 s    6 runs

Benchmark #4: ./clang-lto
  Time (mean ± σ):     24.327 s ±  0.323 s    [User: 24.036 s, System: 0.195 s]
  Range (min … max):   23.880 s … 24.614 s    6 runs

Benchmark #5: ./clang-thinlto
  Time (mean ± σ):     24.096 s ±  0.150 s    [User: 23.806 s, System: 0.196 s]
  Range (min … max):   23.915 s … 24.361 s    6 runs

Summary
  './gcc-lto' ran
    1.16 ± 0.01 times faster than './clang-thinlto'
    1.17 ± 0.02 times faster than './clang-lto'
    1.20 ± 0.02 times faster than './gcc'
    1.21 ± 0.01 times faster than './clang'


samples/diff benchmark
----------------------
I built the samples/diff program using each of the compilers produced
above:

    mmc --mercury-linkage static -s hlc.gc -m diff [-O5 --intermod-opt]

Then timed this command:

    ./diff.$version input-1 input-2 >/dev/null

input-1 is a 10000 line text file, input-2 is a permuted version of input-1.


Benchmark #1: ./gcc
  Time (mean ± σ):      2.247 s ±  0.003 s    [User: 2.240 s, System: 0.004 s]
  Range (min … max):    2.242 s …  2.252 s    10 runs

Benchmark #2: ./gcc.O5im
  Time (mean ± σ):      2.162 s ±  0.007 s    [User: 2.155 s, System: 0.004 s]
  Range (min … max):    2.157 s …  2.181 s    10 runs

Benchmark #3: ./gcc-lto
  Time (mean ± σ):      2.327 s ±  0.006 s    [User: 2.321 s, System: 0.003 s]
  Range (min … max):    2.321 s …  2.342 s    10 runs

Benchmark #4: ./gcc-lto.O5im
  Time (mean ± σ):      2.299 s ±  0.003 s    [User: 2.293 s, System: 0.003 s]
  Range (min … max):    2.296 s …  2.306 s    10 runs

Benchmark #5: ./clang
  Time (mean ± σ):      1.781 s ±  0.002 s    [User: 1.774 s, System: 0.004 s]
  Range (min … max):    1.778 s …  1.784 s    10 runs

Benchmark #6: ./clang.O5im
  Time (mean ± σ):      1.804 s ±  0.002 s    [User: 1.796 s, System: 0.005 s]
  Range (min … max):    1.800 s …  1.807 s    10 runs

Benchmark #7: ./clang-lto
  Time (mean ± σ):      1.018 s ±  0.003 s    [User: 1.013 s, System: 0.004 s]
  Range (min … max):    1.012 s …  1.021 s    10 runs

Benchmark #8: ./clang-lto.O5im
  Time (mean ± σ):      1.171 s ±  0.002 s    [User: 1.167 s, System: 0.003 s]
  Range (min … max):    1.167 s …  1.175 s    10 runs

Benchmark #9: ./clang-thinlto
  Time (mean ± σ):      1.457 s ±  0.005 s    [User: 1.452 s, System: 0.003 s]
  Range (min … max):    1.448 s …  1.466 s    10 runs

Benchmark #10: ./clang-thinlto.O5im
  Time (mean ± σ):      1.455 s ±  0.012 s    [User: 1.450 s, System: 0.003 s]
  Range (min … max):    1.447 s …  1.487 s    10 runs

Summary
  './clang-lto' ran
    1.15 ± 0.00 times faster than './clang-lto.O5im'
    1.43 ± 0.01 times faster than './clang-thinlto.O5im'
    1.43 ± 0.01 times faster than './clang-thinlto'
    1.75 ± 0.01 times faster than './clang'
    1.77 ± 0.01 times faster than './clang.O5im'
    2.12 ± 0.01 times faster than './gcc.O5im'
    2.21 ± 0.01 times faster than './gcc'
    2.26 ± 0.01 times faster than './gcc-lto.O5im'
    2.29 ± 0.01 times faster than './gcc-lto'


Here are the executable sizes:

     3020016 diff/diff.gcc
     3028208 diff/diff.gcc.O5im
      629272 diff/diff.gcc-lto
      637464 diff/diff.gcc-lto.O5im
     3617928 diff/diff.clang
     3626120 diff/diff.clang.O5im
      286112 diff/diff.clang-lto
      298400 diff/diff.clang-lto.O5im
      694760 diff/diff.clang-thinlto
      698856 diff/diff.clang-thinlto.O5im



More information about the developers mailing list