[m-users.] Link-time optimization

Richard O'Keefe raoknz at gmail.com
Sun Jun 14 18:59:55 AEST 2020


For comparison, I have a Smalltalk-to-C compiler.
Compiling practically everything together takes
about 2 minutes on a laptop that currently runs Ubuntu 18.04
but was originally shipped with Vista, and almost all of
that time is in the C compiler.
Ignoring lines that contain nothing but white space,
and curly braces, the code expansion ratio is about
4.2 lines of C for each line of Smalltalk, for
650 kSLOC of C and 1.2M raw lines of C in total.
This does not include hand-written support code.

GCC has been able to do link-time optimisation (including
cross-module inlining) by itself since the 4.5 release in
2011.  You need -flto -O3 both when compiling the individual
files and in the link step.
Clang also accepts the -flto option and a new variant
-flto-thin which I have not yet tried.
I believe the corresponding option in the Microsoft C
compiler is /GL but it's years since I used MSVC.

Two minutes is long enough to be annoying.
But there is another issue.
There was a machine I loved and a compiler that I trusted
more than gcc.  It only had about 4GB of memory, and trying
to compile at high optimisation levels, the C compiler
would run out of memory and crash.  If I understand
correctly, resource consumption at LTO time is the reason
that Clang has -flto=thin.

All things considered, generating a single file is not
obviously useful for Mercury, even for distribution.


On Sun, 14 Jun 2020 at 08:01, Zoltan Somogyi <zoltan.somogyi at runbox.com>
wrote:

>
> 2020-06-14 04:48 GMT+10:00 Massimo Dentico<m.dentico at virgilio.it>:
> >> Putting all
> >> the C code generated by the Mercury compiler into a single .c file
> >> will generate a huge .c file for any Mercury program of a nontrivial
> size,
> >> especially when compiled in a debug grade.
> >
> > Obviously this single C file would be for distribution only, not
> > for development.
>
> That would help, since a long compile time wouldn't be as much of as
> problem.
>
> > So no debug grade involved (I presume here that the
> > developers using the library would not be interested in learning
> > Mercury.)
>
> And this would help too, since debug grades generate much bigger .c files
> than non-debug grades.
>
> > I have read Henderson & Somogyi, "Compiling Mercury to high-level C
> > code" (2002) and I was under the impression that the ratio of lines of
> > C code produced for one line of Mercury code was not dramatically high.
> > Do you care/have time to explain why this is not (or is no more)
> > the case?
>
> No, the ratio is not too high, and hasn't changed all that much since that
> paper.
> I just checked: the compiler directory in the Mercury system has about
> 500,000 lines of Mercury code (which occupies about 21 megabytes),
> which in the hlc.gc grade is translated into just shy of 2,600,000 lines of
> C code (which occupies about 110 megabytes). So the ratio is definitely
> not dramatically high. I was just drawing attention to the point that C
> compilers
> are tuned for acceptable compilation times on hand-written C code, and a
> single C source file containing over two million lines of code would be
> much, much bigger than anything they are usually asked to compile.
>
> As a compiler writer myself, I know that I prefer to avoid quadratic (or
> worse)
> algorithms when N is unbounded, but I don't mind using them when
> N tends to be naturally limited by the way people program. Since good
> programming practice frowns on such huge amounts of code in a single
> source file, I don't expect compiler writers to avoid algorithms that are
> quadratic in this respect.
>
> > Anyway I'm not particularly worried by this because I would like to use
> > Mercury only for the inference engine part of an application. The rest
> > will be in another language.
>
> This will help by keeping down N as well, so (depending on how big
> that inference part is) my warning may not apply to your use case.
>
> >> .......................................... This will mean that you
> >> wouldn't be able to compile that .c file with any C compiler options
> >> that call for the use of any algorithm whose complexity is O(n^2) or
> >> worse in the number of functions in the file. The last time I looked,
> >> this meant that even -O2 was off the table (though admittedly
> >> that was more than a decade ago). So the resulting executable
> >> code may be compact, but would also be quite slow.
> >
> > The assumption here is that -O3 is always beneficial.
>
> I was talking about the effect of losing -O2, not -O3. -O2 is useful
> much more often that -O3.
>
> > 3. the SQLite developers distribute an easy to use amalgamation file [4]
> >    that is nearly 230,000 lines long and is 7.73 MiBs in size (version
> >    3.32.2).
>
> I did not know that, but note that this is still only about one tenth
> the size of the .c files of the Mercury compiler. Automatic tools such as
> the Mercury compiler can generate much more stuff (in this case C code)
> than programmers can do by hand.
>
> Zoltan.
> _______________________________________________
> users mailing list
> users at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20200614/9240e252/attachment.html>


More information about the users mailing list