[m-dev.] for review: GCC back-end interface
Tyson Dowd
trd at cs.mu.OZ.AU
Tue Jan 9 17:01:26 AEDT 2001
On 05-Jan-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> Estimated hours taken: 120
>
> Connect the Mercury compiler to the GCC back-end.
> These changes give us a version of the Mercury compiler which
> compiles to assembler without going via any intermediate files.
> This new back-end for the Mercury compiler generates GCC's `tree' data
> type, and then calls functions in the GCC "middle-end" to convert that
> to GCC's RTL (Register Transfer Language) and to invoke the rest of
> the GCC middle-end and back-end to compile it to assembler.
The log message should contain a justification for this change.
Why did you do it?
> I don't plan to commit the changes to mercury_compile.m yet, since the
> new module that it imports have references to functions defined in the
> GCC back-end, and so they require you to have a copy of the gcc
> back-end built to link in to the Mercury compiler. I'm not sure what
> the best solution to that is; probably a configure option and some
> conditional compilation, like we do for the Aditi back-end, is the
> best approach.
As long as it doesn't cause grief to other developers it should be
fine.
>
> gcc/mercury:
> New directory.
> This contains the C side of the Mercury <-> GCC interface.
>
> gcc/mercury/Make-lang.in:
> gcc/mercury/config-lang.in:
> gcc/mercury/lang-specs.h:
> Makefile/configure/specs fragments (respectively)
> that are required by GCC.
>
> gcc/mercury/lang-options.h:
> Documents the Mercury-specific gcc options,
> in particular the `--mmc-flag=' option.
>
> gcc/mercury/mercury-gcc.c:
> gcc/mercury/mercury-gcc.h:
> This is the "meat" on the C side of the Mercury <-> GCC interface.
> These files provide the C code that GCC requires of each
> language front-end. They also define some routines for
> building parts of the GCC `tree' data structure that are
> used by the Mercury compiler.
>
> gcc/mercury/Makefile:
> A Makefile which just runs `make mercury' in the parent directory.
> Just for convenience.
>
> gcc/mercury/README:
> gcc/mercury/ChangeLog:
> Some (very basic) documentation.
>
> gcc/mercury/test.m:
> A sample Mercury module, to serve as a simple test case.
>
> gcc/mercury/testmercury.c:
> C driver program for the test Mercury module.
>
> mercury/compiler/gcc.m:
> New file. This is an interface to the tree data structure defined
> in gcc/tree.h, and to functions for manipulating that data structure
> which are defined in gcc/mercury/mercury-gcc.c and in other parts
> of the GCC back-end. It's almost entirely composed of simple
> pragma c_code routines that each just call a single C function.
>
> mercury/compiler/mlds_to_gcc.m:
> New file. This converts the MLDS into the gcc tree representation
> whose interface is in gcc.m, using the routines defined in gcc.m.
> This is the "meat" on the Mercury side of the Mercury <=> GCC interface.
>
> mercury/compiler/globals.m:
> Define new target `asm', for compiling directly to assembler
> (without any intermediate files), via the gcc back-end.
>
> mercury/compiler/handle_options.m:
> `--target asm' implies `--high-level-code'.
>
> mercury/compiler/mercury_compile.m:
> Handle `--target asm' by invoking mlds_to_gcc.m.
>
> mercury/main.c:
> New file, containing main() that calls mercury_main().
>
> mercury/compiler/Mmakefile:
> Add C2INITFLAGS=--library, so that we can link `libmercury_compile.a'
> as a library without main(). For the mercury_compile executable,
> get main by linking in ../main.o.
>
> Add `libmmc' target, for building libmercury_compile.a and
> mercury_compile_init.a.
>
> Add the appropriate `-D' and `-I' options to CFLAGS-gcc so that we
> can compile gcc.m.
>
> mercury/runtime/mercury.c:
> Define out-of-line copies of MR_box_float() and MR_unbox_float(),
> so that the new `--target asm' back-end can generate calls to them.
>
> mercury/runtime/mercury.h:
> mercury/runtime/mercury_heap.h:
> Add comments warning about code duplication between
> the inline and out-of-line versions of various functions.
>
> mercury/Makefile:
> mercury/Mmakefile:
> Add `libmmc' target, for use by gcc/mercury/Make-lang.in.
>
> mercury/runtime/mercury_std.h:
> When IN_GCC is defined, use safe_ctype.h rather than
> ctype.h, since the latter conflicts with the GCC headers.
>
> Comment out the definition of the `reg' macro, since
> that too conflicts with the GCC headers.
>
> mercury/runtime/mercury_dlist.c:
> mercury/runtime/mercury_hash_table.c:
> mercury/runtime/mercury_stacks.h:
> Delete unnecessary uses of the `reg' macro.
>
> Workspace: /home/pgrad/fjh/ws/gcc
> Index: mercury/compiler/gcc.m
> ===================================================================
> RCS file: gcc.m
> diff -N gcc.m
> --- /dev/null Thu Mar 30 14:06:13 2000
> +++ gcc.m Fri Jan 5 17:16:46 2001
> @@ -0,0 +1,1280 @@
> +%-----------------------------------------------------------------------------%
> +% Copyright (C) 2000 The University of Melbourne.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +
> +% File: gcc.m
> +% Main author: fjh
> +
> +% This module is the Mercury interface to the GCC compiler back-end.
> +%
> +% This module provides a thin wrapper around the C types,
> +% constants, and functions defined in gcc/tree.{c,h,def}
> +% and gcc/mercury/mercury-gcc.c in the GCC source.
> +% (The functions in gcc/mercury/mercury-gcc.c are in turn a
> +% thicker wrapper around the more complicated parts of GCC's
> +% source-language-independent back-end.)
> +%
> +% Note that we want to keep this code as simple as possible.
> +% Anything complicated, which might require changes for new versions
> +% of gcc, should go in gcc/mercury/mercury-gcc.c rather than in
> +% inline C code here.
> +%
It is interesting to note that many other developers have leaned towards
the opposite approach, which is to have a module of simple inline C
code in a single Mercury module, and a further module on top of that
that handles the more complex parts in Mercury.
Are there any technical problems that made such an approach infeasible
in this case?
Is it the complexity or the "changes for new versions of gcc" that you are
concerned about?
If so, could you document them here?
I have a few comments about the generalizing the code you have written
here which might be difficult to address if you cannot look beyond the
veil of gcc/mercury/mercury-gcc.c. In that case it may be worth simply
adding comments to the effect of "It would be nice to do <blah> but
because of <bletch> we have to put <foo> in gcc/mercury/mercury-gcc.c".
> +% This module makes no attempt to be a *complete* interface to the
> +% gcc back-end; we only define interfaces to those parts of the gcc
> +% back-end that we need for compiling Mercury.
> +%
> +% REFERENCES
> +%
> +% For more information about the GCC compiler back-end,
> +% see the documentation at <http://gcc.gnu.org> and
> +% <http://gcc.gnu.org/readings.html>, in particular
> +% "Writing a Compiler Front End to GCC" by Joachim Nadler
> +% and Tim Josling <tej at melbpc.org.au>.
> +%
> +% QUOTES
> +%
> +% ``GCC is a software Vietnam.''
> +% -- Simon Peyton-Jones.
My fear is that this will become all too true if we end up maintaining
this backend, as a large part of the backend is actually just a bunch of
function calls into some "complex" C code.
> +% A GCC `tree' representing a function declaration.
> +:- type gcc__func_decl.
> +
> + % build a function declaration
> +:- type func_name == string.
> +:- type func_asm_name == string.
> +:- pred build_function_decl(func_name, func_asm_name, gcc__type,
> + gcc__param_types, gcc__param_decls, gcc__func_decl,
> + io__state, io__state).
> +:- mode build_function_decl(in, in, in, in, in, out, di, uo) is det.
> +
> + % Declarations for builtin functions
> +:- func alloc_func_decl = gcc__func_decl. % GC_malloc()
> +:- func strcmp_func_decl = gcc__func_decl. % strcmp()
> +:- func hash_string_func_decl = gcc__func_decl. % MR_hash_string()
> +:- func box_float_func_decl = gcc__func_decl. % MR_box_float()
> +:- func setjmp_func_decl = gcc__func_decl. % __builtin_setjmp()
> +:- func longjmp_func_decl = gcc__func_decl. % __builtin_longjmp()
A few of these seem very Mercury specific. I understand that this is not
supposed to be a *complete* gcc backend interface, but is it supposed to
be *completely* a gcc backend interface?
Perhaps they would be better in a another module?
> + % A GCC `tree' representing a list of field declarations
> +:- type gcc__field_decls.
> +
> + % Construct an empty field list.
> +:- pred empty_field_list(gcc__field_decls, io__state, io__state).
> +:- mode empty_field_list(out, di, uo) is det.
> +
> + % Give a new field decl, cons it into the start of a field list.
> + % Note that each field decl can only be on one field list.
> +:- pred cons_field_list(gcc__field_decl, gcc__field_decls, gcc__field_decls,
> + io__state, io__state).
> +:- mode cons_field_list(in, in, out, di, uo) is det.
> +
There are now types, empty and cons declarations for constructing
lists of fields, parameters and parameter types.
It might be nice to generalize this code to use some abstraction (e.g. a
type class). But maybe just a comment to that effect would be enough.
> + % GCC represents variable expressions just by (the pointer to)
> + % their declaration tree node.
> +var_expr(Decl) = Decl.
> +
> +%
> +% stuff for function calls
> +%
> +
> + % GCC represents functions pointer expressions just as ordinary
> + % ADDR_EXPR nodes whose operand the function declaration tree node.
whose operand (is?) the function declaration tree node?
> +%
> +% Initializers
> +%
> +
> +:- type gcc__init_elem == gcc__tree.
> +
> +gcc__array_elem_initializer(Int, GCC_Int) -->
> + build_int(Int, GCC_Int).
> +
> +gcc__struct_field_initializer(FieldDecl, FieldDecl) --> [].
> +
> +:- type gcc__init_list == gcc__tree.
> +
> +:- pragma c_code(empty_init_list(InitList::out,
> + _IO0::di, _IO::uo), [will_not_call_mercury],
> +"
> + InitList = (MR_Word) merc_empty_init_list();
> +").
> +
> +:- pragma c_code(cons_init_list(Elem::in, Init::in, InitList0::in, InitList::out,
> + _IO0::di, _IO::uo), [will_not_call_mercury],
> +"
> + InitList = (MR_Word)
> + merc_cons_init_list((tree) Elem, (tree) Init, (tree) InitList0);
> +").
> +
> +:- pragma c_code(build_initializer_expr(InitList::in, Type::in,
> + Expr::out, _IO0::di, _IO::uo), [will_not_call_mercury],
> +"
> + Expr = (MR_Word) build(CONSTRUCTOR, (tree) Type, NULL_TREE,
> + (tree) InitList);
> +#if 0
> + /* XXX do we need this? */
> + TREE_STATIC ((tree) Expr) = 1;
> +#endif
> +").
Please explain.
> Index: mercury/compiler/globals.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/globals.m,v
> retrieving revision 1.38
> diff -u -d -r1.38 globals.m
> --- mercury/compiler/globals.m 2000/11/17 17:47:10 1.38
> +++ mercury/compiler/globals.m 2000/12/17 13:13:34
> @@ -22,11 +22,15 @@
> :- type globals.
>
> :- type compilation_target
> - ---> c % Generate C code
> + ---> c % Generate C code (including GNU C)
> ; il % Generate IL assembler code
> % IL is the Microsoft .NET Intermediate Language
> - ; java. % Generate Java
> + ; java % Generate Java
> % (this target is not yet implemented)
> + ; asm. % Compile directly to assembler via the GCC back-end.
> + % Do not go via C, instead generate GCC's internal
> + % `tree' data structure.
> + % (Work in progress.)
While the comment about going via the GCC backend is correct, it's
pretty irrelevant to the compilation_target.
>
> :- type gc_method
> ---> none
> @@ -186,6 +190,9 @@
> % test against known strings.
> convert_target("java", java).
> convert_target("Java", java).
> +convert_target("asm", asm).
> +convert_target("Asm", asm).
> +convert_target("ASM", asm).
> convert_target("il", il).
> convert_target("IL", il).
> convert_target("c", c).
This brings up the issue that assembler is machine specific...
Another interesting case for the foreign language interface to handle.
> Index: mercury/compiler/mlds_to_gcc.m
> ===================================================================
> RCS file: mlds_to_gcc.m
> diff -N mlds_to_gcc.m
> --- /dev/null Thu Mar 30 14:06:13 2000
> +++ mlds_to_gcc.m Thu Jan 4 04:05:30 2001
> @@ -0,0 +1,2924 @@
> +%-----------------------------------------------------------------------------%
> +% Copyright (C) 1999-2000 The University of Melbourne.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +
> +% mlds_to_gcc - Convert MLDS to the GCC back-end representation.
> +% Main author: fjh.
> +
> +% Note that this does *not* compile to GNU C -- instead it
> +% actually generates GCC's internal "Tree" representation,
> +% without going via an external file.
> +
> +% Currently this supports grade hlc.gc only.
> +%
> +% Trailing will probably work too, but since trailing
> +% is currently implemented using the C interface,
> +% it will end up compiling everything via C.
> +
> +% TODO:
> +% Fix configuration issues:
> +% - mmake support
> +% - document installation procedure
> +% - test more
> +% - support in tools/bootcheck and check that it bootchecks
> +% - set up nightly tests
> +%
> +% Implement implementation-specific features that are supported
> +% by other Mercury back-ends:
> +% - support --high-level-data (enum types, pred types, user_type)
> +% - support --profiling and --heap-profiling
> +% - support --nondet-copy-out
> +% - support --gcc-nested-functions (probably not worth it)
> +% - pragma foreign_code(asm, ...)
Might be worth noting somewhere around here that foreign_proc("C", ...)
will have to go via an external file (this is sort of mentioned in a few
places other than here), and that because of this inlining of C code
won't work.
Is this broken at the moment? I see no changes to inlining.m to disable
inlining of pragma_foreign -- if you set your preferred backend foreign
language to C, you will probably get inlining of foreign C by default.
> +build_rtti_type(notag_functor_desc, _, GCC_Type) -->
> + % typedef struct {
> + % MR_ConstString MR_notag_functor_name;
> + % MR_PseudoTypeInfo MR_notag_functor_arg_type;
> + % XXX need to add the following field when I do a cvs update:
> + % /***MR_ConstString MR_notag_functor_arg_name;***/
> + % } MR_NotagFunctorDesc;
> + build_struct_type("MR_NotagFunctorDesc",
> + ['MR_ConstString' - "MR_notag_functor_name",
> + 'MR_PseudoTypeInfo' - "MR_notag_functor_arg_type"],
> + %%% 'MR_ConstString' - "MR_notag_functor_arg_name"],
> + GCC_Type).
As mentioned in the Mercury meeting, it would be good to factor out this
code, and use it to (optionally) generate the appropriate definitions
for a header file. This way we can hopefully avoid the double update
problem.
> + % rtti_enum_const(Name, Value):
> + % Succeed iff Name is the name of an RTTI
> + % enumeration constant whose integer value is Value.
> + % The values here must match the definitions of the
> + % MR_TypeCtor and MR_Sectag_Locn enumerations in
> + % runtime/mercury_type_info.h.
> +:- pred rtti_enum_const(string::in, int::out) is semidet.
> +rtti_enum_const("MR_TYPECTOR_REP_ENUM", 0).
> +rtti_enum_const("MR_TYPECTOR_REP_ENUM_USEREQ", 1).
> +rtti_enum_const("MR_TYPECTOR_REP_DU", 2).
> +rtti_enum_const("MR_TYPECTOR_REP_DU_USEREQ", 3).
> +rtti_enum_const("MR_TYPECTOR_REP_NOTAG", 4).
> +rtti_enum_const("MR_TYPECTOR_REP_NOTAG_USEREQ", 5).
> +rtti_enum_const("MR_TYPECTOR_REP_EQUIV", 6).
> +rtti_enum_const("MR_TYPECTOR_REP_EQUIV_VAR", 7).
> +rtti_enum_const("MR_TYPECTOR_REP_INT", 8).
> +rtti_enum_const("MR_TYPECTOR_REP_CHAR", 9).
> +rtti_enum_const("MR_TYPECTOR_REP_FLOAT", 10).
> +rtti_enum_const("MR_TYPECTOR_REP_STRING", 11).
> +rtti_enum_const("MR_TYPECTOR_REP_PRED", 12).
> +rtti_enum_const("MR_TYPECTOR_REP_UNIV", 13).
> +rtti_enum_const("MR_TYPECTOR_REP_VOID", 14).
> +rtti_enum_const("MR_TYPECTOR_REP_C_POINTER", 15).
> +rtti_enum_const("MR_TYPECTOR_REP_TYPEINFO", 16).
> +rtti_enum_const("MR_TYPECTOR_REP_TYPECLASSINFO", 17).
> +rtti_enum_const("MR_TYPECTOR_REP_ARRAY", 18).
> +rtti_enum_const("MR_TYPECTOR_REP_SUCCIP", 19).
> +rtti_enum_const("MR_TYPECTOR_REP_HP", 20).
> +rtti_enum_const("MR_TYPECTOR_REP_CURFR", 21).
> +rtti_enum_const("MR_TYPECTOR_REP_MAXFR", 22).
> +rtti_enum_const("MR_TYPECTOR_REP_REDOFR", 23).
> +rtti_enum_const("MR_TYPECTOR_REP_REDOIP", 24).
> +rtti_enum_const("MR_TYPECTOR_REP_TRAIL_PTR", 25).
> +rtti_enum_const("MR_TYPECTOR_REP_TICKET", 26).
> +rtti_enum_const("MR_TYPECTOR_REP_NOTAG_GROUND", 27).
> +rtti_enum_const("MR_TYPECTOR_REP_NOTAG_GROUND_USEREQ", 28).
> +rtti_enum_const("MR_TYPECTOR_REP_EQUIV_GROUND", 29).
> +rtti_enum_const("MR_TYPECTOR_REP_TUPLE", 30).
> +rtti_enum_const("MR_TYPECTOR_REP_UNKNOWN", 31).
> +rtti_enum_const("MR_SECTAG_NONE", 0).
> +rtti_enum_const("MR_SECTAG_LOCAL", 1).
> +rtti_enum_const("MR_SECTAG_REMOTE", 2).
This is another one that should be factored out.
> +% The func_info holds information used while generating code
> +% inside a function.
> +% The name is a bit of a misnomer, since we also use this while
> +% generating initializers for global variable.
> +% So it should perhaps be called something like
> +% func_or_global_var_info (ugh).
definition info?
body info?
> + ( { MaybeSize = yes(SizeInBytes0) } ->
> + % Rather than generating a reference to a global variable
> + % mercury__private_builtin__SIZEOF_WORD, we ignore the
> + % word size multiplier, and instead get the word size
> + % from the bytes_per_word option.
> + % XXX This is kludgy. We should change new_object
> + % so that it has the size in words rather than in bytes.
Yes please!
If you don't I will.
> +%-----------------------------------------------------------------------------%
> +%
> +% Utility predicates.
> +%
> +
> +:- pred defn_contains_foreign_code(mlds__defn).
> +:- mode defn_contains_foreign_code(in) is semidet.
> +
> +defn_contains_foreign_code(Defn) :-
> + Defn = mlds__defn(_Name, _Context, _Flags, Body),
> + Body = function(_, _, yes(FunctionBody)),
> + statement_contains_statement(FunctionBody, Statement),
> + Statement = mlds__statement(Stmt, _),
> + Stmt = atomic(target_code(TargetLang, _)),
> + TargetLang \= lang_asm.
> +
> + % XXX This should be moved to ml_util.m
> +:- pred defn_is_type(mlds__defn).
> +:- mode defn_is_type(in) is semidet.
> +
> +defn_is_type(Defn) :-
> + Defn = mlds__defn(Name, _Context, _Flags, _Body),
> + Name = type(_, _).
*Both* of these should be moved to ml_util (defn_contains_foreign_code
needs to be parameterized on target language, however.
> Index: mercury/compiler/Mmakefile
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/Mmakefile,v
> retrieving revision 1.35
> diff -u -d -r1.35 Mmakefile
> --- mercury/compiler/Mmakefile 2000/12/11 05:38:45 1.35
> +++ mercury/compiler/Mmakefile 2000/12/20 11:44:07
> @@ -41,9 +42,11 @@
> C2INIT = MERCURY_MOD_LIB_MODS="$(LIBRARY_DIR)/$(STD_LIB_NAME).init $(RUNTIME_DIR)/$(RT_LIB_NAME).init" \
> MERCURY_TRACE_LIB_MODS="$(BROWSER_DIR)/$(BROWSER_LIB_NAME).init" \
> MERCURY_MKINIT=$(UTIL_DIR)/mkinit $(SCRIPTS_DIR)/c2init
> +C2INITFLAGS = --library
> ML = MERCURY_C_LIB_DIR=. $(SCRIPTS_DIR)/ml
> MLFLAGS = --mercury-libs none
> -MLLIBS = $(TRACE_DIR)/lib$(TRACE_LIB_NAME).$A \
> +MLLIBS = ../main.o \
> + $(TRACE_DIR)/lib$(TRACE_LIB_NAME).$A \
> $(BROWSER_DIR)/lib$(BROWSER_LIB_NAME).$A \
> $(LIBRARY_DIR)/lib$(STD_LIB_NAME).$A \
> $(RUNTIME_DIR)/lib$(RT_LIB_NAME).$A ` \
> @@ -81,6 +84,18 @@
Should this be conditionalized somehow?
> Index: gcc/mercury/lang-specs.h
> ===================================================================
> RCS file: lang-specs.h
> diff -N lang-specs.h
> --- /dev/null Thu Mar 30 14:06:13 2000
> +++ lang-specs.h Thu Dec 28 10:32:37 2000
> @@ -0,0 +1,28 @@
> +/* Definitions for specs for the GNU compiler for the Mercury language.
> + Copyright (C) 1996, 1998, 1999, 2000 Free Software Foundation, Inc.
> +
> +This file is part of GNU CC.
> +
> +GNU CC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 2, or (at your option)
> +any later version.
> +
> +GNU CC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GNU CC; see the file COPYING. If not, write to
> +the Free Software Foundation, 59 Temple Place - Suite 330,
> +Boston, MA 02111-1307, USA. */
> +
> +/* This is the contribution to the `default_compilers' array in gcc.c for
> + Mercuyy. */
s/Mercuyy/Mercury/
I should note that I just skimmed the mercury-gcc.c file.
I don't know the gcc backend at all, so I'm going to have to assume it
works ;-)
Apart from this the diff appears fine. But it's very light on
justification -- I'm sure many developers and users will be scratching
their heads and asking "Why does anyone need this backend? Has Fergus
finally flipped?" and that kind of thing ;-)
--
Tyson Dowd #
# Surreal humour isn't everyone's cup of fur.
trd at cs.mu.oz.au #
http://www.cs.mu.oz.au/~trd #
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list