[m-dev.] for review: GCC back-end interface

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Jan 9 19:17:11 AEDT 2001


On 09-Jan-2001, Tyson Dowd <trd at cs.mu.OZ.AU> wrote:
> On 05-Jan-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> > Estimated hours taken: 120
> > 
> > Connect the Mercury compiler to the GCC back-end.
> > These changes give us a version of the Mercury compiler which
> > compiles to assembler without going via any intermediate files.
> > This new back-end for the Mercury compiler generates GCC's `tree' data
> > type, and then calls functions in the GCC "middle-end" to convert that
> > to GCC's RTL (Register Transfer Language) and to invoke the rest of
> > the GCC middle-end and back-end to compile it to assembler.
> 
> The log message should contain a justification for this change.

Good point.

> Why did you do it?

The main advantage is improved compilation speed.

(The improvement is not huge, since the current front-end is so slow,
but improving the speed of the back-end increases the incentive to
improve the speed of the front-end, since it means that any increases
in the speed of the front-end will get bigger overall speedups.)

Another advantage is that it gives us more opportunity to give the GCC
back-end information about how to optimize the code that we generate.
For example:
	- We can tell GCC when it is safe to treat function calls
	  as tail calls.  (The gcc back-end already has some support
	  for doing tail calls.  But its check to determine when
	  it is safe to do them is much too conservative for Mercury.
	  I've been working on extending the gcc back-end infrastructure
	  so that front ends can tell the gcc back-end when it is safe.)
	- We can use `__builtin_{set,long}jmp' rather than ordinary
	  `{set,long}jmp'.  (We can also do this for the C back-end,
	  as it happens, since those are supported in GNU C too, but
	  I wouldn't have found out about it if not for doing this
	  back-end.)  Or we can use gcc's exception handling;
          the gcc developers have told me that using exception
	  handling may be more efficient on some platforms.
	- We can mark GC_malloc as an allocation function,
	  so that GCC knows that stuff allocated with it won't
	  alias other pointers.
	- In general we could give GCC more information about aliasing.
	  I haven't investigated this much yet, but I think the GCC
	  back-end has support for keeping track of alias sets and
	  recording which set each pointer points to and which sets
	  may be subsets of which other sets.

Another advantage is that it provides a demonstration of how to
compile logic languages or functional languages using the GNU C
back-end.  This may be useful to researchers or open-source developers
who are working on other languages.

It is also good public relations, because it removes one more
barrier towards acceptance of Mercury.  Some people want a language
implementation that has a "native code compiler", and don't want one
that compiles via C.  These people may be more willing to consider
Mercury now.

I'll put the preceding paragraphs in the log message.

> > +% File: gcc.m
> > +% Main author: fjh
> > +
> > +% This module is the Mercury interface to the GCC compiler back-end.
> > +%
> > +% This module provides a thin wrapper around the C types,
> > +% constants, and functions defined in gcc/tree.{c,h,def}
> > +% and gcc/mercury/mercury-gcc.c in the GCC source.
> > +% (The functions in gcc/mercury/mercury-gcc.c are in turn a
> > +% thicker wrapper around the more complicated parts of GCC's
> > +% source-language-independent back-end.)
> > +%
> > +% Note that we want to keep this code as simple as possible.
> > +% Anything complicated, which might require changes for new versions
> > +% of gcc, should go in gcc/mercury/mercury-gcc.c rather than in
> > +% inline C code here.
> > +%
> 
> It is interesting to note that many other developers have leaned towards
> the opposite approach, which is to have a module of simple inline C
> code in a single Mercury module, and a further module on top of that
> that handles the more complex parts in Mercury.
>
> Are there any technical problems that made such an approach infeasible
> in this case? 

Well, actually I've done that, in parts; the further module on top of
that is mlds_to_gcc.m.  For instance, all the symbol table handling is
done on the Mercury side, rather than the C side, since it was much
easier there.

What I meant by that comment is that any complicated fragments *of C code*
should go in gcc/mercury/mercury-gcc.c.  I didn't mean that everything
which is complicated should be done in C rather than in Mercury.

> Is it the complexity or the "changes for new versions of gcc" that you are
> concerned about?
> If so, could you document them here?

What I'm concerned about is minimizing the size and complexity of the
Mercury/C interface.  I'm hoping that we can get the C side of the
changes integrated into the gcc distribution / cvs repository, so that
when the gcc developers change the interface to the gcc back-end, they
can fix the Mercury front-end at the same time as they fix all the
other front-ends.  However, I don't think it is reasonable to expect
or even to hope that they would do that for Mercury code, or for
C code which is written as inline `pragma c_code' fragments in Mercury.
So to minimize the amount of maintenance needed, I want to minimize
the amount of the GCC back-end interface that the Mercury code depends
on.

> > +% QUOTES
> > +%
> > +%	``GCC is a software Vietnam.''
> > +%		-- Simon Peyton-Jones.
> 
> My fear is that this will become all too true if we end up maintaining
> this backend, as a large part of the backend is actually just a bunch of
> function calls into some "complex" C code.

That's possible.  However, I am optimistic that the maintenance effort
will be manageable.  Only time will tell.

> > +% A GCC `tree' representing a function declaration.
> > +:- type gcc__func_decl.
> > +
> > +	% build a function declaration
> > +:- type func_name == string.
> > +:- type func_asm_name == string.
> > +:- pred build_function_decl(func_name, func_asm_name, gcc__type,
> > +		gcc__param_types, gcc__param_decls, gcc__func_decl,
> > +		io__state, io__state).
> > +:- mode build_function_decl(in, in, in, in, in, out, di, uo) is det.
> > +
> > +	% Declarations for builtin functions
> > +:- func alloc_func_decl = gcc__func_decl.	% GC_malloc()
> > +:- func strcmp_func_decl = gcc__func_decl.	% strcmp()
> > +:- func hash_string_func_decl = gcc__func_decl.	% MR_hash_string()
> > +:- func box_float_func_decl = gcc__func_decl.	% MR_box_float()
> > +:- func setjmp_func_decl = gcc__func_decl.	% __builtin_setjmp()
> > +:- func longjmp_func_decl = gcc__func_decl.	% __builtin_longjmp()
> 
> A few of these seem very Mercury specific.  I understand that this is not
> supposed to be a *complete* gcc backend interface, but is it supposed to
> be *completely* a gcc backend interface?
>
> Perhaps they would be better in a another module?

Perhaps.  But there are only three or four such functions
(GC_malloc, MR_hash_string, MR_box_float, and perhaps strcmp).
I'm not sure if it is worth creating a new module just for that.

This module is supposed to be an interface to the gcc back-end *and*
to the C side of the Mercury front-end to gcc.  The latter naturally
includes some parts which are Mercury-specific.  I don't want to split
it into two parts, one of which interfaces directly to the gcc
back-end, and the other of which interfaces to mercury-gcc.c, because
that distinction is not useful to the user of this module.  Many parts
of mercury-gcc.c are there just to encapsulate the more complicated
parts of the gcc back-end interface, and at this level I want I want
to hide the distinction between the native and the encapsulated parts
of the gcc interface and to just present a single unified interface
here.

On the other hand, I could perhaps split gcc/mercury/mercury-gcc.c
into two files, called say gcc/mercury/encaps.c and gcc/mercury/lang.c,
one of which does the generic encapsulation, and the other of which
provides the Mercury-specific stuff.  Then I could define two
different Mercury modules, one of which (gcc.m) interfaces with
the gcc back-end and gcc/mercury/encaps.c, while the other
(gcc_mercury.m?) interfaces with just gcc/mercury/lang.c.

That might perhaps be worth doing in the long run.
However, the payoff is pretty small, and its a non-trivial
amount of work.  For now I think I'll leave it as is and
just mark it with an XXX comment.

> > +	% A GCC `tree' representing a list of field declarations
> > +:- type gcc__field_decls.
> > +
> > +	% Construct an empty field list.
> > +:- pred empty_field_list(gcc__field_decls, io__state, io__state).
> > +:- mode empty_field_list(out, di, uo) is det.
> > +
> > +	% Give a new field decl, cons it into the start of a field list.
> > +	% Note that each field decl can only be on one field list.
> > +:- pred cons_field_list(gcc__field_decl, gcc__field_decls, gcc__field_decls,
> > +		io__state, io__state).
> > +:- mode cons_field_list(in, in, out, di, uo) is det.
> 
> There are now types, empty and cons declarations for constructing 
> lists of fields, parameters and parameter types.

Also for lists of expressions and lists of initializers.

> It might be nice to generalize this code to use some abstraction (e.g. a
> type class).  But maybe just a comment to that effect would be enough.

The code in mlds_to_gcc.m which uses these always wants to use
a particular instance, it never wants to abstract over that.
So I think adding a type class here would complicate things
without sufficient benefit to make it worthwhile.

> > +	% GCC represents variable expressions just by (the pointer to)
> > +	% their declaration tree node.
> > +var_expr(Decl) = Decl.
> > +
> > +%
> > +% stuff for function calls
> > +%
> > +
> > +	% GCC represents functions pointer expressions just as ordinary
> > +	% ADDR_EXPR nodes whose operand the function declaration tree node.
> 
> whose operand (is?) the function declaration tree node?

Yes.

> > +:- pragma c_code(build_initializer_expr(InitList::in, Type::in,
> > +	Expr::out, _IO0::di, _IO::uo), [will_not_call_mercury],
> > +"
> > +	Expr = (MR_Word) build(CONSTRUCTOR, (tree) Type, NULL_TREE,
> > +		(tree) InitList);
> > +#if 0
> > +	/* XXX do we need this? */
> > +	TREE_STATIC ((tree) Expr) = 1;
> > +#endif
> > +").
> 
> Please explain.

Which bit don't you understand?

The code in #if 0 ... #endif is commented out because I'm not sure if
it is needed.  

The TREE_STATIC macro is documented in tree.h in the gcc source code.

> > Index: mercury/compiler/globals.m
> > +++ mercury/compiler/globals.m	2000/12/17 13:13:34
> > @@ -22,11 +22,15 @@
> >  :- type globals.
> >  
> >  :- type compilation_target
> > -	--->	c	% Generate C code
> > +	--->	c	% Generate C code (including GNU C)
> >  	;	il	% Generate IL assembler code
> >  			% IL is the Microsoft .NET Intermediate Language
> > -	;	java.	% Generate Java
> > +	;	java	% Generate Java
> >  			% (this target is not yet implemented)
> > +	;	asm. 	% Compile directly to assembler via the GCC back-end.
> > +			% Do not go via C, instead generate GCC's internal
> > +			% `tree' data structure.
> > +			% (Work in progress.)
> 
> While the comment about going via the GCC backend is correct, it's
> pretty irrelevant to the compilation_target.

I don't agree; it is relevant, because the compiler assumes that the
compilation_target determines the path taken.  The option to
select compilation via the GCC back-end is `--target asm'.

> > Index: mercury/compiler/mlds_to_gcc.m
...
> Might be worth noting somewhere around here that foreign_proc("C", ...)
> will have to go via an external file (this is sort of mentioned in a few
> places other than here),

Done (except I've used the present tense rather than the future
tense, since this is already implemented).

> and that because of this inlining of C code won't work. 
>
> Is this broken at the moment?  I see no changes to inlining.m to disable
> inlining of pragma_foreign -- if you set your preferred backend foreign
> language to C, you will probably get inlining of foreign C by default.

Sorry, I did change inlining.m -- I forgot to include that change in
the set that I posted.  I'll post that separately.

> > +build_rtti_type(notag_functor_desc, _, GCC_Type) -->
> > +	% typedef struct {
> > +	%     MR_ConstString      MR_notag_functor_name;
> > +	%     MR_PseudoTypeInfo   MR_notag_functor_arg_type;
> > +	% XXX need to add the following field when I do a cvs update:
> > +	% /***MR_ConstString      MR_notag_functor_arg_name;***/
> > +	% } MR_NotagFunctorDesc;
> > +	build_struct_type("MR_NotagFunctorDesc",
> > +		['MR_ConstString'	- "MR_notag_functor_name",
> > +		 'MR_PseudoTypeInfo'	- "MR_notag_functor_arg_type"],
> > +		 %%% 'MR_ConstString'	- "MR_notag_functor_arg_name"],
> > +		GCC_Type).
> 
> As mentioned in the Mercury meeting, it would be good to factor out this
> code, and use it to (optionally) generate the appropriate definitions
> for a header file.  This way we can hopefully avoid the double update
> problem.

I discussed this with Zoltan during our meeting with Levi yesterday.
He managed to convince me that abstracting out these types in a way
that was language-independent would be difficult, and would require
inventing a whole new layer of abstraction machinery that might well
be more complicated to implement and maintain than just living with
the code duplication.  The basic problem is that many of these
RTTI types contain variable-sized arrays, unions, function pointers,
and other complications which would need to be mapped differently for
different target languages.

It may be worth doing this, but it is a problem that affects
all back-ends, and so I think it should be a separate change.

> > +% The func_info holds information used while generating code
> > +% inside a function.
> > +% The name is a bit of a misnomer, since we also use this while
> > +% generating initializers for global variable.
> > +% So it should perhaps be called something like
> > +% func_or_global_var_info (ugh).
> 
> definition info?

OK.

> > +	( { MaybeSize = yes(SizeInBytes0) } ->
> > +		% Rather than generating a reference to a global variable
> > +		% mercury__private_builtin__SIZEOF_WORD, we ignore the
> > +		% word size multiplier, and instead get the word size
> > +		% from the bytes_per_word option.
> > +		% XXX This is kludgy.  We should change new_object
> > +		% so that it has the size in words rather than in bytes.
> 
> Yes please!  
> 
> If you don't I will.

I'm happy to do it, but I'll do it as a separate change.

I think it is better to separate changes that add new
functionality from changes that refactor existing functionality.

> > +:- pred defn_contains_foreign_code(mlds__defn).
> > +:- mode defn_contains_foreign_code(in) is semidet.
> > +
> > +defn_contains_foreign_code(Defn) :-
> > +	Defn = mlds__defn(_Name, _Context, _Flags, Body),
> > +	Body = function(_, _, yes(FunctionBody)),
> > +	statement_contains_statement(FunctionBody, Statement),
> > +	Statement = mlds__statement(Stmt, _),
> > +	Stmt = atomic(target_code(TargetLang, _)),
> > +	TargetLang \= lang_asm.
> > +
> > +	% XXX This should be moved to ml_util.m
> > +:- pred defn_is_type(mlds__defn).
> > +:- mode defn_is_type(in) is semidet.
> > +
> > +defn_is_type(Defn) :-
> > +	Defn = mlds__defn(Name, _Context, _Flags, _Body),
> > +	Name = type(_, _).
> 
> *Both* of these should be moved to ml_util (defn_contains_foreign_code
> needs to be parameterized on target language, however.

OK, shall do.

> > Index: mercury/compiler/Mmakefile
> > ===================================================================
> > RCS file: /home/mercury1/repository/mercury/compiler/Mmakefile,v
> > retrieving revision 1.35
> > diff -u -d -r1.35 Mmakefile
> > --- mercury/compiler/Mmakefile	2000/12/11 05:38:45	1.35
> > +++ mercury/compiler/Mmakefile	2000/12/20 11:44:07
> > @@ -41,9 +42,11 @@
> >  C2INIT =	MERCURY_MOD_LIB_MODS="$(LIBRARY_DIR)/$(STD_LIB_NAME).init $(RUNTIME_DIR)/$(RT_LIB_NAME).init" \
> >  		MERCURY_TRACE_LIB_MODS="$(BROWSER_DIR)/$(BROWSER_LIB_NAME).init" \
> >  		MERCURY_MKINIT=$(UTIL_DIR)/mkinit $(SCRIPTS_DIR)/c2init
> > +C2INITFLAGS =	--library
> >  ML	=	MERCURY_C_LIB_DIR=. $(SCRIPTS_DIR)/ml
> >  MLFLAGS =	--mercury-libs none
> > -MLLIBS  =	$(TRACE_DIR)/lib$(TRACE_LIB_NAME).$A \
> > +MLLIBS  =	../main.o \
> > +		$(TRACE_DIR)/lib$(TRACE_LIB_NAME).$A \
> >  		$(BROWSER_DIR)/lib$(BROWSER_LIB_NAME).$A \
> >  		$(LIBRARY_DIR)/lib$(STD_LIB_NAME).$A \
> >  		$(RUNTIME_DIR)/lib$(RT_LIB_NAME).$A ` \
> > @@ -81,6 +84,18 @@
> 
> Should this be conditionalized somehow?

No, there's no need. 

Defining main in main.c at the top-level mercury directory is nice for
readability of the source code, anyway; it makes it obvious where to
start reading ;-).  I made sure to put copious comments in main.c.

> > Index: gcc/mercury/lang-specs.h
> > +/* This is the contribution to the `default_compilers' array in gcc.c for
> > +   Mercuyy.  */
> 
> s/Mercuyy/Mercury/

Fixed.

> I should note that I just skimmed the mercury-gcc.c file.
> I don't know the gcc backend at all, so I'm going to have to assume it
> works ;-)

I plan to get the gcc developers to review that part.

The gcc developers mailing lists are online at gcc.gnu.org.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list