[m-dev.] Thoughts on binary back-ends
Ralph Becket
rbeck at microsoft.com
Tue Jun 19 22:35:54 AEST 2001
I've been thinking about compiler back-ends and been looking at
C-- and MLRISC and so forth (I admit it's not something I've
studied extensively). These are some preliminary thoughts on
the subject - I'd be interested to hear other people's views
and opinions.
My first impression is that you really want to keep control over
your stack allocation and register allocation policy.
You probably don't want your back end language doing stuff like
CSE, invariant code motion, expression decomposition into
primitives etc. (that's what the front end is supposed to do.)
You do want your back end to handle register allocation, argument
passing, instruction scheduling, peephole optimization and so
forth.
However, there is something of an impedance mismatch here that
neither C--, MLRISC or any of the others that I've looked at
address.
It seems to me that you want the compiler front end to deliver
some low level code representation along the lines of
:- type code_block
---> code_block(
entry_point :: maybe(proc_name),
globals :: var_names,
implicit_params :: var_names,
procs :: list(proc)
).
:- type proc
---> proc(
name :: proc_name,
params :: var_names,
results :: var_names,
locals :: var_names,
body :: body
).
:- type proc_name == string.
% XXX We should differentiate between ints and floats...
%
:- type var_names == list(var_name).
:- type var_name == string.
:- type body == list(instr).
:- type instr
---> (val := rval) % Assignment (move or
computation).
; call(string, var_names, var_names) % Subroutine call.
; goto(string, var_names) % Tail call.
; return % Return from subroutine.
; loop(body) % Circular loop.
; break(n) % Break out of nth enclosing
loop.
; continue(n) % Jump to start of nth enclosing
loop.
; if_then(cond, body)
; if_then_else(cond, body, body).
:- type val
---> const(int)
; var(var_name)
; ind(var_name, offset). % XXX Should this be a binary
op?
:- type offset
---> b(int) % Bytes.
; w(int). % Words.
:- type rval
---> val(val)
; i(int)
; op(unary_op, val)
; op(val, binary_op, val).
:- type cond
---> (val = val)
; (val \= val)
; (val > val)
; (val >= val)
; (val < val)
; (val =< val).
(This misses all sorts of stuff out such as constant declarations,
but gives the general idea.)
The outline above covers the majority (if not all) of language
constructs that are used in the
LL code would be compiled by the back-end into a very low level
language. Pretty much all that would be required is a set of
translations, the most complex being a specification of the
calling convention (parameter order, return address specification,
caller/callee saves register sets, global register values etc.),
and a spot of register colouring.
The VLL language would be something of the order:
:- type asms == list(asm).
:- type asm
---> (val := rval)
; label(label_name) % Jump target.
; goto(target) % Unconditional jump.
; cond_goto(cond, target). % Conditional jump.
:- type target
---> addr(label_name)
; val(val).
VLL code is simple enough that targeting most architectures should
be a relatively lightweight matter (one hopes). This is where one
would apply various peephole optimizations.
The LLDS already contains code to do much of this, which makes me
wonder how hard it would be to implement a retargetable binary
back end along the lines above?
While we would certainly expect an improvement in compilation times,
how much performance might we lose by not going via gcc?
As I say, I'd be interested in hearing what people think.
- Ralph
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list