[m-dev.] for review: first cut at MLDS

Fergus Henderson fjh at cs.mu.OZ.AU
Thu Jul 1 16:41:14 AEST 1999


Hi all,

Tyson and I are going to work on a new back-end for the Mercury compiler.
The new back-end will generate higher-level code than the current back-end,
and so it will go through a new intermediate data structure called the "MLDS"
(medium level data structure).

This is nowhere near being ready to commit yet.
At this stage it is closer to a back-of-the-envelope design sketch.
But I think it is sufficiently fleshed out for me to get some useful
feedback, and I suspect Tyson and Zoltan may well want to comment on
this.

If this looks OK, then my next step will be to fill in all the details
so that I can get it to compile, and then modify mercury_to_c.m to
generate MLDS instead of C.  This would be capable of handling variables
of type `int', and det and semidet code only.

Open issues:
	- How to handle polymorphism -- for the high-level C back-end,
	  we should handle it exactly as we do in the current LLDS-based
	  backend, but for the JVM and similar back-ends we would need
	  to do something different.
	- How to handle discriminated union types.  Again, for the MLDS->C
	  back-end we could use the same data representation as our current
	  LLDS->C compiler uses, but for JVM and similar back-ends we
	  probably want to use inheritence.

%-----------------------------------------------------------------------------%
% Copyright (C) 1999 The University of Melbourne.
% This file may only be copied under the terms of the GNU General
% Public License - see the file COPYING in the Mercury distribution.
%-----------------------------------------------------------------------------%

% MLDS - The Medium-Level Data Structure.
% Main author: fjh.

% This module defines the MLDS data structure itself.
% The MLDS is an intermediate data structure used in compilation;
% we compile Mercury source -> parse tree -> HLDS -> MLDS -> target (e.g. C).
% See notes/compiler_design.html for more information about the MLDS & LLDS.
% [XXX Need to document MLDS in notes/compiler_design.html.]
%
% The MLDS is intended to be suitable for generating target code in
% languages such as Java, Java bytecode, high-level C, C++, or C--, etc.
% This is as opposed to the LLDS, which is better suited for generating
% assembler or the very low-level C or GNU C (lower level than C--) that
% the original Mercury compiler backend generates.  In the LLDS, we are
% doing all our own register allocation and stack manipulation, but with
% MLDS, those tasks are the responsibility of the target.
% The one really important simplification that we do make relative to the
% HLDS is that MLDS does not have any support for non-determinism, so the
% HLDS->MLDS compiler must compile non-deterministic code into code that
% uses continuation passing.

% The MLDS data structure is quite full-featured, including for example
% support for multiple return values and tagged pointers.  However, many
% of the intended target languages don't support all of those features. 
% Therefore the HLDS->MLDS compiler must ensure that the final MLDS code that
% it eventually generates does not use features which the target does not
% support.  This will (presumably) be accomplished by having handle_options.m
% set various flags according to the selected target, and having the
% HLDS->MLDS compiler take account of those flags and either generate simpler
% MLDS code in the first place or run some extra simplification passes over
% the MLDS code before invoking the MLDS->target compiler.
% 

%-----------------------------------------------------------------------------%

:- module mlds.

:- interface.

:- import_module hlds_pred, prog_data.
:- import_module llds.
:- import_module bool, list, std_util.

%-----------------------------------------------------------------------------%

%
% The type `mlds' is the actual MLDS.
%
:- type mlds
	---> mlds(
		module_name,		% The Mercury module name

		mlds__foreign_code,	% Code defined in some other language,
					% e.g. for `pragma c_header_code', etc.

		% The MLDS code itself
		mlds__imports,		% Packages/modules/classes to import
		mlds__defns		% 
	).

:- type mlds__imports == list(mlds__import).
:- type mlds__import.

:- type mlds__defns == list(mlds__defn).
:- type mlds__defn
	---> mlds__defn(
		mlds__entity_name,	% the name of the entity being declared
		mlds__context,		% the source location
		mlds__decl_flags,	% these contain the following:
			% mlds__access,		% public/private/protected
			% mlds__member_type,	% static/per_instance
			% mlds__virtuality,	% virtual/non_virtual
			% mlds__finality,	% final/overridable (funcs only)
			% mlds__constness,	% const/modifiable  (data only)
			% mlds__is_abstract,	% abstract/concrete
			% etc.
		mlds__entity_defn	% the definition of the entity
	).

:- type mlds__entity_name.

	% This specifies information about some entity being defined
	% The entity may be any of the following:
	%	constant or variable
	%	function
	%	class, including
	%		package (class with only static members)
	%		interface (abstract class, no data members)
	%		struct (value class)
	%		enum
:- type mlds__entity_defn
		% constants or variables
	--->	mlds__data(
			mlds__type,
			maybe(mlds__initializer)
		)
		% functions
	;	mlds__function(
			maybe(pred_proc_id),	% identifies the original
						% Mercury procedure, if any
			mlds__func_signature,	% the argument & return types
			maybe(mlds__statement)	% the function body, or `no'
						% if the function is abstract
		)
		% packages, classes, interfaces, structs, enums
	;	mlds__class(
			mlds__class
		).

:- type mlds__type.

:- type mlds__initializer == list(mlds__rval).

:- type mlds__func_signature
	---> mlds__func_signature(
		mlds__type,		% return type
		list(mlds__type)	% argument types
	).

:- type mlds__class_kind
	--->	mlds__class		% a generic class:
					% can inherit other classes and
					% interfaces
					% (but most targets will only support
					% single inheritence, so usually there
					% will be at most one class)
	;	mlds__package		% class with only static members
					% (can only inherit other packages)
	;	mlds__interface		% class with no variable data members
					% (can only inherit other interfaces)
	;	mlds__struct		% value class
					% (can only inherit other structs)
	;	mlds__enum		% class with one integer member and
					% a bunch of static consts
					% (cannot inherit anything)
	.

:- type mlds__class
	---> mlds__class(
		mlds__class_kind,
		mlds__imports,			% imports these classes (or
						% modules, packages, ...)
		list(mlds__class_id),		% inherits these base classes
		list(mlds__interface_id),	% implements these interfaces
		mlds__defns			% contains these members
	).

:- type mlds__class_id.
:- type mlds__interface_id.

%-----------------------------------------------------------------------------%

:- type mlds__decl_flags.

:- type access
	--->	public
	;	private
	;	protected
	;	default.	% Java "default" access: accessible to anything
				% defined in the same package.

:- type per_instance
	--->	one_copy	% i.e. "static" storage duration
				% (but not necessarily static linkage)
				% or static member function
	;	per_instance.	% i.e. "auto" local variable in function,
				% or non-static member of class.

:- type virtuality
	--->	virtual
	;	non_virtual.

:- type finality
	--->	final
	;	overridable.

:- type constness
	--->	const
	;	modifiable.

:- type abstractness
	--->	abstract
	;	concrete.

:- func access(mlds__decl_flags) = access.
:- func per_instance(mlds__decl_flags) = per_instance.
:- func virtuality(mlds__decl_flags) = virtuality.
:- func finality(mlds__decl_flags) = finality.
:- func constness(mlds__decl_flags) = constness.
:- func abstractness(mlds__decl_flags) = abstractness.

%-----------------------------------------------------------------------------%

	%
	% C code required for the C interface.
	% When compiling to a language other than C,
	% this part still needs to be generated as C code
	% and compiled with a C compiler.
	%
:- type mlds__foreign_code
	---> mlds__foreign_code(
		c_header_info,
		list(user_c_code),
		list(c_export)		% XXX we will need to modify
					% export.m to handle different
					% target languages
	).

%-----------------------------------------------------------------------------%

:- type mlds__statement
	--->	mlds__statement(
			mlds__stmt,
			mlds__context
		).

	% mlds__context is probably == prog_context,
	% but might also contain goal_path or other information.
:- type mlds__context.

:- type mlds__stmt
	--->

	%
	% sequence
	%
		block(mlds__defns, list(mlds__statement))

	%
	% iteration
	%
	;	while(mlds__rval, mlds__statement, bool)
			% the `bool' is true iff the loop is guaranteed
			% to iterate at least once -- in that case,
			% the compiler can generate a `do...while' loop
			% rather than a `while...' loop.

	%
	% selection (see also computed_goto)
	%
	;	if_then_else(mlds__rval, mlds__statement,
			maybe(mlds__statement))

/******
	% Is it worth including this?  We already have `computed_goto'...
	;	switch(
			mlds__rval,

			% other representations might be better...
			assoc_list(mlds__rval, mlds__statement),
			mlds__statement		% the default case
		)
******/

	%
	% transfer of control
	%

	;	label(label)
			% Defines a label that can be used as the
			% target of calls, gotos, etc.

	;	goto(label)
			% goto(Target)
			% Branch to the specified address.

	;	computed_goto(mlds__rval, list(label))
			% Evaluate rval, which should be an integer,
			% and jump to the (rval+1)th label in the list.
			% e.g. computed_goto(2, [A, B, C, D])
			% will branch to label C.


	%
	% function call/return
	%

	;	call(
			mlds__func_signature,	% signature of the func
			mlds__rval,		% the function to call
			maybe(mlds__rval),	% for method calls, this field
						% specifies the `this' object
			list(mlds__rval),	% ordinary function arguments
			is_tail_call		% indicates whether this
						% call is a tail call
		)

	;	return(list(mlds__rval))	% Some targets will not support
						% returning more than one value
						
	%
	% exception handling
	%

	/* XXX not yet implemented */

	%
	% atomic statements
	%

	;	atomic(mlds__atomic_statement)
	
	.


:- type is_tail_call
	--->	tail_call	% a tail call
	;	call		% just an ordinary call
	.


	%
	% atomic statements
	%
:- type mlds__atomic_statement

	--->	comment(string)
			% Insert a comment into the output code.

	;	assign(mlds__lval, mlds__rval)
			% assign(Location, Value):
			% Assign the value specified by rval to the location
			% specified by lval.

	%
	% heap management
	%

	;	alloc_mem(mlds__lval, maybe(tag), mlds__rval, mlds__type)
			% Get a memory block of a size given by an rval
			% and put its address in the given lval,
			% possibly after tagging it with a given tag.
			% (Some targets might not support tags.)

	;	mark_hp(mlds__lval)
			% Tell the heap sub-system to store a marker
			% (for later use in restore_hp/1 instructions)
			% in the specified lval
			%
			% It's OK for the target to treat this as a no-op,
			% and probably that is what most targets will do.

	;	restore_hp(mlds__rval)
			% The rval must be a marker as returned by mark_hp/1.
			% The effect is to deallocate all the memory which
			% was allocated since that call to mark_hp.
			%
			% It's OK for the target to treat this as a no-op,
			% and probably that is what most targets will do.

	%
	% trail management
	%

	;	trail_op(trail_op)

	%
	% foreign language interfacing
	%

	;	target_code(target_lang, string)
			% Do whatever is specified by the string,
			% which can be any piece of code in the specified
			% target language (C, assembler, IL, or whatever)
			% that does not have any non-local flow of control.

	.


	%
	% This is just a random selection of possible languages
	% that we might want to target...
	%
:- type target_lang
	--->	lang_C
	;	lang_GNU_C
	;	lang_C_minus_minus
	;	lang_asm
	;	lang_IL_bytecode
	;	lang_IL_asm
	;	lang_java_asm
	;	lang_java_bytecode
	.


	%
	% trail management
	%
:- type trail_op

	--->	store_ticket(mlds__lval)
			% Allocate a new "ticket" and store it in the lval.

	;	reset_ticket(mlds__rval, reset_trail_reason)
			% The rval must specify a ticket allocated with
			% `store_ticket' and not yet invalidated or
			% deallocated.
			% If undo_reason is `undo' or `exception', restore
			% any mutable global state to the state it was in when
			% the ticket was obtained with store_ticket();
			% invalidates any tickets allocated after this one.
			% If undo_reason is `commit' or `solve', leave the state
			% unchanged, just check that it is safe to commit
			% to this solution (i.e. that there are no outstanding
			% delayed goals -- this is the "floundering" check).
			% Note that we do not discard trail entries after
			% commits, because that would in general be unsafe.
			%
			% Any invalidated ticket is useless and should
			% be deallocated with either `discard_ticket'
			% or `discard_tickets_to'.

	;	discard_ticket
			% Deallocates the most-recently allocated ticket.

	;	mark_ticket_stack(mlds__lval)
			% Tell the trail sub-system to store a ticket counter
			% (for later use in discard_tickets_upto)
			% in the specified lval.

	;	discard_tickets_to(mlds__rval)
			% The rval must be a ticket counter obtained via
			% `mark_ticket_stack' and not yet invalidated.
			% Deallocates any trail tickets allocated after
			% the corresponding call to mark_ticket_stack.
			% Invalidates any later ticket counters.
	.

%-----------------------------------------------------------------------------%

	%
	% An lval represents a data location or variable that can be used
	% as the target of an assignment.
	%
	% XXX this probably needs work
	%

:- type field_id.

:- type mlds__var.

:- type mlds__lval 

	%
	% values on the heap
	%
	--->	field(maybe(tag), mlds__rval, field_id)
				% field(Tag, Address, FieldName)
				% selects a field of a compound term.
				% Address is a tagged pointer to a cell
				% on the heap; the offset into the cell
				% is FieldNum words. If Tag is yes, the
				% arg gives the value of the tag; if it is
				% no, the tag bits will have to be masked off.
				% The value of the tag should be given if
				% it is known, since this will lead to
				% faster code.

	%
	% values somewhere in memory
	%
	;	mem_ref(mlds__rval)	% The rval should have
				% originally come from a mem_addr rval.

	%
	% local variables
	%
	;	lvar(mlds__var)
	
	.

%-----------------------------------------------------------------------------%

	% XXX this probably needs work

	% An rval is an expression that represents a value.
:- type mlds__rval	
	--->	lval(mlds__lval)
		% The value of an `lval' rval is just the value stored in
		% the specified lval.

	;	create(tag, list(maybe(mlds__rval)), create_arg_types,
			static_or_dynamic, int, mlds__type)
		% create(Tag, Arguments, MaybeArgTypes, StaticOrDynamic,
		%	LabelNumber, CellKind):
		% [see comments in llds.m]

	;	mkword(tag, mlds__rval)
		% Given a pointer and a tag, mkword returns a tagged pointer.

	;	const(rval_const)

	;	unop(unary_op, mlds__rval)

	;	binop(binary_op, mlds__rval, mlds__rval)

	;	mem_addr(mem_ref).
		% The address of a word in the heap, etc.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list