[m-rev.] for review: first step towards MLDS->C accurate GC

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Nov 27 00:14:39 AEDT 2001


Branches: main
Estimated hours taken: 16

Preliminary steps towards support for accurate GC
in the MLDS->C back-end.

compiler/ml_elim_nested.m:
	Add support for a new pass that puts local
	variables that might contain pointers into
	a struct, and chains these structs together.
	
compiler/mercury_compile.m:
	Invoke the new pass (if `--gc accurate').

runtime/mercury.h:
runtime/mercury.c:
	Declare the `stack_chain' global variable that
	points to the head of the chain.

Workspace: /home/earth/fjh/ws-earth3/mercury
Index: compiler/ml_elim_nested.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/ml_elim_nested.m,v
retrieving revision 1.43
diff -u -d -r1.43 ml_elim_nested.m
--- compiler/ml_elim_nested.m	24 Oct 2001 05:44:01 -0000	1.43
+++ compiler/ml_elim_nested.m	26 Nov 2001 13:03:53 -0000
@@ -8,7 +8,24 @@
 % Main author: fjh
 
 % This module is an MLDS-to-MLDS transformation
-% that eliminates nested functions.
+% that has two functions:
+% (1) eliminating nested functions
+% (2) putting local variables that might contain pointers into
+%     structs, and chaining these structs together,
+%     for use with accurate garbage collection.
+%
+% The two transformations are quite similar,
+% so they're both handled by the same code;
+% a flag is passed to say which transformation
+% should be done.
+
+% XXX Would it be possible to do both in a single pass?
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+% (1) eliminating nested functions
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
 
 % Note that this module does not attempt to handle arbitrary MLDS
 % as input; it will only work with the output of the current MLDS
@@ -112,6 +129,163 @@
 % ml_code_gen puts in calls to the nested functions.
 
 %-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+% (2) accurate GC
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+%
+% SUMMARY
+%
+% This is an MLDS-to-MLDS transformation that transforms the
+% MLDS code to add the information needed to do accurate GC
+% when compiling to C (or to assembler).
+%
+% Basically what we do is to put all all local variables that might
+% contain pointers in structs, with one struct for each stack frame,
+% and chain these structs together.  At GC time, we traverse the chain
+% of structs.  This allows us to accurately scan the C stack.
+%
+%-----------------------------------------------------------------------------%
+%
+% DETAILED DESCRIPTION
+%
+% For each function, we generate a struct for that function.
+% Each such struct starts with a sub-struct containing a couple of
+% fixed fields, which allow the GC to traverse the chain:
+%
+%	struct <function_name>_pointers {
+%		struct MR_StackChain fixed_fields;
+%		...
+%	};
+%
+% The fixed fields are as follows:
+%
+%	struct MR_StackChain {
+%		struct MR_StackChain *prev;
+%		void (*traverse_frame)(void *this_frame);
+%	};
+%		
+% The prev field holds a link to the entry for this function's caller.
+% The traverse_frame field is the address of a function to
+% trace everything pointed to by this stack frame.
+%
+% To ensure that we don't try to traverse uninitialized fields,
+% we zero-initialize each struct before inserting it into the chain.
+%
+% We need to keep a link to the topmost frame on the stack.
+% There's two possible ways that we could handle this.
+% One way is to pass it down as an parameter.
+% Each function would get an extra parameter `stack_chain' which
+% points to the caller's struct.
+% An alternative approach is to just have a global variable
+% `stack_chain' that points to the top of the stack.  We need extra code
+% to set this pointer when entering and returning from functions.
+% To make this approach thread-safe, the variable would actually
+% need to be thread-local rather than global.
+% This approach would probably work best if the variable is
+% a GNU C global register variable, which would make it both
+% efficient and thread-safe.
+% XXX Currently, for simplicity, we're using a global variable.
+%
+% As an optimization, we ought to not bother allocating a struct for
+% functions that don't have any variables that might contain pointers.
+% We also ought to not bother allocating a struct for leaf functions that
+% don't contain any functions calls or memory allocations.
+% XXX These optimizations are not yet implemented!
+%
+%-----------------------------------------------------------------------------%
+%
+% EXAMPLE
+%
+% If we have a function
+%
+%	RetType
+%	foo(Arg1Type arg1, Arg2Type arg2, ...)
+%	{
+%		Local1Type local1;
+%		Local1Type local2;
+%		...
+%		local1 = MR_new_object(...);
+%		...
+%		bar(arg1, arg2, local1, &local2);
+%		...
+%	}
+%
+% where say Arg1Type and Local1Type might contain pointers,
+% but Arg2Type and Local2Type don't, then we would transform it as follows:
+%
+%	struct foo_frame {
+%		MR_StackChain fixed_fields;
+%		Arg1Type arg1;
+%		Local1Type arg1;
+%		...
+%	};
+%
+%	static void
+%	foo_traverse_frame(void *this_frame) {
+%		struct foo_frame *frame = (struct foo_frame *)this_frame;
+%		MR_GC_TRAVERSE(<TypeInfo for type of arg1>, &frame->arg1);
+%		MR_GC_TRAVERSE(<TypeInfo for type of local1>, &frame->local1);
+%		...
+%	}
+%
+%	RetType
+%	foo(Arg1Type arg1, Arg2Type arg2, ...)
+%	{
+%		struct foo_frame this_frame;
+%		Local1Type local2;
+%		
+%		this_frame.fixed_fields.prev = stack_chain;
+%		this_frame.fixed_fields.traverse_frame = foo_traverse_frame;
+%		this_frame.fixed_fields.arg1 = arg1;
+%		stack_chain = &this_frame;
+%
+%		...
+%		local1 = MR_new_object(...);
+%		...
+%		bar(this_frame.arg1, arg2, this_frame.local1, &local2);
+%		...
+%		stack_chain = stack_chain->prev;
+%	}
+%
+% Alternatively, if we were passing stack_chain as an argument,
+% rather than treating it as a global variable, then the generated
+% code for foo() would look like this:
+%
+%	RetType
+%	foo(struct MR_StackChain *stack_chain,
+%		Arg1Type arg1, Arg2Type arg2, ...)
+%	{
+%		struct foo_frame this_frame;
+%		Local1Type local2;
+%		
+%		this_frame.fixed_fields.prev = stack_chain;
+%		this_frame.fixed_fields.traverse_frame = foo_traverse_frame;
+%		this_frame.fixed_fields.arg1 = arg1;
+%
+%		...
+%		local1 = MR_new_object(&this_frame, ...);
+%		...
+%		bar(&this_frame, this_frame.arg1, arg2,
+%			this_frame.local1, &local2);
+%		...
+%		/* no need to explicitly unchain the stack frame here */
+%	}
+%
+% The code in the Mercury runtime to traverse the stack frames would
+% look something like this:
+%
+%	void
+%	MR_traverse_stack(struct MR_StackChain *stack_chain)
+%	{
+%		while (stack_chain != NULL) {
+%			(*stack_chain->traverse)(stack_chain);
+%			stack_chain = stack_chain->prev;
+%		}
+%	}
+%
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
 
 :- module ml_elim_nested.
 
@@ -123,10 +297,14 @@
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
-	% Eliminated nested functions for the whole MLDS.
-	%
-:- pred ml_elim_nested(mlds, mlds, io__state, io__state).
-:- mode ml_elim_nested(in, out, di, uo) is det.
+:- type action
+	--->	hoist_nested_funcs	% Eliminate nested functions
+	;	chain_gc_stack_frames.  % Add shadow stack for supporting
+					% accurate GC.
+
+	% Process the whole MLDS, performing the indicated action.
+:- pred ml_elim_nested(action, mlds, mlds, io__state, io__state).
+:- mode ml_elim_nested(in, in, out, di, uo) is det.
 
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
@@ -135,38 +313,46 @@
 :- import_module bool, int, list, std_util, string, require.
 
 :- import_module ml_code_util, ml_util.
+:- import_module prog_util, type_util.
 
 % the following imports are needed for mangling pred names
 :- import_module hlds_pred, prog_data, prog_out.
 
 :- import_module globals, options.
 
-	% Eliminated nested functions for the whole MLDS.
+	% Perform the specified action on the whole MLDS.
 	%
-ml_elim_nested(MLDS0, MLDS) -->
+ml_elim_nested(Action, MLDS0, MLDS) -->
 	globals__io_get_globals(Globals),
 	{ MLDS0 = mlds(ModuleName, ForeignCode, Imports, Defns0) },
 	{ MLDS = mlds(ModuleName, ForeignCode, Imports, Defns) },
 	{ MLDS_ModuleName = mercury_module_name_to_mlds(ModuleName) },
 	{ OuterVars = [] },
 	{ DefnsList = list__map(
-		ml_elim_nested_defns(MLDS_ModuleName, Globals, OuterVars),
+		ml_elim_nested_defns(Action, MLDS_ModuleName, Globals,
+			OuterVars),
 		Defns0) },
 	{ Defns = list__condense(DefnsList) }.
 
+	% Either eliminated nested functions:
 	% Hoist out any nested function occurring in a single mlds__defn.
 	% Return a list of mlds__defns that contains no nested functions.
 	%
-:- func ml_elim_nested_defns(mlds_module_name, globals, outervars,
+	% Or handle accurate GC:
+	% put all variables that might contain pointers in structs
+	% and chain these structs together into a "shadow stack".
+	%
+:- func ml_elim_nested_defns(action, mlds_module_name, globals, outervars,
 		mlds__defn) = list(mlds__defn).
-ml_elim_nested_defns(ModuleName, Globals, OuterVars, Defn0) = FlatDefns :-
+ml_elim_nested_defns(Action, ModuleName, Globals, OuterVars, Defn0)
+		= Defns :-
 	Defn0 = mlds__defn(Name, Context, Flags, DefnBody0),
 	( DefnBody0 = mlds__function(PredProcId, Params,
 			defined_here(FuncBody0), Attributes) ->
-		EnvName = ml_env_name(Name),
+		EnvName = ml_env_name(Name, Action),
 			% XXX this should be optimized to generate 
 			% EnvTypeName from just EnvName
-		ml_create_env(EnvName, [], Context, ModuleName, Globals,
+		ml_create_env(Action, EnvName, [], Context, ModuleName, Globals,
 			_EnvTypeDefn, EnvTypeName, _EnvDecls, _InitEnv),
 		
 		EnvPtrTypeName = ml_make_env_ptr_type(Globals, EnvTypeName),
@@ -175,11 +361,13 @@
 		% traverse the function body, finding (and removing)
 		% any nested functions, and fixing up any references
 		% to the arguments or to local variables or local
-		% static constants which occur in nested functions
+		% static constants that need to be put in the environment
+		% structure (e.g. because they occur in nested functions,
+		% or to make them visible to the garbage collector)
 		%
-		ElimInfo0 = elim_info_init(ModuleName, OuterVars, EnvTypeName,
-			EnvPtrTypeName),
-		Params = mlds__func_params(Arguments, _RetValues),
+		ElimInfo0 = elim_info_init(Action, ModuleName,
+			OuterVars, EnvTypeName, EnvPtrTypeName),
+		Params = mlds__func_params(Arguments, RetValues),
 		ml_maybe_add_args(Arguments, FuncBody0, ModuleName,
 			Context, ElimInfo0, ElimInfo1),
 		flatten_statement(FuncBody0, FuncBody1, ElimInfo1, ElimInfo),
@@ -195,14 +383,19 @@
 		% Fix up access flags on the statics that we're going to hoist:
 		% convert "local" to "private"
 		%
-		HoistedStatics = list__map(convert_local_to_global, LocalStatics),
+		HoistedStatics = list__map(convert_local_to_global,
+			LocalStatics),
 
 
 		%
+		% When hoisting nested functions,
 		% if there were no nested functions, then we just
 		% hoist the local static constants
 		%
-		( NestedFuncs0 = [] ->
+		(
+			Action = hoist_nested_funcs,
+			NestedFuncs0 = []
+		->
 			FuncBody = FuncBody1,
 			HoistedDefns = HoistedStatics
 		;
@@ -212,19 +405,21 @@
 			% both the containing function and the nested
 			% functions
 			%
-			ml_create_env(EnvName, LocalVars, Context, ModuleName,
-				Globals, EnvTypeDefn, _EnvTypeName, EnvDecls,
-				InitEnv),
+			ml_create_env(Action, EnvName, LocalVars, Context,
+				ModuleName, Globals, EnvTypeDefn,
+				_EnvTypeName, EnvDecls, InitEnv),
 			list__map_foldl(
 				ml_insert_init_env(EnvTypeName, ModuleName,
 					Globals), NestedFuncs0, NestedFuncs,
 					no, InsertedEnv),
 
 			% Hoist out the local statics and the nested functions
-			HoistedDefns0 = list__append(HoistedStatics, NestedFuncs),
+			HoistedDefns0 = list__append(HoistedStatics,
+				NestedFuncs),
 
 			% 
-			% It's possible that none of the nested
+			% When hoisting nested functions,
+			% it's possible that none of the nested
 			% functions reference the arguments or locals of
 			% the parent function.  In that case, there's no
 			% need to create an environment, we just need to 
@@ -238,47 +433,85 @@
 			% really a big problem, since the code
 			% that generates these arguments needs them.
 			%
-			( InsertedEnv = yes ->
+			(
+				Action = hoist_nested_funcs,
+				InsertedEnv = no
+			->
+				FuncBody = FuncBody1,
+				HoistedDefns = HoistedDefns0
+			;
 				%
 				% If the function's arguments are
-				% referenced by nested functions, then
-				% we need to copy them to local
+				% referenced by nested functions,
+				% or (for accurate GC) may contain pointers,
+				% then we need to copy them to local
 				% variables in the environment
 				% structure.
 				%
 				ml_maybe_copy_args(Arguments, FuncBody0,
-					ModuleName, EnvTypeName, EnvPtrTypeName,
+					ElimInfo, EnvTypeName, EnvPtrTypeName,
 					Context, _ArgsToCopy, CodeToCopyArgs),
 
 				%
+				% Insert code to unlink this stack frame
+				% before doing any tail calls or returning
+				% from the function, either explicitly
+				% or implicitly.
+				%
+					% add unlink statements before
+					% any explicit returns or tail calls
+				( Action = chain_gc_stack_frames ->
+					add_unchain_stack_to_statement(
+						FuncBody1, FuncBody2,
+						ElimInfo, _ElimInfo)
+				;
+					FuncBody2 = FuncBody1
+				),
+					% add a final unlink statement
+					% at the end of the function,
+					% if needed.  This is only needed if
+					% the function has no return values --
+					% if there is a return value, then the
+					% function must exit with an explicit
+					% return statement.
+				(
+					Action = chain_gc_stack_frames,
+					RetValues = []
+				->
+					UnchainFrame = [ml_gen_unchain_frame(
+						Context, ElimInfo)]
+				;
+					UnchainFrame = []
+				),
+
+				%
 				% insert the definition and
 				% initialization of the environment
 				% struct variable at the start of the
-				% top-level function's body
+				% top-level function's body,
+				% and append the final unlink statement
+				% (if any) at the end
 				%
 				FuncBody = ml_block(EnvDecls,
-					list__append(
-						[InitEnv | CodeToCopyArgs], 
-						[FuncBody1]), Context),
+						InitEnv ++ CodeToCopyArgs ++
+						[FuncBody2] ++ UnchainFrame,
+						Context),
 				%
-				% insert the environment struct type
-				% at the start of the list of hoisted definitions
+				% insert the environment struct type at
+				% the start of the list of hoisted definitions
 				% (preceding the previously nested functions
 				% and static constants in HoistedDefns0),
 				%
 				HoistedDefns = [EnvTypeDefn | HoistedDefns0]
-			;
-				FuncBody = FuncBody1,
-				HoistedDefns = HoistedDefns0
 			)
 		),
 		DefnBody = mlds__function(PredProcId, Params,
 			defined_here(FuncBody), Attributes),
 		Defn = mlds__defn(Name, Context, Flags, DefnBody),
-		FlatDefns = list__append(HoistedDefns, [Defn])
+		Defns = list__append(HoistedDefns, [Defn])
 	;
 		% leave definitions of things other than functions unchanged
-		FlatDefns = [Defn0]
+		Defns = [Defn0]
 	).
 
 	%
@@ -291,9 +524,11 @@
 
 ml_maybe_add_args([], _, _, _) --> [].
 ml_maybe_add_args([Arg|Args], FuncBody, ModuleName, Context) -->
+	=(ElimInfo),
 	(
-		{ Arg = data(var(VarName)) - _Type },
-		{ ml_should_add_local_data(ModuleName, VarName, [], [FuncBody]) }
+		{ Arg = data(var(VarName)) - Type },
+		{ ml_should_add_local_data(ElimInfo, VarName, Type,
+			[], [FuncBody]) }
 	->
 		{ ml_conv_arg_to_var(Context, Arg, ArgToCopy) },
 		elim_info_add_local_data(ArgToCopy)
@@ -307,18 +542,20 @@
 	% to the environment struct.
 	%
 :- pred ml_maybe_copy_args(mlds__arguments, mlds__statement,
-		mlds_module_name, mlds__type, mlds__type, mlds__context, 
+		elim_info, mlds__type, mlds__type, mlds__context, 
 		mlds__defns, mlds__statements).
 :- mode ml_maybe_copy_args(in, in, in, in, in, in, out, out) is det.
 
 ml_maybe_copy_args([], _, _, _, _, _, [], []).
-ml_maybe_copy_args([Arg|Args], FuncBody, ModuleName, ClassType, EnvPtrTypeName,
+ml_maybe_copy_args([Arg|Args], FuncBody, ElimInfo, ClassType, EnvPtrTypeName,
 		Context, ArgsToCopy, CodeToCopyArgs) :-
-	ml_maybe_copy_args(Args, FuncBody, ModuleName, ClassType,
+	ml_maybe_copy_args(Args, FuncBody, ElimInfo, ClassType,
 		EnvPtrTypeName,	Context, ArgsToCopy0, CodeToCopyArgs0),
+	ModuleName = elim_info_get_module_name(ElimInfo),
 	(
 		Arg = data(var(VarName)) - FieldType,
-		ml_should_add_local_data(ModuleName, VarName, [], [FuncBody])
+		ml_should_add_local_data(ElimInfo, VarName, FieldType,
+			[], [FuncBody])
 	->
 		ml_conv_arg_to_var(Context, Arg, ArgToCopy),
 
@@ -361,17 +598,20 @@
 	%	struct <EnvClassName> *env_ptr;
 	%	env_ptr = &env;
 	%
-:- pred ml_create_env(mlds__class_name, list(mlds__defn), mlds__context,
+:- pred ml_create_env(action, mlds__class_name, list(mlds__defn), mlds__context,
 		mlds_module_name, globals, mlds__defn, mlds__type,
-		list(mlds__defn), mlds__statement).
-:- mode ml_create_env(in, in, in, in, in, out, out, out, out) is det.
+		list(mlds__defn), list(mlds__statement)).
+:- mode ml_create_env(in, in, in, in, in, in, out, out, out, out) is det.
 
-ml_create_env(EnvClassName, LocalVars, Context, ModuleName, Globals,
+ml_create_env(Action, EnvClassName, LocalVars, Context, ModuleName, Globals,
 		EnvTypeDefn, EnvTypeName, EnvDecls, InitEnv) :-
 	%
 	% generate the following type:
 	%
 	%	struct <EnvClassName> {
+	%	  #ifdef ACCURATE_GC
+	%		struct MR_StackChain fixed_fields;
+	%	  #endif
 	%		<LocalVars>
 	%	};
 	%
@@ -390,7 +630,57 @@
 		EnvTypeKind),
 	EnvTypeEntityName = type(EnvClassName, 0),
 	EnvTypeFlags = env_type_decl_flags,
-	Fields = list__map(convert_local_to_field, LocalVars),
+	Fields0 = list__map(convert_local_to_field, LocalVars),
+
+	( Action = chain_gc_stack_frames ->
+		%
+		% insert the fixed fields:
+		%	void *prev;
+		%	void (*trace)(...);
+		%
+		PrevFieldName = data(var(var_name("prev", no))),
+		PrevFieldFlags = ml_gen_public_field_decl_flags,
+		PrevFieldType = mlds__generic_env_ptr_type,
+		PrevFieldDefnBody = mlds__data(PrevFieldType, no_initializer),
+		PrevFieldDecl = mlds__defn(PrevFieldName, Context,
+				PrevFieldFlags, PrevFieldDefnBody),
+
+		TraceFieldName = data(var(var_name("trace", no))),
+		TraceFieldFlags = ml_gen_public_field_decl_flags,
+		TraceFieldType = mlds__generic_type, % XXX
+		TraceFieldDefnBody = mlds__data(TraceFieldType, no_initializer),
+		TraceFieldDecl = mlds__defn(TraceFieldName, Context,
+				TraceFieldFlags, TraceFieldDefnBody),
+
+		Fields = [PrevFieldDecl, TraceFieldDecl | Fields0],
+
+		%
+		% Set the initializer for the `prev' field
+		% to the global stack chain
+		% XXX we want to zero out the rest of the environment struct;
+		%     will just not mentioning the remaining fields work?
+		%
+		StackChain = ml_stack_chain_var,
+		EnvInitializer = init_struct([init_obj(lval(StackChain))]),
+
+		%
+		% Generate code to set the global stack chain
+		% to point to the current environment.
+		%	 stack_chain = env_ptr;
+		%
+		EnvPtrTypeName = ml_make_env_ptr_type(Globals, EnvTypeName),
+		EnvPtr = lval(var(qual(ModuleName,
+				mlds__var_name("env_ptr", no)),
+			EnvPtrTypeName)),
+		AssignToStackChain = assign(StackChain, EnvPtr),
+		LinkStackChain = [mlds__statement(atomic(AssignToStackChain),
+			Context)]
+	;
+		Fields = Fields0,
+		EnvInitializer = no_initializer,
+		LinkStackChain = []
+	),
+
 	Imports = [],
 	Ctors = [], % mlds_to_il.m will add an empty constructor if needed
 	Interfaces = [],
@@ -402,11 +692,11 @@
 	%
 	% generate the following variable declaration:
 	%
-	%	struct <EnvClassName> env;
+	%	struct <EnvClassName> env; // = { ... }
 	%
 	EnvVarName = data(var(var_name("env", no))),
 	EnvVarFlags = ml_gen_local_var_decl_flags,
-	EnvVarDefnBody = mlds__data(EnvTypeName, no_initializer),
+	EnvVarDefnBody = mlds__data(EnvTypeName, EnvInitializer),
 	EnvVarDecl = mlds__defn(EnvVarName, Context, EnvVarFlags,
 		EnvVarDefnBody),
 
@@ -423,23 +713,19 @@
 	%
 	( OnHeap = yes ->
 		EnvVarAddr = lval(var(EnvVar, EnvTypeName)),
-		ml_init_env(EnvTypeName, EnvVarAddr, Context, ModuleName,
-			 Globals, EnvPtrVarDecl, InitEnv0),
-		
-		NewObj = mlds__statement(
+		NewObj = [mlds__statement(
 				atomic(new_object(
 					var(EnvVar, EnvTypeName), 
 					no, no, EnvTypeName, no, no, [], [])),
-				Context),
-		InitEnv = mlds__statement(block([], 
-			[NewObj, InitEnv0]), Context),
-		EnvDecls = [EnvVarDecl, EnvPtrVarDecl]
+				Context)]
 	;
 		EnvVarAddr = mem_addr(var(EnvVar, EnvTypeName)),
-		ml_init_env(EnvTypeName, EnvVarAddr, Context, ModuleName,
-			Globals, EnvPtrVarDecl, InitEnv),
-		EnvDecls = [EnvVarDecl, EnvPtrVarDecl]
-	).
+		NewObj = []
+	),
+	ml_init_env(EnvTypeName, EnvVarAddr, Context, ModuleName,
+		Globals, EnvPtrVarDecl, InitEnv0),
+	EnvDecls = [EnvVarDecl, EnvPtrVarDecl],
+	InitEnv = NewObj ++ [InitEnv0] ++ LinkStackChain.
 
 	% When converting local variables into fields of the
 	% environment struct, we need to change `local' access
@@ -610,6 +896,13 @@
 		mlds__statement(block(VarDecls, Statements), Context)
 	).
 		
+:- func ml_stack_chain_var = mlds__lval.
+ml_stack_chain_var = StackChain :-
+	mercury_private_builtin_module(PrivateBuiltin),
+	MLDS_Module = mercury_module_name_to_mlds(PrivateBuiltin),
+	StackChain = var(qual(MLDS_Module, var_name("stack_chain", no)),
+		mlds__generic_env_ptr_type).
+
 %-----------------------------------------------------------------------------%
 %
 % This code does some name mangling.
@@ -623,25 +916,27 @@
 
 	% Compute the name to use for the environment struct
 	% for the specified function.
-:- func ml_env_name(mlds__entity_name) = mlds__class_name.
+:- func ml_env_name(mlds__entity_name, action) = mlds__class_name.
 
-ml_env_name(type(_, _)) = _ :-
+ml_env_name(type(_, _), _) = _ :-
 	error("ml_env_name: expected function, got type").
-ml_env_name(data(_)) = _ :-
+ml_env_name(data(_), _) = _ :-
 	error("ml_env_name: expected function, got data").
-ml_env_name(function(PredLabel, ProcId, MaybeSeqNum, _PredId)) = ClassName :-
+ml_env_name(function(PredLabel, ProcId, MaybeSeqNum, _PredId),
+		Action) = ClassName :-
+	Base = (if Action = chain_gc_stack_frames then "locals" else "env"),
 	PredLabelString = ml_pred_label_name(PredLabel),
 	proc_id_to_int(ProcId, ModeNum),
 	( MaybeSeqNum = yes(SeqNum) ->
-		string__format("%s_%d_%d_env",
-			[s(PredLabelString), i(ModeNum), i(SeqNum)],
+		string__format("%s_%d_%d_%s",
+			[s(PredLabelString), i(ModeNum), i(SeqNum), s(Base)],
 			ClassName)
 	;
-		string__format("%s_%d_env",
-			[s(PredLabelString), i(ModeNum)],
+		string__format("%s_%d_%s",
+			[s(PredLabelString), i(ModeNum), s(Base)],
 			ClassName)
 	).
-ml_env_name(export(_)) = _ :-
+ml_env_name(export(_), _) = _ :-
 	error("ml_env_name: expected function, got export").
 
 :- func ml_pred_label_name(mlds__pred_label) = string.
@@ -810,8 +1105,9 @@
 %
 % flatten_nested_defns:
 % flatten_nested_defn:
-%	Hoist out nested function definitions and local variables
-%	referenced by nested functions, storing them both in the elim_info.
+%	Hoist out nested function definitions, and any local variables
+%	that need to go in the environment struct (e.g. because they are
+%	referenced by nested functions), storing them both in the elim_info.
 %
 
 :- pred flatten_nested_defns(mlds__defns, mlds__statements, mlds__defns,
@@ -870,14 +1166,13 @@
 		elim_info_add_nested_func(Defn),
 		{ Defns = [] }
 	;
-		{ DefnBody0 = mlds__data(_, _) },
+		{ DefnBody0 = mlds__data(Type, _) },
 		%
 		% for local variable definitions, if they are
 		% referenced by any nested functions, then
 		% strip them out and store them in the elim_info
 		%
 		=(ElimInfo),
-		{ ModuleName = elim_info_get_module_name(ElimInfo) },
 		(
 			(
 				% For IL and Java, we need to hoist all
@@ -890,7 +1185,8 @@
 				{ ml_decl_is_static_const(Defn0) }
 			;
 				{ Name = data(var(VarName)) },
-				{ ml_should_add_local_data(ModuleName, VarName,
+				{ ml_should_add_local_data(ElimInfo,
+					VarName, Type,
 					FollowingDefns, FollowingStatements) }
 			)
 		->
@@ -913,10 +1209,26 @@
 
 	%
 	% Succeed iff we should add the definition of this variable
-	% to the local_data field of the ml_elim_info, meaning that
+	% to the local_data field of the elim_info, meaning that
 	% it should be added to the environment struct
 	% (if it's a variable) or hoisted out to the top level
 	% (if it's a static const).
+:- pred ml_should_add_local_data(elim_info, mlds__var_name, mlds__type,
+		mlds__defns, mlds__statements).
+:- mode ml_should_add_local_data(in, in, in, in, in) is semidet.
+
+ml_should_add_local_data(ElimInfo, VarName, Type,
+		FollowingDefns, FollowingStatements) :-
+	Action = ElimInfo ^ action,
+	(
+		Action = chain_gc_stack_frames,
+		ml_type_might_contain_pointers(Type) = yes
+	;
+		Action = hoist_nested_funcs,
+		ml_need_to_hoist(ElimInfo ^ module_name, VarName,
+			FollowingDefns, FollowingStatements)
+	).
+
 	%
 	% This checks for a nested function definition
 	% or static initializer that references the variable.
@@ -931,11 +1243,11 @@
 	% so to keep things simple we do the same for the
 	% C back-end to, i.e. we always hoist all static constants.
 	%
-:- pred ml_should_add_local_data(mlds_module_name, mlds__var_name,
+:- pred ml_need_to_hoist(mlds_module_name, mlds__var_name,
 		mlds__defns, mlds__statements).
-:- mode ml_should_add_local_data(in, in, in, in) is semidet.
+:- mode ml_need_to_hoist(in, in, in, in) is semidet.
 
-ml_should_add_local_data(ModuleName, VarName,
+ml_need_to_hoist(ModuleName, VarName,
 		FollowingDefns, FollowingStatements) :-
 	QualVarName = qual(ModuleName, VarName),
 	(
@@ -955,6 +1267,59 @@
 		initializer_contains_var(Initializer, QualVarName)
 	).
 
+	% Return `yes' if the type needs to be traced by
+	% the accurate garbage collector, i.e. if it might
+	% contain pointers.
+	%
+	% It's always safe to return `yes' here, so if in doubt, we do.
+	%
+	% For floats, we can return `no' even though they might
+	% get boxed in some circumstances, because if they are
+	% boxed then they will be represented as mlds__generic_type.
+	%
+	% Note that with --gcc-nested-functions,
+	% cont_type will be a function pointer that
+	% may point to a trampoline function,
+	% which might in fact contain pointers.
+	% But in that case we can't trace them,
+	% so that combination of options isn't supported.
+	% XXX we should add code to handle_options to check it.
+	% Hence we can return `no' here for mlds__cont_type.
+
+:- func ml_type_might_contain_pointers(mlds__type) = bool.
+
+ml_type_might_contain_pointers(mercury_type(_Type, TypeCategory, _)) =
+	ml_type_category_might_contain_pointers(TypeCategory).
+ml_type_might_contain_pointers(mercury_array_type(_)) = yes.
+ml_type_might_contain_pointers(mlds__native_int_type) = no.
+ml_type_might_contain_pointers(mlds__native_float_type) = no.
+ml_type_might_contain_pointers(mlds__native_bool_type) = no.
+ml_type_might_contain_pointers(mlds__native_char_type) = no.
+ml_type_might_contain_pointers(mlds__foreign_type(_, _)) = yes.
+ml_type_might_contain_pointers(mlds__class_type(_, _, Category)) =
+	(if Category = mlds__enum then no else yes).
+ml_type_might_contain_pointers(mlds__ptr_type(_)) = yes.
+ml_type_might_contain_pointers(mlds__array_type(_)) = yes.
+ml_type_might_contain_pointers(mlds__func_type(_)) = no.
+ml_type_might_contain_pointers(mlds__generic_type) = yes.
+ml_type_might_contain_pointers(mlds__generic_env_ptr_type) = yes.
+ml_type_might_contain_pointers(mlds__pseudo_type_info_type) = yes.
+ml_type_might_contain_pointers(mlds__cont_type(_)) = no. 
+ml_type_might_contain_pointers(mlds__commit_type) = no.
+ml_type_might_contain_pointers(mlds__rtti_type(_)) = yes.
+ml_type_might_contain_pointers(mlds__unknown_type) = yes.
+
+:- func ml_type_category_might_contain_pointers(builtin_type) = bool.
+ml_type_category_might_contain_pointers(int_type) = no.
+ml_type_category_might_contain_pointers(char_type) = no.
+ml_type_category_might_contain_pointers(str_type) = yes.
+ml_type_category_might_contain_pointers(float_type) = no.
+ml_type_category_might_contain_pointers(pred_type) = yes.
+ml_type_category_might_contain_pointers(tuple_type) = yes.
+ml_type_category_might_contain_pointers(enum_type) = no.
+ml_type_category_might_contain_pointers(polymorphic_type) = yes.
+ml_type_category_might_contain_pointers(user_type) = yes.
+
 %-----------------------------------------------------------------------------%
 
 %
@@ -1562,6 +1927,144 @@
 
 %-----------------------------------------------------------------------------%
 
+:- pred add_unchain_stack_to_maybe_statement(maybe(mlds__statement),
+		maybe(mlds__statement), elim_info, elim_info).
+:- mode add_unchain_stack_to_maybe_statement(in, out, in, out) is det.
+
+add_unchain_stack_to_maybe_statement(no, no) --> [].
+add_unchain_stack_to_maybe_statement(yes(Statement0), yes(Statement)) -->
+	add_unchain_stack_to_statement(Statement0, Statement).
+	
+:- pred add_unchain_stack_to_statements(mlds__statements, mlds__statements,
+		elim_info, elim_info).
+:- mode add_unchain_stack_to_statements(in, out, in, out) is det.
+
+add_unchain_stack_to_statements(Statements0, Statements) -->
+	list__map_foldl(add_unchain_stack_to_statement,
+		Statements0, Statements).
+
+:- pred add_unchain_stack_to_statement(mlds__statement, mlds__statement,
+		elim_info, elim_info).
+:- mode add_unchain_stack_to_statement(in, out, in, out) is det.
+
+add_unchain_stack_to_statement(Statement0, Statement) -->
+	{ Statement0 = mlds__statement(Stmt0, Context) },
+	add_unchain_stack_to_stmt(Stmt0, Context, Stmt),
+	{ Statement = mlds__statement(Stmt, Context) }.
+
+:- pred add_unchain_stack_to_stmt(mlds__stmt, mlds__context, mlds__stmt,
+		elim_info, elim_info).
+:- mode add_unchain_stack_to_stmt(in, in, out, in, out) is det.
+
+add_unchain_stack_to_stmt(Stmt0, Context, Stmt) -->
+	(
+		{ Stmt0 = block(Defns, Statements0) },
+		add_unchain_stack_to_statements(Statements0, Statements),
+		{ Stmt = block(Defns, Statements) }
+	;
+		{ Stmt0 = while(Rval, Statement0, Once) },
+		add_unchain_stack_to_statement(Statement0, Statement),
+		{ Stmt = while(Rval, Statement, Once) }
+	;
+		{ Stmt0 = if_then_else(Cond, Then0, MaybeElse0) },
+		add_unchain_stack_to_statement(Then0, Then),
+		add_unchain_stack_to_maybe_statement(MaybeElse0, MaybeElse),
+		{ Stmt = if_then_else(Cond, Then, MaybeElse) }
+	;
+		{ Stmt0 = switch(Type, Val, Range, Cases0, Default0) },
+		list__map_foldl(add_unchain_stack_to_case, Cases0, Cases),
+		add_unchain_stack_to_default(Default0, Default),
+		{ Stmt = switch(Type, Val, Range, Cases, Default) }
+	;
+		{ Stmt0 = label(_) },
+		{ Stmt = Stmt0 }
+	;
+		{ Stmt0 = goto(_) },
+		{ Stmt = Stmt0 }
+	;
+		{ Stmt0 = computed_goto(_Rval, _Labels) },
+		{ Stmt = Stmt0 }
+	;
+		{ Stmt0 = call(_Sig, _Func, _Obj, _Args, _RetLvals, TailCall) },
+		( { TailCall = tail_call } ->
+			=(ElimInfo),
+			{ Stmt = prepend_unchain_frame(Stmt0, Context,
+				ElimInfo) }
+		;
+			{ Stmt = Stmt0 }
+		)
+	;
+		{ Stmt0 = return(_Rvals) },
+		=(ElimInfo),
+		{ Stmt = prepend_unchain_frame(Stmt0, Context, ElimInfo) }
+	;
+		{ Stmt0 = do_commit(_Ref) },
+		{ Stmt = Stmt0 }
+	;
+		{ Stmt0 = try_commit(Ref, Statement0, Handler0) },
+		add_unchain_stack_to_statement(Statement0, Statement),
+		add_unchain_stack_to_statement(Handler0, Handler),
+		{ Stmt = try_commit(Ref, Statement, Handler) }
+	;
+		{ Stmt0 = atomic(_AtomicStmt0) },
+		{ Stmt = Stmt0 }
+	).
+
+:- pred add_unchain_stack_to_case(mlds__switch_case, mlds__switch_case,
+		elim_info, elim_info).
+:- mode add_unchain_stack_to_case(in, out, in, out) is det.
+
+add_unchain_stack_to_case(Conds0 - Statement0, Conds - Statement) -->
+	list__map_foldl(fixup_case_cond, Conds0, Conds),
+	add_unchain_stack_to_statement(Statement0, Statement).
+
+:- pred add_unchain_stack_to_default(mlds__switch_default, mlds__switch_default,
+		elim_info, elim_info).
+:- mode add_unchain_stack_to_default(in, out, in, out) is det.
+
+add_unchain_stack_to_default(default_is_unreachable, default_is_unreachable)
+		--> [].
+add_unchain_stack_to_default(default_do_nothing, default_do_nothing) --> [].
+add_unchain_stack_to_default(default_case(Statement0), default_case(Statement))
+		-->
+	add_unchain_stack_to_statement(Statement0, Statement).
+	
+:- func prepend_unchain_frame(mlds__stmt, mlds__context, elim_info) =
+	mlds__stmt.
+prepend_unchain_frame(Stmt0, Context, ElimInfo) = Stmt :-
+	UnchainFrame = ml_gen_unchain_frame(Context, ElimInfo),
+	Statement0 = mlds__statement(Stmt0, Context),
+	Stmt = block([], [UnchainFrame, Statement0]).
+
+:- func append_unchain_frame(mlds__stmt, mlds__context, elim_info) =
+	mlds__stmt.
+append_unchain_frame(Stmt0, Context, ElimInfo) = Stmt :-
+	UnchainFrame = ml_gen_unchain_frame(Context, ElimInfo),
+	Statement0 = mlds__statement(Stmt0, Context),
+	Stmt = block([], [Statement0, UnchainFrame]).
+
+:- func ml_gen_unchain_frame(mlds__context, elim_info) = mlds__statement.
+ml_gen_unchain_frame(Context, ElimInfo) = UnchainFrame :-
+	EnvPtrTypeName = ElimInfo ^ env_ptr_type_name,
+	ModuleName = ElimInfo ^ module_name,
+	%
+	% Generate code to remove this frame from the stack chain:
+	%	stack_chain = stack_chain->prev;
+	%
+	StackChain = ml_stack_chain_var,
+	Tag = yes(0),
+	PrevFieldName = var_name("prev", no),
+	PrevFieldNameString = ml_var_name_to_string(PrevFieldName),
+	PrevFieldId = named_field(qual(ModuleName, PrevFieldNameString),
+		EnvPtrTypeName),
+	PrevFieldType = mlds__generic_env_ptr_type,
+	PrevFieldRval = lval(field(Tag, lval(StackChain), PrevFieldId,
+		PrevFieldType, EnvPtrTypeName)),
+	Assignment = assign(StackChain, PrevFieldRval),
+	UnchainFrame = mlds__statement(atomic(Assignment), Context).
+
+%-----------------------------------------------------------------------------%
+
 %
 % The elim_info type holds information that we use or accumulate
 % as we traverse through the function body.
@@ -1569,6 +2072,11 @@
 
 :- type elim_info
 	--->	elim_info(
+				% Specify whether we're eliminating nested
+				% functions, or doing the transformation
+				% needed for accurate GC.
+			action :: action,
+
 				% The name of the current module.
 			module_name :: mlds_module_name,
 
@@ -1609,10 +2117,11 @@
 	% innermost first
 :- type outervars == list(list(mlds__defn)).
 
-:- func elim_info_init(mlds_module_name, outervars, mlds__type, mlds__type)
-	= elim_info.
-elim_info_init(ModuleName, OuterVars, EnvTypeName, EnvPtrTypeName) =
-	elim_info(ModuleName, OuterVars, [], [], EnvTypeName, EnvPtrTypeName).
+:- func elim_info_init(action, mlds_module_name, outervars,
+		mlds__type, mlds__type) = elim_info.
+elim_info_init(Action, ModuleName, OuterVars, EnvTypeName, EnvPtrTypeName) =
+	elim_info(Action, ModuleName, OuterVars, [], [],
+		EnvTypeName, EnvPtrTypeName).
 
 :- func elim_info_get_module_name(elim_info) = mlds_module_name.
 elim_info_get_module_name(ElimInfo) = ElimInfo ^ module_name.
Index: runtime/mercury.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury.c,v
retrieving revision 1.25
diff -u -d -r1.25 mercury.c
--- runtime/mercury.c	31 May 2001 06:00:10 -0000	1.25
+++ runtime/mercury.c	8 Nov 2001 16:16:54 -0000
@@ -26,6 +26,8 @@
 ** Variable definitions
 */
 
+void *mercury__private_builtin____stack_chain;
+
 MR_Word mercury__private_builtin__dummy_var;
 
 /*---------------------------------------------------------------------------*/
Index: runtime/mercury.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury.h,v
retrieving revision 1.41
diff -u -d -r1.41 mercury.h
--- runtime/mercury.h	22 Nov 2001 16:22:29 -0000	1.41
+++ runtime/mercury.h	26 Nov 2001 13:06:12 -0000
@@ -273,10 +273,23 @@
 typedef struct MR_FO_PseudoTypeInfo_Struct19 MR_FO_PseudoTypeInfo_Struct19;
 typedef struct MR_FO_PseudoTypeInfo_Struct20 MR_FO_PseudoTypeInfo_Struct20;
 
+/* The chain of stack frames, used for accurate GC. */
+struct MR_StackChain {
+	struct MR_StackChain *prev;
+	void (*trace)(void *this_frame);
+};
+
 /*---------------------------------------------------------------------------*/
 /*
 ** Declarations of contants and variables
 */
+
+/*
+** This points to the start of the MR_StackChain frame list.
+** XXX Using a global variable for this is not thread-safe.
+**     We should probably use a GNU C global register variable.
+*/
+extern void *mercury__private_builtin__stack_chain;
 
 /* declare MR_TypeCtorInfo_Structs for the builtin types */
 extern const MR_TypeCtorInfo_Struct
Index: compiler/mercury_compile.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mercury_compile.m,v
retrieving revision 1.224
retrieving revision 1.225
diff -u -d -r1.224 -r1.225
--- mercury_compile.m	9 Nov 2001 18:20:48 -0000	1.224
+++ mercury_compile.m	26 Nov 2001 09:30:56 -0000	1.225
@@ -3221,11 +3259,20 @@
 	maybe_report_stats(Stats),
 	mercury_compile__maybe_dump_mlds(MLDS20, "20", "tailcalls"),
 
+	%
+	% Note that we call ml_elim_nested twice --
+	% the first time to flatten nested functions,
+	% and the second time to chain the stack frames
+	% together, for accurate GC.
+	% These two passes are quite similar,
+	% but must be done separately.
+	%
+
 	globals__io_lookup_bool_option(gcc_nested_functions, NestedFuncs),
 	( { NestedFuncs = no } ->
 		maybe_write_string(Verbose,
 			"% Flattening nested functions...\n"),
-		ml_elim_nested(MLDS20, MLDS30),
+		ml_elim_nested(hoist_nested_funcs, MLDS20, MLDS30),
 		maybe_write_string(Verbose, "% done.\n")
 	;
 		{ MLDS30 = MLDS20 }
@@ -3233,13 +3280,25 @@
 	maybe_report_stats(Stats),
 	mercury_compile__maybe_dump_mlds(MLDS30, "30", "nested_funcs"),
 
+	globals__io_get_gc_method(GC),
+	( { GC = accurate } ->
+		maybe_write_string(Verbose,
+			"% Threading GC stack frames...\n"),
+		ml_elim_nested(chain_gc_stack_frames, MLDS30, MLDS35),
+		maybe_write_string(Verbose, "% done.\n")
+	;
+		{ MLDS35 = MLDS30 }
+	),
+	maybe_report_stats(Stats),
+	mercury_compile__maybe_dump_mlds(MLDS35, "35", "gc_frames"),
+
 	globals__io_lookup_bool_option(optimize, Optimize),
 	( { Optimize = yes } ->
 		maybe_write_string(Verbose, "% Optimizing MLDS...\n"),
-		ml_optimize__optimize(MLDS30, MLDS40),
+		ml_optimize__optimize(MLDS35, MLDS40),
 		maybe_write_string(Verbose, "% done.\n")
 	;
-		{ MLDS40 = MLDS30 }
+		{ MLDS40 = MLDS35 }
 	),
 	maybe_report_stats(Stats),
 	mercury_compile__maybe_dump_mlds(MLDS40, "40", "optimize"),

Note that I accidentally committed the changes to
mercury_compile.m as part of my earlier change.
So the actually diff for that file against the current
repository just looks like this:

Index: compiler/mercury_compile.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mercury_compile.m,v
retrieving revision 1.226
diff -u -d -r1.226 mercury_compile.m
--- compiler/mercury_compile.m	26 Nov 2001 10:57:31 -0000	1.226
+++ compiler/mercury_compile.m	26 Nov 2001 12:42:37 -0000
@@ -3272,9 +3272,7 @@
 	( { NestedFuncs = no } ->
 		maybe_write_string(Verbose,
 			"% Flattening nested functions...\n"),
-		% XXX this version of ml_elim_nested doesn't exist
-		% ml_elim_nested(hoist_nested_funcs, MLDS20, MLDS30),
-		ml_elim_nested(MLDS20, MLDS30),
+		ml_elim_nested(hoist_nested_funcs, MLDS20, MLDS30),
 		maybe_write_string(Verbose, "% done.\n")
 	;
 		{ MLDS30 = MLDS20 }
@@ -3282,7 +3280,6 @@
 	maybe_report_stats(Stats),
 	mercury_compile__maybe_dump_mlds(MLDS30, "30", "nested_funcs"),
 
-	/* XXX the version of ml_elim_nested doesn't exist.
 	globals__io_get_gc_method(GC),
 	( { GC = accurate } ->
 		maybe_write_string(Verbose,
@@ -3294,8 +3291,6 @@
 	),
 	maybe_report_stats(Stats),
 	mercury_compile__maybe_dump_mlds(MLDS35, "35", "gc_frames"),
-	*/
-	{ MLDS35 = MLDS30 },
 
 	globals__io_lookup_bool_option(optimize, Optimize),
 	( { Optimize = yes } ->

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list