[m-dev.] for review: generating C layout structures

Tyson Dowd trd at cs.mu.OZ.AU
Tue Jan 9 18:16:56 AEDT 2001


On 08-Jan-2001, Zoltan Somogyi <zs at cs.mu.OZ.AU> wrote:
> For review by Tyson.
> 
> Estimated hours taken: 30
> 
> Instead of generating the layout structures of labels, procs and modules
> as rvals, generate them almost entirely as C structures. This will make
> future modifications much easier, since mismatches between what the runtime
> expects and what the compiler generates will now be pointed out by the C
> compiler. (It also reduces the size of the C source files generated with
> debugging enabled by about 5%.) Layout structures contain a few components
> that are not well-typed in C; we continue to generate these as rvals.
> 
> Closure layout structures used to have a well-typed part and a non-well-typed
> part. We now generate the well-typed part as a separate structure, pointed to
> from the other. We also extend the well-typed part, so that instead of
> just giving the name the called procedure, it also identifies the source
> location where the closure was constructed. This could be useful for
> the debugger and for deep profiling.
> 
> runtime/mercury_stack_layout.h:
> 	Reorganize the definitions of layout structures. Rename
> 	Stack_Layout_Entry structures as Proc_Layout structures,
> 	and Stack_Layout_Label structures as Label_Layout structures.
> 	(The debugger paper refers to the structures by the new names.)
> 	Fold the Stack_Layout_Vars structure into the structure that contains
> 	it, the Label_Layout structure. Add a Closure_Id structure that
> 	contains a Proc_Id structure as well as extra information identifying
> 	the source location where the closure was created.
> 
> 	Create "short" versions of the Proc_Layout structures, which contain
> 	only the first one or two of the three groups of fields. Previously,
> 	the Mercury compiler would define new C types when it generated such
> 	short structures. Since we are not defining new C types anymore, there
> 	must be a C type for every kind of structure the Mercury compiler can
> 	generate. We now also have separate variants for the layouts of
> 	user-defined and compiler-generated procedures, since the format
> 	of their procedure id information is different. While the runtime
> 	system refers to their procedure id information through a union,
> 	the C types of the structures generated by the Mercury compiler
> 	do not use a union, since a union cannot be initialized through
> 	its second member.
> 
> 	Make the constant fields of structures const, since we now generate
> 	values of those structure types, and initialize them with constant
> 	data.
> 
> 	Move the documentation of layout structures here from stack_layout.m.
> 
> runtime/mercury_ho_call.h:
> 	Instead of bodily including an MR_Proc_Id structure in closures,
> 	include a pointer to the more detailed MR_Closure_Id structure.
> 
> runtime/mercury_accurate_gc.c:
> runtime/mercury_agc_debug.c:
> runtime/mercury_init.h:
> runtime/mercury_label.[ch]:
> runtime/mercury_layout_util.[ch]:
> 	Minor updates to conform to changes in mercury_stack_layout.h.
> 
> runtime/mercury_goto.h:
> 	Use separate naming schemes for label layout structures and proc layout
> 	structures.
> 
> library/exception.m:
> 	Minor updates to conform to changes in mercury_stack_layout.h.
> 
> compiler/layout.m:
> 	A new module that defines data structures for label, proc and module
> 	layout structures and for closure id structures.
> 
> compiler/layout_out.m:
> 	A new module that converts the Mercury data structures of layout.m into
> 	declarations and definitions of C data structures.
> 
> compiler/stack_layout.m:
> 	Generate the new layout structures instead of rvals.
> 
> 	Move the documentation of layout structures from here to
> 	runtime/mercury_stack_layout.h, since this module is no longer
> 	aware of some of their details.
> 
> compiler/llds.m:
> 	Make layout structures a separate kind of compiler-generated data.
> 
> compiler/llds_out.m:
> 	Remove the code for the output of layout structures; call layout_out.m
> 	instead.
> 
> compiler/code_gen.m:
> compiler/code_info.m:
> compiler/llds.m:
> compiler/mercury_compile.m:
> compiler/unify_gen.m:
> 	Instead of handling closure layouts like other static data, handle
> 	them separately. Add a counter to the code_info structure in order
> 	to allow closure id structures to be identified uniquely by a pair
> 	consisting of the id of the procedure that generates them and a closure
> 	sequence number within that procedure.
> 
> compiler/llds_common.m:
> 	Look for common rvals among the rvals in layout structures.
> 
> compiler/opt_debug.m:
> 	Generate developer-friendly names for layout structure references.
> 
> browser/dl.m:
> 	Update the code for constructing closure layouts.

You also need to update extras/dynamic_linking/dl.m

This double update problem needs to be at least documented at the top of
dl.m, but really we have to decide how we are going to fix it in the
long term.

> Index: browser/dl.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/browser/dl.m,v
> retrieving revision 1.5
> diff -u -b -r1.5 dl.m
> --- browser/dl.m	2000/10/16 01:33:23	1.5
> +++ browser/dl.m	2001/01/07 11:32:57
> @@ -1,5 +1,5 @@
>  %-----------------------------------------------------------------------------%
> -% Copyright (C) 1998-2000 The University of Melbourne.
> +% Copyright (C) 1998-2001 The University of Melbourne.
>  % This file may only be copied under the terms of the GNU Library General
>  % Public License - see the file COPYING.LIB in the Mercury distribution.
>  %-----------------------------------------------------------------------------%
> @@ -103,20 +103,9 @@
>  #endif
>  }").
>  
> -:- type closure_layout
> -	--->	closure_layout(
> -			int,
> -			string,
> -			string,
> -			string,
> -			int,
> -			int,
> -			int
> -		).
> -
>  :- type closure
>  	--->	closure(
> -			closure_layout,
> +			int,		% really MR_Closure_Layout
>  			c_pointer,
>  			int
>  		).

I'm not sure why you used int instead of c_pointer, or
something like
	:- type closure_layout ---> closure_layout(c_pointer).

> Index: compiler/layout_out.m
> ===================================================================
> RCS file: layout_out.m
> diff -N layout_out.m
> --- /dev/null	Thu Sep  2 15:00:04 1999
> +++ layout_out.m	Tue Jan  2 18:41:13 2001
> @@ -0,0 +1,881 @@
> +%-----------------------------------------------------------------------------%
> +% Copyright (C) 2001 The University of Melbourne.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +%
> +% Author: zs.
> +
> +%-----------------------------------------------------------------------------%

This file should have a documented purpose.


> +
> +:- module layout_out.
> +
> +:- interface.
> +
> +:- import_module layout, llds, llds_out.
> +:- import_module bool, io.
> +
> +:- pred output_layout_data_defn(layout_data::in, decl_set::in, decl_set::out,
> +	io__state::di, io__state::uo) is det.
> +
> +:- pred output_layout_data_decl(layout_data::in, decl_set::in, decl_set::out,
> +	io__state::di, io__state::uo) is det.
> +
> +:- pred output_layout_addr_storage_type_name(layout_name::in, bool::in,
> +	io__state::di, io__state::uo) is det.
> +
> +:- pred output_layout_addr(layout_name::in,
> +	io__state::di, io__state::uo) is det.
> +

Depending on how detailed the purpose is, you may or may not need to add
some one-line comments to the exported output_layout_* preds to explain
what they do.

> +:- pred layout_name_would_include_code_addr(layout_name::in, bool::out) is det.
> +
> +:- pred make_label_layout_name(label::in, string::out) is det.

But these two should be commented here.

> +:- pred file_name_mangle(string::in, string::out) is det.
> +
> +file_name_mangle(FileName, MangledFileName) :-
> +	(
> +		string__remove_suffix(FileName, ".m", BaseName),
> +		string__is_alnum_or_underscore(BaseName)
> +	->
7B
> +		string__append(BaseName, "_dot_m", MangledFileName)
> +	;
> +		llds_out__name_mangle(FileName, MangledFileName)
> +	).

What is this code for? It appears to be unused.  The code that
writes file names just uses quote_and_write_string/3.

> Index: compiler/stack_layout.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/stack_layout.m,v
> retrieving revision 1.58
> diff -u -b -r1.58 stack_layout.m
> --- compiler/stack_layout.m	2000/12/06 06:05:16	1.58
> +++ compiler/stack_layout.m	2001/01/07 09:50:07
> @@ -1,292 +1,23 @@
>  %---------------------------------------------------------------------------%
> -% Copyright (C) 1997-2000 University of Melbourne.
> +% Copyright (C) 1997-2001 University of Melbourne.
>  % This file may only be copied under the terms of the GNU General
>  % Public License - see the file COPYING in the Mercury distribution.
>  %---------------------------------------------------------------------------%
>  %
> -% This module generates the LLDS code that defines global constants to
> -% hold the `stack_layout' structures of the stack frames defined by the
> -% current module.
> -%
> -% The tables generated have a number of `create' rvals within them.
> -% llds_common.m converts these into static data structures.
> -%
> -% We can create several types of stack layouts. Which kind we generate
> -% depends on the values of several options.
> -%
> +% File: stacK_layout.m.

s/stacK/stack

> @@ -604,26 +232,53 @@
>  		VarSet, VarTypes, InternalMap) },
>  	{ map__to_assoc_list(InternalMap, Internals) },
>  	stack_layout__set_cur_proc_named_vars(map__init),
> -	list__foldl(stack_layout__construct_internal_layout(EntryLabel),
> -		Internals),
> +
> +	{ code_util__extract_proc_label_from_label(EntryLabel, ProcLabel) },
> +	stack_layout__get_procid_stack_layout(ProcIdLayout0),
> +	{ bool__or(ProcIdLayout0, ForceProcIdLayout, ProcIdLayout) },
> +	( { ProcIdLayout = yes } ->
> +		{
> +			ProcLabel = proc(_, _, _, _, _, _),
> +			UserOrCompiler = user
> +		;
> +			ProcLabel = special_proc(_, _, _, _, _, _),
> +			UserOrCompiler = compiler
> +		},

proc_layout_user_or_compiler does exactly this piece of logic.
It's not a big deal, but maybe it could be lifted and put somewhere
useful?

> Index: runtime/mercury_ho_call.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_ho_call.h,v
> retrieving revision 1.3
> diff -u -b -r1.3 mercury_ho_call.h
> --- runtime/mercury_ho_call.h	2000/08/03 06:18:46	1.3
> +++ runtime/mercury_ho_call.h	2001/01/07 11:27:48
> @@ -1,5 +1,5 @@
>  /*
> -** Copyright (C) 1999, 2000 The University of Melbourne.
> +** Copyright (C) 1999-2001 The University of Melbourne.
>  ** This file may only be copied under the terms of the GNU Library General
>  ** Public License - see the file COPYING.LIB in the Mercury distribution.
>  */
> @@ -18,7 +18,7 @@
>  #ifndef	MERCURY_HO_CALL_H
>  #define	MERCURY_HO_CALL_H
>  
> -#include "mercury_stack_layout.h"
> +#include "mercury_stack_layout.h"	/* for MR_Closure_Id etc */
>  #include "mercury_type_info.h"		/* for MR_PseudoTypeInfo */
>  
>  /*
> @@ -27,7 +27,7 @@
>  ** in any closure that calls that procedure. It is represented as a
>  ** vector of words containing
>  **
> -**	a MR_Stack_Layout_Proc_Id structure
> +**	a pointer to an MR_Closure_Id structure
>  **	a pointer to information about the locations of typeinfos
>  **		for the type parameters of the procedure
>  **		(NULL if there are no type parameters)
> @@ -63,15 +63,28 @@
>  ** will not have any layout information. This will be indicated by the value
>  ** of num_all_args being negative, which says that the only field of this
>  ** structure containing valid information is proc_id.
> +**
> +** The Dyn_Link variant is for closures created by browser/dl.m. The closure_id
> +** field of such closures will contain an invalid proc_id (which you can tell
> +** from the negative arity) and a creation context that is also different from
> +** other closures: instead of specifying the source context where the closure
> +** is created, it puts a sequence number into the field that normally contains
> +** the line number.
>  */

I'm wondering whether it would be possible to get the compiler to
generate a genuine closure layout which dl.m could load.  If it has a
known (that is, regularly mangled) symbol name then couldn't we
reference it by dynamically loading it?  

I'm thinking that if you told the compiler that "somepredname" was an
"entry point" it might generate a closure layout suitable for
dynamically linking.

> Index: runtime/mercury_stack_layout.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_stack_layout.h,v
> retrieving revision 1.45
> diff -u -b -r1.45 mercury_stack_layout.h
> --- runtime/mercury_stack_layout.h	2000/12/06 06:05:45	1.45
> +++ runtime/mercury_stack_layout.h	2001/01/07 09:50:29

> +** The MR_sll_var_count field, if it is valid, is encoded by the formula
> +** (#Long << MR_SHORT_COUNT_BITS + #Short), where #Short is the number
> +** data items whose descriptions fit into an MR_Short_Lval and #Long is the
> +** number of data items whose descriptions do not. (The number of distinct
> +** values that fit into a 8 bits also fits into 8 bits, but since some
> +** locations hold the value of more than one variable at a time, not all
> +** the values need to be distinct; this is why MR_SHORT_COUNT_BITS is
> +** more than 8.)

I don't really understand the parenthetical remark.  It seems to be
missing a few words ("fit into a 8 bits also fits into 8 bits"?).

> +** The MR_sll_var_nums field may be NULL, which means that there is no
> +** information about the variable numbers of the live values. If the field
> +** is not NULL, it points to a vector of variable numbers, which has an element
> +** for each live data item. This is either the live data item's HLDS variable
> +** number, or one of two special values. Zero means that the live data item
> +** is not a variable (e.g. it is a saved copy of succip). The largest possible
> +** 16-bit number on the other hand means "the number of this variable does not
> +** fit into 16 bits". With the exception of these special values, the value
> +** in this slot uniquely identifies the variable.

Does this max 16 bit number mean very bad things have happened?  Can you
expect the data to be well formed despite this problem?  A hint as to
what kind of behaviour the runtime should adopt in this case could be
good here.

Apart from this the change looks fine.

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd at cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list