[m-dev.] for review: Infrastructure for Java Backend

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Jan 23 19:23:41 AEDT 2001


On 23-Jan-2001, Julien Fischer <juliensf at students.cs.mu.oz.au> wrote:
> 
> Two main changes:

It would be better to separate these two changes, posting separate
diffs for them, and committing them separately.

> Enable support for compiling to Java, options for same etc.
> 
> Alter the MLDS so that the definition of mlds__lval for 
> variables includes type information about the variable.  The
> Java backend uses this to distinguish between integers that are
> enumertion constants and those that are just integers.  The definition
> of ctor_name has been changed so that the names of 

That sentence is incomplete.

> Added an access specifier `local' because when methods were being
> generated there was no distinction between local and private variables,
> and the local variables were being declared private; the Java compiler
> was not particularly happy about this.  These three changes are 
> resposible for causing most of the changes below.

Above, you said two main changes, and here you say three.
Which is it?  If the idea is that there are two groups of changes,
#1 being to add options etc. for compiling to Java,
and #2 being changes to the MLDS required by
the Java back-end, then the log message should be
clearer about this.

> I'll post the actual Java backend shortly.
> 
> Julien
> 
> compiler/globals.m
> 	Removed comment about java not being implemented, replaced
> 	by one saying it is work in progress

Please include ":" after each file name.
Please end sentences with a full stop.

> compiler/options.m
> 	Add new options for compiling Java files:
> 	`--java-compiler' (`--javac'),
> 	`--java-flags',
> 	`--java-include-directory'
> 	`--java-object-file-extension'
> 	`--java-flag-to-name-object-file'

These options should be documented in doc/user_guide.texi.

> compiler/mlds.m
> 	Added new access flag, `local' because code in generated
> 	classes was unable to tell the difference between local
> 	and instance variables.
> 	Added type information for lvals that are variables because the
> 	java backend needs this information in creating enumeration objects.
> 	Redefined ctor_name from a string, to a fully module qualified 
> 	definition.  

s/java/Java/

> compiler/ml_elim_nested.m
> 	Changed code to match new definition of mlds__lval
> 
> compiler/ml_optimize.m
> 	Changed code to match new definition of mlds__lval
> 	
> compiler/ml_string_switch.m
> 	Changed code to match new definition of mlds__lval
> 
> compiler/ml_tailcall.m
> 	Changed code to match new definition of mlds__lval  

That should be formatted like so:

	compiler/ml_elim_nested.m:
	compiler/ml_optimize.m:
	compiler/ml_string_switch.m:
	compiler/ml_tailcall.m:
		Changed code to match new definition of mlds__lval.

> --- handle_options.m	2001/01/10 10:53:50	1.99
> +++ handle_options.m	2001/01/23 05:53:34
> @@ -293,7 +295,23 @@
>  	;
>  		[]
>  	),
> -
> +	% Generating Java implies high-level code, turning off nested functions,
> +	% using copy-out for both det and nondet output arguments,
> +	% using zero tags, boxing enums, disabling no_tag_types and no
> +	% static ground terms.
> +	( { Target = java } ->
> +		globals__io_set_option(highlevel_code, bool(yes)),
> +		globals__io_set_option(gcc_nested_functions, bool(no)),
> +		globals__io_set_option(nondet_copy_out, bool(yes)),
> +		globals__io_set_option(det_copy_out, bool(yes)),
> +		globals__io_set_option(num_tag_bits, int(0)),
> +		globals__io_set_option(line_numbers, bool(no))
> +		% XXX These are done for IL, are they needed for Java?
> +		% globals__io_set_option(unboxed_enums, bool(no)),
> +		% globals__io_set_option(unboxed_no_tag_types, bool(no))

The comment above doesn't match the code
(e.g. the comment says no static ground terms,
but the code doesn't mention that).

Also s/using zero tags/not using tags/

> Index: ml_code_gen.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/ml_code_gen.m,v
> retrieving revision 1.73
> diff -u -r1.73 ml_code_gen.m
> --- ml_code_gen.m	2001/01/17 17:37:15	1.73
> +++ ml_code_gen.m	2001/01/23 05:53:34
> @@ -1058,25 +1058,40 @@
>  		% value (rather than being passed by reference) and remove
>  		% them from the byref_output_vars field in the ml_gen_info.
>  		( CodeModel = model_non ->
> -			ml_set_up_initial_succ_cont(ModuleInfo,
> -				CopiedOutputVars, MLDSGenInfo0, MLDSGenInfo1)
> +		
> + 		ml_set_up_initial_succ_cont(ModuleInfo, CopiedOutputVars,
> + 			MLDSGenInfo0, MLDSGenInfo1)

That indentation change looks wrong.

> @@ -1836,21 +1850,22 @@
>  	%
>  	=(MLDSGenInfo),
>  	{ ml_gen_info_get_varset(MLDSGenInfo, VarSet) },
> -	{ ml_gen_info_get_module_info(MLDSGenInfo, ModuleInfo) },
> +	{ ml_gen_info_get_module_info(MLDSGenInfo, _ModuleInfo) },
>  	{ OutputVarName = ml_gen_var_name(VarSet, OutputVar) },
>  
>  	%
>  	% Generate a declaration for a corresponding local variable.
> -	%
> +	% XXX LocalVarDefn contained ModuleInfo as 4th argument
>  	{ string__append("local_", OutputVarName, LocalVarName) },
> -	{ LocalVarDefn = ml_gen_var_decl(LocalVarName, Type,
> -		mlds__make_context(Context), ModuleInfo) },
> -
> +	ml_gen_type(Type, MLDS_Type),
> +	{ LocalVarDefn = ml_gen_mlds_var_decl(var(LocalVarName), MLDS_Type,
> +		mlds__make_context(Context)) },

What's the reason for those changes?
They look wrong to me.

> Index: ml_code_util.m
...
> -	% Qualify the name of the specified variable
> -	% with the current module name.
> +	% Generate an lval from the variable name and type
>  	%
> -:- pred ml_qualify_var(mlds__var_name, mlds__lval,
> +:- pred ml_gen_var_lval(mlds__var_name, mlds__type, mlds__lval,

Probably the documentation for this should still mention the fact that
the variable name will be qualified with the current module name.

> +	% Generate declaration flags for a local variable
> +	%
> +	%
> +:- func ml_gen_var_decl_flags = mlds__decl_flags.

I think it would be clearer to name that `ml_gen_local_var_decl_flags'.

> @@ -1277,18 +1285,23 @@
>  		%
>  		{ mercury_private_builtin_module(PrivateBuiltin) },
>  		{ MLDS_Module = mercury_module_name_to_mlds(PrivateBuiltin) },
> -		{ Lval = var(qual(MLDS_Module, "dummy_var")) }
> +		%
> +		% XXX Change for Java Backend
> +		%
> +		ml_gen_type(Type, MLDS_Type),
> +		{ Lval = var(qual(MLDS_Module, "dummy_var"), MLDS_Type) }

The XXX comment here is not clear.
"XXX This needs to be changed for the Java backend"
would be better, but is still not adequate.
Why does it need to be changed for the Java backend?
The comment should explain.

>  	;
>  		=(MLDSGenInfo),
>  		{ ml_gen_info_get_varset(MLDSGenInfo, VarSet) },
>  		{ VarName = ml_gen_var_name(VarSet, Var) },
> -		ml_qualify_var(VarName, VarLval),
> +		ml_gen_type(Type, MLDS_Type),
> +		ml_gen_var_lval(VarName, MLDS_Type, VarLval),
>  		%
>  		% output variables may be passed by reference...
>  		%
>  		{ ml_gen_info_get_byref_output_vars(MLDSGenInfo, OutputVars) },
>  		( { list__member(Var, OutputVars) } ->
> -			ml_gen_type(Type, MLDS_Type),
> +			%ml_gen_type(Type, MLDS_Type),

That line should be deleted rather than being commented out.

>  			{ Lval = mem_ref(lval(VarLval), MLDS_Type) }
>  		;
>  			{ Lval = VarLval }
> @@ -1338,11 +1351,14 @@
>  	% Qualify the name of the specified variable
>  	% with the current module name.
>  	%
> -ml_qualify_var(VarName, QualifiedVarLval) -->
> +ml_gen_var_lval(VarName, VarType, QualifiedVarLval) -->
>  	=(MLDSGenInfo),
>  	{ ml_gen_info_get_module_name(MLDSGenInfo, ModuleName) },
>  	{ MLDS_Module = mercury_module_name_to_mlds(ModuleName) },
> -	{ QualifiedVarLval = var(qual(MLDS_Module, VarName)) }.
> +	%
> +	% XXX Change for java backend
> +	%
> +	{ QualifiedVarLval = var(qual(MLDS_Module, VarName), VarType) }.

Again the XXX comment here is not clear.

> @@ -1377,10 +1393,10 @@
>  	MLDS_Defn = mlds__defn(Name, MLDS_Context, DeclFlags, Defn).
>  
>  	% Return the declaration flags appropriate for a local variable.
> +	% Java backend needs this
> +% :- func ml_gen_var_decl_flags = mlds__decl_flags.

I don't think the comment "Java backend needs this" is helpful here.
If anything, the comment should be at the definition of the
"local" access in mlds.m.  But if the comment is to be of much
use, it should also explain *why* the Java back-end needs it.

>  ml_initial_cont(OutputVarLvals0, OutputVarTypes0, Cont) -->
> -	ml_qualify_var("cont", ContLval),
> -	ml_qualify_var("cont_env_ptr", ContEnvLval),
> +	%
> +	% XXX mlds__cont_type([]) is wrong here;
> +	% the [] should be the return types if --nondet-copy-out is enabled
> +	%
> +	ml_gen_var_lval("cont", mlds__cont_type([]), ContLval),

That is a nasty problem.  It should be fixed, or at least documented
very clearly in mlds.m.

> @@ -1638,8 +1659,13 @@
>  	ml_gen_cont_params(ArgTypes0, InnerFuncParams0),
>  	{ InnerFuncParams0 = func_params(InnerArgs0, Rets) },
>  	{ InnerArgRvals = list__map(
> -		(func(Data - _Type) 
> -		= lval(var(qual(MLDS_Module, VarName))) :-
> +		%
> +		% XXX Used to be (func(Data - _Type)
> +		(func(Data - Type) 

Why is that comment there?

I don't think it helps.
If people want to know what the code used to be, they should use cvs.

> +		%
> +		% XXX Change for java backend
> +		%
> +		= lval(var(qual(MLDS_Module, VarName), Type)) :-
>  			( Data = data(var(VarName0)) ->
>  				VarName = VarName0		
>  			;

The XXX comment here is unclear.

> @@ -1648,7 +1674,12 @@
>  		), 
>  			InnerArgs0) },
>  	{ InnerFuncArgType = mlds__cont_type(ArgTypes0) },
> -	{ InnerFuncRval = lval(var(qual(MLDS_Module, "passed_cont"))) },
> +	%
> +	% XXX Changed for java backend
> +	% Added InnerFuncArgType to lval(var(qual( .... etc
> +	%
> +	%
> +	{ InnerFuncRval = lval(var(qual(MLDS_Module, "passed_cont"), InnerFuncArgType)) },

That line is longer than 79 characters wide.

The comment doesn't seem very useful.

> @@ -1696,7 +1727,15 @@
>  	% The ml_elim_nested pass will insert the declaration
>  	% of the env_ptr variable.
>  ml_get_env_ptr(lval(EnvPtrLval)) -->
> -	ml_qualify_var("env_ptr", EnvPtrLval).
> +	%
> +	% XXX JAVA
> +	% XXX Using the type `mlds__generic_env_ptr_type' here is wrong.
> +	%     We should really use the type that the env_ptr variable
> +	%     was declared with.  But that type hasn't been generated yet,
> +	%     and it won't be generated until the ml_elim_nested pass.
> +	%     So getting this right is hard.
> +	%
> +	ml_gen_var_lval("env_ptr", mlds__generic_env_ptr_type, EnvPtrLval).

Again, that should be either fixed or at least documented clearly
in mlds.m.

> Index: ml_elim_nested.m
> @@ -400,19 +401,27 @@
>  		% generated needs to be a little different.
>  		% XXX Perhaps if we used value classes this could go
>  		% away.
> +		% XXX JAVA we probably need to do the same for Java
> +		% that we do for IL
>  	( Target = il ->
> -		EnvVarAddr = lval(var(EnvVar)),
> +		EnvVarAddr = lval(var(EnvVar, EnvTypeName)),
>  		ml_init_env(EnvTypeName, EnvVarAddr, Context, ModuleName,
>  			 Globals, EnvPtrVarDecl, InitEnv0),
> +		
> +		%
> +		% XXX changed the CtorType to no
> +		%

In general, comments should not be used to hold the revision history --
that is what cvs is for.

(An exception to this is that sometimes it is useful to document *why*
something is that used to be done is not done any more, or that used
to be done in one way is now done in a different way.  But merely
documenting *what* used to be done is not helpful.  And in general
such comments should only be left there for subtle issues, where it is
likely that someone might make the same mistake again.  The comments
should not document the complete revision history.)

> Index: ml_type_gen.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/ml_type_gen.m,v
> retrieving revision 1.4
> diff -u -r1.4 ml_type_gen.m
> --- ml_type_gen.m	2000/06/06 07:32:28	1.4
> +++ ml_type_gen.m	2001/01/23 05:53:34
> @@ -36,6 +36,14 @@
>  :- pred ml_gen_type_name(type_id, mlds__class, arity).
>  :- mode ml_gen_type_name(in, out, out) is det.
>  
> +	% The Java backend needs to use these to do some transformations
> +	% on the MLDS
> +	%
> +:- func ml_gen_type_decl_flags = mlds__decl_flags.
> +:- func ml_gen_special_member_decl_flags = mlds__decl_flags.
> +:- func ml_gen_enum_constant_decl_flags = mlds__decl_flags.
> +:- func ml_gen_member_decl_flags = mlds__decl_flags.

There should be documentation here explaining what each of these
functions does.  In particular the documentation should explain
how ml_gen_special_member_decl_flags differs from ml_gen_member_decl_flags.

> @@ -526,6 +534,7 @@
>  		MLDS_Type = mercury_type_to_mlds_type(ModuleInfo, Type)
>  	),
>  	FieldName = ml_gen_field_name(MaybeFieldName, ArgNum0),
> +	% XXX this gets the access flag wrong
>  	MLDS_Defn = ml_gen_mlds_var_decl(var(FieldName), MLDS_Type,
>  		mlds__make_context(Context)),
>  	ArgNum = ArgNum0 + 1.

That XXX should be fixed.

> Index: ml_unify_gen.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/ml_unify_gen.m,v
> retrieving revision 1.26
> diff -u -r1.26 ml_unify_gen.m
> --- ml_unify_gen.m	2000/12/14 11:48:42	1.26
> +++ ml_unify_gen.m	2001/01/23 05:53:34
> @@ -299,8 +299,9 @@
>  		% If this argument is something that would normally be allocated
>  		% on the heap, just generate a reference to the static constant
>  		% that we must have already generated for it.
> -		%
> -		ml_gen_static_const_addr(Var, ConstAddrRval),
> +		% XXX Not sure that it should by VarType
> +		ml_gen_type(VarType, MLDS_Type),
> +		ml_gen_static_const_addr(Var, MLDS_Type, ConstAddrRval),
>  		{ TagVal = 0 ->
>  			TaggedRval = ConstAddrRval
>  		;

I think that code is wrong.

The type here should match the type used in the construct_statically
case of new_object, which is currently `mlds__array_type(mlds__generic_type)':

                %
                % Generate a local static constant for this term.
                %
                ml_gen_static_const_name(Var, ConstName),
                { ConstType = mlds__array_type(mlds__generic_type) },
                { ArgInits = list__map(func(X) = init_obj(X), ArgRvals) },
                { Initializer = init_array(ArgInits) },
                { ConstDefn = ml_gen_static_const_defn(ConstName, ConstType,
                        Initializer, Context) },

However, using `mlds__array_type(mlds__generic_type)' is probably
wrong in the case where `--high-level-data' is enabled.  So both
places should be marked with an XXX comment.

> @@ -1016,7 +1020,7 @@
>  		% Assign the address of the local static constant to
>  		% the variable.
>  		%
> -		ml_gen_static_const_addr(Var, ConstAddrRval),
> +		ml_gen_static_const_addr(Var, MLDS_Type, ConstAddrRval),
>  		{ MaybeTag = no ->
>  			TaggedRval = ConstAddrRval
>  		;

I think that's wrong -- I think it should be `ConstType'
rather than `MLDS_Type' there.

>  :- pred ml_cons_name(cons_id, ctor_name, ml_gen_info, ml_gen_info).
>  :- mode ml_cons_name(in, out, in, out) is det.
>  
> -ml_cons_name(ConsId, ConsName) -->
> -	{ hlds_out__cons_id_to_string(ConsId, ConsName) }.
> +ml_cons_name(HLDS_ConsId, QualifiedConsId) -->
> +	( { HLDS_ConsId = cons(SymName, Arity),
> +	    SymName = qualified(SymModuleName, ConsName) } ->
> +		{ ConsId = ctor_id(ConsName, Arity) },
> +		{ ModuleName = mercury_module_name_to_mlds(SymModuleName) },
> +		{ QualifiedConsId = qual(ModuleName, ConsId) }
> +	;
> +		{ error("mlds_unify_gen.m: not a constructor id") }
> +	).

The indentation for that if-then-else doesn't conform to
the Mercury coding guidelines.

> +++ ml_util.m	2001/01/23 05:53:34
> @@ -68,8 +68,17 @@
>  :- pred defn_is_public(mlds__defn).
>  :- mode defn_is_public(in) is semidet.
>  
> -%-----------------------------------------------------------------------------%
> +	% Succeeds iff this definition is a function definition which
> +	% defines a special_pred
> +:- pred defn_is_special_pred(mlds__defn).
> +:- mode defn_is_special_pred(in) is semidet.

What should this do for continuation functions which form part of the
definition of special_pred?  This can occur e.g. for types with 
user-defined equality functions, if the user-defined equality function
calls nondeterministic procedures.
Should defn_is_special_pred succeed for those?

> +defn_is_special_pred(Defn) :-
> +	Defn  = mlds__defn(Name, _Context, _Flags, _Body),
> +	Name  = function(Label, _ProcID, _MaybeSeqNum, _PredID),
> +	Label = special_pred(_, _, _, _).

This ignores _MaybeSeqNum -- is that the right thing to do?

> +	% Succeeds iff this definition is a data definition which
> +	% defines RTTI 
> +:- pred defn_is_rtti(mlds__defn).
> +:- mode defn_is_rtti(in) is semidet.
...
> +defn_is_rtti(Defn) :-
> +	Defn = mlds__defn(_Name, _Context, _Flags, Body),
> +	Body = mlds__data(Type, _),
> +	Type = mlds__rtti_type(_).

What is that needed for?

> Index: mlds.m
> @@ -1052,8 +1058,12 @@
>  	;	name(mlds__qualified_entity_name)
>  	.
>  
> -	% XXX I'm not sure what representation we should use here
> -:- type ctor_name == string.
> +	% constructor id
> +	% this is the version used in the java backend
> +:- type ctor_name == mlds__qualified_ctor_id.
> +:- type mlds__ctor_id ---> ctor_id(mlds__class_name, arity).
> +:- type mlds__qualified_ctor_id ==
> +	mlds__fully_qualified_name(mlds__ctor_id).

The comment "this is the version used in the java backend"
is misleading, since if you commit this change, this will be
the version used in all the MLDS-based backends.

> @@ -1425,6 +1434,7 @@
>  access_bits(private) 	= 0x01.
>  access_bits(protected)	= 0x02.
>  access_bits(default)	= 0x03.
> +access_bits(local) 	= 0x04.
>  % 0x4 - 0x7 reserved

The comment should say "0x5 - 0x7 reserved".
                          ^

> Index: mlds_to_il.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_il.m,v
> retrieving revision 1.11
> diff -u -r1.11 mlds_to_il.m
> --- mlds_to_il.m	2001/01/22 04:20:31	1.11
> +++ mlds_to_il.m	2001/01/23 05:53:35
> @@ -520,15 +520,20 @@
>  		Tree0, Tree) --> 
>  	( 
>  		{ Name = data(DataName) },
> -		{ Entity = mlds__data(_MldsType, Initializer) }
> +		{ Entity = mlds__data(MLDSType, Initializer) }
>  	->
>  		( { Initializer = no_initializer } ->
>  			{ Tree = Tree0 }
>  		;
>  			( { DataName = var(VarName) } ->
>  				il_info_get_module_name(ModuleName),
> +				
> +				%
> +				% XXX We are assuming that using MLDSType
> +				% below is correct.  
> +				%

I think you can safely delete that `XXX' comment.

>  				get_load_store_lval_instrs(
> -					var(qual(ModuleName, VarName)), 
> +				var(qual(ModuleName, VarName), MLDSType), 
>  					LoadMemRefInstrs, StoreLvalInstrs),
>  				{ NameString = VarName }
>  			;

It would be clearer to split that up a bit:

				Lval = var(qual(ModuleName, VarName),
					MLDSType),
  				get_load_store_lval_instrs(Lval,
					LoadMemRefInstrs, StoreLvalInstrs),

OK, that completes this review.
Please post another diff when you've addressed those review comments.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list