[m-dev.] for review: The .NET MSIL backend.

Fergus Henderson fjh at cs.mu.OZ.AU
Wed Oct 4 04:40:03 AEDT 2000


On 22-Sep-2000, Tyson Dowd <trd at cs.mu.OZ.AU> wrote:
> +	% Any scope without local variables may be eliminated.
> +	% XXX we don't do this yet becuase it requires finding the
> +	% matching end_block and removing it too.  Now that block IDs
> +	% are available we can actually do this.
> +:- pred match4(instr, instrs, instrs).
> +:- mode match4(in, in, out) is semidet.
> +match4(start_block(scope([]), _), Instrs0, Instrs) :-
> +	Replacement = [],
> +	Rest = Instrs0,
> +	Instrs = list__append(Replacement, Rest).

I'm not sure what the code here is supposed to be doing,
but it doesn't seem to do what the comment describes
(eliminating the start_block and matching end_block),
and this code is not called from anywhere.

Oh, I think I understand now.  The comment could be
clearer, though.  E.g.
	s/because it requires/because it would require/
and
	s/we can actually do this/we could actually do this,
	  but currently we don't, because the code below is incomplete.
	  This procedure is not yet called from anywhere./

> +%-----------------------------------------------------------------------------%
> +
> +	% Skip over all the comments.
> +:- pred skip_comments(instrs::in, instrs::out, instrs::out) is det.
> +
> +skip_comments(Instrs0, Instrs1, Comments) :-
> +	list__takewhile((pred(X::in) is semidet :- 
> +		X = ilds__comment(_)
> +	), Instrs0, Comments, Instrs1).

That code is OK, but you could perhaps write that more concisely as

	list__takewhile(pred(ilds__comment(_)::in) is semidet,
		Instrs0, Comments, Instrs1).

or (arguably) more readably by using the name `Instr' rather than `X'.

Also s/Instrs1/Instrs/g to abide by our naming guidelines.

> +	% Skip over all the nop equivalents.
> +:- pred skip_nops(instrs::in, instrs::out, instrs::out) is det.
> +
> +skip_nops(Instrs0, Instrs1, Nops) :-
> +	list__takewhile((pred(X::in) is semidet :- 
> +		equivalent_to_nop(X)
> +	), Instrs0, Nops, Instrs1).

You should write that more concisely and clearly as

	list__takewhile(equivalent_to_nop, Instrs0, Nops, Instrs1).

Also s/Instrs1/Instrs/g to abide by our naming guidelines.

> +	% These instructions generate no actual code and do not affect
> +	% control flow, they are simply part of instr for
> +	% convenience.
> +:- pred equivalent_to_nop(instr).
> +:- mode equivalent_to_nop(in) is semidet.
> +equivalent_to_nop(end_block(scope(_), _)).
> +equivalent_to_nop(comment(_)).
> +equivalent_to_nop(start_block(scope(_), _)).
> +
> +	% These instructions can branch control flow.
> +:- pred can_branch(instr).
> +:- mode can_branch(in) is semidet.
> +can_branch(br(_)).
> +can_branch(brtrue(_)).
> +can_branch(brfalse(_)).
> +can_branch(beq(_)).
> +can_branch(bge(_, _)).
> +can_branch(bgt(_, _)).
> +can_branch(ble(_, _)).
> +can_branch(blt(_, _)).
> +can_branch(bne(_, _)).
> +can_branch(switch(_)).

Experience has shown that for these kind of things it is generally
better for maintenance to make them boolean functions rather than
predicates, so that you get determinism errors if you add a new
alternative and forget to update these procedures.

> Index: compiler/ilasm.m
> +% XXX When representing these types, we just use a convenient Mercury type.
> +% If we *really* wanted to support these types we would need to make sure
> +% all values of the type can really fit.
> +% XXX not all code uses these types yet.  It should.
> +
> +:- type int64 == int.

For int64 you should use `integer' rather than `int', I think.

> +% A top level declaration in IL assembler.
> +:- type decl
...
> +		% .method  (a global function)
> +		% there are lots of restriction on global functions so
> +		% don't get to excited about using them for anything.

s/to/too/

> +		% .assembly extern
> +		% declares an assembly name
> +	;	extern_assembly(ilds__id)
> +			% an assembly declaration

I don't think the comment "an assembly declaration" here adds
anything that isn't already clear from the other two comments.

> +:- type classdecl
> +		% .method (a class method)
> +	--->	method(
> +			methodhead,		% name, signature, attributes
> +			list(methoddecl)	% definition of method
> +		)	

That list(methoddecl) occurs in a couple of places,
so it would be a good idea to abstract it out as

	:- type methoddefn == list(methoddecl).
or
	:- type method_defn == list(method_body_decl).

> +		% comments
> +	;	comment_term(term)
> +	;	comment(string)
> +	;	some [T] comment_thing(T).

What requirements are there on `T' here?
E.g. must it be a type for which `io__print' works?
What if it is a higher-order type, for example,
or a type which is or contains c_pointer?

> +	;	wchar_ptr(string)		% a string to convert to unicode

Is the string supposed to be

	(a) composed of only 8-bit characters
	(b) composed of characters which might include unicode characters
	    if/when the underlying Mercury implementation supports
	    unicode in strings
	(c) unicode encoded in UTF-8
	(d) something else?

> +	% a list of interfaces that we implement
> +	% XXX should probably just use an empty list to represent
> +	% noimplement
> +:- type implements
> +	--->	implements(list(classname))
> +	;	noimplement. 

That XXX should be fixed at some point
(not necessarily before committing it).

> +	% declarations that can form the body of a method.
> +:- type methoddecl 

I think it would be clearer if that type was named 
`methodbodydecl' or (even better) `method_body_decl'.

> +	--->	emitbyte(byte)		% raw byte output (danger! danger!)

Are those bytes the IL binary representation, or are they x86 instructions?

> +	;	maxstack(int)		% maximum stack size (still needed?)

What are the units -- bits? bytes? something else?

> +	% attributes that a class can have.
> +	% see SDK documentation for what they all mean.

I suggest s/SDK/Microsoft .NET SDK/
or s/SDK/Microsoft SDK/
just to make it a little clearer as to where to find the
documentation.  (A URL would be great too, although I suspect
it would probably suffer URL-rot fairly quickly.)

Probably it would be good to mention the SDK documentation once
in detail at the top of the module, then you can just leave
the comments here as they are.

> +	% the body of a .data declaration
> +:- type data_body 
> +	--->	itemlist(list(data_item))
> +	;	item(data_item).

Is there any difference in semantics between `itemlist([X])'
and `item(X)'?  If so, what is it?  If not, why are there
two different ways of representing the same thing?

> +	% various constants that can be used in .data declarations
> +:- type data_item 
> +	---> 	float32(float)
> +	;	float64(float)
> +	;	int64(int)
> +	;	int32(int)
> +	;	int16(int)
> +	;	int8(int)

s/float/float{32,64}/ respectively
s/int/int{64,32,16,8}/ respectively

> +	;	char_ptr(string)
> +	;	'&'(ilds__id)
> +	;	bytearray(list(byte)).	% two digit hex, e.g. 01 F7 0A

s/two digit hex/output as two digit hex/  ?

> +	% classnames are just structured names.
> +:- type classname == structured_name.
> +
> +:- implementation.
> +
> +:- import_module require, int, term_io, varset, globals, options, bool.
> +:- import_module string, pprint, getopt.

The modules in the standard library should be listed in separate
import_module statements from the modules not in the standard library.
Here `globals' and `options' are not in the standard library.

> +ilasm__output_decl(comment(CommentStr)) --> 
> +	io__write_string("// "),
> +	io__write_string(CommentStr),
> +	io__write_string("\n").

What if CommentStr contains newlines?  If that is not allowed,
it should be documented in the data type definition.

> +ilasm__output_decl(extern_assembly(AsmName)) --> 
> +	io__write_string(".assembly extern "),
> +	output_id(AsmName),
> +	io__write_string(" { }").
> +
> +ilasm__output_decl(assembly(AsmName)) --> 
> +	io__write_string(".assembly "),
> +	output_id(AsmName),
> +	io__write_string(" { }").
> +
> +:- pred ilasm__output_classdecl(classdecl::in, io__state::di,
> +	io__state::uo) is det.
> +
> +ilasm__output_classdecl(method(MethodHead, MethodDecls)) -->
> +	io__write_string(".method "),
> +	output_methodhead(MethodHead),
> +
> +		% Don't do debug output on class constructors, since
> +		% they are automatically generated and take forever to
> +		% run.
> +	globals__io_lookup_option(debug_il_asm, DebugIlAsm),
> +	( { MethodHead = methodhead(_, cctor, _, _) } ->
> +		globals__io_set_option(debug_il_asm, bool(no))
> +	;
> +		[]
> +	),
> +	io__write_string(" {\n"),
> +	io__write_list(MethodDecls, "\n", output_methoddecl),
> +	io__write_string("}\n"),
> +	( { MethodHead = methodhead(_, cctor, _, _) } ->
> +		globals__io_set_option(debug_il_asm, DebugIlAsm)
> +	;
> +		[]
> +	).

The code for this duplicates the code for outputting global methods.
IMHO it would be better to reuse the code, e.g.

ilasm__output_classdecl(method(MethodHead, MethodDecls)) -->
		% Don't do debug output on class constructors, since
		% they are automatically generated and take forever to
		% run.
	globals__io_lookup_option(debug_il_asm, DebugIlAsm),
	( { MethodHead = methodhead(_, cctor, _, _) } ->
		globals__io_set_option(debug_il_asm, bool(no))
		ilasm__output_decl(method(MethodHead, MethodDecls)),
		globals__io_set_option(debug_il_asm, DebugIlAsm)
	;
		ilasm__output_decl(method(MethodHead, MethodDecls)),
	).

> +ilasm__output_classdecl(comment(CommentStr)) --> 
> +	io__write_string("// "),
> +	io__write_string(CommentStr),
> +	io__write_string("\n").

What if CommentStr contains newlines?

> +ilasm__output_methoddecl(label(Label)) -->
> +	io__write_string(Label),
> +	io__write_string(":").
> +
> +:- pred output_label(label::in, io__state::di, io__state::uo) is det.
> +output_label(Label) -->
> +	io__write_string(Label).

Shouldn't ilasm__output_methoddecl call output_label
rather than calling io__write_string directly?

> +output_simple_type('[]'(Type, _Bounds)) --> 
> +	output_type(Type),
> +	io__write_string("[]").  % XXX we don't output bounds!

Why not?
If that one is easy to fix, it would be good to go ahead and do so,
because otherwise this will bite us one day.
If it is not easy to fix, it would be good to document why not.

> +output_simple_type_opcode(native_float) --> 
> +	{ error("unable to create opcode for this simple type") }.

I suggest s/this/native_float/

> +output_simple_type_opcode(bool) --> io__write_string("i4").
> +output_simple_type_opcode(char) --> io__write_string("i4").
> +output_simple_type_opcode(refany) --> io__write_string("ref").
> +output_simple_type_opcode(class(_Name)) --> io__write_string("ref").
> +output_simple_type_opcode(value_class(_Name)) --> io__write_string("ref").
> +output_simple_type_opcode(interface(_Name)) --> io__write_string("ref").
> +output_simple_type_opcode('[]'(_Type, _Bounds)) --> io__write_string("ref").
> +output_simple_type_opcode('*'(_Type)) --> io__write_string("ref").
> +output_simple_type_opcode('&'(_Type)) --> io__write_string("ref").

Hmm... some comments here explaining why all those different types
map to the same opcode type might be helpful.

> +:- pred output_instr(instr::in, io__state::di,
> +	io__state::uo) is det.
> +
> +output_instr(comment(Comment)) --> 
> +	io__write_string("// "),
> +	io__write_string(Comment).

What if Comment contains newlines?

> +	% This is stuff for "Opt-IL", which was (is?) some sort of 

[... to be continued ...]

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list