[m-dev.] for review: making the debugger work on typeclasses

Tyson Dowd trd at cs.mu.OZ.AU
Mon Oct 19 16:46:45 AEST 1998


On 17-Oct-1998, Zoltan Somogyi <zs at cs.mu.OZ.AU> wrote:
> 
> This is for Tyson.
> 
> Estimated hours taken: 40
> 
> Extend the layout scheme to handle typeinfos inside typeclass infos,
> and thus enable the debugger (and later native gc) to work with programs
> that use type classes and existential types.
> 
> compiler/llds.m:
> 	Change the data structure that holds information about the locations
> 	of the typeinfo variables of the tvars active at call return sites
> 	from set(pair(tvar, lval)) to map(tvar, set(layout_locn)).
> 
> 	The change from set to map avoids the possibility of inadvertently
> 	duplicating the info for a give type variable.
> 
> 	The change to explicitly keep a set of locations in which the typeinfo
> 	var may be found allows us to use set intersection on those sets if
> 	(a) the program point may be reached via more than one path, and
> 	(b) not all paths have the same sets. Both of these can happen in
> 	programs that use type classes.
> 
> 	The change from lval to layout_locn (which encodes either an lval,
> 	or an lval representing a typeclass info and an (indirect) offset 
> 	inside that typeclass info) is necessary support programs with
> 	type classes.
> 
> compiler/continuation_info.m:
> 	Change the data structure that holds information about the locations
> 	of the typeinfo variables of the tvars active at a particular program
> 	point the same way and for the same reasons as in llds.m.
> 
> 	Take set intersections of typeinfo var locations whenever we find
> 	multiple live variable info records for the same label.
> 
> compiler/call_gen.m:
> 	Delay the construction of the return live variable information
> 	until the code generator state has been updated to reflect where
> 	things will be on return, instead of trying to cobble up this
> 	info into the code generator state that reflects the point just
> 	before the call. Apart from being cleaner, this is necessary
> 	to avoid compiler aborts for programs that use existential types.
> 	The old compiler could not find the typeinfos of any existentially
> 	quantified type vars, since they do not exist before the call.
> 
> compiler/code_info.m:
> 	Rewrite and generalize the code for generating live value information.
> 
> compiler/trace.m:
> 	Remove the specialized code for generating live value information;
> 	call code_info instead.
> 
> compiler/stack_layout.m:
> 	Pick one of several possible locations for a typeinfo var.
> 
> 	Generate the new indirect layout location descriptions.
> 
> 	Reduce the number of tag bits used to describe different kinds of
> 	lvals, to leave more room for the indirect information.
> 
> compiler/*.m:
> 	Conform to the above data structure changes.
> 
> compiler/hlds_pred.m:
> 	Clarify the documentation of type_info_locn.
> 
> compiler/llds_out.m:
> 	Use map (and thus tree234) instead of bintree_set to represent
> 	the set of things declared so far. With --trace deep, the binary
> 	trees of items were almost always sticks. This change reduces
> 	the time spent in writing out typecheck.c with --trace deep
> 	from 70 seconds to 18 seconds (on cyclone).

There are two changes in this file:  this one, plus the change the
data structure for type_info variables (although this could fall
under the compiler/*.m comment).

The bintree_set data structure changed to map doesn't really belong in
the rest of this change.  The changes are related only by the virtue
of appearing in the same file.  You should at least mention this
in the top-level log message.

> runtime/mercury_stack_layout.h:
> 	Update the section that deals with MR_Live_Lval to take
> 	indirect typeinfo locations into account.
> 
> runtime/mercury_layout_util.c:
> 	Handle indirect typeinfo locations when interpreting layout structures.
> 
> runtime/mercury_layout_util.c:
> trace/mercury_trace_internal.c:
> 	Ignore variables whose names start with TypeClassInfo.
> 
> runtime/mercury_accurate_gc.c:
> runtime/mercury_agc_debug.c:
> 	Add markers to remind Tyson to handle indirect typeinfo locations.
> 
> library/map.m:
> 	Add new predicates map__intersect and map__union for use by the
> 	new code in the compiler, and by others.

Again, a separate change, but easy to commit separately this time.

A change to the library probably needs more attention than being
buried in the middle of this change.

> 
> tests/debugger/implied_instance.{m,inp,exp}:
> tests/debugger/multi_paramster.{m,inp,exp}:
> tests/debugger/existential_type_classes.{m,inp,exp}:
> 	Copies of the tests in tests/hard_coded/typeclasses, modified to
> 	avoid or delay I/O, so that the calls to I/O preds that may or may
> 	not be traced to do not affect the output.
> 
> tests/debugger/Mmakefile:
> 	Add the new test cases.
> 
> 	Remove references to the *_lib variants of the old test cases.
> 	They are not necessary if I/O is delayed until after the last
> 	reported trace event.
> 
> tests/hard_coded/typeclasses/Mmakefile:
> 	Remove --trace deep from existential_type_classes, since that
> 	aspect of the test case is now covered in the debugger directory.
> 
> Zoltan.
> 
> cvs diff: Diffing .
> cvs diff: Diffing bindist
> cvs diff: Diffing boehm_gc
> cvs diff: Diffing boehm_gc/Mac_files
> cvs diff: Diffing boehm_gc/cord
> cvs diff: Diffing boehm_gc/cord/private
> cvs diff: Diffing boehm_gc/include
> cvs diff: Diffing boehm_gc/include/private
> cvs diff: Diffing browser
> cvs diff: Diffing bytecode
> cvs diff: Diffing bytecode/test
> cvs diff: Diffing compiler

> Index: compiler/hlds_pred.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/hlds_pred.m,v
> retrieving revision 1.53
> diff -u -u -r1.53 hlds_pred.m
> --- hlds_pred.m	1998/09/24 09:44:26	1.53
> +++ hlds_pred.m	1998/10/07 05:17:23
> @@ -250,11 +250,23 @@
>  	.
>  
>  :- type type_info_locn	
> -	--->	type_info(var)		% it is a normal type info 
> -					% (ie. the type is not constrained)
> +	--->	type_info(var)
> +				% It is a normal type info, i.e. the type

s/type info/type_info

> +				% is not constrained.
> +
>  	;	typeclass_info(var, int).
> -					% it is packed inside a typeclass_info,
> -					% and is at the given offset
> +				% The typeinfo is packed inside a

s/typeinfo/type_info

> +				% typeclass_info. If the int is N, it is
> +				% the Nth typeinfo inside the typeclass_info,

Again.

> +				% but there may be several superclass pointers
> +				% before the block of typeinfos, so it won't

And here.

> +				% be the Nth word of the typeclass_info.
> +				%
> +				% To find the typeinfo inside the

Once more here. 

The inconsistencies aren't critical, but they do reduce readability.
(it's difficult enough correctly recognizing each of type_info,
typeclass_info, base_type_info and base_typeclass_info without having
variations on each one).

> Index: compiler/llds.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/llds.m,v
> retrieving revision 1.229
> diff -u -u -r1.229 llds.m
> --- llds.m	1998/10/16 06:17:28	1.229
> +++ llds.m	1998/10/16 10:45:47
> @@ -17,7 +17,7 @@
>  :- interface.
>  
>  :- import_module hlds_pred, hlds_data, tree, prog_data, (inst).
> -:- import_module assoc_list, bool, list, set, term, std_util.
> +:- import_module bool, list, set, map, term, std_util.
>  
>  %-----------------------------------------------------------------------------%
>  
> @@ -421,19 +421,33 @@
>  	% the non-conservative garbage collector.
>  :- type liveinfo
>  	--->	live_lvalue(
> -			lval,
> -				% What stackslot/reg does
> -				% this lifeinfo structure
> +			layout_locn,
> +				% What location does this lifeinfo structure
>  				% refer to?
>  			live_value_type,
>  				% What is the type of this live value?
> -			assoc_list(tvar, lval)
> +			map(tvar, set(layout_locn))
>  				% Where are the typeinfos that determine the
>  				% types of the actual parameters of the type
>  				% parameters of this type (if it is
>  				% polymorphic), and the type variable
>  				% for each one.

Perhaps add a comment with a little of the justification for the use of
a set(layout_locn) from the log message.

>  		).
> +
> +	% Most of the time, a layout specifies a location as an lval.
> +	% However, a type_info variable may be hidden inside a typeclass_info,
> +	% In this case, accessing the type_info requires indirection.
> +	% The address of the typeclass_info is given as an lval, and
> +	% the location of the typeinfo within the typeclass_info as an index;
> +	% private_builtin:type_info_from_typeclass_info interprets the index.
> +	%
> +	% This one level of indirection is sufficient, since type_infos
> +	% cannot be nested inside typeclass_infos any deeper than this.
> +	% A more general representation that would allow more indirection
> +	% would be much harder to fit into one machine word.

The final sentence talks about the runtime representation of
this data structure, which is handled in stack_layout.m, not here.
So perhaps a pointer to stack_layout.m should be here (or remove
the comment).



> Index: library/map.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/map.m,v
> retrieving revision 1.65
> diff -u -u -r1.65 map.m
> --- map.m	1998/08/26 05:45:27	1.65
> +++ map.m	1998/10/06 09:43:37
> @@ -188,6 +188,36 @@
>  :- mode map__map_values(pred(in, in, out) is det, in, out) is det.
>  :- mode map__map_values(pred(in, in, out) is semidet, in, out) is semidet.
>  
> +	% Given two maps M1 and M2, create a third map M3 that has only the
> +	% keys that occur in both M1 and M2. For keys that occur in both M1
> +	% and M2, compute the value in the final map by applying the supplied
> +	% predicate to the values associated with the key in M1 and M2.
> +	% Fail if and only if this predicate fails on some common key.

The last line should be something more like:

% Fail if and only if this predicate fails to compute some value.

The predicate doesn't operate on the keys at all.


> +:- pred map__intersect(pred(V, V, V), map(K, V), map(K, V), map(K, V)).
> +:- mode map__intersect(pred(in, in, out) is semidet, in, in, out) is semidet.
> +:- mode map__intersect(pred(in, in, out) is det, in, in, out) is det.
> +
> +	% Calls map__intersect. Abort (with the last argument as the message)
> +	% if map__intersect fails.
> +:- pred map__det_intersect(pred(V, V, V), map(K, V), map(K, V), map(K, V),
> +	string).
> +:- mode map__det_intersect(pred(in, in, out) is semidet, in, in, out, in)
> +	is det.
> +
> +	% Given two maps M1 and M2, create a third map M3 that all the keys
> +	% that occur in either M1 and M2. For keys that occur in both M1
> +	% and M2, compute the value in the final map by applying the supplied
> +	% predicate to the values associated with the key in M1 and M2.
> +	% Fail if and only if this predicate fails on some common key.

again:

% Fail if and only if this predicate fails to compute some value.

> +:- pred map__union(pred(V, V, V), map(K, V), map(K, V), map(K, V)).
> +:- mode map__union(pred(in, in, out) is semidet, in, in, out) is semidet.
> +:- mode map__union(pred(in, in, out) is det, in, in, out) is det.
> +
> +	% Calls map__union. Abort (with the last argument as the message)
> +	% if map__intersect fails.
> +:- pred map__det_union(pred(V, V, V), map(K, V), map(K, V), map(K, V), string).
> +:- mode map__det_union(pred(in, in, out) is semidet, in, in, out, in) is det.
> +
>  %-----------------------------------------------------------------------------%
>  
>  :- import_module tree234.
> @@ -198,7 +228,7 @@
>  %-----------------------------------------------------------------------------%
>  
>  :- implementation.
> -:- import_module std_util, require, string.
> +:- import_module set, std_util, require, string.
>  
>  %-----------------------------------------------------------------------------%
>  
> @@ -423,6 +453,80 @@
>  
>  map__map_values(Pred, Map0, Map) :-
>  	tree234__map_values(Pred, Map0, Map).
> +
> +%-----------------------------------------------------------------------------%
> +
> +map__intersect(CommonPred, Map1, Map2, Common) :-
> +	map__keys(Map1, Keys1),
> +	map__keys(Map2, Keys2),
> +	set__sorted_list_to_set(Keys1, Set1),
> +	set__sorted_list_to_set(Keys2, Set2),
> +	set__intersect(Set1, Set2, Set3),
> +	set__to_sorted_list(Set3, Keys3),
> +	map__init(Common0),
> +	map__intersect_2(Keys3, Map1, Map2, CommonPred, Common0, Common).

map does not necessarily return the keys in sorted order.

You could fix this by adding map__sorted_keys and tree234__sorted_keys
since in fact for tree234 trees it will return sorted maps.

> +
> +:- pred map__intersect_2(list(K), map(K, V), map(K, V), pred(V, V, V),
> +	map(K, V), map(K, V)).
> +:- mode map__intersect_2(in, in, in, pred(in, in, out) is semidet, in, out)
> +	is semidet.
> +:- mode map__intersect_2(in, in, in, pred(in, in, out) is det, in, out)
> +	is det.
> +
> +map__intersect_2([], _, _, _, Common, Common).
> +map__intersect_2([Key | Keys], Map1, Map2, CommonPred, Common0, Common) :-
> +	map__lookup(Map1, Key, Value1),
> +	map__lookup(Map2, Key, Value2),
> +	call(CommonPred, Value1, Value2, Value),
> +	map__det_insert(Common0, Key, Value, Common1),
> +	map__intersect_2(Keys, Map1, Map2, CommonPred, Common1, Common).

I'm not convinced this is a particularly efficient implementation.

Convert both maps to assoc_lists, and then do a "merge" of those
lists while calling the pred, then convert the sorted assoc_list back.
This avoids all those map lookups.  If you add map__to_sorted_assoc_list
I believe it should be faster for large maps (no sorting, no lookups,
just conversion to list, merge, and conversion to map).

Please point out if I have missed something.

> +map__union(UnionPred, Map1, Map2, Union) :-
> +	map__keys(Map1, Keys1),
> +	map__keys(Map2, Keys2),
> +	set__sorted_list_to_set(Keys1, Set1),
> +	set__sorted_list_to_set(Keys2, Set2),
> +	set__union(Set1, Set2, Set3),
> +	set__to_sorted_list(Set3, Keys3),
> +	map__init(Union0),
> +	map__union_2(Keys3, Map1, Map2, UnionPred, Union0, Union).

Again, map__keys doesn't mean a sorted list.

> +
> +:- pred map__union_2(list(K), map(K, V), map(K, V), pred(V, V, V),
> +	map(K, V), map(K, V)).
> +:- mode map__union_2(in, in, in, pred(in, in, out) is semidet, in, out)
> +	is semidet.
> +:- mode map__union_2(in, in, in, pred(in, in, out) is det, in, out)
> +	is det.
> +
> +map__union_2([], _, _, _, Union, Union).
> +map__union_2([Key | Keys], Map1, Map2, UnionPred, Union0, Union) :-
> +	( map__search(Map1, Key, Value1) ->
> +		( map__search(Map2, Key, Value2) ->
> +			call(UnionPred, Value1, Value2, Value)
> +		;
> +			Value = Value1
> +		)
> +	;
> +		map__lookup(Map2, Key, Value)
> +	),
> +	map__det_insert(Union0, Key, Value, Union1),
> +	map__union_2(Keys, Map1, Map2, UnionPred, Union1, Union).

And again, I think an assoc_list based implementation would be faster.


> Index: runtime/mercury_stack_layout.h
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/runtime/mercury_stack_layout.h,v
> retrieving revision 1.9
> diff -u -u -r1.9 mercury_stack_layout.h
> --- mercury_stack_layout.h	1998/10/16 06:18:56	1.9
> +++ mercury_stack_layout.h	1998/10/16 06:35:46
> @@ -66,26 +66,33 @@
>  */
>  
>  /*
> -** MR_Live_Lval is a Word which describes an lval. This includes:
> -** 	- stack slots, registers, and special lvals such as succip, hp,
> -** 	  etc.
> +** MR_Live_Lval is a Word which describes an location. This includes
> +** lvals such as stack slots, general registers, and special registers
> +** such as succip, hp, etc, as well as locations whose address is given
> +** as a particular word offset from the memory address found in an lval.
>  **
> -** MR_Live_Lval is encoded using an 8 bit low tag, the rest of the word is a
> -** data field describing which stack slot number or register number.
> +** What kind of of location an MR_Live_Lval refers to is encoded using
> +** a low tag with MR_LIVE_LVAL_TAGBITS bits; the type MR_Lval_Type describes
> +** the different tag values. The interpretation of the rest of the word
> +** depends on the location type:
>  **
> -**  Lval		Tag	Rest
> -**  r(Num)		 0	Num
> -**  f(Num)		 1	Num
> -**  stackvar(Num)	 2	Num
> -**  framevar(Num)	 3	Num
> +**  Locn		Tag	Rest
> +**  r(Num)		 0	Num (register number)
> +**  f(Num)		 1	Num (register number)
> +**  stackvar(Num)	 2	Num (stack slot number)
> +**  framevar(Num)	 3	Num (stack slot number)
>  **  succip		 4
>  **  maxfr		 5
>  **  curfr		 6
>  **  hp			 7
>  **  sp			 8
> -**  unknown		 9		(The location is not known)
> +**  indirect(Base, N)	 9	See below
> +**  unknown		10	(The location is not known)
>  **
> -** The type MR_Lval_Type describes the different tag values.
> +** For indirect references, the word exclusive of the tag consists of
> +** (a) an integer with MR_LIVE_LVAL_OFFSETBITS bits giving the number of
> +** words to offset and (b) a MR_Live_Lval value giving the location of
> +** the base address. This MR_Live_Lval valud will *not* have an indirect tag.

s/valud/value

> +
> +/*
> +XXX we don't yet support `pragma c_code' for existential type class constraints
> +:- pragma c_code(my_univ_value(Univ::in) = (Value::out), will_not_call_mercury, "
> +	TypeInfo_for_T = field(mktag(0), Univ, UNIV_OFFSET_FOR_TYPEINFO);
> +	Value = field(mktag(0), Univ, UNIV_OFFSET_FOR_DATA);
> +	ClassInfo_1 = XXX;
> +").
> +*/

I don't think this XXX is true any longer -- it depends on the
definition of "support".   Perhaps fjh or dgj should revisit this
test case.


Apart from those comments, the change is fine.  I don't mind if
you commit before the issue of performance in the library is
fixed (or explained) but if you aren't going to fix it there
needs to be an XXX about it.

-- 
Those who would give up essential liberty to purchase a little temporary
safety deserve neither liberty nor safety.     - Benjamin Franklin

Tyson Dowd   <tyson at tyse.net>   http://tyse.net



More information about the developers mailing list