[m-rev.] for (style) review: [ctgc] define the structure sharing abstract domain.

Julien Fischer juliensf at cs.mu.OZ.AU
Thu Jan 19 16:55:43 AEDT 2006


On Mon, 16 Jan 2006, Nancy wrote:

> (sorry, msg in attachment, no time to figure out other procedure)
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>
>     [ Part 2: "Attached Text" ]
>
> Hi,
>
> this is a call for a "style review". This is code that has been taken from the
> reuse branch, but almost entirely rewritten (some predicate-renaming,
> extra documentation, even new implementation). It is not tested yet, as the
> analysis system needs to be transscribed still. So there is no use in really
> looking at the implementation. Yet, any style comments would already be
> helpful.
>

Hi Nancy,

I've had a look through your code.  Most of my comments are probably
better articulated in the Mercury coding standard (which lives in
compiler/notes/coding_standard.html).

> Estimated hours taken: 40
> Branches: main
>
> Definition of the structure sharing domain. The public
> representation is defined in prog_data.m, the private
> representation is defined in structure_sharing.domain.m.
>
> Structure sharing operations depend on datastructure and selector operations,
> which are defined in their own seperate modules. The modules are at the
> ctgc toplevel as they are needed in the structure reuse analysis too.
>
> ctgc.m:
> ctgc.datastruct.m:
> ctgc.selector.m:
> 	New modules.
>

Please prefix files listed in log messages with the directory name:

	compiler/ctgc.m:
	compiler/ctgc.datastruct.m:
	compiler/ctgc.selector.m:
		New modules.

etc.

> prog_data.m:
> 	Public representation for sharing.
>
> structure_sharing.m:
> structure_sharing.domain.m:
> 	Private representation for sharing.
>
>
> Index: ctgc.datastruct.m
> ===================================================================
> RCS file: ctgc.datastruct.m
> diff -N ctgc.datastruct.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ ctgc.datastruct.m	16 Jan 2006 10:38:41 -0000

...

> +	% Create an initial top-datastruct of the given variable.
> +:- func init(prog_var) = datastruct.
> +:- func init(prog_var, selector) = datastruct.
> +:- func init(prog_var, cons_id, int) = datastruct.
> +

The prevailing style is to place a blank comment between the predicate
description and the pred/func declaration (or in the case of related
preds/funcs line here each block of them).  e.g.

		% Create an initial top-datastruct of the given variable.
		%
	:- func init(prog_var) = datastruct.
	:- func init(prog_var, selector) = datastruct.
	:- func init(prog_var, cons_id, int) = datastruct.

...

> +init(V) = init(V, []).
> +init(V, Sel) = selected_cel(V, Sel).
> +init(V, ConsId, Int) = init(V, selector.init(ConsId, Int)).
> +
> +equal(D1, D2) :- D1 = D2.
> +
> +rename(ProgVarMap, MaybeSubst, Data0, Data) :-
> +	Data0 = selected_cel(Var0, Sel0),
> +	map__lookup(ProgVarMap, Var0, Var),
> +	ctgc.selector.rename_types(MaybeSubst, Sel0, Sel),
> +	Data = selected_cel(Var, Sel).

Avoid use different module qualifiers in the same predicate.

...

> Index: ctgc.selector.m
> ===================================================================
> RCS file: ctgc.selector.m
> diff -N ctgc.selector.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ ctgc.selector.m	16 Jan 2006 10:38:42 -0000
> @@ -0,0 +1,392 @@

...

> +top_selector = [].
> +init(Cons, Index) = [termsel(Cons, Index)].
> +init(Types) =
> +	list__map(func(T) = US :- US = typesel(T), Types).
> +
> +rename_types(no, S, S).
> +rename_types(yes(Subst), S0, S) :-
> +	list__map(unit_selector_rename_types(Subst), S0, S).
> +
> +:- pred unit_selector_rename_types(tsubst::in,
> +		unit_selector::in, unit_selector::out) is det.
> +

There is one too many level of indentation there.

> +unit_selector_rename_types(Subst, US0, US) :-
> +	(
> +		US0 = termsel(_,_),
> +		US = US0
> +	;
> +		US0 = typesel(Type0),
> +		prog_type_subst.apply_subst_to_type(Subst, Type0, Type),
> +		US = typesel(Type)
> +	).
> +

A lot of this code would benefit from using state variables. For
example the above code could be rewritten as:

	unit_selector_rename_types(Subst, !US) :-
		(
			!.US = termsel(_,_)
		;
			!.US = typesel(Type0),
			apply_subst_to_type(Subst, Type0, Type),
			!:US = typesel(Type)
		).

In general the use of state variables as opposed to sequences of numbered
variables is preferred for a few reasons:

(1) it leads to more concise code

(2) it makes it easier to add things to the middle of the sequence

(3) it helps to avoid bugs where the sequence gets misnumbered or
    where the sequence is reordered and the numbering not changed etc.

...

> +	% SubType = select_subtype(ModuleInfo, Type, ConsID, Position, SubType).
> +	% Determine the type of the type node selected from the type tree Type,
> +    % selecting the specific constructor (ConsId), at position Position.
> +	% Position counts starting from 1.
> +:- func select_subtype(module_info, mer_type, cons_id, int) = mer_type.
> +select_subtype(ModuleInfo, Type, ConsID, Choice) = SubType :-
> +	(
> +		type_util__get_cons_id_non_existential_arg_types(ModuleInfo,

You probably don't need to module qualify things imported from type_util.

> +	% split_upto_type_selector(Sin, S1, TS, S2):
> +	%	This predicate succeeds if there exists a typeselector
> +	% 	TS, such that Sin is equivalent to append(S1, [TS | S2])
> +	% 	and S1 contains no other type selector. It fails otherwise.
> +:- pred split_upto_type_selector(selector::in, selector::out,
> +		unit_selector::out, selector::out) is semidet.
> +
> +split_upto_type_selector(Sin, S1, TS, S2):-
> +    list__takewhile(
> +        pred(UnitSelector::in) is semidet :-
> +            ( UnitSelector = termsel(_, _) ),
> +        Sin,
> +        S1,
> +        Remainder),
> +    Remainder = [TS | S2 ].
> +

I think the above could be more clearly rewritten in either of the
following ways:

1)
	split_upto_type_selector(Sin, S1, TS, S2):-
		IsTermSelector = (pred(UnitSelector::in) is semidet :-
			UnitSelector = termsel(_, _)
		),
		list.takewhile(IsTermSelector, Sin, S1, Remainder),
		Remainder = [Ts | S2].

or 2)

	split_upto_type_selector(Sin, S1, Ts, Se) :-
		list.takewhile(is_term_selector, Sin, S1, Remainder),
		Remainder = [Ts | S2].

	:- pred is_term_selector(unit_selector::in) is semidet.

	is_term_selector(termsel(_, _)).

The same applies to lambda expressions elsewhere.

...

> +	% type_on_path(ModuleInfo, FromType, ToType, Path, Remainder):
> +	% This predicate verifies that the path Path starting from
> +	% FromType encounters at least one type node with the type ToType.
> +    % Remainder is the remainder of the Path after stripping it to the last
> +    % encounter of a node with "ToType".
> +    % XXX Changed w.r.t. original implementation!
> +	%
> +:- pred type_on_path(module_info::in, mer_type::in, mer_type::in,
> +		selector::in, selector::out) is semidet.
> +
> +type_on_path(ModuleInfo, FromType, ToType, Path, RemainderPath) :-
> +    % In checking this, at least one step of the Path must be done. Indeed, if
> +    % FromType = ToType, than RemainderPath would be equal to Path, which would
> +    % contradict the actual meaning of a type selector: A type-selector is a
> +    % shortcut notation for any non-zero (!) selector that selects a node of
> +    % the type described by the type-selector.
> +	type_on_path_2(first, ModuleInfo, FromType,
> +			ToType, Path, RemainderPath).
> +
> +:- type step ---> first ; subsequent.

The recommended formatting for types (see coding standard) is:

	:- type step
		--->	first
		;	subsequent.

Also, there needs to be a comment describing the type.

...

> +normalize_with_type_information(ModuleInfo, Type, Sel0, Sel) :-
> +	(
> +		prog_type.is_introduced_type_info_type(Type)

Predicates from prog_type.m shouldn't need to be module qualified.  The names
have been chosen so that they don't cause conflicts.

> +	->
> +		Sel = Sel0
> +	;
> +		branch_map_init(B0),
> +		branch_map_insert(Type, top_selector, B0, B1),
> +		normalize_wti(ModuleInfo, Type, B1, top_selector, Sel0, Sel)
> +	).
> +
> +
> +:- pred normalize_wti(module_info::in, mer_type::in, branch_map::in,
> +	selector::in, selector::in, selector::out) is det.
> +
> +normalize_wti(ModuleInfo, VarType, B0, Acc0, Sel0, Sel):-

I don't find the last four argument names particularly meaningful.  Something
like the following would be preferable:

	normalize_wti(ModuleInfo, VarType, BranchMap0, SelectorAcc,
			!Selector) :-

Note: I may have misunderstood what the purpose of some of those arguments
is so the above names may not be appropriate.

> +	(
> +		Sel0 = [ UnitSelector | SelRest ]
> +	->
> +		Class = type_util.classify_type(ModuleInfo, VarType),
> +		(
> +			Class = type_cat_user_ctor
> +		->
> +            % If it is either a term-selector of a non existentially typed
> +            % functor or a type-selector, construct the branch map and proceed
> +            % with normalization. If it is a term-selector of an existentially
> +            % typed functor, than normalization stops.
> +			(
> +			    (
> +                    UnitSelector = termsel(ConsId, Index),
> +				    type_util.get_cons_id_non_existential_arg_types(ModuleInfo,
> +				        VarType, ConsId, ArgTypes),
> +				    (
> +					    list__index1(ArgTypes, Index, SubType)
> +				    ->
> +					    CType = SubType
> +				    ;
> +                        Msg = "normalize_wti: accessing non-existing index.",
> +					    unexpected(this_file, Msg)

The current convention regarding string literals that don't fit on one line
should be split into two or more strings concatenated by the "++" operator.

...

> Index: prog_data.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/compiler/prog_data.m,v
> retrieving revision 1.151
> diff -u -d -r1.151 prog_data.m
> --- prog_data.m	16 Jan 2006 03:08:14 -0000	1.151
> +++ prog_data.m	16 Jan 2006 10:38:44 -0000
> @@ -320,6 +320,46 @@
>
>  %-----------------------------------------------------------------------------%
>  %
> +% Stuff for the `structure_sharing_info' pragma.
> +%
> +
> +        % Elements of the structure sharing domain lattice are either bottom
> +        % (no structure sharing), top (any kind of structure sharing), or
> +        % a list of structure sharing pairs.
> +:- type structure_sharing_domain
> +    --->    bottom
> +    ;       real(structure_sharing)
> +    ;       top(list(string)).
> +

What is the the list of strings supposed to represent here?  I suggest
using an equivalence type to give it a more meaningful name.

...

> Index: structure_sharing.domain.m
> ===================================================================
> RCS file: structure_sharing.domain.m
> diff -N structure_sharing.domain.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ structure_sharing.domain.m	16 Jan 2006 10:38:51 -0000
> @@ -0,0 +1,1518 @@

...

> +	% Operations w.r.t. the "bottom" element of the lattice.
> +:- func init = sharing_as.
> +:- pred is_bottom(sharing_as::in) is semidet.
> +
> +	% Operations w.r.t. the "top" element of the lattice. When sharing
> +	% becomes top, it is useful to know why it has become top. This can
> +	% be recorded and passed to the top-value as a string.
> +:- func top_sharing(string) = sharing_as is det.
> +:- func top_sharing(string, sharing_as) = sharing_as is det.

The determinism annotations on these functions are redundant.  Also what
does the string argument represent?

...

> +	% Returns the size of the sharing set. Fails when sharing is top.
> +:- pred size(sharing_as::in, int::out) is semidet.
> +
> +	% Projection operation.
> +	% This operation reduces the sharing information to information
> +	% regarding a given set of variables only.
> +	% Some properties of a call `project(Vars, SharingIn, SharingOut)':
> +	% 	* vars(SharingOut) is a subset of Vars.
> +	% 	* vars(SharingIn minus SharingOut) union Vars = emptyset.
> +:- pred project(list(prog_var)::in, sharing_as::in, sharing_as::out) is det.

A small thing: the equivalence prog_vars == list(prog_var) is defined
in prog_data.m.

...

> +sharing_from_unification(ModuleInfo, ProcInfo, Unification,
> +    GoalInfo) = Sharing :-
> +    (
> +        Unification = construct(Var, ConsId, Args0, _, _, _, _),
> +        list__takewhile(is_introduced_typeinfo_arg(ProcInfo), Args0,
> +            _TypeInfoArgs, Args),
> +        number_args(Args, NumberedArgs),
> +        list__foldl(
> +            add_var_arg_sharing(ModuleInfo, ProcInfo, Var, ConsId),
> +            NumberedArgs,
> +            sharing_set_init,
> +            SharingSet0),
> +        create_internal_sharing(ModuleInfo, ProcInfo, Var, ConsId,
> +            NumberedArgs, SharingSet0, SharingSet),
> +        Sharing = wrap(SharingSet)
> +    ;
> +        Unification = deconstruct(Var, ConsId, Args0, _, _, _),
> +        list__takewhile(is_introduced_typeinfo_arg(ProcInfo), Args0,
> +            _TypeInfoArgs, Args),
> +        number_args(Args, NumberedArgs),
> +        optimize_for_deconstruct(GoalInfo, NumberedArgs, ReducedNumberedArgs),
> +        list__foldl(
> +            add_var_arg_sharing(ModuleInfo, ProcInfo, Var, ConsId),
> +            ReducedNumberedArgs,
> +            sharing_set_init,
> +            SharingSet),
> +        Sharing = wrap(SharingSet)
> +    ;
> +        Unification = assign(X, Y),
> +        (
> +            arg_has_primitive_type(ModuleInfo, ProcInfo, X)
> +        ->
> +            Sharing = init
> +        ;
> +            new_entry(ModuleInfo, ProcInfo,
> +                datastruct.init(X) - datastruct.init(Y),
> +                sharing_set_init, SharingSet),
> +            Sharing = wrap(SharingSet)
> +        )
> +    ;
> +        Unification = simple_test(_, _),
> +        Sharing = init
> +    ;
> +        Unification = complicated_unify(_, _, _),
> +        % XXX needs to be studied again. For the moment generate top.
> +        Sharing = top_sharing("Encountered complicated unify")
> +    ).

When is this pass run? If it's after the "frontend_simplify" pass (stage 65)
then there shouldn't be any complicated_unifys left in the HLDS.  They should
have been expanded into calls to the appropriate unification predicate (see
simplify.process_compl_unify/10).  In this case you may just want to
call unexpected/2 here.

...

> +:- func without_doubles(sharing_set) = sharing_set.
> +
> +without_doubles(SharingSet0) = SharingSet :-
> +	SharingSet0 = sharing_set(_, Map0),
> +	map__foldl(
> +		pred(ProgVar::in, SelSharingSet::in, SS0::in, SS::out) is det :-
> +			( SelSharingSet = selector_sharing_set(_, SelMap),
> +			  map__foldl(
> +			  	pred(Selector::in, DataSet::in, SSa::in, SSb::out) is det :-
> +					(
> +					DataSet = datastructures(_, DS),
> +					set__fold(
> +						pred(Datastruct::in,
> +							SSx::in, SSy::out)
> +								is det :-
> +							( (
> +						        directed_entry_is_member(
> +                                    datastruct.init(ProgVar, Selector),
> +                                    Datastruct, SSx)
> +                            ->
> +                                SSy = SSx
> +                            ;
> +                                new_directed_entry(Datastruct,
> +                                    datastruct.init(ProgVar, Selector),
> +                                    SSx, SSy)
> +							) ),
> +						 	DS,
> +							SSa,
> +							SSb
> +						)
> +					),
> +                SelMap,
> +				SS0,
> +				SS
> +			  )),
> +		Map0,
> +		sharing_set_init,
> +		SharingSet).
> +

I suggest rewriting functions/predicates like the above using auxiliary
predicates in order to avoid nesting the closures quite so deeply.  I think
the above function is easier to understand when written as follows:

:- func without_doubles(sharing_set) = sharing_set.

without_doubles(SharingSet0) = SharingSet :-
	SharingSet0 = sharing_set(_, Map0),
	map.foldl(without_doubles_2, Map0, sharing_set_init, SharingSet).

:- pred without_doubles_2(prog_var::in, selector_sharing_set::in,
	sharing_set::in, sharing_set::out) is det.

without_doubles_2(ProgVar, SelectorSS, !SS) :-
	SelectorSS = selector_sharing_set(_, SelMap),
	map.foldl(without_doubles_3(ProgVar), SelMap, !SS).

:- pred without_doubles_3(prog_var::in, selector::in, data_set::in,
	sharing_set::in, sharing_set::out) is det.

without_doubles_3(ProgVar, Selector, DataSet, !SS) :-
	DataSet = datastructures(_, DS),
	set.foldl(without_doubles_4(ProgVar, Selector), DS, !SS)

:- pred without_doubles_4(prog_var::in, selector::in, datastruct::in,
	sharing_set::in, sharing_set::out) is det.

without_doubles_4(ProgVar::in, Selector::in, Datastruct::in, !SS) :-
	(
		directed_entry_is_member(datastruct.init(ProgVar, Selector),
			Datastruct, !.SS)
	->
		true
	;
		new_directed_entry(Datastruct,
			datastruct.init(ProgVar, Selector),
		!SS)
	).

...

> +:- pred data_set_map(pred(datastruct, datastruct), data_set, data_set).
> +:- mode data_set_map(pred(in, out) is det, in, out) is det.
> +data_set_map(Pred, DataSet0, DataSet):-
> +	DataSet0 = datastructures(_Size, Datastructs0),
> +	Datastructs = set__map(tofunc(Pred), Datastructs0),
> +	DataSet = datastructures(set__count(Datastructs), Datastructs).
> +
> +:- func tofunc(pred(X,Y), X) = Y.
> +:- mode tofunc(pred(in,out) is det, in) = out is det.
> +
> +tofunc(Pred, X) = Y :- Pred(X, Y).

Wouldn't it be easier to write predicates that have the mode
(pred(in, out) is det) as functions in the first place rather than
have to do this?

Arguably, tofunc/2 is a candidate for inclusion in the standard library
(if there isn't already have something like that.)

Cheers,
Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list