[m-rev.] for review: analysis framework (1/2)

Julien Fischer juliensf at cs.mu.OZ.AU
Mon Jan 16 18:42:09 AEDT 2006


On Mon, 16 Jan 2006, Peter Wang wrote:

> Estimated hours taken: 30
> Branches: main
>
> Some work on the intermodule analysis framework.  The main changes are that
> modules and analysis results have statuses associated with them, which are
> saved into the `.analysis' files, and there is now code to handle intermodule

s/in/into/

> dependency graphs (that record which modules are dependent on a particular
> analysis result).
>
> Automatic recompilation of modules that use out of date or invalid analysis
> results from other modules is not handled yet.
>
> analysis/analysis.m:
> analysis/analysis.file.m:
> 	Remove the `FuncInfo' type variable everywhere.  This was originally
> 	designed to be used by analyses to store "extra" information that
> 	would be passed from an analysis implementation through the analysis
> 	framework, back to methods defined by the analysis implementation
> 	itself.  One problem was that `FuncInfo' values were not designed to
> 	be saved and restored from disk.

Presumably that could be dealt with by insisting that they be members of the
to_string typeclass though?

>       Also, it made two `Call' or two
> 	`Answer' values hard to compare, as a `FuncInfo' value had to be
> 	present for a comparison call to be made, and it was not always
> 	obvious where that `FuncInfo' value would come from.  I have changed
> 	it so that that any information which might be be stored in a
> 	`FuncInfo' should be stored in the corresponding `Call' value itself.
>

I don't understand this one; could you provide an example.

> 	Change the format of analysis result files to include an overall
> 	status for the module and a status for each analysis result.  The
> 	statuses record whether the module or analysis result could be
> 	improved by further compilation, or if the module or analysis result
> 	is no longer valid.
>
> 	Add code to read and write intermodule dependency graphs (IMDGs).  The
> 	IMDG file for module M records all the modules which depend on an
> 	analysis result for a procedure defined in M.
>
> 	Bump analysis file format version numbers as they are incompatible
> 	with earlier versions.
>

Does the README file in the analysis directory also need updating?

> compiler/mercury_compile.m:
> 	Update to match changes in the intermodule analysis framework.
>
> compiler/mmc_analysis.m:
> 	Add the trail usage analysis to the list of analyses to be used with
> 	the intermodule analysis framework.
> 	Update the entry for unused argument elimination.

At the moment the results from the intermodule analysis framework should
correspond to the pragmas in the .opt files when compiling with
--intermodule-optimization (although not with
--transitive-intermodule-optimization).
You should check that this is the case.

>
> compiler/add_pragma.m:
> compiler/hlds_module.m:
> compiler/trailing_analysis.m:
> 	Make the trail usage analysis pass able to make use of the intermodule
> 	analysis framework.  Mainly, functions had to be converted to predicates
> 	taking I/O states.
>

Delete the last sentence - the main change was adding typeclass
instances.

> 	Associate each `trailing_status' in the `trailing_info' map with an
> 	`analysis_status', i.e. whether it is optimal or not.
>
> compiler/unused_args.m:
> 	Update to match the removal of `FuncInfo' arguments and the
> 	addition of analysis statuses.
>
> 	Record the unused argument analysis result for a procedure even if
> 	all of the procedures arguments are used, so that callers of the
> 	procedure will know not to request more precise answers.
>
> 	Record the dependence of the current module on analysis results from
> 	other modules.
>

...

> +:- pred read_module_status(analysis_status::out, io::di, io::uo) is det.
> +
> +read_module_status(Status, !IO) :-
> +	parser__read_term(TermResult `with_type` read_term, !IO),
> +	( TermResult = term(_, term__functor(term__atom(String), [], _)) ->
> +		( analysis_status_to_string(Status0, String) ->
> +			Status = Status0
> +		;
> +			error("read_module_status: unknown status " ++ String),
> +			throw(invalid_analysis_file)

Isn't the call to throw redundant here?

...

> @@ -163,49 +231,122 @@
>  	throw(invalid_analysis_file)
>      ).
>
> +read_module_imdg(Info, ModuleId, ModuleEntries, !IO) :-
> +    read_analysis_file(Info ^ compiler, ModuleId, ".imdg",

Make sure that mmake realclean (clean?) and the corresponding functionality in
mmc --make know how to clean up any new filetypes you add.

> +:- pred parse_imdg_arc(Compiler::in)
> +	    `with_type` parse_entry(module_analysis_map(imdg_arc))
> +	    `with_inst` parse_entry <= compiler(Compiler).
> +
> +parse_imdg_arc(Compiler, Term, Arcs0, Arcs) :-
> +    (
> +	Term = term.functor(atom("->"),
> +	    [term.functor(string(DependentModule), [], _), ResultTerm], _),
> +	ResultTerm = functor(atom(AnalysisName),
> +	    [VersionNumberTerm, FuncIdTerm, CallPatternTerm], _),
> +	FuncIdTerm = term.functor(term.string(FuncId), [], _),
> +	CallPatternTerm = functor(string(CallPatternString), [], _),
> +	analysis_type(_ : unit(Call), _ : unit(Answer))
> +	    = analyses(Compiler, AnalysisName),
> +	CallPattern = from_string(CallPatternString) : Call
> +    ->
> +	(
> +	    VersionNumber = analysis_version_number(_ : Call, _ : Answer),
> +	    VersionNumberTerm = term.functor(
> +		term.integer(VersionNumber), [], _)
> +	->
> +	    Arc = 'new imdg_arc'(CallPattern, DependentModule),
> +	    ( AnalysisArcs0 = map.search(Arcs0, AnalysisName) ->
> +		AnalysisArcs1 = AnalysisArcs0
> +	    ;
> +		AnalysisArcs1 = map.init
> +	    ),
> +	    ( FuncArcs0 = map.search(AnalysisArcs1, FuncId) ->
> +		FuncArcs = [Arc | FuncArcs0]
> +	    ;
> +		FuncArcs = [Arc]
> +	    ),
> +	    Arcs = map.set(Arcs0, AnalysisName,
> +		map.set(AnalysisArcs1, FuncId, FuncArcs))
> +    	;
> +	    % Ignore results with an out-of-date version number.
> +	    % XXX: is that the right thing to do?
> +	    %	   do we really need a version number for the IMDG?
> +	    Arcs = Arcs0
> +	)
> +    ;
> +	throw(invalid_analysis_file)
> +    ).
> +
> +%-----------------------------------------------------------------------------%
> +
> +:- type read_header(T) == pred(T, io, io).
> +:- inst read_header == (pred(out, di, uo) is det).
> +
>  :- type parse_entry(T) == pred(term, T, T).
>  :- inst parse_entry == (pred(in, in, out) is det).
>

Maybe these types would better named read_analysis_header and
parse_analysis_entry?   (And likewise for the write preds below).

...

> Index: analysis/analysis.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/analysis/analysis.m,v
> retrieving revision 1.2
> diff -u -r1.2 analysis.m
> --- analysis/analysis.m	5 Apr 2004 05:06:38 -0000	1.2
> +++ analysis/analysis.m	10 Jan 2006 04:33:53 -0000


>  :- type analysis_name == string.
>
>  :- type analysis_type
> -	---> some [FuncInfo, Call, Answer]
> -		analysis_type(unit(FuncInfo), unit(Call), unit(Answer))
> -		=> analysis(FuncInfo, Call, Answer).
> -
> -	% An analysis is defined by a type describing call patterns,
> -	% a type defining answer patterns and a type giving information
> -	% about the function being analysed (e.g. arity) which should
> -	% be provided by the caller.
> -:- typeclass analysis(FuncInfo, Call, Answer) <=
> -		(call_pattern(FuncInfo, Call),
> -		answer_pattern(FuncInfo, Answer))
> +	--->	some [Call, Answer]
> +		analysis_type(unit(Call), unit(Answer))
> +		=> analysis(Call, Answer).
> +
> +	% An analysis is defined by a type describing call patterns and
> +	% a type defining answer patterns.  If the analysis needs to store
> +	% more information about the function being analysed (e.g. arity)
> +	% it should be stored as part of the type for call patterns.
> +	%
> +:- typeclass analysis(Call, Answer) <=
> +		(call_pattern(Call),
> +		answer_pattern(Answer))
>  	where
>  [
> -	func analysis_name(FuncInfo::unused, Call::unused, Answer::unused) =
> +	func analysis_name(Call::unused, Answer::unused) =
>  		(analysis_name::out) is det,
>
>  	% The version number should be changed when the Call or Answer
>  	% types are changed so that results which use the old types
>  	% can be discarded.
> -	func analysis_version_number(FuncInfo::unused, Call::unused,
> +	func analysis_version_number(Call::unused,
>  		Answer::unused) = (int::out) is det,
>
> -	func preferred_fixpoint_type(FuncInfo::unused, Call::unused,
> -		Answer::unused) = (fixpoint_type::out) is det
> +	func preferred_fixpoint_type(Call::unused,
> +		Answer::unused) = (fixpoint_type::out) is det,
> +
> +	% `top' and `bottom' should not really depend on the call pattern.
> +	% However some analyses may choose to store extra information about
> +	% the function in their `Call' types that might be needed for the
> +	% answer pattern.
> +	%
> + 	func bottom(Call) = Answer,
> + 	func top(Call) = Answer
>  ].
>
>  :- type fixpoint_type
> @@ -75,18 +91,15 @@
>  			% Can stop at any time.
>  		greatest_fixpoint.
>
> -:- typeclass call_pattern(FuncInfo, Call)
> -		<= (partial_order(FuncInfo, Call), to_string(Call)) where [].
> +:- typeclass call_pattern(Call)
> +		<= (partial_order(Call), to_string(Call)) where [].
>
> -:- typeclass answer_pattern(FuncInfo, Answer)
> -		<= (partial_order(FuncInfo, Answer), to_string(Answer)) where [
> -	func bottom(FuncInfo) = Answer,
> -	func top(FuncInfo) = Answer
> -].
> +:- typeclass answer_pattern(Answer)
> +		<= (partial_order(Answer), to_string(Answer)) where [].
>
> -:- typeclass partial_order(FuncInfo, Call) where [
> -	pred more_precise_than(FuncInfo::in, Call::in, Call::in) is semidet,
> -	pred equivalent(FuncInfo::in, Call::in, Call::in) is semidet
> +:- typeclass partial_order(T) where [
> +	pred more_precise_than(T::in, T::in) is semidet,
> +	pred equivalent(T::in, T::in) is semidet
>  ].
>
>  :- typeclass to_string(S) where [
> @@ -94,11 +107,25 @@
>  	func from_string(string) = S is semidet
>  ].
>
> +	% A call pattern that can be used by analyses that do not need
> +	% finer granularity.
> +	%
>  :- type any_call ---> any_call.
> -:- instance call_pattern(unit, any_call).
> -:- instance partial_order(unit, any_call).
> +:- instance call_pattern(any_call).
> +:- instance partial_order(any_call).
>  :- instance to_string(any_call).
>
> +	% The status of a module or a specific analysis result.
> +	%
> +:- type analysis_status
> +	--->	invalid
> +	;	suboptimal
> +	;	optimal.
> +
> +	% Least upper bound of two analysis_status values.
> +	%
> +:- func lub(analysis_status, analysis_status) = analysis_status.
> +
>  	% This will need to encode language specific details like
>  	% whether it is a predicate or a function, and the arity
>  	% and mode number.

...

>  :- type analysis_info
>  	---> some [Compiler] analysis_info(
>  		compiler :: Compiler,
> +
> +			% Holds outstanding requests for more specialised
> +			% variants of procedures.  Requests are added to this
> +			% map as analyses proceed and written out to disk
> +			% at the end of the compilation of this module.
> +			%
>  		analysis_requests :: analysis_map(analysis_request),
> -		analysis_results :: analysis_map(analysis_result)
> +
> +			% The overall status of each module.
> +			%
> +		module_statuses	:: map(module_id, analysis_status),
> +
> +			% The "old" map stores analysis results read in from
> +			% disk.  New results generated while analysing the
> +			% current module are added to the "new" map.  After
> +			% all the analyses the two maps are compared to
> +			% see which analysis results have changed.  Other
> +			% modules may need to be marked or invalidated as a
> +			% result.  Then "new" results are moved into the "old"
> +			% map, from where they can be written to disk.
> +			%
> +		old_analysis_results :: analysis_map(analysis_result),
> +		new_analysis_results :: analysis_map(analysis_result),
> +
> +			% The Inter-module Dependency Graph records dependences

s/dependencies/dependences/

> +			% of an entire module's analysis results on another
> +			% module's answer patterns. e.g. assume module M1
> +			% contains function F1 that has an analysis result that
> +			% used the answer F2:CP2->AP2 from module M2.  If AP2
> +			% changes then all of M1 will either be marked
> +			% `suboptimal' or `invalid'.  Finer-grained dependency
> +			% tracking would allow only F1 to be recompiled,
> +			% instead of all of M1, but we don't do that.
> +			%
> +			% IMDGs are loaded from disk into the old map.
> +			% During analysis any dependences of the current module
> +			% on other modules is added into the new map.
> +			% At the end of analysis all the arcs which terminate
> +			% at the current module are cleared from the old map
> +			% and replaced by those in the new map.
> +			%
> +			% XXX: check if we really need two maps
> +			%
> +		old_imdg :: analysis_map(imdg_arc),
> +		new_imdg :: analysis_map(imdg_arc)
>  	) => compiler(Compiler).
>
...

> +record_dependency(CallerModuleId, AnalysisName, CalleeModuleId, FuncId, Call,
> +	!Info) :-
> +    (if CallerModuleId = CalleeModuleId then
> +	% XXX this assertion breaks compiling the standard library with
> +	% --analyse-trail-usage at the moment

How is it breaking it?

...

> +    % The algorithm is from Nick's thesis, pp. 108-9.

I suggest putting a pointer to analysis/README there since that file contains
the details of the thesis.

> +    % Or my corruption thereof.
> +    %
> +    % For each new analysis result (P^M:DP --> Ans_new):
> +    %   Read in the registry of M if necessary
> +    %   If there is an existing analysis result (P^M:DP --> Ans_old):
> +    %	if Ans_new \= Ans_old:
> +    %	    Replace the entry in the registry with P^M:DP --> Ans_new
> +    %	    if Ans_new `more_precise_than` Ans_old
> +    %		Status = suboptimal
> +    %	    else
> +    %		Status = invalid
> +    %	    For each entry (Q^N:DQ --> P^M:DP) in the IMDG:
> +    %		% Mark Q^N:DQ --> _ (_) with Status
> +    %		Actually, we don't do that.  We only mark the
> +    %		module N's _overall_ status with the
> +    %		least upper bound of its old status and Status.
> +    %   Else (P:DP --> Ans_old) did not exist:
> +    %	Insert result (P:DP --> Ans_new) into the registry.
> +    %
> +    % Finally, clear out the "new" analysis results map.  When we write
> +    % out the analysis files we will do it from the "old" results map.
> +    %

...

> +    % In this procedure we have just finished compiling module ModuleId
> +    % and will write out data currently cached in the analysis_info
> +    % structure out to disk.
> +    %
> +write_analysis_files(ModuleId, ImportedModuleIds, !Info, !IO) :-
> +    % The current module was just compiled so we set its status to the
> +    % lub of all the new analysis results generated.
> +    (if NewResults = !.Info ^ new_analysis_results ^ elem(ModuleId) then
> +	ModuleStatus = lub_result_statuses(NewResults)
> +    else
> +	ModuleStatus = optimal
> +    ),
> +
> +    update_analysis_registry(!Info, !IO),
> +
> +    !:Info = !.Info ^ module_statuses ^ elem(ModuleId) := ModuleStatus,
> +
> +    update_intermodule_dependencies(ModuleId, ImportedModuleIds,
> +	!Info, !IO),
> +    (if map.is_empty(!.Info ^ new_analysis_results) then
> +	true
> +    else
> +	io.print("Warning: new_analysis_results is not empty\n", !IO),
> +	io.print(!.Info ^ new_analysis_results, !IO),
> +	io.nl(!IO)
> +    ),
> +
> +    % Write the results for all the modules we know of.  For the
> +    % module being compiled, its analysis results may have changed.

s/the/its/

...

> +% XXX make this enableable with a command-line option.  A problem is that we
> +% don't want to make the analysis directory dependent on anything in the
> +% compiler directory.
> +
> +:- pred debug_msg(pred(io, io)::in(pred(di, uo) is det), io::di, io::uo)
> +    is det.
> +
> +debug_msg(_P, !IO) :-
> +    % P(!IO),
> +    true.

I suggest using a mutable to keep track of whether debugging traces
are enabled and then export a predicate from the analysis library that
client compilers can use to turn debugging on and off.

To be continued ...
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list