[m-rev.] for review: static terms in the MLDS backend

Zoltan Somogyi zs at csse.unimelb.edu.au
Mon Aug 31 11:58:18 AEST 2009
Previous message: [m-rev.] for post-commit review: speed up name mangling
Next message: [m-rev.] Re: for review: static terms in the MLDS backend
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I apologize for the size of this diff, but I was working on it at home
in a pipelined fashion, working on the next speedup while the previous
one was being bootchecked. There was never any intermediate point
where waiting for a review would have been convenient.

I intend to commit this diff tomorrow morning (sep 1). Reviews are welcome
both before and after then.

The diff code has some ZZZs in it. I will come back to them and fix them
over the next few days.

-------------------------------------------------------------------------

Rationalize the way the MLDS code generator handles global data structures.
The previous way was somewhat perverse in several respects.

The first respect was that static Mercury terms were translated into global
MLDS definitions, those MLDS definitions were turned into local definitions,
and then later turned back into global definitions. However, while they were
local definitions, they were part of the list of definitions processed
by ml_elim_nested. Since the complexity of ml_elim_nested is at least O(n^2)
and probably O(n^3) or worse, and since the definitions representing static
terms could increase the value of n from dozens or hundred to many thousands,
this was a BAD IDEA, especially since by construction, the MLDS definitions
representing ground terms cannot have other definitions nested inside them.
This problem was the main reason why previously the compiler took effectively
forever to compile any program with large tables of facts in MLDS grades.

The second respect has to do with the fact that terms are represented as tagged
pointers to memory cells. When generating an MLDS definition (a static cell)
for a ground term, ml_unify_gen.m remembered the name of the MLDS variable
representing the static cell, but forgot the tag on the pointer. So when
that ground term was used as an argument in another, bigger ground term,
it had to figure out the tag all over again. The argument of the
construct_statically functor in construct unifications had as its sole purpose
the provision of the data needed by this recomputation.

The third respect was that the code generator could generate duplicate
definitions (same variable name, same content) when two conjoined goals
referred to two or more type_infos or pseudo_type_infos for the same type.
The code generator used to handle it by wrapping a scope around the later
definitions, which (a) wasted memory, and (b) could yield warnings from
the C compiler about shadowed declarations. This problem used to prevent
the compiler from bootchecking in grade hlc_nest.gc; with this diff,
we can again bootcheck in grade hlc_nest.gc.

The fourth respect was that the code generator's record of which variables are
bound to constant terms was inaccurate, because it was never reset. If a
variable was bound to a constant term in one branch of a control structure,
the MLDS code generator state's record of this binding was still there
when the compiler generated code for the later branches of that control
structure. It did not use the record, but it was still there.

The MLDS code generator also used a horribly inefficient algorithm for figuring
out which Mercury variables should have their corresponding MLDS variables
declared at a given goal. It gathered up all the goals's variables, and then
gathered up all the variables in the goal's immediate subgoals. This meant
that the entire goal had to be traversed TWICE, with the second traversal
being needed only because the first one gathered too much information.
Those traversals were a huge performance problem on programs with large static
terms, since their representation is large conjunctions of unifications,
which have large numbers of variables. To add insult to injury, the traversals
used ordered lists to represent the sets, which meant that (given the ascending
variable numbers in from_ground_term scopes), adding the n'th unification's
new variable to the set took O(n) time, with the complexity of the whole
algorithm being quadratic.

Besides fixing these problems, this diff makes the MLDS code generator
handle from_ground_term_construct scopes specially, in a streamlined fashion,
just as the LLDS backend has done for a while now. It also fixes a bunch
of other performance problems pointed out by profiling.

This diff reduces the compilation time on the training_cars_150.m benchmark,
which has nothing but ground terms, from 94+ seconds to less than 2 seconds,
a 98% speedup. (This is with from_ground_term scopes enabled, as is
appropriate.) For ordinary programs, compilation time in grade hlc.gc
is reduced by about 3%.

compiler/ml_global_data.m:
	A new module that manages the MLDS code generator's records about
	static definitions. It separates those definitions into different
	categories based on what kind of processing they need. At the moment,
	the categories are: definitions of cells for ground terms and
	definitions of cells for RTTI, with the latter being subdivided
	into those that may need to be processed by ml_elim_nested
	and those that are known not to need such processing.

	It also records whether we have generated representation for a
	type_info or pseudo_type yet, so that we can avoid generating duplicate
	definitions. This, and the code that uses this, fixes the third
	problem.

compiler/ml_backend.m:
compiler/notes/compiler_design.html:
	Add the new module.

compiler/goal_util.m:
	Add a warning about the bad complexity of goal_vars.

compiler/hlds_goal.m:
	Remove the static_cons type, since it is no longer needed.
	Remove the static_cons argument of the construct_statically
	functor of the how_to_construct type.

	Fix an out-of-date comment about from_ground_term scopes.

compiler/mark_static_terms.m:
	Change the data structure being threaded through this module
	from being a map(prog_var, static_cons) to a set_tree234(prog_var),
	since we no longer need the information stored in static_conses.

compiler/modes.m:
	Do not compute static_conses.

compiler/mlds.m:
	Put the information about global data into a separate field of
	the MLDS, since some classes of such data can have their treatment
	optimized.

	Put the arguments of some functions into a more logical order.

	Give some function symbol names prefixes to avoid ambiguities.

	Change or add some field name prefixes to avoid ambiguities.

	Avoid the unnecessary use of higher order code in handling the flags of
	MLDS definitions. The new code is simpler as well as faster than the
	old.

compiler/ml_code_util.m:
	Make ml_global_data a part of the MLDS code generator state. This
	allows a bunch of predicates in the MLDS code generator to no longer
	return lists of definitions, since those definitions are now put
	into this new part of ml_gen_info, which ml_elim_nested won't touch.
	This is part of the solution of the first problem.

	Replace the field that mapped vars to the name of the global static
	definition involved in representing the ground term bound to that var
	with a field that maps the var to the actual rval (which will be a
	tagged pointer to that static definition), though we also remember
	the variable's type, since this is needed for making decisions about
	boxing. This is part of the solution for the second problem.

	Rename the extra_defns field of the ml_gen_info to the
	closure_wrapper_defns field, since this is the only thing that
	it is used for.

	Add fields to the ml_gen_info holding the value of the --highlevel-data
	option and the compilation target, since these are needed very often,
	and looking them up in the globals is too slow.

	Access all fields of ml_gen_info via access predicates, not via field
        notation, so that the number of accesses to each field can be measured
        by deep profiling.

        Separate the ml_gen_info structure into a main structure whose fields
        are frequently updated and which fits into an 8-word block of memory,
        and a substructure whose fields are read-only or rarely updated.
        This should help improve memory performance.

	Provide more useful access predicates to some of the counters,
	and make them harder to confuse by using fewer type synonyms.

	Provide some more convenient predicates for creating auxiliary MLDS
	variables (those that do not correspond to Mercury variables).

	Delete ml_join_decls, since fixing the third problem means that it is
	no longer needed.

	Delete some other predicates whose job has been taken over by
	ml_global_data.m.

	Put the arguments of some predicates into a more logical order.

	Rename some predicates to avoid ambiguities.

compiler/ml_code_gen.m:
	As part of the fix for the repeated traversals of parts of the
	procedure body with goal_vars, we now have to run quantification
	before generating MLDS for a procedure. Quantification does in
	one optimized pass what the old code used to do in many unoptimized
	passes. However, since quantification can change the HLDS, this
	requires a small change in the module's interface. HLDS dump 499
	now dumps to the HLDS version *output* by MLDS code generation,
	not the HLDS version that is its input.

	Use the (now guaranteed accurate) nonlocals fields of the goal and
	its immediate subgoals to figure out what variables need to declared at
	each goal. This fixes the fourth problem.

	Special case from_ground_term_construct scopes. (This code recognizes
	such scopes; the code that handles them is in ml_unify_gen.m.)

	Use trace goals to print progress messages.

	Avoid the use of io.format and string.format (see below).

	Group the predicates of this module a bit more meaningfully.
	(A really good order requires more reordering, but that will be
	a separate change.)

	Give some predicates more meaningful names.

	Add predicates that allow the convenient implementation of branched
	control structures.

	Use them in the implementation of the branched control structures
	handled in this module

compiler/ml_switch_gen.m:
compiler/ml_string_switch.m:
	Use them in the implementation of the branched control structures
	handled in these modules.

	Use the new module to generate constant data.

compiler/ml_unify_gen.m:
	Implement the changes described at the top.

	Divide ml_gen_new_object, a predicate that used to be 220 lines long,
        into several pieces, one for each different method of constructing new
	objects (memory cells).

	Delete the (extensive) code that used to recreate the tags on
	pointers to static cells. This code used to also recreate static
	arguments in their entirety if they were NOT pointers to cells,
	e.g. if they were integers or floats or strings, so we now record
	the values of such non-tagged-pointer constant terms also.

	Put the arguments of some predicates into a more logical order.
	In addition, some predicates now need ml_global_data threaded through
	them, but avoid returning lists of definitions.

compiler/ml_call_gen.m:
	Simplify the interface of the predicate that decides on boxing and
	unboxing, making it callable from places that do not have a full code
	generator state (such as the streamlined code in ml_unify_gen handling
	from_ground_term_construct scopes).

	Avoid the unnecessary use of higher order code.

	Rename a predicate to avoid ambiguity.

compiler/ml_closure_gen.m;
	Generate the closure layout information as static global data.

compiler/ml_elim_nested.m:
	Thread the globals through this module instead of the I/O state,
	since the only thing the I/O state was used for was accessing the
	globals.

	Traverse only the small subset of global data that needs to be
	traversed.

compiler/ml_type_gen.m:
	Use trace goals to print progress messages.

	Replace a semidet predicate  with a function returning a tailored type,
	for improved robustness.

	Look up stuff in the globals structure less frequently, by reusing the
	results of past lookups.

compiler/rtti_to_mlds.m:
	Restructure the code in this module in two respects.

	First, instead of having predicates returning lists of
	MLDS definitions, make them add those definitions to a ml_global_data
	data structure threaded through the module. Since the ml_global_data
	structure records which type_infos and pseudo_type_infos we have
	already generated definitions for, this allows us to avoid generating
	duplicate static cells for them. This allows us to avoid the old
	code's roundabout way of handling of such cells.

	Second, several predicates of this module returned initializers,
	which their caller then stuck into a static definition. This diff
	changes things so that wherever possible, the predicate that computes
	the initializers also creates the cell definition. This improves the
	code's cohesion.

	Also, rename some predicates where, after the above changes, their old
	names are no longer appropriate.

compiler/mlds_to_c.m:
	Avoid the use of io.format and string.format, since (after the more
	extensive changes described at the top of this log message) this
	turned out to account for a nontrivial percentage of the compiler's
	runtime.

	Instead of repeatedly looking up the values of some options
	in a copy of the globals obtained from the I/O state, thread
	through this module a term that records all the information
	that this module needs from globals.

	Threading this data through this module instead of looking something
	upo in the I/O state leads to a slight slowdown, but not having to do
	option lookups in large trees leads to a larger speedup.

	Rename some predicates to ambiguity.

	Convert some predicate definitions from multiple clauses into one,
	for clarity.

	Conform to the changes in the structure of the MLDS.

compiler/mlds_to_gcc.m:
	Conform to the changes in the structure of the MLDS.

compiler/mlds_to_il.m:
compiler/mlds_to_ilasm.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
	Conform to the changes in the structure of the MLDS.

	Pass the globals to this module instead of looking it up in the
	I/O state.

	Rename some predicates to ambiguity.

	Avoid the unnecessary use of higher order code.

	In mlds_to_managed.m, thread a previously frequently-looked-up value
	through the module.

compiler/typecheck_info.m:
	The code for expanding equivalence types in a procedure's vartypes
	field used to iterate on the keys of the vartypes map, and then
	looked up the associated values in the map. This diff replaces hat
	with new code that iterates on an association list that puts the
	value right next to its key (a list that the old code just threw away).

	The var/type assoc list we compute is still sorted on the variables,
	so create the new vartypes map not by repeated insertion, but
	directly (using map.from_sorted_assoc_list).

compiler/purity.m:
	Add a note about a possible future speedup.

compiler/prog_io_sym_name.m:
	Rename a predicate to avoid an ambiguity, and to make its purpose
	clearer.

compiler/globals.m:
compiler/handle_options.m:
compiler/mlds_to_il.m:
	Check whether the mmc option that specifies the IL version number is
	well formed at the start of the compilation, in handle_options.m,
	when all the other option values are checked if needed, rather than
	when generating IL code. This way, we can generate an error message,
	not a compiler abort.

	The IL version number is now recorded in the globals structure.

compiler/make_hlds_warn.m:
	Do not descend into from_ground_term scopes, since by construction,
	they cannot contain singleton variables.

compiler/analysis.file.m:
compiler/deep_profiling..m:
compiler/field_access.m:
compiler/hlds_out.m:
compiler/interval.m:
compiler/liveness.m:
compiler/loop_inv.m:
compiler/make.module_dep_file.m:
compiler/mercury_compile.m:
compiler/loop_inv.m:
compiler/ml_tailcall.m:
compiler/module_qual.m:
compiler/prog_ctgc.m:
compiler/prog_io_dcg.m:
compiler/prog_io_goal.m:
compiler/prog_io_pragma.m:
compiler/prog_io_typeclass.m:
compiler/prog_io_util.m:
compiler/quantification.m:
compiler/rbmm.add_rbmm_goal_infos.m:
compiler/recompilation.check.m:
compiler/recompilation.version.m:
compiler/structure_reuse.indirect.m:
compiler/superhomogeneous.m:
compiler/var_locn.m:
	Conform to the changes above.

compiler/unused_args.m:
	Fix an ambiguous predicate name.

tests/hard_coded/float_field.{m,exp}:
	Add some more test situations to this test case. These test
	the handling of notag types in static terms more thoroughly
	than before.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
Previous message: [m-rev.] for post-commit review: speed up name mangling
Next message: [m-rev.] Re: for review: static terms in the MLDS backend
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list