[m-rev.] for review: static terms in the MLDS backend
Zoltan Somogyi
zs at csse.unimelb.edu.au
Mon Aug 31 11:58:18 AEST 2009
I apologize for the size of this diff, but I was working on it at home
in a pipelined fashion, working on the next speedup while the previous
one was being bootchecked. There was never any intermediate point
where waiting for a review would have been convenient.
I intend to commit this diff tomorrow morning (sep 1). Reviews are welcome
both before and after then.
The diff code has some ZZZs in it. I will come back to them and fix them
over the next few days.
-------------------------------------------------------------------------
Rationalize the way the MLDS code generator handles global data structures.
The previous way was somewhat perverse in several respects.
The first respect was that static Mercury terms were translated into global
MLDS definitions, those MLDS definitions were turned into local definitions,
and then later turned back into global definitions. However, while they were
local definitions, they were part of the list of definitions processed
by ml_elim_nested. Since the complexity of ml_elim_nested is at least O(n^2)
and probably O(n^3) or worse, and since the definitions representing static
terms could increase the value of n from dozens or hundred to many thousands,
this was a BAD IDEA, especially since by construction, the MLDS definitions
representing ground terms cannot have other definitions nested inside them.
This problem was the main reason why previously the compiler took effectively
forever to compile any program with large tables of facts in MLDS grades.
The second respect has to do with the fact that terms are represented as tagged
pointers to memory cells. When generating an MLDS definition (a static cell)
for a ground term, ml_unify_gen.m remembered the name of the MLDS variable
representing the static cell, but forgot the tag on the pointer. So when
that ground term was used as an argument in another, bigger ground term,
it had to figure out the tag all over again. The argument of the
construct_statically functor in construct unifications had as its sole purpose
the provision of the data needed by this recomputation.
The third respect was that the code generator could generate duplicate
definitions (same variable name, same content) when two conjoined goals
referred to two or more type_infos or pseudo_type_infos for the same type.
The code generator used to handle it by wrapping a scope around the later
definitions, which (a) wasted memory, and (b) could yield warnings from
the C compiler about shadowed declarations. This problem used to prevent
the compiler from bootchecking in grade hlc_nest.gc; with this diff,
we can again bootcheck in grade hlc_nest.gc.
The fourth respect was that the code generator's record of which variables are
bound to constant terms was inaccurate, because it was never reset. If a
variable was bound to a constant term in one branch of a control structure,
the MLDS code generator state's record of this binding was still there
when the compiler generated code for the later branches of that control
structure. It did not use the record, but it was still there.
The MLDS code generator also used a horribly inefficient algorithm for figuring
out which Mercury variables should have their corresponding MLDS variables
declared at a given goal. It gathered up all the goals's variables, and then
gathered up all the variables in the goal's immediate subgoals. This meant
that the entire goal had to be traversed TWICE, with the second traversal
being needed only because the first one gathered too much information.
Those traversals were a huge performance problem on programs with large static
terms, since their representation is large conjunctions of unifications,
which have large numbers of variables. To add insult to injury, the traversals
used ordered lists to represent the sets, which meant that (given the ascending
variable numbers in from_ground_term scopes), adding the n'th unification's
new variable to the set took O(n) time, with the complexity of the whole
algorithm being quadratic.
Besides fixing these problems, this diff makes the MLDS code generator
handle from_ground_term_construct scopes specially, in a streamlined fashion,
just as the LLDS backend has done for a while now. It also fixes a bunch
of other performance problems pointed out by profiling.
This diff reduces the compilation time on the training_cars_150.m benchmark,
which has nothing but ground terms, from 94+ seconds to less than 2 seconds,
a 98% speedup. (This is with from_ground_term scopes enabled, as is
appropriate.) For ordinary programs, compilation time in grade hlc.gc
is reduced by about 3%.
compiler/ml_global_data.m:
A new module that manages the MLDS code generator's records about
static definitions. It separates those definitions into different
categories based on what kind of processing they need. At the moment,
the categories are: definitions of cells for ground terms and
definitions of cells for RTTI, with the latter being subdivided
into those that may need to be processed by ml_elim_nested
and those that are known not to need such processing.
It also records whether we have generated representation for a
type_info or pseudo_type yet, so that we can avoid generating duplicate
definitions. This, and the code that uses this, fixes the third
problem.
compiler/ml_backend.m:
compiler/notes/compiler_design.html:
Add the new module.
compiler/goal_util.m:
Add a warning about the bad complexity of goal_vars.
compiler/hlds_goal.m:
Remove the static_cons type, since it is no longer needed.
Remove the static_cons argument of the construct_statically
functor of the how_to_construct type.
Fix an out-of-date comment about from_ground_term scopes.
compiler/mark_static_terms.m:
Change the data structure being threaded through this module
from being a map(prog_var, static_cons) to a set_tree234(prog_var),
since we no longer need the information stored in static_conses.
compiler/modes.m:
Do not compute static_conses.
compiler/mlds.m:
Put the information about global data into a separate field of
the MLDS, since some classes of such data can have their treatment
optimized.
Put the arguments of some functions into a more logical order.
Give some function symbol names prefixes to avoid ambiguities.
Change or add some field name prefixes to avoid ambiguities.
Avoid the unnecessary use of higher order code in handling the flags of
MLDS definitions. The new code is simpler as well as faster than the
old.
compiler/ml_code_util.m:
Make ml_global_data a part of the MLDS code generator state. This
allows a bunch of predicates in the MLDS code generator to no longer
return lists of definitions, since those definitions are now put
into this new part of ml_gen_info, which ml_elim_nested won't touch.
This is part of the solution of the first problem.
Replace the field that mapped vars to the name of the global static
definition involved in representing the ground term bound to that var
with a field that maps the var to the actual rval (which will be a
tagged pointer to that static definition), though we also remember
the variable's type, since this is needed for making decisions about
boxing. This is part of the solution for the second problem.
Rename the extra_defns field of the ml_gen_info to the
closure_wrapper_defns field, since this is the only thing that
it is used for.
Add fields to the ml_gen_info holding the value of the --highlevel-data
option and the compilation target, since these are needed very often,
and looking them up in the globals is too slow.
Access all fields of ml_gen_info via access predicates, not via field
notation, so that the number of accesses to each field can be measured
by deep profiling.
Separate the ml_gen_info structure into a main structure whose fields
are frequently updated and which fits into an 8-word block of memory,
and a substructure whose fields are read-only or rarely updated.
This should help improve memory performance.
Provide more useful access predicates to some of the counters,
and make them harder to confuse by using fewer type synonyms.
Provide some more convenient predicates for creating auxiliary MLDS
variables (those that do not correspond to Mercury variables).
Delete ml_join_decls, since fixing the third problem means that it is
no longer needed.
Delete some other predicates whose job has been taken over by
ml_global_data.m.
Put the arguments of some predicates into a more logical order.
Rename some predicates to avoid ambiguities.
compiler/ml_code_gen.m:
As part of the fix for the repeated traversals of parts of the
procedure body with goal_vars, we now have to run quantification
before generating MLDS for a procedure. Quantification does in
one optimized pass what the old code used to do in many unoptimized
passes. However, since quantification can change the HLDS, this
requires a small change in the module's interface. HLDS dump 499
now dumps to the HLDS version *output* by MLDS code generation,
not the HLDS version that is its input.
Use the (now guaranteed accurate) nonlocals fields of the goal and
its immediate subgoals to figure out what variables need to declared at
each goal. This fixes the fourth problem.
Special case from_ground_term_construct scopes. (This code recognizes
such scopes; the code that handles them is in ml_unify_gen.m.)
Use trace goals to print progress messages.
Avoid the use of io.format and string.format (see below).
Group the predicates of this module a bit more meaningfully.
(A really good order requires more reordering, but that will be
a separate change.)
Give some predicates more meaningful names.
Add predicates that allow the convenient implementation of branched
control structures.
Use them in the implementation of the branched control structures
handled in this module
compiler/ml_switch_gen.m:
compiler/ml_string_switch.m:
Use them in the implementation of the branched control structures
handled in these modules.
Use the new module to generate constant data.
compiler/ml_unify_gen.m:
Implement the changes described at the top.
Divide ml_gen_new_object, a predicate that used to be 220 lines long,
into several pieces, one for each different method of constructing new
objects (memory cells).
Delete the (extensive) code that used to recreate the tags on
pointers to static cells. This code used to also recreate static
arguments in their entirety if they were NOT pointers to cells,
e.g. if they were integers or floats or strings, so we now record
the values of such non-tagged-pointer constant terms also.
Put the arguments of some predicates into a more logical order.
In addition, some predicates now need ml_global_data threaded through
them, but avoid returning lists of definitions.
compiler/ml_call_gen.m:
Simplify the interface of the predicate that decides on boxing and
unboxing, making it callable from places that do not have a full code
generator state (such as the streamlined code in ml_unify_gen handling
from_ground_term_construct scopes).
Avoid the unnecessary use of higher order code.
Rename a predicate to avoid ambiguity.
compiler/ml_closure_gen.m;
Generate the closure layout information as static global data.
compiler/ml_elim_nested.m:
Thread the globals through this module instead of the I/O state,
since the only thing the I/O state was used for was accessing the
globals.
Traverse only the small subset of global data that needs to be
traversed.
compiler/ml_type_gen.m:
Use trace goals to print progress messages.
Replace a semidet predicate with a function returning a tailored type,
for improved robustness.
Look up stuff in the globals structure less frequently, by reusing the
results of past lookups.
compiler/rtti_to_mlds.m:
Restructure the code in this module in two respects.
First, instead of having predicates returning lists of
MLDS definitions, make them add those definitions to a ml_global_data
data structure threaded through the module. Since the ml_global_data
structure records which type_infos and pseudo_type_infos we have
already generated definitions for, this allows us to avoid generating
duplicate static cells for them. This allows us to avoid the old
code's roundabout way of handling of such cells.
Second, several predicates of this module returned initializers,
which their caller then stuck into a static definition. This diff
changes things so that wherever possible, the predicate that computes
the initializers also creates the cell definition. This improves the
code's cohesion.
Also, rename some predicates where, after the above changes, their old
names are no longer appropriate.
compiler/mlds_to_c.m:
Avoid the use of io.format and string.format, since (after the more
extensive changes described at the top of this log message) this
turned out to account for a nontrivial percentage of the compiler's
runtime.
Instead of repeatedly looking up the values of some options
in a copy of the globals obtained from the I/O state, thread
through this module a term that records all the information
that this module needs from globals.
Threading this data through this module instead of looking something
upo in the I/O state leads to a slight slowdown, but not having to do
option lookups in large trees leads to a larger speedup.
Rename some predicates to ambiguity.
Convert some predicate definitions from multiple clauses into one,
for clarity.
Conform to the changes in the structure of the MLDS.
compiler/mlds_to_gcc.m:
Conform to the changes in the structure of the MLDS.
compiler/mlds_to_il.m:
compiler/mlds_to_ilasm.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
Conform to the changes in the structure of the MLDS.
Pass the globals to this module instead of looking it up in the
I/O state.
Rename some predicates to ambiguity.
Avoid the unnecessary use of higher order code.
In mlds_to_managed.m, thread a previously frequently-looked-up value
through the module.
compiler/typecheck_info.m:
The code for expanding equivalence types in a procedure's vartypes
field used to iterate on the keys of the vartypes map, and then
looked up the associated values in the map. This diff replaces hat
with new code that iterates on an association list that puts the
value right next to its key (a list that the old code just threw away).
The var/type assoc list we compute is still sorted on the variables,
so create the new vartypes map not by repeated insertion, but
directly (using map.from_sorted_assoc_list).
compiler/purity.m:
Add a note about a possible future speedup.
compiler/prog_io_sym_name.m:
Rename a predicate to avoid an ambiguity, and to make its purpose
clearer.
compiler/globals.m:
compiler/handle_options.m:
compiler/mlds_to_il.m:
Check whether the mmc option that specifies the IL version number is
well formed at the start of the compilation, in handle_options.m,
when all the other option values are checked if needed, rather than
when generating IL code. This way, we can generate an error message,
not a compiler abort.
The IL version number is now recorded in the globals structure.
compiler/make_hlds_warn.m:
Do not descend into from_ground_term scopes, since by construction,
they cannot contain singleton variables.
compiler/analysis.file.m:
compiler/deep_profiling..m:
compiler/field_access.m:
compiler/hlds_out.m:
compiler/interval.m:
compiler/liveness.m:
compiler/loop_inv.m:
compiler/make.module_dep_file.m:
compiler/mercury_compile.m:
compiler/loop_inv.m:
compiler/ml_tailcall.m:
compiler/module_qual.m:
compiler/prog_ctgc.m:
compiler/prog_io_dcg.m:
compiler/prog_io_goal.m:
compiler/prog_io_pragma.m:
compiler/prog_io_typeclass.m:
compiler/prog_io_util.m:
compiler/quantification.m:
compiler/rbmm.add_rbmm_goal_infos.m:
compiler/recompilation.check.m:
compiler/recompilation.version.m:
compiler/structure_reuse.indirect.m:
compiler/superhomogeneous.m:
compiler/var_locn.m:
Conform to the changes above.
compiler/unused_args.m:
Fix an ambiguous predicate name.
tests/hard_coded/float_field.{m,exp}:
Add some more test situations to this test case. These test
the handling of notag types in static terms more thoroughly
than before.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list