[m-dev.] Propagation solver implementation of constranints based mode analysis of Mercury.

Richard Fothergill fothergill at gmail.com
Sun Mar 26 23:21:05 AEDT 2006
Previous message: [m-dev.] Re: CBMA project
Next message: [m-dev.] semantics with any insts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hey everyone,

I've written up an overview of everything I've worked on over the past
couple of summers. My plans for next year are unsure - they range from
going to Japan for three months to becoming a hermit. Whatever it is,
I'll need a move from intellectual stimulus because I've got a tough
year ahead of me (I'm doing 340, but that's just the start of it).
Hopefully this will give enough of an idea about the workings of the
propagation based implementation of constraints based mode analysis of
Mercury to someone, and they will be able to work on it for the while.

See you all soon I hope,

-Richard

P.S.
Ralph, the information you requested is mostly near the end; Julien,
the information you requested is mostly at the beginning.

P.P.S.

Some information I've missed in the main document:
The propagation solver is activated with the command line argument
--prop-mode-constraints

I'll have missed more, and I haven't proof read it. I'm actually a
little bit sick, and pretty tired, so I'm going to go to bed.


P.P.P.S.

Here it is:

==============================================


This is a description of the propagation solver implementation for
constraints based mode analysis of Mercury. It works on a simplified
constraints based mode analysis system, with no subtyping, or higher
order code.

The constraints approach works by first finding, for each goal of the
program a program variable is used in, where its value is produced, and
where it's value is consumed (i.e. used). It does this by creating
constraint variables that represent propositions about the program,
constraining them, and then solving for them with a propagation-type
solver. The next stage involves ordering conjunctions based on the
results of the producer/consumer relationship. To do this, a partial
order is created by noting that a conjunct that produces a variable must
come before any conjunct that consumes it. Then a full order is
determined by a simple topological sort, while maintaining the minimum
ordering criteria.

A good deal of work remains to be done on the propagation solver
implementation to constraints based mode analysis, including:
improvement of nondeterministic behaviour,
full subtyping/partial instantiation,
the handling of higher order code,
mode inference, and
various other small situations that can arise in programs.
More advanced concepts such as uniqueness might also be considered.

The constraint variables used in the first stage take the following
form:
First, we represent program variables uniquely for mode analysis by the
mc_progvar (read: mode constraints progvar) type, which stores the
original program variable ID along with the pred_id of the predicate it
occurs in. Now, considering some mc_progvar, it will appear in some
goals of its predicate. We assign to each of those goals a unique label
called a goal_path which records the path to that goal through nested
goals of the predicate (see notes 1 at the end on goal path notation).
For each goal in which a mc_progvar occurs we create the proposition
that it is produced there, by means of a constraint variable of type
mc_var and a corresponding abstract representation of the proposition,
of type mc_rep_var. The mc_rep_var containts the mc_progvar, and a
goal_path.

The constructor for the mc_progvar type is mostly used in infix
notation, e.g. Progvar `in` PredId, where Progvar is a program variable
and PredId a predicate it occurs in. Infix notation is also used for the
mc_rep_var type, so the proposition that Progvar is produced at goal
path GoalPath in PredId is given by:
   ProgVar `in` PredId `at` GoalPath
(in its abstract mc_rep_var representation). A bimap, of type
mc_var_map is used to convert freely between these and the the solver
variables of type mc_var. The former is more readily understood by the
ordering phase and by the programmer, while the latter should be more
efficient for the number of times it will be inspected during constraint
solving.

The constraints placed on these variables are built up goal by goal, as
follows.

Conjunction:
If a program variable is produced at a conjunct, it is produced at the
conjunction as a whole. If a program variable is not produced at any
conjunct, it is not produced at the conjunction. A program variable can
be produced at at most one conjunct. The constraints for a parallel
conjunction have not been implemented.

Disjunction:
If a program variable is produced at some disjunct then all disjuncts
must be able to produce it in this mode, for we don't know which
disjunct will succeed first. If a program variable is produced at some
disjunct it is produced at the disjunction as a whole.

If Then Else:
Variables nonlocal to the condition cannot be produced by the condition,
since they are in a negated context if the else part gets executed.
The then- and else-parts together act as a disjunction. The condition
and the then-part together act as a conjunction.

Scope:
Variables produced at the sub-goal are produced at this path.

Unification:
For variable-variable unifications at most one of them is produced.
For variable-functor unifications, either the variable is produced, or
every argument of the functor is produced, but not both. The constraints
for higher-order unifications have not been implemented.

Predicate Call:
If the callee has mode declarations, then the constraints follow
immediately from the modes. Otherwise, an argument variable is produced
when the corresponding head variable of the callee is produced.

Generic Call:
The constraints for higher order calls have not been implemented.
The constraints for class method calls have not been implemented.
There are no constraints for a cast, as the value of no program variable
is produced or consumed.

Foreign Procedure:
The procedure should have a mode declaration, and should be handled like
a predicate call as such.

Negation:
No nonlocal is produced inside a negation.

No other goal type has constraints implemented for it.

Predicates with declared modes can be constrained by their mode
declarations - as with a predicate call the constraints follow
immediately from the modes.

As most of these constraints simplify quickly into conjunctions of
variable bindings as soon as even one of the variables involved is
bound, a propagation solver was implemented to solve them.

First, constraints are prepared in a structure of type prep_cstrts. Most
are simplified completely at this stage into a propagation graph - a
graph of variable bindings. Once a binding is made the graph can be
followed around to discover the immediately implied subsequent bindings.
A few of the constraints do not simplify immediately and are stored more
abstractly.

Once a system of constraints is ready for solving, all equivalences
between variables are taken out and represented as equivalence classes
of variables. A representative member of each class is used to replace
all other members of the same class in the constraints and the
constraints are simplified accordingly. The constraints are copied into
a solver_cstrts type data structure to mark that they are ready for
solving.

The solver works simply: as variables are bound, constraints they
participate in are checked - if any new bindings are implied they are
made and followed through, if any of the more complex constraints can be
simplified into the propagation graph, or into bindings, they are. Every
solution to the producer consumer constraints corresponds (but not
uniquely) to a mode for the predicate or SCC being analysed.

Once the producer consumer constraints have been solved, the results are
used to create the conjunct ordering constraints. The only ordering
constraint at this stage is a bindary ordering between conjuncts. They
are produced by noting that any conjunct that produces a program
variable must come before any conjunct that consumes it. These
constraints are used to build up a directed graph that acts as a partial
order between conjuncts in the conjunction being ordered.

The ordering constraints are solved with a simple topological sort.
The first step is to collect a list of all the conjuncts that are not
required to come after some unscheduled conjunct. Then, the one that
appeared first in the original ordering is selected and scheduled (i.e.
added to the ordered conjunction we are building up). The process is
then repeated for the remaining unscheduled conjuncts.

This is in keeping with the minimum reordering requirement, but is only
a local minimum with respect to a single solution to the
produce/consumer constraints.  That another producer/consumer solution
may result in less reordering is not checked (because it would be time
consuming - all producer/consumer solutions would have to be found).

Conjunction reordering is applied to predicates with declared modes,
but not yet to predicates that require mode inference.


The propagation solver implementation of constraints based mode analysis
is still incomplete, and has some quirks. For example, if a bug arises
that causes correct solutions to the producer consumer relationship to
be missed, this sometimes results in seemingly unending nondeterministic
behaviour of the propagation solver. This could mean that some
incorrectly moded programs will take a long time to analyse too,
although this has not been tested enough.

A first step towards alleviating this problem would be to incorporate
the ordering constraints into the producer/consumer constraints. This
has the potential to discard many computation paths quicker, but more
importantly would catch the cases where producer/consumer relationships
exist but an ordering for them doesn't a lot quicker as well. One of the
producer/consumer constraints for a conjunction is that a program
variable can be produced at at most one conjunct. As soon as that
conjunct is determined, the ordering constraints for that program
variable on that conjunct can be created. This means that loops in the
conjunct partial order could be detected immediately.

A rather time consuming but reasonably simple expansion of this mode
analysis pass would be the expansion to full subtyping. When this is
done, the producer/consumer constraints will no longer be on program
variables, but on positions in program variables. The transformation to
hhf used by the original constraints based mode analysis currently
builds the instantiation graph required for this, but also makes
requantification necessary, which would in turn obsolete the inst graph.
The way around this would be to separate the calculation of the inst
graph from any hlds transform, and then run it only after
requantification.

The calculation of the instantiation graph should also be used to
determine higher order insts, as these will be needed for creating
producer/consumer constraints for higher order calls. The constraints
for a higher order call will be simple - and much the same as the
constraints for any other call. However, the mode information will be
required, and this means the instantiation information of the higher
order variable must be known. This is not immediately available, but in
most cases is should be determinable by examining unifications, and the
mode declarations of predicates that the higher order variable is used
in the head of. The producer/consumer constraints for lambda
unifications should be easy and has not been done simply as a matter of
priority. Only the higher order variable should be produced by the
unification. Other nonlocals should never be.

Another reasonably straightforward extension will be mode inference. The
basic premise is to perform solve (in the least) the producer consumer
relationship for predicates with declared modes first (e.g., exported
predicates must have their modes declared). These will imply directly
the modes needed for the predicates to be mode inferred. In fact, their
producer/consumer solutions should contain bindings for the head
variables of the predicates to be mode inferred, due to the nature of
the call goal constraints. It is suggested that mode inference for a
required mode be performed as soon as it is known to be required, since
mode inference failure will require backtracking into the
producer/consumer constraints solution that required it. Furthermore,
mode inference should be performed for an entire SCC at once, as
allowing all posible combinations of modes within a SCC to be mode
inferred will result in very weak constraints and therefore a long
running time.

There are a number of small loose ends that remain to be tidied up,
including, but not limited to:
requantification failure is dealt with simply by a call to 'unexpected'
- but this may be reasonable;
constraints for switches and parallel conjuctions are not implemented -
this time the call is to 'sorry';
other such problems can be found by looking for XXX comments in the
code.

Once all this is done the constraints based mode analysis system should
be able to take over most of the work of the current mode analysis
system. After that, the next thing to consider is unique modes. There's
not much to say on the matter at this point, except this: when the
ordering constraints are solved at this point, they come to a stage
where they pick the earliest originally occuring conjunct out of a
number of valid choices. If a nondeterministic predicate like
list.member were to select the conjunct instead, it could (presumably) be
made to try the earliest occuring first, but then move on to others if
more complicated uniqueness constraints failed.





Notes:

1: Goal Paths, notation

A goal path is a list of steps you have to take from goal to sub-goal in
order to arrive a specific goal in the body of a predicate. For example,
consider the following append predicate:

append(Xs, Ys, Zs) :-
   (
       Xs = [],
       Zs = Ys
   ;
       Xs = [Xh|Xt],
       Zs = [Xh|Zt],
       append(Xt, Ys, Zt)
   ).

Then the call goal "append(Xt, Ys, Zt)" is the third conjunct of the
second disjunct of the body, so it is given the goal path [];d2;c3;.
Here [] represents starting at the head of the predicate, d2 taking the
second disjunct of that, etc. Similarly, the conjunction goal
(Xs = [], Zs = Ys) is at goal path [];d1;.

Here are the string representations used for a few different goals:
[]  The empty goal path; the predicate head.
cN  The Nth conjunct of a conjunction.
dN  The Nth disjunct of a disjunction.
?   The condition of an if-then-else.
t   The then-part of an if-then-else.
e   The else-part of an if-then-else.

Normally when writing out goal paths, the [] is dropped as it only ever
appears at the start, and in practice relative goal paths are rarely
used - only absolute ones. (Thus the two goal paths listed above would
appear d2;c3; and d1; respectively.)

The data structure for storing goal paths stores the goal path steps in
reverse order in a list - thus the symbol [] is automatically used for
the top level goal. The data structures themselves can be found in
goal_path.m in the compiler directory.

--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
Previous message: [m-dev.] Re: CBMA project
Next message: [m-dev.] semantics with any insts
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the developers mailing list