[m-rev.] for review: don't allow nondefault mode functions in terms

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Nov 2 02:08:31 AEDT 2015



On Sun, 1 Nov 2015 14:36:03 +1100, Mark Brown <mark at mercurylang.org> wrote:
> The ordering here is in part an information theoretic one, but the
> instantiatedness tree, at least as described in the ref man, doesn't
> include all the relevant information.

Of course it doesn't. It can't, because if it did, noone would
understand it.

I have long said that the reason why the code that handles insts
is so complex is because insts are complex. The inst of a variable
at a given program point encodes *all* of the following information:

- Is the program point reachable?
- Which nodes of the type tree of the variable are bound?
- Of the nodes which are bound, which are unique or clobbered?
- Of the nodes which are bound, which are known to be bound to
   only a subset of the function symbols of the type of the affected
   type tree node?
- Of the nodes which are bound, and which correspond to a higher
   order type, what is the determinism and what are the modes
   of the arguments of that higher order value?

Each one of those can be handled reasonably easily. It is their
combination that leads to the explosion of complexity.

The term "ground" clearly* answers the second question: all nodes
are bound. The different meanings of "ground" differ in what
restrictions, if any, they impose on the answers to the third, fourth
and fifth. Just by combining the answers to "do we care about each
of those three kinds of distinctions or not", we have 2^3=8 possible
meanings of "ground". And they are NOT quite independent; in code
such as valid/ho_func_call.m, some function symbols of a type
have higher order arguments while other function symbols do not,
which couples the fourth and fifth questions together, in a way.

The first question is the easiest to handle, in that its answer depends
only on the program point, and not the variable. Yet answers to the
third question for a variable at a program point can affect whether
a later program point is reachable or not (by affecting which tests
may succeed and which may not), and answers to the first question
affect all the others (since branches of e.g. disjunctions whose endpoint
is unreachable don't have to bind variables that are bound by the
other branches), so the first question cannot be considered separately
from the others either :-(

* (Actually, even this isn't TOTALLY clear. If you have an inst i1
that says X is either f(ground), g(ground) or h(free), but then you
learn that X can only be bound to f or g, do you consider the new inst
to be "ground"? All the type tree nodes that may exist in *X* are bound,
but some nodes in the type tree of the *type of X* are not.)

> As I said, I
> haven't reached my pain threshold yet.

Has the above helped you reach it?

I think that in the medium term, the best solution to this problem
would be to take the answer to the fourth question out of insts
and to put them into types instead. It would be an aspect of types
that would have to checked during mode analysis, not type checking.
This would complicate the relationship between the two analyses,
which is why it wasn't done that way originally. However, if it was
done correctly, it would guarantee that the information cannot
get lost. With that system, if you took a higher order value out
of a term, its type would tell you its signature, even if the term
was (at some intermediate point in time) polymorphic.

Unfortunately, this proposal has been a "medium term solution"
for way more than a decade now :-(

Zoltan.





More information about the reviews mailing list