[m-users.] A more precise determinism checker
Zoltan Somogyi
zoltan.somogyi at runbox.com
Sun Mar 20 04:07:59 AEDT 2022
2022-03-20 03:12 GMT+11:00 "Philip White" <philip at pswhite.org>:
> I've been working in my spare time on a compiler for a relational
> programming language heavily inspired by mercury - it has strong and
> statically checked types, modes, and determinism.
That is nice to hear.
> The determinism checking approach of this language is totally different
> than Mercury's approach, and I believe it yields much more precise
> results, albeit via a more expensive algorithm. I'm curious if the
> compiler writers ever considered something like this. If not, would you
> every consider using it in the current mercury compiler?
No, and no.
We the implementors of the Mercury compiler consider that compilation
speed is very important in making a language practical. If, after changing
the code of a module, the process of rebuilding the executable takes long enough
for you to lose your focus, your productivity will suffer, and we don't want that.
The Mercury determinism analysis algorithm was purposely designed to be simple.
Its simplicity does not only make it fast, it also makes it *easily understandable*.
>From time to time, we get proposals to extend it, e.g. by teaching it that if the
type of X contains only the function symbols f, g and h, then in the else case
of an if-then-else of the form "( if X = f then ... else ...)", X can only be g or h,
so a switch on X with only those two arms should be considered "det".
We have always turned such proposals down, not because of the impact
on compiler speed (which would probably be small), but because it would make
the rules of determinism analysis both harder to learn and harder to predict
even once learned.
> Predicates written in this language go through a mode-resolution phase
> in which clauses in a conjunction are re-ordered so that variables are
> properly instantiated for each predicate that needs to be invoked; this
> phase also selects the mode to use at each invocation. It's after this
> phase that I run the determinism checker. I imagine the Mercury
> compiler does something pretty similar.
Yes, it does. You can see it yourself by invoking mmc with the -v (verbose)
option, which tells the compiler to print a line for each phase of compilation.
You will see that type and mode analysis are followed by two passes
that gather information for determinism analysis, and then determinism
analysis itself.
> Naively, we "execute" the predicate with each combination of
> constructors that each variable can have. This guarantees that we will
> hit every possible code-path of the predicate. The result of each
> execution is a determinism. For example, if the predicate finishes
> twice, then the result of that execution is "multi". The final
> determinism of the predicate is the join of all the execution results
> (i.e. the join operation of the determinism lattice). For example, if
> one path yields "multi" and another yields "failure", then the final
> determinism is "nondet".
>
> This naive approach would be prohibitively expensive for predicates
> with lots variables, especially if those variables have types with many
> constructors.
And some types, such as integers and floats, have billions of constructors,
and strings even more.
> We can improve it by initially treating every variable as
> abstract; every time the predicate attempts to pattern match a
> variable, we restart execution once for each of that variable's
> constructors. The algorithm is roughly the same, but we only end up
> running variable combinations on-demand, so we save a lot of work
> for variables that are merely moved around and never inspected.
This approach multiplies the amount of work you have to do by *many*
orders of magnitude. Using techniques from areas such as model checking
or concolic testing, you may be able to get most of those orders of magnitude
back, but you would have to do a *lot* of work to achieve even 1% of the
speed of the Mercury determinism analysis algorithm.
> There are a few details that I've glossed over, but this is the general
> idea. It's definitely a more expensive algorithm than something that is
> more syntax directed, but I think it will notice switch statements more
> often than Mercury does, and it will also determine that a set of
> variables is exhaustive more often than the Mercury compiler. (I make
> this claim mostly based on my understanding of the Mercury compiler
> from the documentation, rather than from experience; I bet I can
> furnish examples if you would like to see any, though)
I believe the claim that you could get more precise determinisms
than the Mercury compiler. I don't believe that the effort will be worth it,
for a simple reason: the current Mercury determinism analysis algorithm,
simple as it is, is *already* smart enough in practice. It is very rare for me
to get an error message from the compiler about a determinism error
where that error is caused by the compiler not being smart enough to see
the actual determinism of the code. In virtually all cases, when the error
message says that the inferred determinism of this predicate is e.g. semidet,
then that will be the actual determinism of the code, because of a bug
in the actual code, such as a missing case in a switch. Very rarely, the cause
is a bug in the mode declaration; I forgot to tell the compiler that e.g.
this predicate expects its caller to specify a value for an argument
that makes unreachable the "missing" case in that switch. Having the cause of the
error message be a limitation in determinism analysis is even rarer,
and the only such limitation that I bump into more than about once a year
is that switch detection does not look into disjunctions deep enough.
That is a problem so rare you probably don't even know what I am talking about :-(
> I'd love to hear what people think of this approach. I haven't yet
> tested it at large scale with huge predicates, so my biggest concern is
> that it will be really, really slow.
I can tell you right away that I don't believe anyone can make your approach
fast enough for people to accept, simply because the threshold of how long
people are willing to wait for compilation is so low nowadays. Look at the
Rust community, which is struggling to make the speed of the Rust compiler acceptable,
despite the fact that they are using algorithms with much lower complexities
than what you are proposing.
Sorry to be the bearer of bad news, but you did ask ...
Zoltan.
More information about the users
mailing list