[m-users.] A more precise determinism checker
Richard O'Keefe
raoknz at gmail.com
Sun Mar 20 17:30:58 AEDT 2022
If I may take the liberty, I'd like to paraphrase part of what
I understand Zoltan to be saying:
- as a rule, you get a particular determinism in Mercury
because that is the determinism you *intend* to get,
- we programmers need to understand the connection between
what determinism we intend to get and how to get it,
- if the compiler's rules are too complex, *we* won't
understand them, and will be left poking hopelessly at
the code hoping the error message will go away.
Don't let any of this discourage you.
Try to prove Zoltan wrong.
I will say that there are some application areas where high
levels of compile-time checking are worth it. I am talking
to some people about a multi-year nationally distributed embedded
system, and I am trying to persuade them that the costs of
bugs in *this* environment make SPARK+Ada the right choice at
the edge, but it's hardly a typical application domain.
On Sun, 20 Mar 2022 at 06:08, Zoltan Somogyi <zoltan.somogyi at runbox.com>
wrote:
>
> 2022-03-20 03:12 GMT+11:00 "Philip White" <philip at pswhite.org>:
> > I've been working in my spare time on a compiler for a relational
> > programming language heavily inspired by mercury - it has strong and
> > statically checked types, modes, and determinism.
>
> That is nice to hear.
>
> > The determinism checking approach of this language is totally different
> > than Mercury's approach, and I believe it yields much more precise
> > results, albeit via a more expensive algorithm. I'm curious if the
> > compiler writers ever considered something like this. If not, would you
> > every consider using it in the current mercury compiler?
>
> No, and no.
>
> We the implementors of the Mercury compiler consider that compilation
> speed is very important in making a language practical. If, after changing
> the code of a module, the process of rebuilding the executable takes long
> enough
> for you to lose your focus, your productivity will suffer, and we don't
> want that.
>
> The Mercury determinism analysis algorithm was purposely designed to be
> simple.
> Its simplicity does not only make it fast, it also makes it *easily
> understandable*.
> From time to time, we get proposals to extend it, e.g. by teaching it that
> if the
> type of X contains only the function symbols f, g and h, then in the else
> case
> of an if-then-else of the form "( if X = f then ... else ...)", X can only
> be g or h,
> so a switch on X with only those two arms should be considered "det".
> We have always turned such proposals down, not because of the impact
> on compiler speed (which would probably be small), but because it would
> make
> the rules of determinism analysis both harder to learn and harder to
> predict
> even once learned.
>
> > Predicates written in this language go through a mode-resolution phase
> > in which clauses in a conjunction are re-ordered so that variables are
> > properly instantiated for each predicate that needs to be invoked; this
> > phase also selects the mode to use at each invocation. It's after this
> > phase that I run the determinism checker. I imagine the Mercury
> > compiler does something pretty similar.
>
> Yes, it does. You can see it yourself by invoking mmc with the -v (verbose)
> option, which tells the compiler to print a line for each phase of
> compilation.
> You will see that type and mode analysis are followed by two passes
> that gather information for determinism analysis, and then determinism
> analysis itself.
>
> > Naively, we "execute" the predicate with each combination of
> > constructors that each variable can have. This guarantees that we will
> > hit every possible code-path of the predicate. The result of each
> > execution is a determinism. For example, if the predicate finishes
> > twice, then the result of that execution is "multi". The final
> > determinism of the predicate is the join of all the execution results
> > (i.e. the join operation of the determinism lattice). For example, if
> > one path yields "multi" and another yields "failure", then the final
> > determinism is "nondet".
> >
> > This naive approach would be prohibitively expensive for predicates
> > with lots variables, especially if those variables have types with many
> > constructors.
>
> And some types, such as integers and floats, have billions of constructors,
> and strings even more.
>
> > We can improve it by initially treating every variable as
> > abstract; every time the predicate attempts to pattern match a
> > variable, we restart execution once for each of that variable's
> > constructors. The algorithm is roughly the same, but we only end up
> > running variable combinations on-demand, so we save a lot of work
> > for variables that are merely moved around and never inspected.
>
> This approach multiplies the amount of work you have to do by *many*
> orders of magnitude. Using techniques from areas such as model checking
> or concolic testing, you may be able to get most of those orders of
> magnitude
> back, but you would have to do a *lot* of work to achieve even 1% of the
> speed of the Mercury determinism analysis algorithm.
>
> > There are a few details that I've glossed over, but this is the general
> > idea. It's definitely a more expensive algorithm than something that is
> > more syntax directed, but I think it will notice switch statements more
> > often than Mercury does, and it will also determine that a set of
> > variables is exhaustive more often than the Mercury compiler. (I make
> > this claim mostly based on my understanding of the Mercury compiler
> > from the documentation, rather than from experience; I bet I can
> > furnish examples if you would like to see any, though)
>
> I believe the claim that you could get more precise determinisms
> than the Mercury compiler. I don't believe that the effort will be worth
> it,
> for a simple reason: the current Mercury determinism analysis algorithm,
> simple as it is, is *already* smart enough in practice. It is very rare
> for me
> to get an error message from the compiler about a determinism error
> where that error is caused by the compiler not being smart enough to see
> the actual determinism of the code. In virtually all cases, when the error
> message says that the inferred determinism of this predicate is e.g.
> semidet,
> then that will be the actual determinism of the code, because of a bug
> in the actual code, such as a missing case in a switch. Very rarely, the
> cause
> is a bug in the mode declaration; I forgot to tell the compiler that e.g.
> this predicate expects its caller to specify a value for an argument
> that makes unreachable the "missing" case in that switch. Having the cause
> of the
> error message be a limitation in determinism analysis is even rarer,
> and the only such limitation that I bump into more than about once a year
> is that switch detection does not look into disjunctions deep enough.
> That is a problem so rare you probably don't even know what I am talking
> about :-(
>
> > I'd love to hear what people think of this approach. I haven't yet
> > tested it at large scale with huge predicates, so my biggest concern is
> > that it will be really, really slow.
>
> I can tell you right away that I don't believe anyone can make your
> approach
> fast enough for people to accept, simply because the threshold of how long
> people are willing to wait for compilation is so low nowadays. Look at the
> Rust community, which is struggling to make the speed of the Rust compiler
> acceptable,
> despite the fact that they are using algorithms with much lower
> complexities
> than what you are proposing.
>
> Sorry to be the bearer of bad news, but you did ask ...
>
> Zoltan.
> _______________________________________________
> users mailing list
> users at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20220320/07fcbc16/attachment.html>
More information about the users
mailing list