<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">If I may take the liberty, I'd like to paraphrase part of what</div><div class="gmail_default" style="font-family:monospace,monospace">I understand Zoltan to be saying:</div><div class="gmail_default" style="font-family:monospace,monospace"> - as a rule, you get a particular determinism in Mercury</div><div class="gmail_default" style="font-family:monospace,monospace"> because that is the determinism you *intend* to get,</div><div class="gmail_default" style="font-family:monospace,monospace"> - we programmers need to understand the connection between</div><div class="gmail_default" style="font-family:monospace,monospace"> what determinism we intend to get and how to get it,</div><div class="gmail_default" style="font-family:monospace,monospace"> - if the compiler's rules are too complex, *we* won't</div><div class="gmail_default" style="font-family:monospace,monospace"> understand them, and will be left poking hopelessly at</div><div class="gmail_default" style="font-family:monospace,monospace"> the code hoping the error message will go away.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">Don't let any of this discourage you.<br></div><div class="gmail_default" style="font-family:monospace,monospace">Try to prove Zoltan wrong.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">I will say that there are some application areas where high</div><div class="gmail_default" style="font-family:monospace,monospace">levels of compile-time checking are worth it. I am talking</div><div class="gmail_default" style="font-family:monospace,monospace">to some people about a multi-year nationally distributed embedded</div><div class="gmail_default" style="font-family:monospace,monospace">system, and I am trying to persuade them that the costs of</div><div style="font-family:monospace,monospace" class="gmail_default">bugs in *this* environment make SPARK+Ada the right choice at</div><div style="font-family:monospace,monospace" class="gmail_default">the edge, but it's hardly a typical application domain.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 20 Mar 2022 at 06:08, Zoltan Somogyi <<a href="mailto:zoltan.somogyi@runbox.com">zoltan.somogyi@runbox.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> 2022-03-20 03:12 GMT+11:00 "Philip White" <<a href="mailto:philip@pswhite.org" target="_blank">philip@pswhite.org</a>>:<br> > I've been working in my spare time on a compiler for a relational<br> > programming language heavily inspired by mercury - it has strong and<br> > statically checked types, modes, and determinism.<br> <br> That is nice to hear.<br> <br> > The determinism checking approach of this language is totally different<br> > than Mercury's approach, and I believe it yields much more precise<br> > results, albeit via a more expensive algorithm. I'm curious if the<br> > compiler writers ever considered something like this. If not, would you<br> > every consider using it in the current mercury compiler?<br> <br> No, and no.<br> <br> We the implementors of the Mercury compiler consider that compilation<br> speed is very important in making a language practical. If, after changing<br> the code of a module, the process of rebuilding the executable takes long enough<br> for you to lose your focus, your productivity will suffer, and we don't want that.<br> <br> The Mercury determinism analysis algorithm was purposely designed to be simple.<br> Its simplicity does not only make it fast, it also makes it *easily understandable*.<br> >From time to time, we get proposals to extend it, e.g. by teaching it that if the<br> type of X contains only the function symbols f, g and h, then in the else case<br> of an if-then-else of the form "( if X = f then ... else ...)", X can only be g or h,<br> so a switch on X with only those two arms should be considered "det".<br> We have always turned such proposals down, not because of the impact<br> on compiler speed (which would probably be small), but because it would make<br> the rules of determinism analysis both harder to learn and harder to predict<br> even once learned.<br> <br> > Predicates written in this language go through a mode-resolution phase<br> > in which clauses in a conjunction are re-ordered so that variables are<br> > properly instantiated for each predicate that needs to be invoked; this<br> > phase also selects the mode to use at each invocation. It's after this<br> > phase that I run the determinism checker. I imagine the Mercury<br> > compiler does something pretty similar.<br> <br> Yes, it does. You can see it yourself by invoking mmc with the -v (verbose)<br> option, which tells the compiler to print a line for each phase of compilation.<br> You will see that type and mode analysis are followed by two passes<br> that gather information for determinism analysis, and then determinism<br> analysis itself.<br> <br> > Naively, we "execute" the predicate with each combination of<br> > constructors that each variable can have. This guarantees that we will<br> > hit every possible code-path of the predicate. The result of each<br> > execution is a determinism. For example, if the predicate finishes<br> > twice, then the result of that execution is "multi". The final<br> > determinism of the predicate is the join of all the execution results<br> > (i.e. the join operation of the determinism lattice). For example, if<br> > one path yields "multi" and another yields "failure", then the final<br> > determinism is "nondet".<br> > <br> > This naive approach would be prohibitively expensive for predicates<br> > with lots variables, especially if those variables have types with many<br> > constructors.<br> <br> And some types, such as integers and floats, have billions of constructors,<br> and strings even more.<br> <br> > We can improve it by initially treating every variable as<br> > abstract; every time the predicate attempts to pattern match a<br> > variable, we restart execution once for each of that variable's<br> > constructors. The algorithm is roughly the same, but we only end up<br> > running variable combinations on-demand, so we save a lot of work<br> > for variables that are merely moved around and never inspected.<br> <br> This approach multiplies the amount of work you have to do by *many*<br> orders of magnitude. Using techniques from areas such as model checking<br> or concolic testing, you may be able to get most of those orders of magnitude<br> back, but you would have to do a *lot* of work to achieve even 1% of the<br> speed of the Mercury determinism analysis algorithm.<br> <br> > There are a few details that I've glossed over, but this is the general<br> > idea. It's definitely a more expensive algorithm than something that is<br> > more syntax directed, but I think it will notice switch statements more<br> > often than Mercury does, and it will also determine that a set of<br> > variables is exhaustive more often than the Mercury compiler. (I make<br> > this claim mostly based on my understanding of the Mercury compiler<br> > from the documentation, rather than from experience; I bet I can<br> > furnish examples if you would like to see any, though)<br> <br> I believe the claim that you could get more precise determinisms<br> than the Mercury compiler. I don't believe that the effort will be worth it,<br> for a simple reason: the current Mercury determinism analysis algorithm,<br> simple as it is, is *already* smart enough in practice. It is very rare for me<br> to get an error message from the compiler about a determinism error<br> where that error is caused by the compiler not being smart enough to see<br> the actual determinism of the code. In virtually all cases, when the error<br> message says that the inferred determinism of this predicate is e.g. semidet,<br> then that will be the actual determinism of the code, because of a bug<br> in the actual code, such as a missing case in a switch. Very rarely, the cause<br> is a bug in the mode declaration; I forgot to tell the compiler that e.g.<br> this predicate expects its caller to specify a value for an argument<br> that makes unreachable the "missing" case in that switch. Having the cause of the<br> error message be a limitation in determinism analysis is even rarer,<br> and the only such limitation that I bump into more than about once a year<br> is that switch detection does not look into disjunctions deep enough.<br> That is a problem so rare you probably don't even know what I am talking about :-(<br> <br> > I'd love to hear what people think of this approach. I haven't yet<br> > tested it at large scale with huge predicates, so my biggest concern is<br> > that it will be really, really slow.<br> <br> I can tell you right away that I don't believe anyone can make your approach<br> fast enough for people to accept, simply because the threshold of how long<br> people are willing to wait for compilation is so low nowadays. Look at the<br> Rust community, which is struggling to make the speed of the Rust compiler acceptable,<br> despite the fact that they are using algorithms with much lower complexities<br> than what you are proposing.<br> <br> Sorry to be the bearer of bad news, but you did ask ...<br> <br> Zoltan.<br> _______________________________________________<br> users mailing list<br> <a href="mailto:users@lists.mercurylang.org" target="_blank">users@lists.mercurylang.org</a><br> <a href="https://lists.mercurylang.org/listinfo/users" rel="noreferrer" target="_blank">https://lists.mercurylang.org/listinfo/users</a><br> </blockquote></div>