<div dir="auto">Hi,<div dir="auto">The latest conversation between emacstheviking and Zoltan about how the compiler handles determinism in disjuncts is quite illuminating and useful to learners, I think. I went through all this about a year ago when I began learning Mercury more seriously and I think all learners do or will experience such situations.</div><div dir="auto">I think it would be a Good Idea to add a separate practical example to J. Fondren's excellent crash course to drive the point home.</div><div dir="auto">Perhaps it would also help to strongly underline that the way Mercury works in disjuncts is a design consequence of the elimination of prolog cuts, and not just a form of compiler blindness. I now that prolog is not the core interest of the Mercury dev team but practically people tackling Mercury are coming from a prolog experience and the prolog to Mercury transition guide is a bit short on practical issues in this respect. </div><div dir="auto">Having read R. O'Keefe's The Craft of Prolog in the past does help (Mercury's design is a direct response to prolog shortcomings reviewed in this book), but not all learners read it and a few more Mercury examples would not harm.</div><div dir="auto">Fabrice</div><div dir="auto"> </div></div> <div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le dim. 4 juil. 2021 à 7:34 PM, <<a href="mailto:users-request@lists.mercurylang.org">users-request@lists.mercurylang.org</a>> a écrit : </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send users mailing list submissions to <a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a> To subscribe or unsubscribe via the World Wide Web, visit <a href="https://lists.mercurylang.org/listinfo/users" rel="noreferrer noreferrer" target="_blank">https://lists.mercurylang.org/listinfo/users</a> or, via email, send a message with subject or body 'help' to <a href="mailto:users-request@lists.mercurylang.org" target="_blank" rel="noreferrer">users-request@lists.mercurylang.org</a> You can reach the person managing the list at <a href="mailto:users-owner@lists.mercurylang.org" target="_blank" rel="noreferrer">users-owner@lists.mercurylang.org</a> When replying, please edit your Subject line so it is more specific than "Re: Contents of users digest..." Today's Topics: 1. Re: Determinism issue with parser (Sean Charles (emacstheviking)) 2. Re: Determinism issue with parser (Zoltan Somogyi) 3. Re: Determinism issue with parser (Volker Wysk) 4. Re: Determinism issue with parser (Zoltan Somogyi) ---------------------------------------------------------------------- Message: 1 Date: Sun, 4 Jul 2021 17:31:59 +0100 From: "Sean Charles (emacstheviking)" <<a href="mailto:objitsu@gmail.com" target="_blank" rel="noreferrer">objitsu@gmail.com</a>> To: Zoltan Somogyi <<a href="mailto:zoltan.somogyi@runbox.com" target="_blank" rel="noreferrer">zoltan.somogyi@runbox.com</a>> Cc: users <<a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a>> Subject: Re: [m-users.] Determinism issue with parser Message-ID: <<a href="mailto:F5C1D814-E1A5-46C5-BE48-E938C47B7C7E@gmail.com" target="_blank" rel="noreferrer">F5C1D814-E1A5-46C5-BE48-E938C47B7C7E@gmail.com</a>> Content-Type: text/plain; charset="utf-8" Zoltan, your replies have always helped and continue to do so. If by "one" you mean one determinism declaration, they why do you "change one and recompile"? The answer; simple frustration at being fully cognisant of my own ignorance, that’s why. I —understand— the reasons why it is telling me I am wrong but then chaos seems to erupt inside my head. It’s —my— decision as to the determinism of the code I write, IIUIC that’s the point of declaring it, then the compiler tells me otherwise when it knows better (often). As for using solutions/1, it crossed my mind but for me that’s a biiig code smell and I refused to do it. Part of my confusion is understanding what it means to have more than one solution if the inputs are the same?!?! How can that be? For example, in this predicate where I check for a body form: :- pred parse_body_term(snode::out, astcon::in, astcon::out) is multi. parse_body_term(Term, !A) :- ( parse_atom(Term, !A) ; parse_list(Term, !A) ; parse_map(Term, !A) ; parse_feltcall(Term, !A) ; parse_fncall(Term, !A) ; parse_sexp(Term, !A) ). I thought that this was semidet because either one of those disjuncts succeed or they all fail, I don’t understand why it says multi, multi is ‘can’t fail, >1 solutions’ but if I consume tokens in the parse_map() disjunct branch, how can it then come back for more? I just don’t get it, that’s something that really confuses me at the moment. Hence my question ‘what does it mean to have more than one solution?’. det: for example ‘*’, given x*y, the result is deterministic. semidet: for example, given x/y, if y is nonzero the result is deterministic, if y is zero presumably it fails. pedagogical usage here that’s all. for me the confusion starts with `multi`, to have more than one solution surely it has to be called more than once with different inputs or is stepping through a set of facts? I think if I had managed to get mdb to work it might help to single step the code but I tried for a few hours and emacs version was wrong, lots of problems etc so I went back to ’thinking’ about it and trace[] all over the place. Not a good place to be at the moment but things worthwhile are never easy to attain. Thanks. > On 4 Jul 2021, at 16:56, Zoltan Somogyi <<a href="mailto:zoltan.somogyi@runbox.com" target="_blank" rel="noreferrer">zoltan.somogyi@runbox.com</a>> wrote: > > > 2021-07-05 01:21 GMT+10:00 "Volker Wysk" <<a href="mailto:post@volker-wysk.de" target="_blank" rel="noreferrer">post@volker-wysk.de</a>>: >> I've built a big parser for my project, too. It should always produce one >> solution; it can't fail. But the compiler can't infer that it is >> deterministic, for many parts of it. (That would be undecidable.) >> >> So I've resigned to this state of affairs. I just collect all the (exactly >> one) solutions with solutions/2 and produce an error message in case there >> should be more than one, which would be a bug in my parser. > > That works, but I would consider it fragile. The reason is that convincing > someone (the next maintainer of the parser, perhaps) that there will be > exactly one overall solution requires a single correctness argument over > the entire code, and humans are not very reliable as reasoning machines; > the longer the argument, the higher the probability of error. > > I would instead add disambiguation code to the places in the parser > than can introduce (as far as the compiler is concerned) the possibility > of nondeterminism. You can you can see why there is no such possibility; > I would try to make the compiler that too. In most cases, this is not too hard, > at least in my experience. > > This approach replaces a single bug, global correctness argument with lots of > small, local correctness arguments, which can be understood, documented, > and if need be fixed, much more easily. Modularity in correctness arguments, > if you will. > > Zoltan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <<a href="http://lists.mercurylang.org/archives/users/attachments/20210704/043b2328/attachment-0001.html" rel="noreferrer noreferrer" target="_blank">http://lists.mercurylang.org/archives/users/attachments/20210704/043b2328/attachment-0001.html</a>> ------------------------------ Message: 2 Date: Mon, 05 Jul 2021 03:09:42 +1000 (AEST) From: "Zoltan Somogyi" <<a href="mailto:zoltan.somogyi@runbox.com" target="_blank" rel="noreferrer">zoltan.somogyi@runbox.com</a>> To: "Sean Charles" <<a href="mailto:objitsu@gmail.com" target="_blank" rel="noreferrer">objitsu@gmail.com</a>> Cc: users <<a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a>> Subject: Re: [m-users.] Determinism issue with parser Message-ID: <E1m05ck-0002LB-Kd@rmmprod06.runbox> Content-Type: text/plain; charset="utf-8" 2021-07-05 02:31 GMT+10:00 "Sean Charles (emacstheviking)" <<a href="mailto:objitsu@gmail.com" target="_blank" rel="noreferrer">objitsu@gmail.com</a>>: > Part of my confusion is understanding what it means to have more than one solution if the inputs are the same?!?! I am not sure what you are trying to say with "if the inputs are the same". Determinism is about the number of solutions you can get *for a given set of input values*. > How can that be? For example, in this predicate where I check for a body form: > > :- pred parse_body_term(snode::out, astcon::in, astcon::out) is multi. > > parse_body_term(Term, !A) :- > ( > parse_atom(Term, !A) > ; > parse_list(Term, !A) > ; > parse_map(Term, !A) > ; > parse_feltcall(Term, !A) > ; > parse_fncall(Term, !A) > ; > parse_sexp(Term, !A) > ). > > I thought that this was semidet because either one of those disjuncts succeed or they all fail, You may know that, but the compiler does not. As I mentioned in my previous email, the compiler does not even *try* to figure out whether the success of one of these calls rules out the success of the other calls. Since I presume none of the called predicates are declared to have determinism "failure" or "erroneous", it simply assumes that they can all succeed. As far as the compiler is concerned, any given call to parse_body_term can generate six solutions, one from each disjunct, even if each call generates a maximum of one solution. > I don’t understand why it says multi, multi is ‘can’t fail, >1 solutions’ but if I consume tokens > in the parse_map() disjunct branch, how can it then come back for more? There are two answers to that. First, a call to parse_x, whatever x is, may consume tokens in the sense of stepping over them, but this has effect only during forward execution. Upon backtracking, the value of !.A that the next disjunct will start with *will* be the same as its original value on entry to the disjunction, and thus will contain the "consumed" tokens as well as all the following tokens. Second, determinism analysis does not know anything about tokens, consumed or not, and would not care if it did. The predicates called in the later disjuncts have declared modes that say they can succeed, so *regardless of anything else*, determinism analysis assumes they can succeed. That's it. Determinism analysis of disjunctions is *that* simple. > I just don’t get it, that’s something that really confuses me at the moment. Hence my question ‘what does it mean to have more than one solution?’. Section 6.2 gives the *complete* algorithm. Your problem is that you attributing to determinism analysis powers and reasoning skills it simply does not have. In the case of the above disjunction, if there is a later disjunct than e.g. the one that calls parse_atom, then the compiler will assume it *can* succeed and generate a solution. That's it; there is nothing more. Determinism analysis does not know, nor does it care, whether *you* know, or at least believe, that no later disjunct would generate a solution. > det: for example ‘*’, given x*y, the result is deterministic. > semidet: for example, given x/y, if y is nonzero the result is deterministic, if y is zero presumably it fails. pedagogical usage here that’s all. I have no idea what you are trying to say here. In any case, integer division is det, though it throws an exception if the divisor is zero. > for me the confusion starts with `multi`, to have more than one solution surely it has to be called more than once with different inputs No, that is wrong. Determinism is about how many solutions you can get for *one* call. Given a predicate p(in, out) whose definition is p(1, 11). p(1, 12). the one call p(1, X) will generate two solutions, X = 11 and X = 12. > or is stepping through a set of facts? Determinism analysis does not know, or care, whether a solution comes from a fact or not. > I think if I had managed to get mdb to work You can't run mdb until your program has been compiled, and it won't compile until you fix all the determinism errors. I see two ways to fix your code above. The easier but worse approach would to write something like parse_body_term(Term, !A) :- ( if parse_atom(TermPrime, !A) then Term = TermPrime else if parse_list(TermPrime, !A) then Term = TermPrime else if parse_map(Term, !A) then ... else parse_sexp(Term, !A) ). This will tell the compiler that if you can find e.g. a list in the input, then you want to keep the list, and not look for any more solutions. The harder but better solution is available only if the sets of tokens that atoms, lists, maps etc can start with is disjoint. If that is the case, then I would write parse_body_term as (a) get the first token, then (b) switch on that token. The second approach is always available in LL(1) grammars, but of course not all grammars are LL(1). Zoltan. ------------------------------ Message: 3 Date: Sun, 04 Jul 2021 19:21:11 +0200 From: Volker Wysk <<a href="mailto:post@volker-wysk.de" target="_blank" rel="noreferrer">post@volker-wysk.de</a>> To: <a href="mailto:zoltan.somogyi@runbox.com" target="_blank" rel="noreferrer">zoltan.somogyi@runbox.com</a>, Sean Charles <<a href="mailto:objitsu@gmail.com" target="_blank" rel="noreferrer">objitsu@gmail.com</a>>, users <<a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a>> Subject: Re: [m-users.] Determinism issue with parser Message-ID: <<a href="mailto:ecfa13776f3097af6bfcb6e85d2a4db79a5a1488.camel@volker-wysk.de" target="_blank" rel="noreferrer">ecfa13776f3097af6bfcb6e85d2a4db79a5a1488.camel@volker-wysk.de</a>> Content-Type: text/plain; charset="utf-8" Am Montag, den 05.07.2021, 01:56 +1000 schrieb Zoltan Somogyi: > 2021-07-05 01:21 GMT+10:00 "Volker Wysk" <<a href="mailto:post@volker-wysk.de" target="_blank" rel="noreferrer">post@volker-wysk.de</a>>: > > I've built a big parser for my project, too. It should always produce one > > solution; it can't fail. But the compiler can't infer that it is > > deterministic, for many parts of it. (That would be undecidable.) > > > > So I've resigned to this state of affairs. I just collect all the (exactly > > one) solutions with solutions/2 and produce an error message in case there > > should be more than one, which would be a bug in my parser. > > That works, but I would consider it fragile. The reason is that convincing > someone (the next maintainer of the parser, perhaps) that there will be > exactly one overall solution requires a single correctness argument over > the entire code, and humans are not very reliable as reasoning machines; > the longer the argument, the higher the probability of error. The parser has fallbacks built in. A paragraph that isn't something else, is a text paragraph. An inline (something which occurs inside a paragraph) that isn't something special, is a simple word. A bug, which would cause a special case not to be recognized, causes the respective text to end up in the fallback data. The special cases are all in if-then-else-if-then-else etc. constructs. So there can be only be one matching. It's a parser for the markup language of a wiki engine. I think this is robust because of the language being parsed is like that. But I'd love to have the compiler understand the fact that it's deterministic. :-) > I would instead add disambiguation code to the places in the parser > than can introduce (as far as the compiler is concerned) the possibility > of nondeterminism. You can you can see why there is no such possibility; > I would try to make the compiler that too. In most cases, this is not too hard, > at least in my experience. Hmm... I use my meta-predicate "repeatmax0" (and some variations of it) a lot. But see further down first, you don't really need all the details: % Something repeated zero to several times. Return the collected return % values as a list. :- pred repeat0(pred(T, list(char), list(char)), list(T), list(char), list(char)). :- mode repeat0(pred(out, in, out) is det, out, in, out) is multi. :- mode repeat0(pred(out, in, out) is semidet, out, in, out) is multi. :- mode repeat0(pred(out, in, out) is multi, out, in, out) is multi. :- mode repeat0(pred(out, in, out) is nondet, out, in, out) is multi. % The longest sequence of repetitions of the specified thing, which returns % something. Its return values are returned as a list. Succeeds for zero % repetitions (with an empty list). :- pred repeatmax0(pred(T, list(char), list(char)), list(T), list(char), list(char)). :- mode repeatmax0(pred(out, in, out) is det, out, in, out) is nondet. :- mode repeatmax0(pred(out, in, out) is semidet, out, in, out) is nondet. :- mode repeatmax0(pred(out, in, out) is multi, out, in, out) is nondet. :- mode repeatmax0(pred(out, in, out) is nondet, out, in, out) is nondet. The compiler told me that repeatmax0 is nondet. The implementation is simple: repeat0(_, []) --> []. repeat0(Pars, Values) --> { Values = [Val|Values1] }, Pars(Val), repeat0(Pars, Values1). repeatmax0(Pars, Values) --> repeat0(Pars, Values), % XXX ( if not Pars(_) then [] else end ). And there's the parser for the end of the input: :- pred end(list(char)::in, list(char)::out) is semidet. end([], []). The compiler told me, that the call to repeat0 (marked with "% XXX" above) can succeed more than once, which is correct. Taking your hint about disambiguation code, I was able to reformulate the repeatmax0 predicate such that, repeatmax0 becomes deterministic for a deterministic input parser: :- mode repeatmax0(pred(out, in, out) is det, out, in, out) is det. repeatmax0(Pars, Values) --> ( if Pars(Val) then { Values = [Val|Vals] }, repeatmax0(Pars, Vals) else { Values = [] } ). I'm still not sure what you mean by "disambiguation code", though. > This approach replaces a single bug, global correctness argument with lots of > small, local correctness arguments, which can be understood, documented, > and if need be fixed, much more easily. Modularity in correctness arguments, > if you will. Thanks! Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: <<a href="http://lists.mercurylang.org/archives/users/attachments/20210704/672d4de7/attachment-0001.sig" rel="noreferrer noreferrer" target="_blank">http://lists.mercurylang.org/archives/users/attachments/20210704/672d4de7/attachment-0001.sig</a>> ------------------------------ Message: 4 Date: Mon, 05 Jul 2021 03:34:16 +1000 (AEST) From: "Zoltan Somogyi" <<a href="mailto:zoltan.somogyi@runbox.com" target="_blank" rel="noreferrer">zoltan.somogyi@runbox.com</a>> To: "Volker Wysk" <<a href="mailto:post@volker-wysk.de" target="_blank" rel="noreferrer">post@volker-wysk.de</a>>, "Sean Charles" <<a href="mailto:objitsu@gmail.com" target="_blank" rel="noreferrer">objitsu@gmail.com</a>>, "users" <<a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a>> Subject: Re: [m-users.] Determinism issue with parser Message-ID: <E1m060W-0004Bs-VW@rmmprod06.runbox> Content-Type: text/plain; charset="utf-8" 2021-07-05 03:21 GMT+10:00 "Volker Wysk" <<a href="mailto:post@volker-wysk.de" target="_blank" rel="noreferrer">post@volker-wysk.de</a>>: > The compiler told me that repeatmax0 is nondet. The implementation is > simple: > > > repeat0(_, []) --> []. > > repeat0(Pars, Values) --> > { Values = [Val|Values1] }, > Pars(Val), > repeat0(Pars, Values1). > > repeatmax0(Pars, Values) --> > repeat0(Pars, Values), % XXX > ( if not Pars(_) > then [] > else end > ). This is nondet because the compiler can't see any reason why the second and third clauses could not both generate a solution. > Taking your hint about disambiguation code, I was able to reformulate the > repeatmax0 predicate such that, repeatmax0 becomes deterministic for a > deterministic input parser: > > > :- mode repeatmax0(pred(out, in, out) is det, out, in, out) is det. > > repeatmax0(Pars, Values) --> > ( if Pars(Val) > then { Values = [Val|Vals] }, > repeatmax0(Pars, Vals) > else { Values = [] } > ). > > > I'm still not sure what you mean by "disambiguation code", though. I meant any code that tells the compiler that there will be only one solution even though its old version would be able to generate more than one solution as far as the compiler is concerned. Your code immediately above does this: by replacing the original second and third clauses (an implicit disjunction) with an if-then-else, you told the compiler that there will be a solution from either then then-part or the else-part, BUT NOT BOTH. So it seems you understood my hint sufficiently well, even if you did not think so :-) Zoltan. ------------------------------ Subject: Digest Footer _______________________________________________ users mailing list <a href="mailto:users@lists.mercurylang.org" target="_blank" rel="noreferrer">users@lists.mercurylang.org</a> <a href="https://lists.mercurylang.org/listinfo/users" rel="noreferrer noreferrer" target="_blank">https://lists.mercurylang.org/listinfo/users</a> ------------------------------ End of users Digest, Vol 83, Issue 2 ************************************ </blockquote></div>