[m-users.] Determinism issue with parser
Zoltan Somogyi
zoltan.somogyi at runbox.com
Mon Jul 5 03:09:42 AEST 2021
2021-07-05 02:31 GMT+10:00 "Sean Charles (emacstheviking)" <objitsu at gmail.com>:
> Part of my confusion is understanding what it means to have more than one solution if the inputs are the same?!?!
I am not sure what you are trying to say with "if the inputs are the same".
Determinism is about the number of solutions you can get *for a given set of input values*.
> How can that be? For example, in this predicate where I check for a body form:
>
> :- pred parse_body_term(snode::out, astcon::in, astcon::out) is multi.
>
> parse_body_term(Term, !A) :-
> (
> parse_atom(Term, !A)
> ;
> parse_list(Term, !A)
> ;
> parse_map(Term, !A)
> ;
> parse_feltcall(Term, !A)
> ;
> parse_fncall(Term, !A)
> ;
> parse_sexp(Term, !A)
> ).
>
> I thought that this was semidet because either one of those disjuncts succeed or they all fail,
You may know that, but the compiler does not. As I mentioned in my previous email,
the compiler does not even *try* to figure out whether the success of one of these calls
rules out the success of the other calls. Since I presume none of the called predicates
are declared to have determinism "failure" or "erroneous", it simply assumes that
they can all succeed. As far as the compiler is concerned, any given call to parse_body_term
can generate six solutions, one from each disjunct, even if each call generates a maximum of one solution.
> I don’t understand why it says multi, multi is ‘can’t fail, >1 solutions’ but if I consume tokens
> in the parse_map() disjunct branch, how can it then come back for more?
There are two answers to that.
First, a call to parse_x, whatever x is, may consume tokens in the sense of stepping over them,
but this has effect only during forward execution. Upon backtracking, the value of !.A that the
next disjunct will start with *will* be the same as its original value on entry to the disjunction,
and thus will contain the "consumed" tokens as well as all the following tokens.
Second, determinism analysis does not know anything about tokens, consumed or not,
and would not care if it did. The predicates called in the later disjuncts have declared modes
that say they can succeed, so *regardless of anything else*, determinism analysis assumes
they can succeed. That's it. Determinism analysis of disjunctions is *that* simple.
> I just don’t get it, that’s something that really confuses me at the moment. Hence my question ‘what does it mean to have more than one solution?’.
Section 6.2 gives the *complete* algorithm.
Your problem is that you attributing to determinism analysis
powers and reasoning skills it simply does not have.
In the case of the above disjunction, if there is a later disjunct than e.g. the one that calls parse_atom,
then the compiler will assume it *can* succeed and generate a solution. That's it; there is
nothing more. Determinism analysis does not know, nor does it care, whether *you* know,
or at least believe, that no later disjunct would generate a solution.
> det: for example ‘*’, given x*y, the result is deterministic.
> semidet: for example, given x/y, if y is nonzero the result is deterministic, if y is zero presumably it fails. pedagogical usage here that’s all.
I have no idea what you are trying to say here. In any case, integer division is det, though it throws
an exception if the divisor is zero.
> for me the confusion starts with `multi`, to have more than one solution surely it has to be called more than once with different inputs
No, that is wrong. Determinism is about how many solutions you can get for *one* call.
Given a predicate p(in, out) whose definition is
p(1, 11).
p(1, 12).
the one call p(1, X) will generate two solutions, X = 11 and X = 12.
> or is stepping through a set of facts?
Determinism analysis does not know, or care, whether a solution comes from a fact or not.
> I think if I had managed to get mdb to work
You can't run mdb until your program has been compiled, and it won't compile until
you fix all the determinism errors.
I see two ways to fix your code above. The easier but worse approach would to write
something like
parse_body_term(Term, !A) :-
( if parse_atom(TermPrime, !A) then
Term = TermPrime
else if parse_list(TermPrime, !A) then
Term = TermPrime
else if parse_map(Term, !A) then
...
else
parse_sexp(Term, !A)
).
This will tell the compiler that if you can find e.g. a list in the input,
then you want to keep the list, and not look for any more solutions.
The harder but better solution is available only if the sets of tokens
that atoms, lists, maps etc can start with is disjoint. If that is the case,
then I would write parse_body_term as (a) get the first token, then
(b) switch on that token.
The second approach is always available in LL(1) grammars, but
of course not all grammars are LL(1).
Zoltan.
More information about the users
mailing list