[m-users.] Determinism issue with parser

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Jul 5 01:44:31 AEST 2021


2021-07-04 23:17 GMT+10:00 "Sean Charles (emacstheviking)" <objitsu at gmail.com>:
> I am having terrible push—pull whack-a-mole issues with my parser and it is literally driving me to despair, I have read 6, 6.1 6,2 over and over until the words have become one long unintelligible mantra of confusion.
> 
> I thought —I— decided the determinism then wrote the code and the compiler moans at me to tell me I got it wrong.

You decide the *intended* determinism. The compiler tells you the *actual* determinism,
if it does not match the declared determinism.

> I can accept that. But I just don’t ‘get it’ yet. Case in point, of the of many errors I now have… …and
> once again I failed to represent my problem so nobody can help me. Perhaps I can ask some more
> general questions and then apply the answers to my brain and then to my code? Except I am so confused I can’t frame a question.
> 
> ast.m:040: In `on_lexer'(in, in, out):
> ast.m:040:   error: determinism declaration not satisfied.
> ast.m:040:   Declared `det', inferred `multi'.
> ast.m:115:   Call to `ast.parse_all'(in, out, in, out) can succeed more than
> ast.m:115:   once.
> ast.m:254: In `parse_body_term'(out, in, out):
> ast.m:254:   warning: determinism declaration could be tighter.
> ast.m:254:   Declared `nondet', inferred `multi'.
> ast.m:275: In `parse_fncall'(out, in, out):
> ast.m:275:   error: determinism declaration not satisfied.
> ast.m:275:   Declared `semidet', inferred `nondet'.
> ast.m:283:   Call to `ast.parse_body'(in, out, in, out) can succeed more than
> ast.m:283:   once.
> ast.m:431: In `sexp_or_error'(out, in, in, out):
> ast.m:431:   error: determinism declaration not satisfied.
> 
> So I change one and then recompile and it pops up somewhere else until eventually, the ‘det’ entry point into the module wants to be multi… sigh.

If by "one" you mean one determinism declaration, they why do you
"change one and recompile"?

Determinism is design level information. It says how many solutions you *want*
the predicate to be able to have. If you intend a predicate in a given mode to be det,
but the compiler tells you it can have more than solution, the answer is not to change
the predicate from det to multi, but to fix the *reason* why it can have more than
one solution.

If the top level predicate is declared to be det, but somewhere in its call tree
there is some code that can have more than one solution, then changing
determinism declarations changes only where the compiler detects and reports
the mismatch between the declared and inferred determinism. It is like squeezing
toothpaste in a tube. If the problem is "there should not be toothpaste (nondeterminism)
in this tube, but there is", then moving it around does not help; only squeezing it
completely out of the tube helps.

Look at the error message for line 40. It tells you that on_lexer, which you declared
to be det, is not det. The second part of that same error message tells you
*the reason* why it is not det: the fact that the call to parse_all on line 115
can succeed more than once. (You can tell that this a continuation of the
previous message, and not a new message, by the fact that it continues the
same indentation level.) The fix is either to change parse_all to be det,
or call parse_all in solutions(), getting all its solutions, and then choose
one somehow. Overall, the first is far more likely to be the right fix.

In general, when mmc tells you that a determinism declaration is not satisfied,
it will then give you the place and the nature of all the violations.

> How does the compiler know something is ‘multi’. I guess I don’t know how to interpret the manual properly.

Implicit in section 6.2 is the fact that two main things make something multi.

- disjunctions that are not switches, and
- calls to predicates in modes that are themselves multi.

In the case of on_lexer, the compiler tells you it is because
of the latter, on line 115.
 
> If all possible calls to a particular mode of a predicate or function which return to the caller (calls which terminate, do not throw an exception and do not cause a fatal runtime error)
> 
> 	• have exactly one solution, then that mode is deterministic (det);
> 	• either have no solutions or have one solution, then that mode is semideterministic
> (semidet);
> 	• have at least one solution but may have more, then that mode is multi-solution(multi);
> 	• have zero or more solutions, then that mode is nondeterministic (nondet);
> 	• fail without producing a solution, then that mode has a determinism of failure.

What you just wrote would be the case in an ideal world, where the halting problem
was solvable. In the world we live in, the halting problem is not solvable, and since
figuring out whether a predicate has exactly one solution requires solving the
halting problem, neither is determinism analysis is perfect. To see why, consider

  ( if turing_machine_halts(TM, In, Out) then
        Result = Out
  else
       ( Result = 42 ; Result = 43 )
  )

Since there can exist no perfect determinism inference algorithm, no matter how complex,
we designed our determinism inference algorithm to be as simple as possible while
still performing well on real code. We chose simplicity because a complex algorithm
is inherently non-predicable to humans. Hence the rules in 6.2.

> I parse a file into a stream of tokens, there can be only one correct way to parse those into a well formed program.

That statement is true for some grammars, but not others: ambiguous grammars
can have two or more parses for the same input string. In the absence of some code
to restrict the number of parses, and thus the number of ASTs, to one, the parser
for such a grammar will be multi, not det. It is possible that this is what happened here.

A grammar can be ambiguous for many reasons. For example, if-then-elses where
the else part can be omitted are inherently ambiguous. The standard example
of this dangling-else problem is

  if c1 then if c2 then t2 else ex

This can be parsed as either

  ( if c1 then (if c2 then t2 else ex))

or as

  (if c1 then (if c2 then t2) else ex)

The standard solution is to write the parser to *explicitly* choose one parse,
and thus one AST, over all possible others.

> One. So the ast builder either fails or it doesnt, that is trapped in the code and a maybe_error is handed back, hence I figure the predicate, as seen by the caller (the REPL atm) should be `det`.

That sounds correct.

> What does it ‘mean’ to be able to have more than one solution?

It means that your implementation has a bug, in that it does not match your design.
As mentioned above, the error message tells you where and what the proximate bug is.

> I am not really writing Prolog like code i.e. swathes of facts and rules being interrogated, 

A disjunction with two arms that can both succeed introduces nondeterminism,
even though it does not qualify as "swaths of facts".

In your place, I would

- look at every predicate in the parser, and give it what determinism
  I think it should have (not what determinism the compiler has inferred for it),

- and then fix the resulting errors *without* changing any more determinism
  declarations.

> If I had some good clean examples I think I’d be fine,

Unfortunately, I don't think we have any. Most of our sample programs
do not contain parsers. The ones that do, the parsers are either too simple
(and thus not useful as models), too complex (too hard to understand
to be models), or they parse something other than a list of lexemes.
The closest thing I can think of is parser.m in the Mercury standard
library, but I do warn you, it is significantly more complex than
what I would expect the parser for an average DSL to be.

I hope that helps.

Zoltan.


More information about the users mailing list