[m-users.] Determinism issue with parser

Volker Wysk post at volker-wysk.de
Mon Jul 5 03:21:11 AEST 2021


Am Montag, den 05.07.2021, 01:56 +1000 schrieb Zoltan Somogyi:
> 2021-07-05 01:21 GMT+10:00 "Volker Wysk" <post at volker-wysk.de>:
> > I've built a big parser for my project, too. It should always produce one
> > solution; it can't fail. But the compiler can't infer that it is
> > deterministic, for many parts of it. (That would be undecidable.) 
> > 
> > So I've resigned to this state of affairs. I just collect all the (exactly
> > one) solutions with solutions/2 and produce an error message in case there
> > should be more than one, which would be a bug in my parser.
> 
> That works, but I would consider it fragile. The reason is that convincing
> someone (the next maintainer of the parser, perhaps) that there will be
> exactly one overall solution requires a single correctness argument over
> the entire code, and humans are not very reliable as reasoning machines;
> the longer the argument, the higher the probability of error.

The parser has fallbacks built in. A paragraph that isn't something else, is
a text paragraph. An inline (something which occurs inside a paragraph) that
isn't something special, is a simple word. A bug, which would cause a
special case not to be recognized, causes the respective text to end up in
the fallback data.

The special cases are all in if-then-else-if-then-else etc. constructs. So
there can be only be one matching. It's a parser for the markup language of
a wiki engine. I think this is robust because of the language being parsed
is like that. 

But I'd love to have the compiler understand the fact that it's
deterministic. :-)

> I would instead add disambiguation code to the places in the parser
> than can introduce (as far as the compiler is concerned) the possibility
> of nondeterminism. You can you can see why there is no such possibility;
> I would try  to make the compiler that too. In most cases, this is not too hard,
> at least in my experience.

Hmm... I use my meta-predicate "repeatmax0" (and some variations of it) a
lot. But see further down first, you don't really need all the details:


% Something repeated zero to several times. Return the collected return 
% values as a list.

:- pred repeat0(pred(T, list(char), list(char)), list(T), 
                list(char), list(char)).
:- mode repeat0(pred(out, in, out) is det,     out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is semidet, out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is multi,   out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is nondet,  out, in, out) is multi.


% The longest sequence of repetitions of the specified thing, which returns
% something. Its return values are returned as a list. Succeeds for zero 
% repetitions (with an empty list).

:- pred repeatmax0(pred(T, list(char), list(char)), list(T), 
                   list(char), list(char)).
:- mode repeatmax0(pred(out, in, out) is det,     out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is semidet, out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is multi,   out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is nondet,  out, in, out) is nondet.


The compiler told me that repeatmax0 is nondet. The implementation is
simple:


repeat0(_, []) --> [].

repeat0(Pars, Values) -->
    { Values = [Val|Values1] },
    Pars(Val),
    repeat0(Pars, Values1).

repeatmax0(Pars, Values) -->
    repeat0(Pars, Values),   % XXX
    ( if   not Pars(_)
      then []
      else end
    ).


And there's the parser for the end of the input:


:- pred end(list(char)::in, list(char)::out) is semidet.

end([], []).


The compiler told me, that the call to repeat0 (marked with "% XXX" above)
can succeed more than once, which is correct.

Taking your hint about disambiguation code, I was able to reformulate the
repeatmax0 predicate such that, repeatmax0 becomes deterministic for a
deterministic input parser:


:- mode repeatmax0(pred(out, in, out) is det, out, in, out) is det.

repeatmax0(Pars, Values) -->
    ( if   Pars(Val)
      then { Values = [Val|Vals] },
           repeatmax0(Pars, Vals)
      else { Values = [] }
    ).


I'm still not sure what you mean by "disambiguation code", though.


> This approach replaces a single bug, global correctness argument with lots of
> small, local correctness arguments, which can be understood, documented,
> and if need be fixed, much more easily. Modularity in correctness arguments,
> if you will.

Thanks!

Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mercurylang.org/archives/users/attachments/20210704/672d4de7/attachment.sig>


More information about the users mailing list