[m-users.] Determinism issue with parser
Volker Wysk
post at volker-wysk.de
Mon Jul 5 03:21:11 AEST 2021
Am Montag, den 05.07.2021, 01:56 +1000 schrieb Zoltan Somogyi:
> 2021-07-05 01:21 GMT+10:00 "Volker Wysk" <post at volker-wysk.de>:
> > I've built a big parser for my project, too. It should always produce one
> > solution; it can't fail. But the compiler can't infer that it is
> > deterministic, for many parts of it. (That would be undecidable.)
> >
> > So I've resigned to this state of affairs. I just collect all the (exactly
> > one) solutions with solutions/2 and produce an error message in case there
> > should be more than one, which would be a bug in my parser.
>
> That works, but I would consider it fragile. The reason is that convincing
> someone (the next maintainer of the parser, perhaps) that there will be
> exactly one overall solution requires a single correctness argument over
> the entire code, and humans are not very reliable as reasoning machines;
> the longer the argument, the higher the probability of error.
The parser has fallbacks built in. A paragraph that isn't something else, is
a text paragraph. An inline (something which occurs inside a paragraph) that
isn't something special, is a simple word. A bug, which would cause a
special case not to be recognized, causes the respective text to end up in
the fallback data.
The special cases are all in if-then-else-if-then-else etc. constructs. So
there can be only be one matching. It's a parser for the markup language of
a wiki engine. I think this is robust because of the language being parsed
is like that.
But I'd love to have the compiler understand the fact that it's
deterministic. :-)
> I would instead add disambiguation code to the places in the parser
> than can introduce (as far as the compiler is concerned) the possibility
> of nondeterminism. You can you can see why there is no such possibility;
> I would try to make the compiler that too. In most cases, this is not too hard,
> at least in my experience.
Hmm... I use my meta-predicate "repeatmax0" (and some variations of it) a
lot. But see further down first, you don't really need all the details:
% Something repeated zero to several times. Return the collected return
% values as a list.
:- pred repeat0(pred(T, list(char), list(char)), list(T),
list(char), list(char)).
:- mode repeat0(pred(out, in, out) is det, out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is semidet, out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is multi, out, in, out) is multi.
:- mode repeat0(pred(out, in, out) is nondet, out, in, out) is multi.
% The longest sequence of repetitions of the specified thing, which returns
% something. Its return values are returned as a list. Succeeds for zero
% repetitions (with an empty list).
:- pred repeatmax0(pred(T, list(char), list(char)), list(T),
list(char), list(char)).
:- mode repeatmax0(pred(out, in, out) is det, out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is semidet, out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is multi, out, in, out) is nondet.
:- mode repeatmax0(pred(out, in, out) is nondet, out, in, out) is nondet.
The compiler told me that repeatmax0 is nondet. The implementation is
simple:
repeat0(_, []) --> [].
repeat0(Pars, Values) -->
{ Values = [Val|Values1] },
Pars(Val),
repeat0(Pars, Values1).
repeatmax0(Pars, Values) -->
repeat0(Pars, Values), % XXX
( if not Pars(_)
then []
else end
).
And there's the parser for the end of the input:
:- pred end(list(char)::in, list(char)::out) is semidet.
end([], []).
The compiler told me, that the call to repeat0 (marked with "% XXX" above)
can succeed more than once, which is correct.
Taking your hint about disambiguation code, I was able to reformulate the
repeatmax0 predicate such that, repeatmax0 becomes deterministic for a
deterministic input parser:
:- mode repeatmax0(pred(out, in, out) is det, out, in, out) is det.
repeatmax0(Pars, Values) -->
( if Pars(Val)
then { Values = [Val|Vals] },
repeatmax0(Pars, Vals)
else { Values = [] }
).
I'm still not sure what you mean by "disambiguation code", though.
> This approach replaces a single bug, global correctness argument with lots of
> small, local correctness arguments, which can be understood, documented,
> and if need be fixed, much more easily. Modularity in correctness arguments,
> if you will.
Thanks!
Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mercurylang.org/archives/users/attachments/20210704/672d4de7/attachment.sig>
More information about the users
mailing list