[m-rev.] for review: better error messages for lambda exprs

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed May 18 04:22:40 AEST 2016

On Wed, 18 May 2016 02:56:44 +1000, Mark Brown <mark at mercurylang.org> wrote:
> > Regardless of how desirable that proposal may be, it cannot be
> > implemented without huge pain, because the "possibly generating
> > an error message" is a task that belongs to the typechecker,
> > the rest of that code belongs to the parser, and the two cannot
> > be combined without huge pain.
> That step is the same one that appears in your second template. So for
> this lambda expression where the mode is missing:
>    (pred(Y) is semidet :- X > Y)
> If ':-'/2 is not defined by the user then you get the message your
> above diff would generate. If ':-'/2 is defined but 'is'/2 isn't, you
> get a similar error but it would be reported as coming from the first
> argument to ':-'/2. If both ':-'/2 and 'is'/2 are defined, then both
> pred/N and semidet/0 could still be recognised as part of a lambda
> construct.

By "defined", I *think* you mean e.g "if the user has defined
an entity named :-/2", but I am not sure.

Assuming I am right about your meaning, the above is wrong.
If the parser, when it sees :-/2, does not commit to parsing the
above malformed lambda expression as a lambda expression,
then you would get an error but it would NOT be a syntax error
at all; it would come later, from the typechecker, and would almost
certainly be misleading, precisely because it would *not* be
recognised as being part of a lambda expression. (Unless you
replicated the parser inside the typechecker.)

> > This diff changes things so that when the top function symbol on
> > the RHS of a unification is a Mercury "keyword" used only in one kind
> > of construct, we commit to parsing its arguments as part of that construct.
> > (If it the keyword can be part of more than one construct, we look deeper
> > into its structure, and commit to parsing it as a given construct only
> > when we have seen enough of the structure to rule out the other possible
> > constructs it could be a part of.) If the parsing fails, we now generate
> > an error message that describes the problem *directly*.
> I'd like to propose that the commits only occur if the keyword is not
> also defined by the user.

While thinking about this, I just realised that with the current architecture
of the parser, neither your proposal nor the pragma will actually work
in the vast majority of cases. Consider this a withdrawal of the pragma

Code that parses terms is invoked in two phases of the parser.
Almost all terms are parsed when they are converted to items,
but this process simply copies some terms unchanged into clause
items. Those terms are then parsed when the clause item is added
to the HLDS.

Code that parses clause items during the creation of the HLDS
is invoked after the compiler has seen and processed all the
items that define types, insts, modes etc, and all the pragmas
that don't act as clauses (i.e. most pragmas other than foreign_proc).
Its behaviour therefore can be made dependent on the presence
of a pragma or on the names of entities defined by the program.

However, the parts of the parser that convert terms into items,
neither is possible. When this part of the parser tries to discover
the structure of a term, the terms that contain the rest of the program
may not have been read in yet, much less parsed. Its behavior
cannot depend on stuff it hasn't seen yet, and that includes
pragmas as well as e.g. type definitions. We could in theory
require that the pragma appear first, but that is far too error
prone for my taste.

Probably the reason why this hasn't occurred to me before
is that the part of the parser that brought on this discussion,
the part that parses lambda expressions, is done while
creating the HLDS, and thus it has access to all definitions
and pragmas in the program, even the textually later ones.

The current architecture of the parser is not the cleanest:
I think it would make sense to do *all* parsing of terms
when creating (the now structured) item list. However,
that is a separate issue, and would make any pragma
unable to affect *any* part of the parser.

All this discussion started with my changes to superhomogeneous.m,
which is a part of the parser that operates during the construction
of the HLDS. This made the parser assume that if the
top function symbol of a term indicates that the term is a
lambda expression, then parse it as one, even if it is malformed.
I hope noone is objecting to this change, because without it,
the error messages you get from the typechecker for any
malformed lambda expressions are *really* weird.
I expect that the original objection was to the other
changes I made to the neighboring code, which made
stricter the parsing of terms whose top function symbols
were @, ^, := and the like.

Are there any objections to commiting the changes
to the parsing of the lambda expressions only,
without the changes to the parsing of @, ^, := etc?


More information about the reviews mailing list