[m-rev.] for review: better error messages for lambda exprs

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun May 8 17:04:58 AEST 2016

On Sun, 8 May 2016 15:11:16 +1000, Peter Wang <novalazy at gmail.com> wrote:
> Let me try to summarise it.  The compiler will reject the use of these
> names as data-functors and compound data-terms:
>     :           (still allowed as a data-functor)
>     with_type
>     @           (still allowed as a data-functor)
>     ^           (still allowed as a data-functor)
>     :=
>     :-
>     -->
>     is

After this change, one-character names above are still allowed
to be used with arity 0 because the lexer generates the same output
when it sees e.g. @ as when it sees '@'. Programs must be able to
use every character as a character constant. If it weren't for this,
I would also want to disallow their use with arity 0 as well.

> Those names are not rejected as part of type definitions.  It is
> possible to construct/deconstruct terms with those names through the
> standard library, just not directly in Mercury code.

I would be happy to change that, and reject any attempt to use
any operator or word that has a meaning in Mercury syntax as the
name of a data constructor, or as the name of any other user-defined
entity, such as type name, mode name, predicate name etc. This includes
@, ^ and :.

Can people please vote: would you support or oppose such a change?

> My preference is to keep the definition as-is.  The relevant rules might
> be stated simply as:
>     Any name can be used as a non-compound data-term.
>     Any name can be used as the top-level functor of a compound data-term,
>     which can be any compound term that is not a special data-term.

And I would like to change both occurrences of "any name" above to
"any name except a Mercury operator or keyword", and then you can
delete the last third. I think that is even simpler. But more to the point,
I think it would make programming easier. (See the next part below
for the why.)

I am also wondering: if you want the above rule, then why did you
not say anything for the several similar changes I made to the compiler
in the last several months? For example, the one that prevented the
use of inst names such as free/2 and clobbered/3, or the one that
prevented the uses of e.g. else/2 outside if-then-elses?

> I like that there are no exceptions about which names are allowed where.
> Mercury doesn't have keywords and non-keywords so, while it would be
> possible to explain why *these* names are excluded and not *those*,
> it still feels arbitrary.

"feels arbitrary": does this mean that your objection is philosophical,
and not practical? What I mean is: would your code  be affected?

I am not sure what "*these*" names refers to.

If you mean "why would we want to prohibit the use of *some* Mercury
keywords as data functors and not others", then I say we *should*
prohibit all of them.

If you mean "why don't we allow people to use *any* Mercury keywords
as the names of data functors", my answer is the flexibility it buys programmers
is worth very little (it is trivial to use non-keyword function symbols), while
the cost in making code hard to parse, for programmers as well as
the compiler, is much bigger. Most of the cost is due to the fact that
99.99999% of the uses of these operators and keywords is as part
of the Mercury constructs that these operators and keywords are part of,
but if the compiler cannot rely on this fact, then it cannot generate
good error messages for mistakes in those constructs.

I still haven't heard anyone say that the flexibility of being able to use
any function symbol anywhere is of practical use to them when writing
Mercury code. (As opposed to making translating Prolog code to Mercury
slightly easier.)

> Can we improve error messages another way, like generating special error
> messages if there is a type failure involving the names above?

Mistakes in the uses of these names usually lead to more than one type error;
have a look at the old version of lambda_syntax_error.err_exp. Some of
these involve the badly-used keywords, but some do not: instead, they
report errors for the terms below them, such as the ones whose function
symbols are "::", "in" and "semidet".

If I understand you correctly, you are asking whether we can modify
the typechecker to replace the several type errors that currently result
from e.g. a malformed lambda expression with a single error message
that says in effect "these type errors result from a malformed lambda
expression". My answer is that while that is theoretically possible,
the resulting code would *not* be just a type checker: it would be a
combination of type checker and forensic parser. I say "forensic"
because its task would be even harder than the task of the parser:
it would have to parse not a single term that is intended to be
a lambda expression but isn't, but a *sequence* of the superhomogeneous
unifications that result from the flattening of the original malformed term.
To know which type errors it should include in the above umbrella
message and which it shouldn't, it would have to find out exactly which
unifications are part of the original malformed-lambda-expression term,
which is conceptually equivalent to reconstructing that term.

I would much rather directly generate an error message that says
"you shot this term" than indirectly a message that says "the blood
spatter seems to indicate that you shot this term". And I would
much rather generate error messages about syntax errors in
the parser, rather than the type checker.


More information about the reviews mailing list