[m-rev.] for post-commit review: better diagnostic for missing higher order insts

Tue Jul 25 20:15:28 AEST 2023

On 2023-07-25 07:10 +02:00 CEST, "Peter Wang" <novalazy at gmail.com> wrote:
> On Tue, 25 Jul 2023 07:35:35 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
>> +Consider @samp{list.foldl}, one of the standard fold predicates on lists.
>> +The types of its arguments are given by this predicate declaration:
>> + at example
>> +:- pred foldl(pred(L, A, A), list(L), A, A).
>> + at end example
>> +
>> +The first argument is a higher order value (a predicate in this case),
>> +whose types are the types bound by the caller
>> +to the type variables @var{L}, @var{A} and @var{A} respectively,
>> +where @var{L} is the type of the elements in the list in the second argument,
>> +and @var{A} is the type of the accumulator
>> +whose initial and final values are the third and fourth arguments.
> 
> I suggest T as the variable for the elements.

I copied the declarations verbatim from list.m, and I think
the examples here should stay in sync with the originals.
Either we should keep using L here, or we should switch
list.m to use T as well. I prefer the former. We use T for "type",
but for pred/func declarations that have more than one
type variable, it helps if the name of the type variable
reminds readers of the role of the type variable in the declaration.
T, standing for type but not saying *which* type, is too generic for that.

(All that argument about one letter :-()

>> -The language contains builtin @samp{inst} values
>> + at example
>> +in(pred(in, in, out) is det)
>> +out(pred(in, in, out) is det)
>> + at end example
>> +
>> + at noindent
>> +which are each equivalent to the corresponding example just above.
>> +
> 
> examples

There are two examples of modes above, but each of these lines
corresponds to just one of the two.

>> + at example
>> +:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
>> +:- mode foldl(in(pred(in, mdi, muo) is det), in, mdi, muo) is det.
>> +:- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det.
>> +:- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet.
>> +:- mode foldl(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet.
>> +:- mode foldl(in(pred(in, di, uo) is semidet), in, di, uo) is semidet.
>> +:- mode foldl(in(pred(in, in, out) is multi), in, in, out) is multi.
>> +:- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet.
>> +:- mode foldl(in(pred(in, mdi, muo) is nondet), in, mdi, muo) is nondet.
>> +:- mode foldl(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi.
>> +:- mode foldl(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi.
>> + at end example

I kept five of these: in,in,out det, in,di,uo det, the same with cc_multi,
and in,in,out semidet.

I just noticed this mode:

:- mode foldl(in(pred(in, di, uo) is semidet), in, di, uo) is semidet.

How useful can this be if the failure of any invocation of the predicate
destroys the accumulator without the possibility of getting it back
on backtracking?

>> + at footnote{If there is a single inst or a single mode
>> +that all instances of a given type are expected to use,
>> +programmers will often give that inst or mode
>> +the same name as the name of the type.
>> +The compiler looks up names in its type table when it expects a type,
>> +in its inst table when it expects an inst,
>> +and in its mode table when it expects a mode,
>> +so this does not confuse the compiler.
>> +However, this practice @emph{can} confuse Mercury programmers
>> +who (a) do not know this fact, or (b) are not familiar with this convention.}
> 
> I'm not sure we should encourage the use of trivial mode definitions.
> I think writing a mode with in() or out() around a named inst is
> clearer, and convenient enough.

Agreed. However, the reference manual must explain the constructs,
because people may come across such mode definitions written by
other people.

Would deleting the "single mode" part of the footnote, keeping
only the "single type" and "single inst" parts, work for you?
Or do you think we need to add some explicit recommendation
against mode definitions that are there just to avoid in/1 or out/1
wrappers?

>> + at example
>> +:- mode foldl(pred(in, in, out) is det, in, in, out) is det.
>> + at end example
>> +
>> + at noindent
>> +and is actually written in the source code like this.
>> + at c ZZZ *Should* it be written like that?
>> + at c Arguably, it makes the code harder to read for novices.
>> +
> 
> Yes, I think so. The fact that (pred(in, in, out) is det)
> can both be an inst or a mode could be a source of confusion,
> and it's only slightly more convenient than using the in() wrapper.

Are you saying "yes" to "Should it be written in the source code like this",
or to "makes it harder to read"? The previous question is the former,
but what follows "yes" suggests the latter.

> We might want to remove the builtin higher-order modes eventually.
> I will admit it is tempting to use them to make a predicate's mode
> declarations align with the type declaration...

I could modify parse_higher_order_mode to return an error indication,
instead of the parsed higher order mode, if an option is set. That would
allow us to replace all the uses of this shortcut in the library, and therefore
in the library manual. Would that be useful?

>> + at c ZZZ can you omit the in(...) wrapper around named insts?
>> + at c Answer: NO. it is not even true of the inst is an inst name that expands
>> + at c to a higher order inst, rather than being an *explicit* higher order inst.
>> + at c ZZZ Next question: *why* is the answer NO?
> 
> My understanding is that, by design, the syntax of a builtin
> higher-order mode makes them look very much like (i.e. exactly like)
> a higher-order inst. But an inst is an inst, and a mode is a mode.
> The compiler isn't "expecting to find a mode, but finding an inst",
> it's just "expecting to find a mode".

I agree that is how it looks like to the compiler. I don't think that is
the best way to explain things to readers of the manual.

> I suggest restoring the idea/description of builtin insts.
> They are "builtin" in the same way that predicate/function/tuple types
> are described as "builtin".

I don't think that description helps people understand higher order insts/modes,
or higher order types, for that matter.

> As for builtin higher-order modes, we could describe them as "syntactic
> sugar", a "shorthand" or "convenience" (as before),

Agreed, but the manual already calls them a convenience, so I am not sure
what change, if any, you are calling for.

> and refrain from
> calling them "builtin".

Agreed :-)

>> + at c ZZZ Note that nontrivial examples for functions
>> + at c are much harder to find than for predicates, because
>> + at c - any functions with the default arg modes and determinism need no decl
>> + at c - we don't want to encourage people to write functions which are not det
>> + at c - we don't want to encourage people to write functions whose args
>> + at c   have uniqueness requirements
>> + at c This leaves functions whose arg modes restrict the allowed function symbols
>> + at c (which would be better expressed using subtypes) and functions whose arg
>> + at c modes include higher order insts/modes, which would be too complex to
>> + at c be useful as an example to novices.
> 
> I don't think an example for functions is necessary.

The initial impetus for this change to the manual was that the
error message change in the other diff wanted to direct people
to this section of the manual for an example of the syntax of
a higher order function inst.

>> + at c ZZZ should we have an example of a pred or func of arity zero?
>> + at c If so, we could use the force function from lazy.m.
> 
> I don't think it's necessary.

I deleted that ZZZ. I also followed your other suggestions.

Thanks for the review.

Zoltan.