[m-rev.] for review: add "did you mean" reminders for mistyped predicate names

Tue Sep 26 01:22:23 AEST 2023

On Mon, 25 Sep 2023, Zoltan Somogyi wrote:

> On 2023-09-25 21:14 +10:00 AEST, "Julien Fischer" <jfischer at opturion.com> wrote:
>>
>> On Mon, 25 Sep 2023, Zoltan Somogyi wrote:
>>
>
>>>> compiler/typecheck_error_undef.m:
>>>     If we are reporting an error for a reference an undefined predicate,
>>
>>     *to* an undefined predicate
>>
>>>     check whether any predicates exist whose names are "close enough"
>>>     to the name of the referenced predicate, and if yes, add to the
>>>     error message a line containing "(Did you mean x, y or z?)".
>>>
>>>     (Doing the same for references to undefined functions is future work.
>>>     The two are separate because things that look like function references
>>>     can also refer to e.g. data constructors.)
>>
>> There's quite a bit of scope for doing something similar for other undefined
>> entities. (Although, I'm sure you are aware of that.)
>
> Yep, though it is worth doing only for the entities that people generate
> undefined references to reasonably often. Besides predicates and function
> symbols, I think only type names, and maybe module names, may be
> frequent enough. I am pretty sure that inst, mode and event names are not.
> (Field names count as function names, due to field access syntax.)
> Can you think of any others?

Typeclass names.

>>> +    ( if
>>> +        AddeddumPieces = [],
>>> +        KindQualMsgs = []
>>> +    then
>>> +        KindQualMsgs = [],
>>> +        % It seems that regardless of missing qualifications or equal signs,
>>> +        % the reference is to the wrong name. See if we can mention some
>>> +        % similar names that could be the one they intended.
>>> +        get_known_pred_names(PredicateTable, KnownPredNames),
>>> +        BaseName = unqualify_name(SymName),
>>> +        % Note: name_is_close_enough below depends on all costs here
>>> +        % except for case changes being 2u.
>>> +        Params = edit_params(2u, 2u, case_sensitive_replacement_cost, 2u),
>>> +        string.to_char_list(BaseName, BaseNameChars),
>>> +        list.map(string.to_char_list, KnownPredNames, KnownPredNamesChars),
>>> +        find_closest_seqs(Params, BaseNameChars, KnownPredNamesChars,
>>> +            Cost, HeadBestNameChars, TailBestNamesChars),
>>> +        BestNamesChars = [HeadBestNameChars | TailBestNamesChars],
>>> +        list.map(string.from_char_list, BestNamesChars, BestNames),
>>> +        ( if
>>> +            % Don't offer a string as a replacement for itself.
>>> +            Cost > 0u,
>>> +            % Don't offer a string as a replacement if it is too far
>>> +            % from the original, either.
>>> +            list.filter(name_is_close_enough(Cost, BaseName),
>>> +                BestNames, CloseEnoughBestNames),
>>> +            CloseEnoughBestNames = [_ | _]
>>> +        then
>>
>> I suggest imposing a limit on the number of close enough names that are
>> printed, around a dozen should suffice; I can't imagine the resulting error
>> messages being particularly meaningful if there are more.
>
> The code at the moment prints all the candidate strings that have
> the same edit distance from the undefined reference. This means that
> *if* there were ever more than 12 of them, we would have no useful
> way to pick the 12 most likely to be the one the programmer wants.
> I would punt this problem to the first time we ever encounter it for real.
> I don't think that will ever happen on non-contrived code.

It seems like the kind of thing you could stumble into with machine
generated Mercury code.  (Whether that counts as contrived or not ...)

...

>>> +%---------------------------------------------------------------------------%
>>> diff --git a/tests/invalid/qual_basic_test2.err_exp b/tests/invalid/qual_basic_test2.err_exp
>>> index 65e27c4a1..249d29c01 100644
>>> --- a/tests/invalid/qual_basic_test2.err_exp
>>> +++ b/tests/invalid/qual_basic_test2.err_exp
>>> @@ -1,2 +1,3 @@
>>>  qual_basic_test2.m:019: In clause for predicate `main'/2:
>>>  qual_basic_test2.m:019:   error: undefined predicate `io.io__write_string'/3.
>>> +qual_basic_test2.m:019:   (Did you mean `write_string'?)
>>
>> That seems a little confusing; the actual error is that the module io.io
>> doesn't exist.
>
> I agree, but the problem is not in the new code; it is given the sym_name
> as qualified(unqualified(io"), "io__write_string"), i.e. it is given a string with
> a double underscore in the *base* name. Given that __s as module separators
> have gone the way of the dodo in  current code, I don't really feel like
> chasing down the reason for that.

I don't think mixing module qualifiers has ever worked. We should
consider dropping support for __ as a module qualifier.

...

> There was a question about edit_distance.m that I forgot to ask in my
> original email. Should we add a version of the find_closest_seq predicate
> that explicitly works on strings? It would do all the required conversions
> between strings and char lists, so users wouldn't have to. We could call it
> find_closest_string.

I don't see a problem with adding.

Julien.