[m-rev.] for review: add "did you mean" reminders for mistyped predicate names
Julien Fischer
jfischer at opturion.com
Mon Sep 25 21:14:20 AEST 2023
On Mon, 25 Sep 2023, Zoltan Somogyi wrote:
> For review by anyone. I am particularly looking for feedback
> on whether any part of the change to edit_seq.m should stay.
> I originally intended to base the new error message functionality
> on new code in edit_seq.m, but this turned out to be significantly
> harder than writing new code, in a new module (edit_distance.m)
> using the gcc code Julien linked to as the basis.
>
> I can see three options:
>
> - Keeping edit_seq.m as it was.
>
> - Keeping only the addition of the new predicate for finding the
> closest candidate sequence, but undoing the generalization
> of the edit_params type.
>
> - Keeping both the changes to edit_seq.m in the attached diff:
> the new predicate, *and* the generalization.
>
> Note that the generalization requires changes in modules
> that use this existing module if they pass parameters around,
> so it is a breaking change, and the breakage does not buy you
> anything helpful if your use case is the existing use case for
> this module, i.e. the generation of diffs. This is why I would
> prefer one of the first two options.
1 or 2 is fine. I don't see a particularly strong reason for 2,
but don't have any objections to it.
> WIth either of the last two options, I would of course document
> the chosen option in NEWS.md, and with any of the options,
> I would add a note to edit_seq.m about the existence of
> edit_distance.m. (The form of the note would depend on
> the chosen option, which is why I haven't written it yet.)
> The fate of the new test cases for edit_seq.m also depends on this
> decision.
>
> There is also a new XXX in typecheck_error_undef.m
> I would like your opinions on: should we print "did you mean"
> hints in the presence of hints about pred vs func and/or
> missing module qualification?
> Add "did you mean ..." to undef pred reports.
>
> compiler/typecheck_error_undef.m:
> If we are reporting an error for a reference an undefined predicate,
*to* an undefined predicate
> check whether any predicates exist whose names are "close enough"
> to the name of the referenced predicate, and if yes, add to the
> error message a line containing "(Did you mean x, y or z?)".
>
> (Doing the same for references to undefined functions is future work.
> The two are separate because things that look like function references
> can also refer to e.g. data constructors.)
There's quite a bit of scope for doing something similar for other undefined
entities. (Although, I'm sure you are aware of that.)
> library/edit_distance.m:
> Add this new module to implement the "close enough" check.
> This new module is similar to the existing edit_seq.m module,
> but it is designed to serve different requirements, and it seems to me
> to be far from trivial to write code to meet both sets of requirements
> at once.
>
> library/edit_seq.m:
> Expand this module to serve one of the two requirements
> now met by edit_distance.m that the the old edit_seq.m did not meet.
> This is the requirement to treat case changes differently than
> other replacements. The other requirement, the need to treat transpositions
> differently than insert/delete pairs, is what is hard to reconcile
> with the need to generate output that looks like it came from "diff -u".
...
> diff --git a/compiler/typecheck_error_undef.m b/compiler/typecheck_error_undef.m
> index bb93c2694..96d96e9ae 100644
> --- a/compiler/typecheck_error_undef.m
> +++ b/compiler/typecheck_error_undef.m
> @@ -346,11 +349,11 @@ report_apply_instead_of_pred = Components :-
>
> %---------------------%
>
> -:- func report_error_pred_wrong_name(type_error_clause_context, prog_context,
> - predicate_table, pf_sym_name_arity, list(module_name), list(format_piece))
> - = error_spec.
> +:- func report_error_pred_wrong_full_name(type_error_clause_context,
> + prog_context, predicate_table, pf_sym_name_arity, list(module_name),
> + list(format_piece)) = error_spec.
>
> -report_error_pred_wrong_name(ClauseContext, Context, PredicateTable,
> +report_error_pred_wrong_full_name(ClauseContext, Context, PredicateTable,
> PFSymNameArity, MissingImportModules, AddeddumPieces) = Spec :-
> InClauseForPieces = in_clause_for_pieces(ClauseContext),
> InClauseForComponent = always(InClauseForPieces),
> @@ -379,7 +382,51 @@ report_error_pred_wrong_name(ClauseContext, Context, PredicateTable,
> PossibleModuleQualsSet0, PossibleModuleQualsSet),
> QualMsgs = report_any_missing_module_qualifiers(ClauseContext,
> Context, "predicate", PossibleModuleQualsSet),
> - Msgs = [UndefMsg] ++ KindMsgs ++ QualMsgs,
> + KindQualMsgs = KindMsgs ++ QualMsgs,
> + ( if
> + AddeddumPieces = [],
> + KindQualMsgs = []
> + then
> + KindQualMsgs = [],
> + % It seems that regardless of missing qualifications or equal signs,
> + % the reference is to the wrong name. See if we can mention some
> + % similar names that could be the one they intended.
> + get_known_pred_names(PredicateTable, KnownPredNames),
> + BaseName = unqualify_name(SymName),
> + % Note: name_is_close_enough below depends on all costs here
> + % except for case changes being 2u.
> + Params = edit_params(2u, 2u, case_sensitive_replacement_cost, 2u),
> + string.to_char_list(BaseName, BaseNameChars),
> + list.map(string.to_char_list, KnownPredNames, KnownPredNamesChars),
> + find_closest_seqs(Params, BaseNameChars, KnownPredNamesChars,
> + Cost, HeadBestNameChars, TailBestNamesChars),
> + BestNamesChars = [HeadBestNameChars | TailBestNamesChars],
> + list.map(string.from_char_list, BestNamesChars, BestNames),
> + ( if
> + % Don't offer a string as a replacement for itself.
> + Cost > 0u,
> + % Don't offer a string as a replacement if it is too far
> + % from the original, either.
> + list.filter(name_is_close_enough(Cost, BaseName),
> + BestNames, CloseEnoughBestNames),
> + CloseEnoughBestNames = [_ | _]
> + then
I suggest imposing a limit on the number of close enough names that are
printed, around a dozen should suffice; I can't imagine the resulting error
messages being particularly meaningful if there are more.
> + SuggestionPieces = list_to_quoted_pieces_or(CloseEnoughBestNames),
> + SuggestedNamePieces =
> + [words("(Did you mean")] ++ SuggestionPieces ++
> + [suffix("?)"), nl],
> + SuggestedNamesMsg = simplest_msg(Context, SuggestedNamePieces),
> + Msgs = [UndefMsg, SuggestedNamesMsg]
> + else
> + Msgs = [UndefMsg]
> + )
> + else
> + % The AddeddumPieces part of UndefMsg and/or KindQualMsgs
> + % offer hints about the error that allow for the base name being right.
> + % Print just those.
> + % XXX Should we print SuggestedNamesMsg as well even in this case?
My inclination would be not to.
> + Msgs = [UndefMsg] ++ KindQualMsgs
> + ),
> Spec = error_spec($pred, severity_error, phase_type_check, Msgs).
...
> diff --git a/library/edit_distance.m b/library/edit_distance.m
> index e69de29bb..9ec45a54b 100644
> --- a/library/edit_distance.m
> +++ b/library/edit_distance.m
> @@ -0,0 +1,451 @@
> +%---------------------------------------------------------------------------%
> +% vim: ft=mercury ts=4 sw=4 et
> +%---------------------------------------------------------------------------%
> +% Copyright (C) 2023 The Mercury team.
> +% This file is distributed under the terms specified in COPYING.LIB.
> +%---------------------------------------------------------------------------%
> +%
> +% File: edit_distance.m.
> +% Stability: medium.
> +%
> +% This module computes the edit distance between two sequences of items.
> +% Its job is both similar to, and distinct from, the job of edit_seq.m.
> +% It is similar in that both modules work out the simplest, cheapest way
> +% to transform one sequence into another. It is distinct because the two
> +% modules aim to solve different problems.
> +%
> +% - edit_seq.m aims to solve the problem of displaying the difference
> +% between two given sequences in a way that makes their differences
> +% as easy to understand as possible.
> +%
> +% - edit_distance.m aims to solve the problem of finding in a pool of
> +% candidate sequences the candidate that is closest to a given query
> +% candidate sequence.
Delete that last occurrence of "candidate" there.
> +% Doing a second job with the second problem requires a mechanism that
> +% allows callers to specify that a transposition of two elements (such as
> +% replacing "bc" with "cb", thus transforming e.g. "abcd" into "acbd")
> +% has a different cost than deleting one element in one place in the sequence
> +% and inserting it back at another place. This mechanism does not help
> +% with the first problem at all (since the simplest way to display
> +% a transposition *is* as a delete/insert pair), and in fact its presence
> +% in the system would unnecessarily complicate the algorithm.
Somewhere here, I suggest we should say that this module computes the
Damerau–Levenshtein instance. I suspect potential users of this moulde who are
aware of the different types of edit distance may be looking out for this.
> +%---------------------------------------------------------------------------%
> +%
> +% The code in this module was derived from the spellcheck.cc in gcc.
I suggest moving this comment to the beginning of the implementation;
I would also word it as follows:
The code in this module uses a similar approach to that of
spellcheck.cc in gcc.
...
> +%---------------------------------------------------------------------------%
> diff --git a/tests/invalid/qual_basic_test2.err_exp b/tests/invalid/qual_basic_test2.err_exp
> index 65e27c4a1..249d29c01 100644
> --- a/tests/invalid/qual_basic_test2.err_exp
> +++ b/tests/invalid/qual_basic_test2.err_exp
> @@ -1,2 +1,3 @@
> qual_basic_test2.m:019: In clause for predicate `main'/2:
> qual_basic_test2.m:019: error: undefined predicate `io.io__write_string'/3.
> +qual_basic_test2.m:019: (Did you mean `write_string'?)
That seems a little confusing; the actual error is that the module io.io
doesn't exist.
The diff looks fine otherwise.
Julien.
More information about the reviews
mailing list