[m-rev.] for review: Implement modechecking of coerce more efficiently.

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun May 2 13:51:07 AEST 2021



On Mon, 26 Apr 2021 14:13:51 +1000, Peter Wang <novalazy at gmail.com> wrote:
> --- a/compiler/mode_errors.m
> +++ b/compiler/mode_errors.m
> @@ -165,9 +165,9 @@
>      ;       mode_error_coerce_input_not_ground(prog_var, mer_inst)
>              % The argument in a coerce expression has a non-ground inst.
>  
> -    ;       mode_error_coerce_bad_result_inst(mer_inst, mer_type)
> -            % The result of a coercion would have an inst that is not valid
> -            % for the type.
> +    ;       mode_error_coerce_invalid(mer_inst, mer_type)
> +            % The argument in a coerce expression has a ground inst
> +            % that is not valid for the result type.

Am I correct in guessing that you would generate coerce_invalid errors
only when the input inst is ground? If yes, then I would change "invalid"
to "invalid_ground" or "ground_invalid" to emphasize the fact that this error
and the previous one effectively partition the set of coerce errors.


> +            % Y is bound so bind the coercion result to a fresh variable
> +            % YPrime, then unify Y = YPrime.

I would call this var Z, since it is sort-of-unified with both X and Y.

> +    % Try to produce the resulting inst of a coercion from TypeX to TypeY.
> +    % InstX is the initial inst of the input term,
> +    % InstY is the final inst of the result term.
> +    %
> +:- pred modecheck_coerce_make_inst(module_info::in, tvarset::in, is_live::in,
> +    mer_type::in, mer_type::in, expanded_insts::in,
> +    mer_inst::in, maybe(mer_inst)::out) is det.
> +
> +modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX, TypeX, TypeY,
> +        ExpandedInsts0, InstX, Res) :-

This predicate has no argument named InstY, only Res; update the comment.

> +        InstX = ground(UniqX, HOInstInfoX),
> +        (
> +            HOInstInfoX = none_or_default_func,
> +            modecheck_coerce_from_ground_make_inst(ModuleInfo, TVarSet,
> +                LiveX, UniqX, TypeX, TypeY, Res)
> +        ;
> +            HOInstInfoX = higher_order(_PredInstInfoX),
> +            UniqY = make_result_uniq(LiveX, UniqX),
> +            % Coerce cannot change the calling convention.
> +            InstY = ground(UniqY, HOInstInfoX),
> +            Res = yes(InstY)
> +        )

You made the proposal for coercions when I was in hospital with no internet
connectivity, so I couldn't follow the proposal at the time, so excuse my
ignorance, but cannot a coerce expression restrict the set of allowed
function symbols in an input arg in a higher order value? If it can, then this
code needs more work.

> +        InstX = not_reached,
> +        InstY = not_reached,
> +        Res = yes(InstY)
> +    ;
> +        InstX = inst_var(_),
> +        unexpected($pred, "uninstantiated inst parameter")
> +    ;
> +        InstX = constrained_inst_vars(InstVars, SubInstX),
> +        % The term of TypeX is approximated by a ground inst SubInstX.

What does "the term of" refer to here and in the next sentence?

> +        InstX = defined_inst(InstNameX),
> +        % Avoid infinite loops. We track inst-type pairs because a user-defined
> +        % inst can be used with different types.
> +        ( if
> +            set.insert_new(InstNameX - TypeX, ExpandedInsts0, ExpandedInsts1)
> +        then
> +            inst_lookup(ModuleInfo, InstNameX, InstX1),
> +            modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX,
> +                TypeX, TypeY, ExpandedInsts1, InstX1, Res)
> +        else
> +            % If we would enter an infinite loop, return the inst unexpanded.

I would explicitly document the reason why this is justified: that a recursive
check, by definition, cannot find any errors that the nonrecursive part of
the algorithm wouldn't find.

> +:- func make_result_uniq(is_live, uniqueness) = uniqueness.
> +
> +make_result_uniq(LiveX, UniqX) = UniqY :-

I would give this function a different name. The current one implies
that the return value is *always* unique.

> +:- pred modecheck_coerce_make_bound_functors(module_info::in, tvarset::in,
> +    is_live::in, mer_type::in, mer_type::in, expanded_insts::in,
> +    list(bound_inst)::in, list(bound_inst)::out) is semidet.
> +
> +modecheck_coerce_make_bound_functors(ModuleInfo, TVarSet, LiveX,
> +        TypeX, TypeY, ExpandedInsts0, FunctorsX, FunctorsY) :-
> +    (
> +        FunctorsX = [],
> +        FunctorsY = []
> +    ;
> +        FunctorsX = [HeadFunctorX | TailFunctorsX],
> +        modecheck_coerce_make_bound_functor(ModuleInfo, TVarSet, LiveX,
> +            TypeX, TypeY, ExpandedInsts0, HeadFunctorX, yes(HeadFunctorY)),
> +        modecheck_coerce_make_bound_functors(ModuleInfo, TVarSet, LiveX,
> +            TypeX, TypeY, ExpandedInsts0, TailFunctorsX, TailFunctorsY),
> +        FunctorsY = [HeadFunctorY | TailFunctorsY]
> +    ).

This code finds an error if the call to modecheck_coerce_make_bound_functor
returns "no". It registers the error by failing, and letting code above generate
the error message. But that code cannot generate an  *informative* error message
because it does not know where exactly the error is.

If I were writing this, I would pass down to this predicate the path from the top level X
down to this FunctorsX (the path being a list of "this arg of this functor"), and then
replace the last arg with two args: a bound_inst, and a list of erroneous positions.
(The bound_inst would be meaningful only if the list of errors is empty, which means
that the arg pair would be semantically equivalent to a version of the current
maybe(bound_inst) arg with the error positions added as arguments to the "no",
but with less memory allocation.) If coercion for HeadFunctorX failed, I would add
the current position, paired with HeadFunctorX's cons_id, to the list of errors,
and continue processing TailFunctorsX to look for any more errors. Then at the
top level, I would check whether the error position list is empty, and if not,
convert them to an informative error message.

> +modecheck_coerce_from_bound_make_bound_functor_arg_insts(ModuleInfo, TVarSet,
> +        LiveX, ExpandedInsts0, ArgTypesX, ArgTypesY, ArgInstsX, ArgInstsY) :-

What is the "from_bound" in the name for?

Edit: I see that it distinguised it from the "from_ground" variant,
which does not pass an X inst. Document this, please.

> +    ( if
> +        ArgTypesX = [],
> +        ArgTypesY = [],
> +        ArgInstsX = []
> +    then
> +        ArgInstsY = []
> +    else if
> +        ArgTypesX = [HeadArgTypeX | TailArgTypesX],
> +        ArgTypesY = [HeadArgTypeY | TailArgTypesY],
> +        ArgInstsX = [HeadArgInstX | TailArgInstsX]
> +    then
> +        modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX,
> +            HeadArgTypeX, HeadArgTypeY, ExpandedInsts0,
> +            HeadArgInstX, yes(HeadArgInstY)),

Again, a place to add to the list of errors if there is no good HeadArgInstY.

> +:- pred modecheck_coerce_from_ground_make_inst(module_info::in, tvarset::in,
> +    is_live::in, uniqueness::in, mer_type::in, mer_type::in,
> +    maybe(mer_inst)::out) is det.
> +
> +modecheck_coerce_from_ground_make_inst(ModuleInfo, TVarSet, LiveX, UniqX,
> +        TypeX, TypeY, MaybeInstY) :-
> +    ( if TypeX = TypeY then
> +        % This should be a common case.
> +        UniqY = make_result_uniq(LiveX, UniqX),
> +        InstY = ground(UniqY, none_or_default_func),
> +        MaybeInstY = yes(InstY)
> +    else if check_is_subtype(ModuleInfo, TVarSet, TypeX, TypeY) then
> +        set.init(SeenTypes0),
> +        modecheck_coerce_from_ground_make_inst_2(ModuleInfo, TVarSet,
> +            LiveX, UniqX, SeenTypes0, TypeX, TypeY, InstY),
> +        MaybeInstY = yes(InstY)
> +    else
> +        MaybeInstY = no
> +    ).

If I am not mistaken, this is called from exactly one place. I would inline it there,
and remove the _2 from the name of the next predicate.

> +:- func ground_or_unique_inst(uniqueness) = mer_inst.
> +
> +ground_or_unique_inst(Uniq) =
> +    ( if Uniq = shared then
> +        ground_inst
> +    else
> +        ground(Uniq, none_or_default_func)
> +    ).

Since prog_mode.ground_inst just returns "ground(shared, none_or_default_func)",
the body of this function could be replaced by the else arm.

> +%---------------------------------------------------------------------------%
> +
> +:- pred make_cons_id_for_type_ctor(type_ctor::in, cons_id::in(cons),
> +    cons_id::out(cons)) is det.
> +
> +make_cons_id_for_type_ctor(TypeCtor, ConsId0, ConsId) :-
> +    ConsId0 = cons(SymName0, Arity, TypeCtor0),
> +    ModuleName = type_ctor_module(TypeCtor),
> +    maybe_change_module_qualifier(ModuleName, SymName0, SymName),
> +    ( if TypeCtor0 = cons_id_dummy_type_ctor then
> +        % Keep the dummy type_ctor from the original cons_id.
> +        % This probably makes no difference.
> +        ConsId = cons(SymName, Arity, TypeCtor0)
> +    else
> +        ConsId = cons(SymName, Arity, TypeCtor)
> +    ).

TypeCtor0 should not be a dummy after post-typecheck.
Have you done a bootcheck in which the then part aborts?

> +:- pred get_ctor_arg_type(constructor_arg::in, mer_type::out) is det.
> +
> +get_ctor_arg_type(ctor_arg(_, Type, _), Type).

I would add this function to prog_data.m. It is almost certainly NOT
the only place in the compiler with this functionality.

> --- a/doc/reference_manual.texi
> +++ b/doc/reference_manual.texi
> @@ -7147,7 +7147,7 @@ if @var{Ti} is a subtype of @var{Si},
>  according to visible subtype definitions
>  @end itemize
>  
> -Otherwise, the coerce expression is not type-correct.
> +Otherwise, the @code{coerce} expression is not type-correct.
>  @c NOTE: we deliberately disallow coercion between arbitrary phantom types.
>  
>  Furthermore,
> @@ -7176,63 +7176,87 @@ other than a recursive type of the exact form @samp{base(B1, ..., Bn)}
>  @heading Mode checking
>  
>  Type conversion expressions must also be mode-correct.
> -For an expression where:
> +Intuitively, conversion from a subtype to supertype is safe,

to ITS supertype

> +but a conversion from a supertype to subtype is safe only if

to ITS subtype, or to ONE OF ITS subtypes

> - at itemize @bullet
> +Mode checking proceeds by simultaneously traversing
> +the inst tree of the input term,
> +the type tree of the input term,
> +and the type tree of the result term,
> +and producing the inst tree of the result term
> +if the conversion is valid.
> +Let
> + at itemize

Would you mind repeating for me what is the purpose of describing the algorithm
in the manual?

>  coerce_mode_error.m:052: In clause for `bad_nelist_f_to_list_c(in, out)':
>  coerce_mode_error.m:052:   in coerce:
> -coerce_mode_error.m:052:   mode error: the result would have instantiatedness
> -coerce_mode_error.m:052:     bound(
> -coerce_mode_error.m:052:       '[|]'(
> -coerce_mode_error.m:052:         bound(
> -coerce_mode_error.m:052:           apple
> -coerce_mode_error.m:052:         ;
> -coerce_mode_error.m:052:           lemon
> -coerce_mode_error.m:052:         ;
> -coerce_mode_error.m:052:           orange
> -coerce_mode_error.m:052:         ),
> -coerce_mode_error.m:052:         bound(
> -coerce_mode_error.m:052:           []
> -coerce_mode_error.m:052:         ;
> -coerce_mode_error.m:052:           '[|]'(
> -coerce_mode_error.m:052:             bound(
> -coerce_mode_error.m:052:               apple
> -coerce_mode_error.m:052:             ;
> -coerce_mode_error.m:052:               lemon
> -coerce_mode_error.m:052:             ;
> -coerce_mode_error.m:052:               orange
> -coerce_mode_error.m:052:             ),
> -coerce_mode_error.m:052:             ground
> -coerce_mode_error.m:052:           )
> -coerce_mode_error.m:052:         )
> -coerce_mode_error.m:052:       )
> -coerce_mode_error.m:052:     ),
> -coerce_mode_error.m:052:   which is not valid for the type
> +coerce_mode_error.m:052:   mode error: the input term has instantiatedness
> +coerce_mode_error.m:052:   `ground',
> +coerce_mode_error.m:052:   and cannot be converted to the type
>  coerce_mode_error.m:052:   `coerce_mode_error.list(coerce_mode_error.citrus)'.

All the error messages here give just one side of the story.
They give the actual inst, but not the expected inst, or better yet,
the diff between those two insts.

The diff is otherwise fine. I would be surprised if it had the bad performance
of the earlier attempt, but did you measure its impact?

If I were you, I would do only the easy fixes asked-for above before commit,
and then address the harder ones in a later commit.

Zoltan.





More information about the reviews mailing list