[m-rev.] for review: Implement modechecking of coerce more efficiently.

Peter Wang novalazy at gmail.com
Wed May 5 16:41:28 AEST 2021


On Sun, 02 May 2021 13:51:07 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> 
> On Mon, 26 Apr 2021 14:13:51 +1000, Peter Wang <novalazy at gmail.com> wrote:
> > --- a/compiler/mode_errors.m
> > +++ b/compiler/mode_errors.m
> > @@ -165,9 +165,9 @@
> >      ;       mode_error_coerce_input_not_ground(prog_var, mer_inst)
> >              % The argument in a coerce expression has a non-ground inst.
> >  
> > -    ;       mode_error_coerce_bad_result_inst(mer_inst, mer_type)
> > -            % The result of a coercion would have an inst that is not valid
> > -            % for the type.
> > +    ;       mode_error_coerce_invalid(mer_inst, mer_type)
> > +            % The argument in a coerce expression has a ground inst
> > +            % that is not valid for the result type.
> 
> Am I correct in guessing that you would generate coerce_invalid errors
> only when the input inst is ground? If yes, then I would change "invalid"
> to "invalid_ground" or "ground_invalid" to emphasize the fact that this error
> and the previous one effectively partition the set of coerce errors.

Done.

> 
> 
> > +            % Y is bound so bind the coercion result to a fresh variable
> > +            % YPrime, then unify Y = YPrime.
> 
> I would call this var Z, since it is sort-of-unified with both X and Y.
> 

Hmm, I think it reads more clearly with YPrime as it goes with TypeY and
FinalInstY, so I'll leave it.

> > +    % Try to produce the resulting inst of a coercion from TypeX to TypeY.
> > +    % InstX is the initial inst of the input term,
> > +    % InstY is the final inst of the result term.
> > +    %
> > +:- pred modecheck_coerce_make_inst(module_info::in, tvarset::in, is_live::in,
> > +    mer_type::in, mer_type::in, expanded_insts::in,
> > +    mer_inst::in, maybe(mer_inst)::out) is det.
> > +
> > +modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX, TypeX, TypeY,
> > +        ExpandedInsts0, InstX, Res) :-
> 
> This predicate has no argument named InstY, only Res; update the comment.

Done.

> > +        InstX = ground(UniqX, HOInstInfoX),
> > +        (
> > +            HOInstInfoX = none_or_default_func,
> > +            modecheck_coerce_from_ground_make_inst(ModuleInfo, TVarSet,
> > +                LiveX, UniqX, TypeX, TypeY, Res)
> > +        ;
> > +            HOInstInfoX = higher_order(_PredInstInfoX),
> > +            UniqY = make_result_uniq(LiveX, UniqX),
> > +            % Coerce cannot change the calling convention.
> > +            InstY = ground(UniqY, HOInstInfoX),
> > +            Res = yes(InstY)
> > +        )
> 
> You made the proposal for coercions when I was in hospital with no internet
> connectivity, so I couldn't follow the proposal at the time, so excuse my
> ignorance, but cannot a coerce expression restrict the set of allowed
> function symbols in an input arg in a higher order value? If it can, then this
> code needs more work.

Subtyping is not allowed in higher-order constructor arguments,
e.g. subfoo is not allowed:

    :- type foo
	--->	foo(pred(fruit::in) is semidet).

    :- type subfoo =< foo
	--->	foo(pred(citrus::in) is semidet).

Also, if you have a type:

    :- type wrap(T)
	--->	wrap(T).

coerce does not allow a conversion from `wrap(pred(citrus))' to
`wrap(pred(fruit))' because it requires there to be a subtype
relationship on the type parameter T on the input and result types.
`pred(citrus)' is not a subtype of `pred(fruit)', or vice versa,
by our definition of subtypes.

However, I could well have missed something. If you have an example
in mind then please post it.

> > +        InstX = not_reached,
> > +        InstY = not_reached,
> > +        Res = yes(InstY)
> > +    ;
> > +        InstX = inst_var(_),
> > +        unexpected($pred, "uninstantiated inst parameter")
> > +    ;
> > +        InstX = constrained_inst_vars(InstVars, SubInstX),
> > +        % The term of TypeX is approximated by a ground inst SubInstX.
> 
> What does "the term of" refer to here and in the next sentence?

I have clarified both as "the input term" and "the result term".

> 
> > +        InstX = defined_inst(InstNameX),
> > +        % Avoid infinite loops. We track inst-type pairs because a user-defined
> > +        % inst can be used with different types.
> > +        ( if
> > +            set.insert_new(InstNameX - TypeX, ExpandedInsts0, ExpandedInsts1)
> > +        then
> > +            inst_lookup(ModuleInfo, InstNameX, InstX1),
> > +            modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX,
> > +                TypeX, TypeY, ExpandedInsts1, InstX1, Res)
> > +        else
> > +            % If we would enter an infinite loop, return the inst unexpanded.
> 
> I would explicitly document the reason why this is justified: that a recursive
> check, by definition, cannot find any errors that the nonrecursive part of
> the algorithm wouldn't find.

Done.

> 
> > +:- func make_result_uniq(is_live, uniqueness) = uniqueness.
> > +
> > +make_result_uniq(LiveX, UniqX) = UniqY :-
> 
> I would give this function a different name. The current one implies
> that the return value is *always* unique.

Ah, true. I've renamed it to uniqueness_for_coerce_result.

> 
> > +:- pred modecheck_coerce_make_bound_functors(module_info::in, tvarset::in,
> > +    is_live::in, mer_type::in, mer_type::in, expanded_insts::in,
> > +    list(bound_inst)::in, list(bound_inst)::out) is semidet.
> > +
> > +modecheck_coerce_make_bound_functors(ModuleInfo, TVarSet, LiveX,
> > +        TypeX, TypeY, ExpandedInsts0, FunctorsX, FunctorsY) :-
> > +    (
> > +        FunctorsX = [],
> > +        FunctorsY = []
> > +    ;
> > +        FunctorsX = [HeadFunctorX | TailFunctorsX],
> > +        modecheck_coerce_make_bound_functor(ModuleInfo, TVarSet, LiveX,
> > +            TypeX, TypeY, ExpandedInsts0, HeadFunctorX, yes(HeadFunctorY)),
> > +        modecheck_coerce_make_bound_functors(ModuleInfo, TVarSet, LiveX,
> > +            TypeX, TypeY, ExpandedInsts0, TailFunctorsX, TailFunctorsY),
> > +        FunctorsY = [HeadFunctorY | TailFunctorsY]
> > +    ).
> 
> This code finds an error if the call to modecheck_coerce_make_bound_functor
> returns "no". It registers the error by failing, and letting code above generate
> the error message. But that code cannot generate an  *informative* error message
> because it does not know where exactly the error is.
> 

I have added an XXX.

> If I were writing this, I would pass down to this predicate the path from the top level X
> down to this FunctorsX (the path being a list of "this arg of this functor"), and then
> replace the last arg with two args: a bound_inst, and a list of erroneous positions.
> (The bound_inst would be meaningful only if the list of errors is empty, which means
> that the arg pair would be semantically equivalent to a version of the current
> maybe(bound_inst) arg with the error positions added as arguments to the "no",
> but with less memory allocation.) If coercion for HeadFunctorX failed, I would add
> the current position, paired with HeadFunctorX's cons_id, to the list of errors,
> and continue processing TailFunctorsX to look for any more errors. Then at the
> top level, I would check whether the error position list is empty, and if not,
> convert them to an informative error message.
> 
> > +modecheck_coerce_from_bound_make_bound_functor_arg_insts(ModuleInfo, TVarSet,
> > +        LiveX, ExpandedInsts0, ArgTypesX, ArgTypesY, ArgInstsX, ArgInstsY) :-
> 
> What is the "from_bound" in the name for?
> 
> Edit: I see that it distinguised it from the "from_ground" variant,
> which does not pass an X inst. Document this, please.
> 

I have added "from_bound" to its ancestor predicates, and added comments
to explain what "from_ground" and "from_bound" are referring to.

> > +    ( if
> > +        ArgTypesX = [],
> > +        ArgTypesY = [],
> > +        ArgInstsX = []
> > +    then
> > +        ArgInstsY = []
> > +    else if
> > +        ArgTypesX = [HeadArgTypeX | TailArgTypesX],
> > +        ArgTypesY = [HeadArgTypeY | TailArgTypesY],
> > +        ArgInstsX = [HeadArgInstX | TailArgInstsX]
> > +    then
> > +        modecheck_coerce_make_inst(ModuleInfo, TVarSet, LiveX,
> > +            HeadArgTypeX, HeadArgTypeY, ExpandedInsts0,
> > +            HeadArgInstX, yes(HeadArgInstY)),
> 
> Again, a place to add to the list of errors if there is no good HeadArgInstY.
> 
> > +:- pred modecheck_coerce_from_ground_make_inst(module_info::in, tvarset::in,
> > +    is_live::in, uniqueness::in, mer_type::in, mer_type::in,
> > +    maybe(mer_inst)::out) is det.
> > +
> > +modecheck_coerce_from_ground_make_inst(ModuleInfo, TVarSet, LiveX, UniqX,
> > +        TypeX, TypeY, MaybeInstY) :-
> > +    ( if TypeX = TypeY then
> > +        % This should be a common case.
> > +        UniqY = make_result_uniq(LiveX, UniqX),
> > +        InstY = ground(UniqY, none_or_default_func),
> > +        MaybeInstY = yes(InstY)
> > +    else if check_is_subtype(ModuleInfo, TVarSet, TypeX, TypeY) then
> > +        set.init(SeenTypes0),
> > +        modecheck_coerce_from_ground_make_inst_2(ModuleInfo, TVarSet,
> > +            LiveX, UniqX, SeenTypes0, TypeX, TypeY, InstY),
> > +        MaybeInstY = yes(InstY)
> > +    else
> > +        MaybeInstY = no
> > +    ).
> 
> If I am not mistaken, this is called from exactly one place. I would inline it there,
> and remove the _2 from the name of the next predicate.
> 

It's called recursively as well.

> > +:- func ground_or_unique_inst(uniqueness) = mer_inst.
> > +
> > +ground_or_unique_inst(Uniq) =
> > +    ( if Uniq = shared then
> > +        ground_inst
> > +    else
> > +        ground(Uniq, none_or_default_func)
> > +    ).
> 
> Since prog_mode.ground_inst just returns "ground(shared, none_or_default_func)",
> the body of this function could be replaced by the else arm.
> 

I've deleted the function. It was present from the old implementation,
where the intention was to opportunistically use the static term
returned by ground_inst.

> > +%---------------------------------------------------------------------------%
> > +
> > +:- pred make_cons_id_for_type_ctor(type_ctor::in, cons_id::in(cons),
> > +    cons_id::out(cons)) is det.
> > +
> > +make_cons_id_for_type_ctor(TypeCtor, ConsId0, ConsId) :-
> > +    ConsId0 = cons(SymName0, Arity, TypeCtor0),
> > +    ModuleName = type_ctor_module(TypeCtor),
> > +    maybe_change_module_qualifier(ModuleName, SymName0, SymName),
> > +    ( if TypeCtor0 = cons_id_dummy_type_ctor then
> > +        % Keep the dummy type_ctor from the original cons_id.
> > +        % This probably makes no difference.
> > +        ConsId = cons(SymName, Arity, TypeCtor0)
> > +    else
> > +        ConsId = cons(SymName, Arity, TypeCtor)
> > +    ).
> 
> TypeCtor0 should not be a dummy after post-typecheck.
> Have you done a bootcheck in which the then part aborts?
> 

The then part is reachable in at least
    hard_coded/subtype_pack.m
    invalid/coerce_implied_mode.m
    invalid/coerce_mode_error.m

I have added an XXX for investigation later.

> > +:- pred get_ctor_arg_type(constructor_arg::in, mer_type::out) is det.
> > +
> > +get_ctor_arg_type(ctor_arg(_, Type, _), Type).
> 
> I would add this function to prog_data.m. It is almost certainly NOT
> the only place in the compiler with this functionality.
> 

I have added an XXX for later.

> > --- a/doc/reference_manual.texi
> > +++ b/doc/reference_manual.texi
> > @@ -7147,7 +7147,7 @@ if @var{Ti} is a subtype of @var{Si},
> >  according to visible subtype definitions
> >  @end itemize
> >  
> > -Otherwise, the coerce expression is not type-correct.
> > +Otherwise, the @code{coerce} expression is not type-correct.
> >  @c NOTE: we deliberately disallow coercion between arbitrary phantom types.
> >  
> >  Furthermore,
> > @@ -7176,63 +7176,87 @@ other than a recursive type of the exact form @samp{base(B1, ..., Bn)}
> >  @heading Mode checking
> >  
> >  Type conversion expressions must also be mode-correct.
> > -For an expression where:
> > +Intuitively, conversion from a subtype to supertype is safe,
> 
> to ITS supertype

Fixed.

> 
> > +but a conversion from a supertype to subtype is safe only if
> 
> to ITS subtype, or to ONE OF ITS subtypes

Fixed.

> 
> > - at itemize @bullet
> > +Mode checking proceeds by simultaneously traversing
> > +the inst tree of the input term,
> > +the type tree of the input term,
> > +and the type tree of the result term,
> > +and producing the inst tree of the result term
> > +if the conversion is valid.
> > +Let
> > + at itemize
> 
> Would you mind repeating for me what is the purpose of describing the algorithm
> in the manual?
> 

I want users to be able to understand (precisely) what is allowed or not
by reading the reference manual. Describing the algorithm is an attempt
to impart that information, but perhaps not the best way.

Writing down the algorithm for the reference manual and comparing it
with the code did help clarify my understanding while developing this
change, so even if the text is replaced later, it was useful for that.

> >  coerce_mode_error.m:052: In clause for `bad_nelist_f_to_list_c(in, out)':
> >  coerce_mode_error.m:052:   in coerce:
> > -coerce_mode_error.m:052:   mode error: the result would have instantiatedness
> > -coerce_mode_error.m:052:     bound(
> > -coerce_mode_error.m:052:       '[|]'(
> > -coerce_mode_error.m:052:         bound(
> > -coerce_mode_error.m:052:           apple
> > -coerce_mode_error.m:052:         ;
> > -coerce_mode_error.m:052:           lemon
> > -coerce_mode_error.m:052:         ;
> > -coerce_mode_error.m:052:           orange
> > -coerce_mode_error.m:052:         ),
> > -coerce_mode_error.m:052:         bound(
> > -coerce_mode_error.m:052:           []
> > -coerce_mode_error.m:052:         ;
> > -coerce_mode_error.m:052:           '[|]'(
> > -coerce_mode_error.m:052:             bound(
> > -coerce_mode_error.m:052:               apple
> > -coerce_mode_error.m:052:             ;
> > -coerce_mode_error.m:052:               lemon
> > -coerce_mode_error.m:052:             ;
> > -coerce_mode_error.m:052:               orange
> > -coerce_mode_error.m:052:             ),
> > -coerce_mode_error.m:052:             ground
> > -coerce_mode_error.m:052:           )
> > -coerce_mode_error.m:052:         )
> > -coerce_mode_error.m:052:       )
> > -coerce_mode_error.m:052:     ),
> > -coerce_mode_error.m:052:   which is not valid for the type
> > +coerce_mode_error.m:052:   mode error: the input term has instantiatedness
> > +coerce_mode_error.m:052:   `ground',
> > +coerce_mode_error.m:052:   and cannot be converted to the type
> >  coerce_mode_error.m:052:   `coerce_mode_error.list(coerce_mode_error.citrus)'.
> 
> All the error messages here give just one side of the story.
> They give the actual inst, but not the expected inst, or better yet,
> the diff between those two insts.

Right. The hard part may be how to report a problem somewhere deep
in an inst tree.

> 
> The diff is otherwise fine. I would be surprised if it had the bad performance
> of the earlier attempt, but did you measure its impact?

The occurrences I have take no time at all.

> 
> If I were you, I would do only the easy fixes asked-for above before commit,
> and then address the harder ones in a later commit.

Thanks for the review.

Peter


More information about the reviews mailing list