[m-rev.] for review: better lookahead for switch detection

Julien Fischer jfischer at opturion.com
Sat Nov 15 01:07:52 AEDT 2025


On Fri, 14 Nov 2025 at 18:50, Zoltan Somogyi <zoltan.somogyi at runbox.com> wrote:

> Let switch detection handle deeper disjunctions.
>
> compiler/switch_detection.m:
>     The existing switch detection algorithm does a single forward
>     traversal of the procedure body, doing only very limited lookahead.
>     This prevents it from recognizing some switches. To fix this, add
>     a new prepass that provides unlimited lookahead.
>
>     Add one use of this lookahead information. Later diffs should add
>     more uses.
>
> compiler/print_help.m:
>     Fix the code that served as motivation for this change.
>     It was a block of unifications that was intended to supply
>     the lookahead that the old algorithm needed but did not have,
>     but it actually lied: it said that the the following disjunction
>     covered cons_ids that it actually did not cover. The fact that
>     the compiler did not diagnose this lie until now was a bug.
>     (After this diff, the compiler now reports an error.)
>
> deep_profiler/message.m:
>     Avoid the limitation of the new algorithm that was discussed today
>     on m-dev.

...

> diff --git a/compiler/switch_detection.m b/compiler/switch_detection.m
> index 458ed7193..cdceef692 100644
> --- a/compiler/switch_detection.m
> +++ b/compiler/switch_detection.m

...

> @@ -1583,6 +1696,837 @@ count_covered_cons_ids([Case | Cases]) = CaseCount + CasesCount :-
>      CaseCount = 1 + list.length(OtherConsIds),
>      CasesCount = count_covered_cons_ids(Cases).
>
> +%---------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> +%
> +% The data structures we use in scout_disjunctions_in_goal,

Finish this first sentence here ...

> which is
> +% a pre-pass before the main procedure body traversal that actually decides
> +% which disjunctions can be turned into switches, and they can possibly

... and turn this into a separate sentence.

Also: *if* they can possibly be turned ...

> +% be turned into more than one kind of switch, which one we should choose.
> +% The result of scouting is information about the terrain ahead, which
> +% in this case means information about deconstruction unifications
> +% and disjunctions that the main traversal has not yet seen.
> +%
> +% At the moment, we use scouting results at only point in the main traversal.
> +% However, this may change in the future.
> +%
> +
> +:- type scout_disj_info
> +    --->    scout_disj_info(
> +                % Conceptually, both of these are read-only, though in
> +                % actuality, we update module_info when we handle cases
> +                % inside switches.
> +                sdi_module_info             :: module_info,
> +                sdi_var_table               :: var_table,
> +
> +                % These are the data structures we are constructing.
> +                % The one we really want is the disjunction_info_map;
> +                % we build the disjunct_info_map as a stepping stone to it.
> +                sdi_disjunction_info_map    :: disjunction_info_map,
> +                sdi_disjunct_info_map       :: disjunct_info_map
> +            ).
> +
> +%---------------------%
> +
> +    % Maps the id of a disjunction to information about that disjunction.
> +:- type disjunction_info_map == map(disjunction_id, disjunction_info).
> +
> +    % Map the id of a disjunct to information about that disjunct.
> +:- type disjunct_info_map == map(disjunct_id, disjunct_info).
> +
> +%---------------------%
> +
> +    % The type whose values we use to identify a disjunction.
> +    % The goal specified by the given goal_id will be a disj(...) goal.
> +:- type disjunction_id
> +    --->    disjunction_id(goal_id).
> +
> +    % The type whose values we use to identify a disjunct in a disjunction.
> +    % The goal specified by the given goal_id will have a disj(...) goal
> +    % as its immediate parent.
> +:- type disjunct_id
> +    --->    disjunct_id(goal_id).
> +
> +%---------------------%
> +
> +    % The information that scouting finds about a disjunction.
> +    % There should be one of these in the disjunction_info_map
> +    % for every disjunction in the procedure body.
> +:- type disjunction_info
> +    --->    disjunction_info(
> +                dni_arms                    :: one_or_more(disjunct_id_info),
> +                % The list of disjuncts in the disjunction.
> +                % This field is not yet used.
> +
> +                dni_summary_map             :: all_arms_summary_map
> +            ).
> +
> +:- type all_arms_summary_map == map(prog_var, var_all_arms_summary).
> +
> +:- type var_all_arms_summary
> +    --->    var_all_arms_summary(
> +                set(switchable_cons_id),
> +                is_sub_disj_needed
> +            ).
> +
> +:- type one_arm_summary_map == map(prog_var, var_one_arm_summary).
> +
> +:- type var_one_arm_summary
> +    --->    voas_deconstruct(deconstruct_info)
> +    ;       voas_sub_disjunction(var_all_arms_summary).

Document the types related to arms here.

> +:- type is_sub_disj_needed
> +    --->    sub_disj_is_not_needed
> +    ;       sub_disj_is_needed.
> +
> +:- type switchable_cons_id =< cons_id
> +    --->    du_data_ctor(du_ctor)
> +    ;       some_int_const(some_int_const)
> +    ;       float_const(float)
> +    ;       char_const(char)
> +    ;       string_const(string).
> +
> +%---------------------%
> +
> +:- type maybe_in_zone
> +    --->    in_zone(disjunct_id)
> +            % We are in one of the disjuncts of a disjunction; the argument
> +            % specifies the disjunct. And we are within the initial sequence
> +            % of unifications within that disjunct. (We treat calls from the
> +            % clause head as unifications for this purpose.)
> +            %
> +            % As soon as we leave this initial part of a disjunct,
> +            % we switch to not_in_zone. The only deconstruction unifications
> +            % we consider for switch detection are the ones that occur
> +            % "in the zone".
> +            %
> +            % We originally adopted this rule to reduce the cost (in compile
> +            % time) of searching for deconstruction unifications that denote
> +            % switch arms. However, converting such a deconstruction
> +            % unification into the test for a switch arm can also change
> +            % the order execution of the disjunct's conjuncts, and restricting
> +            % the reordering to happend only among unifications eliminates

s/happend/happen/

> +            % any concerns about changing the operational semantics of the
> +            % procedure in terms of exceptions being raised or nontermination
> +            % being introduced. (Function calls from clause heads do not have
> +            % a clearly specified order with respect to goals in the clause
> +            % body, which is why we allow reordering with respect to them.)
> +            %
> +            % The effect on compile times is no longer meaningful, but the
> +            % effect on operational semantics is still relevant.
> +            %
> +            % There is also an ergonomic argument here: requiring unifications
> +            % that effectively serve as case constants in C switch statements
> +            % to be near the start of their switch arms keeps code readable,
> +            % compared to a hypothetical alternative arrangement in which
> +            % we allow unifications from the ends of possibly-long disjuncts
> +            % to provide the cons_id that identifies a switch arm.
> +    ;       not_in_zone.

> +
> +:- type disjunct_id_info
> +    --->    disjunct_id_info(disjunct_id, disjunct_info).
> +
> +    % The information that scouting finds about a disjunct.
> +    % There should be one of these in the disjunct_info_map
> +    % for every disjunct in the procedure body.
> +:- type disjunct_info
> +    --->    disjunct_info(
> +                di_iz_deconstruct_map       :: in_zone_deconstruct_map,
> +                % We record info about at most one disjunction, since
> +                % a disjunction ends the zone.
> +                di_iz_sub_disjunctions      :: maybe(disjunction_id)
> +            ).
> +
> +    % Note that we map not just the deconstructed variable
> +    % to a deconstruct_info, but also all variables equivalent to it.
> +:- type in_zone_deconstruct_map == map(prog_var, deconstruct_info).
> +
> +:- type deconstruct_info
> +    --->    deconstruct_info(
> +                % This goal ...
> +                goal_id,
> +
> +                % ... deconstructs this variable ...
> +                prog_var,
> +
> +                % ... which is part of this equivalence class ...
> +                set(prog_var),
> +
> +                % ... with this cons_id.
> +                switchable_cons_id
> +            ).
> +
> +%---------------------------------------------------------------------------%
> +
> +:- pred scout_disjunctions_in_goal(hlds_goal::in, instmap::in, instmap::out,
> +    maybe_in_zone::in, maybe_in_zone::out, subst_db::in, subst_db::out,
> +    scout_disj_info::in, scout_disj_info::out) is det.
> +
> +scout_disjunctions_in_goal(Goal, InstMap0, InstMap,
> +        !InZone, !SubstDb, !ScoutInfo) :-
> +    Goal = hlds_goal(GoalExpr, GoalInfo),
> +    (
> +        GoalExpr = unify(_, _, _, _, _),
> +        scout_disjunctions_in_unify_expr(GoalExpr, GoalInfo, InstMap0,
> +            !.InZone, !SubstDb, !ScoutInfo)
> +    ;
> +        ( GoalExpr = generic_call(_, _, _, _, _)
> +        ; GoalExpr = plain_call(_, _, _, _, _, _)
> +        ; GoalExpr = call_foreign_proc(_, _, _, _, _, _, _)
> +        ),
> +        ( if goal_info_has_feature(GoalInfo, feature_from_head) then
> +            true
> +        else
> +            !:InZone = not_in_zone
> +        )
> +    ;
> +        GoalExpr = conj(ConjType, Conjuncts),
> +        (
> +            ConjType = plain_conj,
> +            scout_disjunctions_in_conjuncts(Conjuncts, InstMap0,
> +                !InZone, !SubstDb, !ScoutInfo)
> +        ;
> +            ConjType = parallel_conj,
> +            (
> +                Conjuncts = []
> +            ;
> +                Conjuncts = [HeadConjunct | TailConjuncts],
> +                % The first parallel conjunct can be in the zone;
> +                % any later conjuncts cannot be in the zone.
> +                scout_disjunctions_in_goal(HeadConjunct, InstMap0, InstMap1,
> +                    !.InZone, _, !SubstDb, !ScoutInfo),
> +                scout_disjunctions_in_conjuncts(TailConjuncts, InstMap1,
> +                    not_in_zone, _, !SubstDb, !ScoutInfo),
> +                !:InZone = not_in_zone
> +            )
> +        )
> +    ;
> +        GoalExpr = disj(Disjuncts),
> +        (
> +            Disjuncts = []
> +        ;
> +            Disjuncts = [HeadDisjunct | TailDisjuncts],
> +            scout_disjunctions_in_disjuncts(HeadDisjunct, TailDisjuncts,
> +                InstMap0, !.SubstDb,
> +                HeadDisjunctIdInfo, TailDisjunctIdInfos, !ScoutInfo),
> +
> +            OoMDisjunctIdsInfos =
> +                one_or_more(HeadDisjunctIdInfo, TailDisjunctIdInfos),
> +            construct_disjunction_info(!.ScoutInfo, OoMDisjunctIdsInfos,
> +                DisjunctionInfo),
> +
> +            DisjunctionGoalId = goal_info_get_goal_id(GoalInfo),
> +            DisjunctionId = disjunction_id(DisjunctionGoalId),
> +            DisjunctionInfoMap0 = !.ScoutInfo ^ sdi_disjunction_info_map,
> +            map.det_insert(DisjunctionId, DisjunctionInfo,
> +                DisjunctionInfoMap0, DisjunctionInfoMap),
> +            !ScoutInfo ^ sdi_disjunction_info_map := DisjunctionInfoMap,
> +
> +            (
> +                !.InZone = in_zone(DisjunctId),
> +                DisjunctInfoMap0 = !.ScoutInfo ^ sdi_disjunct_info_map,
> +                map.lookup(DisjunctInfoMap0, DisjunctId, DisjunctInfo0),
> +                DisjunctInfo0 =
> +                    disjunct_info(DeconstructMap0, SubDisjunctions0),
> +                expect(unify(SubDisjunctions0, no), $pred,
> +                    "SubDisjunctions0 != no"),
> +                SubDisjunctions = yes(DisjunctionId),
> +                DisjunctInfo = disjunct_info(DeconstructMap0, SubDisjunctions),
> +                map.det_update(DisjunctId, DisjunctInfo,
> +                    DisjunctInfoMap0, DisjunctInfoMap),
> +                !ScoutInfo ^ sdi_disjunct_info_map := DisjunctInfoMap
> +            ;
> +                !.InZone = not_in_zone
> +            ),
> +            !:InZone = not_in_zone
> +        )
> +    ;
> +        GoalExpr = switch(Var, _CanFail, Cases),
> +        scout_disjunctions_in_cases(Var, Cases, InstMap0,
> +            !.SubstDb, !ScoutInfo),
> +        !:InZone = not_in_zone
> +    ;
> +        GoalExpr = if_then_else(_Vars, Cond, Then, Else),
> +        scout_disjunctions_in_goal(Cond, InstMap0, InstMap1,
> +            not_in_zone, _, !.SubstDb, SubstDbCond, !ScoutInfo),
> +        scout_disjunctions_in_goal(Then, InstMap1, _,
> +            not_in_zone, _, SubstDbCond, _, !ScoutInfo),
> +        scout_disjunctions_in_goal(Else, InstMap0, _,
> +            not_in_zone, _, !.SubstDb, _, !ScoutInfo),
> +        !:InZone = not_in_zone
> +    ;
> +        GoalExpr = negation(SubGoal),
> +        scout_disjunctions_in_goal(SubGoal, InstMap0, _,
> +            not_in_zone, _, !.SubstDb, _, !ScoutInfo),
> +        !:InZone = not_in_zone
> +    ;
> +        GoalExpr = scope(Reason, SubGoal),
> +        (
> +            Reason = from_ground_term(_, FgtKind),
> +            (
> +                FgtKind = from_ground_term_deconstruct,
> +                SubGoal = hlds_goal(SubGoalExpr, _),
> +                ( if
> +                    SubGoalExpr = conj(plain_conj, [HeadSubGoal | _]),
> +                    HeadSubGoal = hlds_goal(HeadSubGoalExpr, HeadSubGoalInfo),
> +                    HeadSubGoalExpr = unify(_, _, _, _, _)
> +                then
> +                    scout_disjunctions_in_unify_expr(HeadSubGoalExpr,
> +                        HeadSubGoalInfo, InstMap0, !.InZone,
> +                        !SubstDb, !ScoutInfo)
> +                    % Ignore the goals after HeadSubGoal; nothing in them
> +                    % could interest us.
> +                else
> +                    unexpected($pred, "unexpected goal in fgt scope")
> +                )
> +            ;
> +                ( FgtKind = from_ground_term_initial
> +                ; FgtKind = from_ground_term_construct
> +                ; FgtKind = from_ground_term_other
> +                )
> +                % Ignore the scope; nothing in it could interest us.
> +            )
> +        ;
> +            ( Reason = exist_quant(_, _)
> +            ; Reason = disable_warnings(_, _)
> +            ; Reason = promise_solutions(_, _)
> +            ; Reason = promise_purity(_)
> +            ; Reason = require_detism(_)
> +            ; Reason = commit(_)
> +            ; Reason = barrier(_)
> +            ; Reason = trace_goal(_, _, _, _, _)
> +            ; Reason = loop_control(_, _, _)
> +            ),
> +            scout_disjunctions_in_goal(SubGoal, InstMap0, _,
> +                !.InZone, _, !.SubstDb, _, !ScoutInfo)
> +        ;
> +            ( Reason = require_complete_switch(_RequiredVar)
> +            ; Reason = require_switch_arms_detism(_RequiredVar, _)
> +            ),
> +            scout_disjunctions_in_goal(SubGoal, InstMap0, _,
> +                not_in_zone, !:InZone, !.SubstDb, _, !ScoutInfo),
> +            expect(unify(!.InZone, not_in_zone), $pred,
> +                "in_zone after switch-related reason")
> +        )
> +    ;
> +        GoalExpr = shorthand(ShortHand),
> +        (
> +            ShortHand = atomic_goal(_GoalType, _Outer, _Inner,
> +                _MaybeOutputVars, MainGoal, OrElseGoals, _OrElseInners),
> +            scout_disjunctions_in_goal(MainGoal, InstMap0, _,
> +                not_in_zone, _, !.SubstDb, _, !ScoutInfo),
> +            scout_disjunctions_in_orelse_goals(OrElseGoals, InstMap0,
> +                !.SubstDb, !ScoutInfo),
> +            !:InZone = not_in_zone
> +        ;
> +            ShortHand = try_goal(_MaybeIO, _ResultVar, SubGoal),
> +            scout_disjunctions_in_goal(SubGoal, InstMap0, _,
> +                not_in_zone, _, !.SubstDb, _, !ScoutInfo),
> +            !:InZone = not_in_zone
> +        ;
> +            ShortHand = bi_implication(_, _),
> +            % These should have been expanded out by now.
> +            unexpected($pred, "bi_implication")
> +        )
> +    ),
> +    apply_goal_instmap_delta(Goal, InstMap0, InstMap).
> +
> +:- pred scout_disjunctions_in_unify_expr(hlds_goal_expr::in(goal_expr_unify),
> +    hlds_goal_info::in, instmap::in, maybe_in_zone::in,
> +    subst_db::in, subst_db::out,
> +    scout_disj_info::in, scout_disj_info::out) is det.
> +
> +scout_disjunctions_in_unify_expr(GoalExpr, GoalInfo, InstMap0,
> +        InZone0, !SubstDb, !ScoutInfo) :-
> +    GoalExpr = unify(XVar, RHS, _, Unification, _),
> +    % For both rhs_var and rhs_functor, we record the effect of the
> +    % unification on the substitution database even when we are
> +    % outside the zone. This extra info won't help us find more
> +    % aliases for in-zone deconstructions this disjunct (since there

*in* this disjunct

> +    % aren't any more past the end of the zone), but the extra information
> +    % in the substitution database may help us find more aliases inside
> +    % nested disjunctions.
> +    (
> +        RHS = rhs_lambda_goal(_, _, _, _, VarsModes, _, LambdaGoal),
> +        % We need to insert the initial insts for the lambda variables
> +        % in the instmap before processing the lambda goal.

s/in/into/

> +        ModuleInfo = !.ScoutInfo ^ sdi_module_info,
> +        instmap.pre_lambda_update(ModuleInfo, VarsModes, InstMap0, InstMap1),
> +        % LambdaGoal is may be in_zone from the point of view of the code

Delete "is".

> +        % outside this unification, but the proper perspective for
> +        % this call is the code *inside* the lambda goal. And from that
> +        % point of view, LambdaGoal is not inside *any* disjunction.
> +        scout_disjunctions_in_goal(LambdaGoal, InstMap1, _,
> +            not_in_zone, _, !.SubstDb, _, !ScoutInfo)
> +    ;
> +        RHS = rhs_var(YVar),
> +        record_var_var_unify(XVar, YVar, !SubstDb)
> +    ;
> +        RHS = rhs_functor(ConsId, _IsExistConstr, YVars),
> +        (
> +            ( ConsId = du_data_ctor(_)
> +            ; ConsId = some_int_const(_)
> +            ; ConsId = float_const(_)
> +            ; ConsId = char_const(_)
> +            ; ConsId = string_const(_)
> +            ),
> +            (
> +                Unification = assign(_, _),
> +                unexpected($pred, "assign")
> +            ;
> +                Unification = simple_test(_, _),
> +                unexpected($pred, "simple_test")
> +            ;
> +                Unification = construct(_, _, _, _, _, _, _)
> +            ;
> +                Unification = deconstruct(_, _, _, _, _, _),
> +                (
> +                    InZone0 = in_zone(DisjunctId),
> +                    GoalId = goal_info_get_goal_id(GoalInfo),
> +                    record_deconstruct(GoalId, XVar, coerce(ConsId),
> +                        !.SubstDb, DisjunctId, !ScoutInfo)
> +                ;
> +                    InZone0 = not_in_zone
> +                )
> +            ;
> +                Unification = complicated_unify(_, _, _),
> +                unexpected($pred, "test")

s/test/complicated_unify/

That looks fine otherwise.

Julien.


More information about the reviews mailing list