[m-rev.] for review: string trie switches for the LLDS, non-lookup edition

Peter Wang novalazy at gmail.com
Tue Mar 26 17:04:07 AEDT 2024


On Mon, 25 Mar 2024 21:02:53 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review by anyone. I would appreciate a review
> by tomorrow evening, because I want to commit this before
> starting work on the lookup version.
> 
> Besides a review of the meat of the diff, I am looking for feedback
> on two peripheral issues.
> 
> One is that this diff exports a test for the use of utf8 by
> the platform the compiler is running on. This works for
> what this diff needs, but I wonder whether people would
> prefer that I replace that with a predicate in (say) globals.m
> that is implemented by three foreign procs, each of which
> returns the compilation_target of the language of the
> foreign_proc.

I expect user programs would also find it useful if we export a function
(from the string module) to take the place of internal_encoding_is_utf8,
e.g.

    :- type string_encoding
	--->	utf8
	;	utf16.

    :- func internal_string_encoding = string_encoding.

> 
> The other is: does anyone know why --no-inlining does not
> actually inhibit inlining? Was this a deliberate choice, or just
> an accident?

I don't know.

> diff --git a/compiler/ml_switch_gen.m b/compiler/ml_switch_gen.m
> index feeaec348..a1e8d3d88 100644
> --- a/compiler/ml_switch_gen.m
> +++ b/compiler/ml_switch_gen.m
> @@ -349,7 +350,16 @@ ml_gen_smart_string_switch(Globals, SwitchVar, SwitchVarEntry,
>      ml_is_lookup_switch(SwitchVar, FilteredTaggedCases, GoalInfo,
>          CodeModel, !.Info, MaybeLookupSwitchInfo),
>      globals.get_opt_tuple(Globals, OptTuple),
> +    globals.get_target(Globals, Target),
>      ( if
> +        % The trie switch may use string operations on non-well-formed strings,
> +        % including non-well-formed strings that it constructs for itself.
> +        % While the C operations we use to do comparisons, strcmp/strncmp,
> +        % do not care about that, the comparison operations we use on the
> +        % C# and Java backends do (or at least may). We therefore require
> +        % that both the host machine and the target machine use the C backend.
> +        internal_encoding_is_utf8,  % host is C
> +        Target = target_c,          % target is C
>          StringTrieSwitchSize = OptTuple ^ ot_string_trie_switch_size,
>          NumConsIds >= StringTrieSwitchSize
>      then

The comment is wrong about the C# and Java backends. Just say that trie
switches have only been implemented with C host and target.

> diff --git a/compiler/string_switch.m b/compiler/string_switch.m
> index 553d1fa86..02f984a14 100644
> --- a/compiler/string_switch.m
> +++ b/compiler/string_switch.m
...
> +:- pred convert_trie_to_nested_switches(string_trie_switch_info::in, rval::in,
> +    map(case_id, label)::in, int::in, trie_node::in,
> +    label::out, llds_code::out, code_info::in, code_info::out) is det.
> +
> +convert_trie_to_nested_switches(TrieSwitchInfo, VarRval, CaseIdToLabelMap,
> +        NumMatched, TrieNode, TrieNodeLabel, Code, !CI) :-
> +    get_next_label(TrieNodeLabel, !CI),
> +    (
> +        TrieNode = trie_leaf(RevMatchedCodeUnits, NotYetMatchedCodeUnits,
> +            CaseId),
> +        list.reverse(RevMatchedCodeUnits, MatchedCodeUnits),
> +        AllCodeUnits = MatchedCodeUnits ++ NotYetMatchedCodeUnits,
> +        list.length(MatchedCodeUnits, NumMatchedCodeUnits),
> +        expect(unify(NumMatchedCodeUnits, NumMatched), $pred,
> +            "NumevMatchedCodeUnits != NumMatched"),
> +        Encoding = TrieSwitchInfo ^ stsi_encoding,
> +        det_from_code_unit_list_in_encoding_allow_ill_formed(Encoding,
> +            AllCodeUnits, EndStr),
> +        CondRval = binop(offset_str_eq(NumMatched, no_size),
> +            VarRval, const(llconst_string(EndStr))),
> +        map.lookup(CaseIdToLabelMap, CaseId, StrCaseLabel),
> +        StrCaseCodeAddr = code_label(StrCaseLabel),
> +        FailLabel = TrieSwitchInfo ^ stsi_fail_label,
> +        Code = from_list([
> +            llds_instr(label(TrieNodeLabel), ""),
> +            llds_instr(if_val(CondRval, StrCaseCodeAddr), "match; goto case"),
> +            llds_instr(goto(code_label(FailLabel)), "no match; goto fail")
> +        ])
> +    ;
> +        TrieNode = trie_choice(RevMatchedCodeUnits, _ChoiceMap, MaybeEnd),
> +                list.length(RevMatchedCodeUnits, NumRevMatchedCodeUnits),

Fix indentation.

> +        expect(unify(NumRevMatchedCodeUnits, NumMatched), $pred,
> +            "NumRevMatchedCodeUnits != NumMatched"),
> +
> +        CodeUnitLval = TrieSwitchInfo ^ stsi_code_unit_reg,
> +        GetCurCodeUnitRval = binop(string_unsafe_index_code_unit,
> +            VarRval, const(llconst_int(NumMatched))),
> +        LabelCode = singleton(
> +            llds_instr(label(TrieNodeLabel), "")
> +        ),
> +        SetCodeUnitCode = singleton(
> +            llds_instr(assign(CodeUnitLval, GetCurCodeUnitRval), "")
> +        ),
> +        CodeUnitRval = lval(CodeUnitLval),
> +        FailLabel = TrieSwitchInfo ^ stsi_fail_label,
> +        GotoFailCode = singleton(
> +            llds_instr(goto(code_label(FailLabel)), "no match; goto fail")
> +        ),

> diff --git a/compiler/switch_gen.m b/compiler/switch_gen.m
> index 21ead087a..70f77bf9a 100644
> --- a/compiler/switch_gen.m
> +++ b/compiler/switch_gen.m
> @@ -299,6 +299,23 @@ generate_smart_string_switch(Globals, CodeModel, CanFail, FilteredCanFail,
>      goal_info_get_store_map(GoalInfo, StoreMap),
>      globals.get_opt_tuple(Globals, OptTuple),
>      ( if
> +        % The trie switch may use string operations on non-well-formed strings,
> +        % including non-well-formed strings that it constructs for itself.
> +        % While the C operations we use to do comparisons, strcmp/strncmp,
> +        % do not care about that, the comparison operations we use on the
> +        % C# and Java backends do (or at least may). We therefore require
> +        % that both the host machine and the target machine use the C backend.
> +        % Unlike ml_switch_gen.m, we don't need to check whether
> +        % we are targeting C.
> +        internal_encoding_is_utf8,  % host is C
> +        StringTrieSwitchSize = OptTuple ^ ot_string_hash_switch_size,

Same as the other comment.

> +        NumConsIds >= StringTrieSwitchSize,
> +        MaybeLookupSwitchInfo = no  % yes is not yet implemented
> +    then
> +        generate_string_trie_switch(TaggedCases, SwitchVarRval,
> +            SwitchVarName, CodeModel, CanFail, GoalInfo,
> +            EndLabel, MaybeEnd, SwitchCode, !CI, !.CLD)
> +    else if
>          StringHashSwitchSize = OptTuple ^ ot_string_hash_switch_size,
>          NumConsIds >= StringHashSwitchSize
>      then

We should have a test in which a trie node causes a code point to be
split in the middle of its code unit sequence.

I didn't every line that closely, but it looks fine.

Peter


More information about the reviews mailing list