[m-rev.] for review: string trie switches for the LLDS, non-lookup edition
Peter Wang
novalazy at gmail.com
Tue Mar 26 17:04:07 AEDT 2024
On Mon, 25 Mar 2024 21:02:53 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review by anyone. I would appreciate a review
> by tomorrow evening, because I want to commit this before
> starting work on the lookup version.
>
> Besides a review of the meat of the diff, I am looking for feedback
> on two peripheral issues.
>
> One is that this diff exports a test for the use of utf8 by
> the platform the compiler is running on. This works for
> what this diff needs, but I wonder whether people would
> prefer that I replace that with a predicate in (say) globals.m
> that is implemented by three foreign procs, each of which
> returns the compilation_target of the language of the
> foreign_proc.
I expect user programs would also find it useful if we export a function
(from the string module) to take the place of internal_encoding_is_utf8,
e.g.
:- type string_encoding
---> utf8
; utf16.
:- func internal_string_encoding = string_encoding.
>
> The other is: does anyone know why --no-inlining does not
> actually inhibit inlining? Was this a deliberate choice, or just
> an accident?
I don't know.
> diff --git a/compiler/ml_switch_gen.m b/compiler/ml_switch_gen.m
> index feeaec348..a1e8d3d88 100644
> --- a/compiler/ml_switch_gen.m
> +++ b/compiler/ml_switch_gen.m
> @@ -349,7 +350,16 @@ ml_gen_smart_string_switch(Globals, SwitchVar, SwitchVarEntry,
> ml_is_lookup_switch(SwitchVar, FilteredTaggedCases, GoalInfo,
> CodeModel, !.Info, MaybeLookupSwitchInfo),
> globals.get_opt_tuple(Globals, OptTuple),
> + globals.get_target(Globals, Target),
> ( if
> + % The trie switch may use string operations on non-well-formed strings,
> + % including non-well-formed strings that it constructs for itself.
> + % While the C operations we use to do comparisons, strcmp/strncmp,
> + % do not care about that, the comparison operations we use on the
> + % C# and Java backends do (or at least may). We therefore require
> + % that both the host machine and the target machine use the C backend.
> + internal_encoding_is_utf8, % host is C
> + Target = target_c, % target is C
> StringTrieSwitchSize = OptTuple ^ ot_string_trie_switch_size,
> NumConsIds >= StringTrieSwitchSize
> then
The comment is wrong about the C# and Java backends. Just say that trie
switches have only been implemented with C host and target.
> diff --git a/compiler/string_switch.m b/compiler/string_switch.m
> index 553d1fa86..02f984a14 100644
> --- a/compiler/string_switch.m
> +++ b/compiler/string_switch.m
...
> +:- pred convert_trie_to_nested_switches(string_trie_switch_info::in, rval::in,
> + map(case_id, label)::in, int::in, trie_node::in,
> + label::out, llds_code::out, code_info::in, code_info::out) is det.
> +
> +convert_trie_to_nested_switches(TrieSwitchInfo, VarRval, CaseIdToLabelMap,
> + NumMatched, TrieNode, TrieNodeLabel, Code, !CI) :-
> + get_next_label(TrieNodeLabel, !CI),
> + (
> + TrieNode = trie_leaf(RevMatchedCodeUnits, NotYetMatchedCodeUnits,
> + CaseId),
> + list.reverse(RevMatchedCodeUnits, MatchedCodeUnits),
> + AllCodeUnits = MatchedCodeUnits ++ NotYetMatchedCodeUnits,
> + list.length(MatchedCodeUnits, NumMatchedCodeUnits),
> + expect(unify(NumMatchedCodeUnits, NumMatched), $pred,
> + "NumevMatchedCodeUnits != NumMatched"),
> + Encoding = TrieSwitchInfo ^ stsi_encoding,
> + det_from_code_unit_list_in_encoding_allow_ill_formed(Encoding,
> + AllCodeUnits, EndStr),
> + CondRval = binop(offset_str_eq(NumMatched, no_size),
> + VarRval, const(llconst_string(EndStr))),
> + map.lookup(CaseIdToLabelMap, CaseId, StrCaseLabel),
> + StrCaseCodeAddr = code_label(StrCaseLabel),
> + FailLabel = TrieSwitchInfo ^ stsi_fail_label,
> + Code = from_list([
> + llds_instr(label(TrieNodeLabel), ""),
> + llds_instr(if_val(CondRval, StrCaseCodeAddr), "match; goto case"),
> + llds_instr(goto(code_label(FailLabel)), "no match; goto fail")
> + ])
> + ;
> + TrieNode = trie_choice(RevMatchedCodeUnits, _ChoiceMap, MaybeEnd),
> + list.length(RevMatchedCodeUnits, NumRevMatchedCodeUnits),
Fix indentation.
> + expect(unify(NumRevMatchedCodeUnits, NumMatched), $pred,
> + "NumRevMatchedCodeUnits != NumMatched"),
> +
> + CodeUnitLval = TrieSwitchInfo ^ stsi_code_unit_reg,
> + GetCurCodeUnitRval = binop(string_unsafe_index_code_unit,
> + VarRval, const(llconst_int(NumMatched))),
> + LabelCode = singleton(
> + llds_instr(label(TrieNodeLabel), "")
> + ),
> + SetCodeUnitCode = singleton(
> + llds_instr(assign(CodeUnitLval, GetCurCodeUnitRval), "")
> + ),
> + CodeUnitRval = lval(CodeUnitLval),
> + FailLabel = TrieSwitchInfo ^ stsi_fail_label,
> + GotoFailCode = singleton(
> + llds_instr(goto(code_label(FailLabel)), "no match; goto fail")
> + ),
> diff --git a/compiler/switch_gen.m b/compiler/switch_gen.m
> index 21ead087a..70f77bf9a 100644
> --- a/compiler/switch_gen.m
> +++ b/compiler/switch_gen.m
> @@ -299,6 +299,23 @@ generate_smart_string_switch(Globals, CodeModel, CanFail, FilteredCanFail,
> goal_info_get_store_map(GoalInfo, StoreMap),
> globals.get_opt_tuple(Globals, OptTuple),
> ( if
> + % The trie switch may use string operations on non-well-formed strings,
> + % including non-well-formed strings that it constructs for itself.
> + % While the C operations we use to do comparisons, strcmp/strncmp,
> + % do not care about that, the comparison operations we use on the
> + % C# and Java backends do (or at least may). We therefore require
> + % that both the host machine and the target machine use the C backend.
> + % Unlike ml_switch_gen.m, we don't need to check whether
> + % we are targeting C.
> + internal_encoding_is_utf8, % host is C
> + StringTrieSwitchSize = OptTuple ^ ot_string_hash_switch_size,
Same as the other comment.
> + NumConsIds >= StringTrieSwitchSize,
> + MaybeLookupSwitchInfo = no % yes is not yet implemented
> + then
> + generate_string_trie_switch(TaggedCases, SwitchVarRval,
> + SwitchVarName, CodeModel, CanFail, GoalInfo,
> + EndLabel, MaybeEnd, SwitchCode, !CI, !.CLD)
> + else if
> StringHashSwitchSize = OptTuple ^ ot_string_hash_switch_size,
> NumConsIds >= StringHashSwitchSize
> then
We should have a test in which a trie node causes a code point to be
split in the middle of its code unit sequence.
I didn't every line that closely, but it looks fine.
Peter
More information about the reviews
mailing list