[m-rev.] for review: maybe reorder last two cases in try/try-me-else chains

Tue Mar 19 15:24:06 AEDT 2024

On Tue, 19 Mar 2024, Zoltan Somogyi wrote:

> Optimize try and try_me_else chains' last tests.
> 
> compiler/tag_switch.m:
>     In try_chains and try_me_else chains, we can omit the test for the last
>     case if the switch covers all possible sectag values. Optimize try_chain
>     and try_me_else chain switches on sectags by switching the order of the
>     last two cases if this puts the more expensive test of the two
>     into the last position.
>
>     Document what we do for ptags in this regard.
>
>     Factor our the commonalities between try_chains and try_me_else chains,
>     since these commonalities are now nontrivial.
>
>     Use neg_rval to negate tests, since it now does better with the
>     tag test code we now generate than simply adding a logical_not op.
> 
> compiler/code_util.m:
>     Extend the implementation of neg_rval to be able to handle
>     the code we now generate for tag tests.
>

> diff --git a/compiler/tag_switch.m b/compiler/tag_switch.m
> index afa9b98d9..9d59f608c 100644
> --- a/compiler/tag_switch.m
> +++ b/compiler/tag_switch.m

...

> @@ -1062,6 +1073,157 @@ generate_secondary_binary_search(SectagRval, MaybeFailLabel,
>  %
>  % Infrastructure needed for both try-me-else and try chain switches
>  % that test whether a tag (ptag or sectag) is in a given set.
> +%
> +
> +    % This predicate is designed to try to reduce the cost of the last test
> +    % in a chain of tests. The scenario it is designed for is when we are
> +    % generating code for a switch on sectags that cannot fail. (This means
> +    % we have a case for every possible value of the secondary tag, even if
> +    % the tag switch as a whole can fail, which would have to be because
> +    % we don't have a case for a primary tag value)
> +    %
> +    % Obviously we have at least two cases, because if we had only one,
> +    % we wouldn't need a switch at all. In general, we have two or more.
> +    % If two cases, say A and B, have NumSectagsA and NumSectagsB sectag values
> +    % corresponding to them respectively, we prioritize getting to the code
> +    % of case A more cheaply that case B, simply because that minimizes

s/that/than/

> +    % the expected *average* cost. This is why we order sectga groups

s/sectga/sectag/

> +    % in descending order of number of sectags, leading to code structures
> +    % such as the try chain
> +    %
> +    %   if sectag is in Case A's set, goto code of case A
> +    %   if sectag is in Case B's set, goto code of case B
> +    %   if sectag is in Case C's set, goto code of case B
> +    %   ...
> +    %
> +    % where NumSectagsA >= NumSectagsB >= NumSectagsC >= ...
> +    %
> +    % (Try-me-else chains follow the same logic, but switch the role of
> +    % the branch away and the fallthrough.)
> +    %
> +    % If the switch on the sectag cannot fail, then the test on the last
> +    % case can be optimized away, since the failure of the previous tests
> +    % guarantees its success. However, our ordering of the cases guarantees
> +    % that the last case will correspond to at most as many sectags as
> +    % the next-to-last case, and (since the cost of the set membership test
> +    % *may* be higher for a larger ser than for a smaller one), this means

s/ser/set/

> +    % that optimizing away the test for the last case may leave some
> +    % performance on the table. If indeed, the cost of the last case
> +    % (call it case F) is cheaper than the cost of the next-to-last case
> +    % (call it case E), then we don't want to generate code such as
> +    %
> +    %   ... code for previous cases ...
> +    %   if sectag is in case E's set, goto code of case E
> +    %   goto code of case F
> +    %
> +    % Instead, we want to generate code such as
> +    %
> +    %   ... code for previous cases ...
> +    %   if sectag is in case F's set, goto code of case F
> +    %   goto code of case E
> +    %
> +    % They both involve a test and a conditional branch. We do this
> +    % transformation only if the test for F is cheaper than the test for E,
> +    % and the cost of the conditional branch will depend on the performance
> +    % of the CPU's branch prediction mechanisms either way. If for some reason
> +    % it turned out that the structure that branches off to E is better
> +    % for performance, we can still get that effect by using code such as
> +    %
> +    %   ... code for previous cases ...
> +    %   if sectag is NOT in case F's set, goto code of case E
> +    %   goto code of case F
> +    %
> +    % Note that the extra negation exists only in this pseudo-code.
> +    % The actual code generated by neg_rval would replace each comparison
> +    % operation by its opposite (e.g. replacing "eq" with "ne"), and update
> +    % any connectives between the comparisons accordingly, keeping its cost
> +    % the same.
> +    %
> +    % For ptags, the scope for this optimization is smaller, since
> +    % the smaller set of possible values also contrains the number of cases.

s/contrains/constrains/

Looks ok otherwise.

Julien.