[m-rev.] For review by Fergus: Loop Invariant Hoisting

Ralph Becket rafe at cs.mu.OZ.AU
Wed Oct 30 12:48:37 AEDT 2002


Fergus Henderson, Wednesday, 16 October 2002:
> On 16-Oct-2002, Ralph Becket <rafe at cs.mu.OZ.AU> wrote:
> > I've just done a bootcheck on an updated tree and all is well.
> 
> I presume you bootchecked with the optimization enabled?
> (E.g. -O4 or greater?)

Bootchecked in -O2 to -O6 in both asm_fast.gc and hlc.gc, each with
--no-loop-invariants and --loop-invariants.

> Also, what effect does this have on performance?  (E.g. for the compiler
> itself, or the benchmarks in the `bench' directory).  We don't want to
> enable the optimization by default if it ends up making performance worse.
> It would be a good idea to run some benchmarks to double-check that the
> intended optimization really does improve things.

In asm_fast.gc:
mercury_compile.01 average of 12 with ignore=1     52.88
mercury_compile.02 average of 12 with ignore=1     52.31
mercury_compile.03 average of 12 with ignore=1     51.40
mercury_compile.04 average of 12 with ignore=1     51.96
mercury_compile.05 average of 12 with ignore=1     50.72
mercury_compile.06 average of 12 with ignore=1     50.62
mercury_compile.07 average of 12 with ignore=1     50.37
mercury_compile.08 average of 12 with ignore=1     50.55
mercury_compile.09 average of 12 with ignore=1     56.99
mercury_compile.10 average of 12 with ignore=1     57.46

In hlc.gc:
mercury_compile.11 average of 12 with ignore=1     50.39
mercury_compile.12 average of 12 with ignore=1     50.44
mercury_compile.13 average of 12 with ignore=1     52.62
mercury_compile.14 average of 12 with ignore=1     50.80
mercury_compile.15 average of 12 with ignore=1     50.38
mercury_compile.16 average of 12 with ignore=1     50.36
mercury_compile.17 average of 12 with ignore=1     51.91
mercury_compile.18 average of 12 with ignore=1     50.67
mercury_compile.19 average of 12 with ignore=1     51.61
mercury_compile.20 average of 12 with ignore=1     50.45

So it's not pessimizing.  The reason we see no general improvement is, I
suspect, that almost all code written in the compiler has had any loop
invariants already hoisted by hand.

> How much space cost does it have?
> Should it be disabled if --optimize-space is specified?

The space cost is reasonably small (with/without optimization):

linv2.mercury_compile.01 7141956 bytes
linv2.mercury_compile.02 7232452 bytes

linv2.mercury_compile.03 7125124 bytes
linv2.mercury_compile.04 7209316 bytes

linv2.mercury_compile.05 6862404 bytes
linv2.mercury_compile.06 6945092 bytes

linv2.mercury_compile.07 7055428 bytes
linv2.mercury_compile.08 7146084 bytes

linv2.mercury_compile.09 6518852 bytes
linv2.mercury_compile.10 6609444 bytes

linv2.mercury_compile.11 4942144 bytes
linv2.mercury_compile.12 5003520 bytes

linv2.mercury_compile.13 4954368 bytes
linv2.mercury_compile.14 5007904 bytes

linv2.mercury_compile.15 5011712 bytes
linv2.mercury_compile.16 5064960 bytes

linv2.mercury_compile.17 5175552 bytes
linv2.mercury_compile.18 5236992 bytes

linv2.mercury_compile.19 5175552 bytes
linv2.mercury_compile.20 5236992 bytes

I've addressed your review comments:

Estimated hours taken: 900
Branches: main

Added loop-invariant hoisting optimization.

liveness.m:
	Improved the debugging output to show HLDS before and after
	liveness analysis.

loop_inv.m:
	New module containing the implementation of the loop invariant
	hoisting optimization.

mercury_compile.m:
	Added loop_inv at step 34, moving inlining to step 35.

options.m:
	Added bool option --loop-invariants (default `no').
	This optimization is set at -O4.

passes_aux.m:
	Minor changes to support introduction of the loop
	invariant hoisting optmimization.

follow_code.m:
	Updated to reflect the new interface to passes_aux.m.

transform_hlds.m:
	Added include_module declaration for loop_inv.

Relative diff:

diff -u compiler/liveness.m compiler/liveness.m
--- compiler/liveness.m	16 Oct 2002 01:58:25 -0000
+++ compiler/liveness.m	25 Oct 2002 03:53:54 -0000
@@ -215,20 +215,20 @@
 	globals__lookup_int_option(Globals, debug_liveness, DebugLiveness),
 	pred_id_to_int(PredId, PredIdInt),
 	maybe_write_progress_message("\nbefore liveness",
-		DebugLiveness, PredIdInt, Goal0, VarSet, ModuleInfo, IO0, IO0b),
+		DebugLiveness, PredIdInt, Goal0, VarSet, ModuleInfo, IO0, IO1),
 
 	initial_liveness(ProcInfo1, PredId, ModuleInfo, Liveness0),
 	detect_liveness_in_goal(Goal0, Liveness0, LiveInfo,
 		_, Goal1),
 
 	maybe_write_progress_message("\nafter liveness",
-		DebugLiveness, PredIdInt, Goal1, VarSet, ModuleInfo, IO0b, IO1),
+		DebugLiveness, PredIdInt, Goal1, VarSet, ModuleInfo, IO1, IO2),
 
 	initial_deadness(ProcInfo1, LiveInfo, ModuleInfo, Deadness0),
 	detect_deadness_in_goal(Goal1, Deadness0, Liveness0, LiveInfo,
 		_, Goal2),
 	maybe_write_progress_message("\nafter deadness",
-		DebugLiveness, PredIdInt, Goal2, VarSet, ModuleInfo, IO1, IO2),
+		DebugLiveness, PredIdInt, Goal2, VarSet, ModuleInfo, IO2, IO3),
 
 	(
 		globals__get_trace_level(Globals, TraceLevel),
@@ -240,10 +240,10 @@
 		delay_death_proc_body(Goal2, VarSet, Liveness0, Goal3),
 		maybe_write_progress_message("\nafter delay death",
 			DebugLiveness, PredIdInt, Goal3, VarSet, ModuleInfo,
-			IO2, IO3)
+			IO3, IO4)
 	;
 		Goal3 = Goal2,
-		IO3 = IO2
+		IO4 = IO3
 	),
 
 	globals__get_trace_level(Globals, TraceLevel),
@@ -255,7 +255,7 @@
 	detect_resume_points_in_goal(Goal3, Liveness0, LiveInfo,
 		ResumeVars0, Goal, _),
 	maybe_write_progress_message("\nafter resume point",
-		DebugLiveness, PredIdInt, Goal, VarSet, ModuleInfo, IO3, IO),
+		DebugLiveness, PredIdInt, Goal, VarSet, ModuleInfo, IO4, IO),
 	proc_info_set_goal(ProcInfo1, Goal, ProcInfo2),
 	proc_info_set_liveness_info(ProcInfo2, Liveness0, ProcInfo).
 
diff -u compiler/loop_inv.m compiler/loop_inv.m
--- compiler/loop_inv.m	16 Oct 2002 04:33:38 -0000
+++ compiler/loop_inv.m	25 Oct 2002 06:33:15 -0000
@@ -2,7 +2,7 @@
 % loop_inv.m
 % Ralph Becket <rbeck at microsoft.com>
 % Thu Nov  8 12:05:29 EST 2001
-% vim: ft=mercury ff=unix ts=4 sw=4 et tw=0 wm=0
+% vim: ft=mercury ts=4 sw=4 et tw=0 wm=0
 %
 %------------------------------------------------------------------------------%
 %
@@ -138,7 +138,7 @@
     ( if
 
             % We only want to apply this optimization to pure preds (e.g.
-            % benchmark_det_loop).
+            % not benchmark_det_loop).
             %
         hlds_pred__pred_info_get_purity(PredInfo, pure),
 
@@ -615,8 +615,6 @@
     % A constant construction is a construction unification with no
     % arguments or which is constructed from a statically initialized
     % constant.
-    %
-    % NOTE: we could do better, but it's not clear it's worth it.
     %
 :- pred const_construction(hlds_goal).
 :- mode const_construction(in) is semidet.
diff -u compiler/mercury_compile.m compiler/mercury_compile.m
--- compiler/mercury_compile.m	16 Oct 2002 01:58:28 -0000
+++ compiler/mercury_compile.m	25 Oct 2002 07:09:36 -0000
@@ -1869,48 +1869,51 @@
 			Verbose, Stats, HLDS33),
 	mercury_compile__maybe_dump_hlds(HLDS33, "33", "accum"),
 
-	mercury_compile__maybe_loop_inv(HLDS33, Verbose, Stats, HLDS34),
-	mercury_compile__maybe_dump_hlds(HLDS34, "34", "loop_inv"),
+	% Hoisting loop invariants first invokes pass 34, "mark_static".
+	% "mark_static" is also run at stage 60.
+	%
+	mercury_compile__maybe_loop_inv(HLDS33, Verbose, Stats, HLDS35),
+	mercury_compile__maybe_dump_hlds(HLDS36, "35", "loop_inv"),
 
-	mercury_compile__maybe_do_inlining(HLDS34, Verbose, Stats, HLDS35),
-	mercury_compile__maybe_dump_hlds(HLDS35, "35", "inlining"),
+	mercury_compile__maybe_do_inlining(HLDS35, Verbose, Stats, HLDS36),
+	mercury_compile__maybe_dump_hlds(HLDS36, "36", "inlining"),
 
-	mercury_compile__maybe_deforestation(HLDS35, Verbose, Stats, HLDS36),
-	mercury_compile__maybe_dump_hlds(HLDS36, "36", "deforestation"),
+	mercury_compile__maybe_deforestation(HLDS36, Verbose, Stats, HLDS37),
+	mercury_compile__maybe_dump_hlds(HLDS37, "37", "deforestation"),
 
-	mercury_compile__maybe_delay_construct(HLDS36, Verbose, Stats, HLDS37),
-	mercury_compile__maybe_dump_hlds(HLDS37, "37", "delay_construct"),
+	mercury_compile__maybe_delay_construct(HLDS37, Verbose, Stats, HLDS38),
+	mercury_compile__maybe_dump_hlds(HLDS38, "38", "delay_construct"),
 
-	mercury_compile__maybe_unused_args(HLDS37, Verbose, Stats, HLDS39),
-	mercury_compile__maybe_dump_hlds(HLDS39, "39", "unused_args"),
+	mercury_compile__maybe_unused_args(HLDS38, Verbose, Stats, HLDS40),
+	mercury_compile__maybe_dump_hlds(HLDS40, "40", "unused_args"),
 
-	mercury_compile__maybe_unneeded_code(HLDS39, Verbose, Stats, HLDS40),
-	mercury_compile__maybe_dump_hlds(HLDS40, "40", "unneeded_code"),
+	mercury_compile__maybe_unneeded_code(HLDS40, Verbose, Stats, HLDS41),
+	mercury_compile__maybe_dump_hlds(HLDS41, "41", "unneeded_code"),
 
-	mercury_compile__maybe_lco(HLDS40, Verbose, Stats, HLDS42),
-	mercury_compile__maybe_dump_hlds(HLDS42, "42", "lco"),
+	mercury_compile__maybe_lco(HLDS41, Verbose, Stats, HLDS43),
+	mercury_compile__maybe_dump_hlds(HLDS43, "43", "lco"),
 
 	% DNF transformations should be after inlining.
-	mercury_compile__maybe_transform_dnf(HLDS40, Verbose, Stats, HLDS44),
-	mercury_compile__maybe_dump_hlds(HLDS44, "44", "dnf"),
+	mercury_compile__maybe_transform_dnf(HLDS41, Verbose, Stats, HLDS45),
+	mercury_compile__maybe_dump_hlds(HLDS45, "45", "dnf"),
 
 	% Magic sets should be the last thing done to Aditi procedures
 	% before RL code generation, and must come immediately after DNF.
-	mercury_compile__maybe_magic(HLDS44, Verbose, Stats, HLDS46),
-	mercury_compile__maybe_dump_hlds(HLDS46, "46", "magic"),
+	mercury_compile__maybe_magic(HLDS45, Verbose, Stats, HLDS47),
+	mercury_compile__maybe_dump_hlds(HLDS47, "47", "magic"),
 
-	mercury_compile__maybe_dead_procs(HLDS46, Verbose, Stats, HLDS48),
-	mercury_compile__maybe_dump_hlds(HLDS48, "48", "dead_procs"),
+	mercury_compile__maybe_dead_procs(HLDS47, Verbose, Stats, HLDS49),
+	mercury_compile__maybe_dump_hlds(HLDS49, "49", "dead_procs"),
 
 	% Deep profiling transformation should be done late in the piece
 	% since it munges the code a fair amount and introduces strange
 	% disjunctions that might confuse other hlds->hlds transformations.
-	mercury_compile__maybe_deep_profiling(HLDS48, Verbose, Stats, HLDS49,
+	mercury_compile__maybe_deep_profiling(HLDS49, Verbose, Stats, HLDS50,
 		DeepProfilingStructures),
-	mercury_compile__maybe_dump_hlds(HLDS49, "49", "deep_profiling"),
+	mercury_compile__maybe_dump_hlds(HLDS50, "50", "deep_profiling"),
 
-	{ HLDS49 = HLDS50 },
-	mercury_compile__maybe_dump_hlds(HLDS50, "50", "middle_pass").
+	{ HLDS50 = HLDS51 },
+	mercury_compile__maybe_dump_hlds(HLDS51, "51", "middle_pass").
 
 %-----------------------------------------------------------------------------%
 
@@ -2838,12 +2841,20 @@
 mercury_compile__maybe_loop_inv(HLDS0, Verbose, Stats, HLDS) -->
 	globals__io_lookup_bool_option(loop_invariants, LoopInv),
 	( { LoopInv = yes } ->
+
+			% We run the mark_static pass because we need
+			% the construct_how flag to be valid.
+			%
+		mercury_compile__maybe_mark_static_terms(HLDS0, Verbose, Stats,
+			HLDS1),
+		mercury_compile__maybe_dump_hlds(HLDS1, "34", "mark_static"),
+
 		maybe_write_string(Verbose,
 			"% Hoisting loop invariants...\n"),
 		maybe_flush_output(Verbose),
 		process_all_nonimported_procs(
 			update_module(hoist_loop_invariants),
-			HLDS0, HLDS),
+			HLDS1, HLDS),
 		maybe_write_string(Verbose, "% done.\n"),
 		maybe_report_stats(Stats)
 	;
diff -u compiler/options.m compiler/options.m
--- compiler/options.m	16 Oct 2002 01:58:35 -0000
+++ compiler/options.m	25 Oct 2002 07:22:03 -0000
@@ -414,7 +414,6 @@
 		;	optimize_saved_vars_cell_all_path_node_ratio
 		;	optimize_saved_vars_cell_include_all_candidates
 		;	optimize_saved_vars
-		;	loop_invariants
 		;	delay_construct
 		;	follow_code
 		;	prev_code
@@ -435,6 +434,7 @@
 		;	optimize_saved_vars_cell_all_path_node_ratio
 		;	optimize_saved_vars_cell_include_all_candidates
 		;	optimize_saved_vars
+		;	loop_invariants
 		;	delay_construct
 		;	follow_code
 		;	prev_code
@@ -923,7 +923,6 @@
 	optimize_duplicate_calls -	bool(no),
 	constant_propagation	-	bool(no),
 	excess_assign		-	bool(no),
-	loop_invariants		-	bool(no),
 	optimize_saved_vars_const	-	bool(no),
 	optimize_saved_vars_cell	-	bool(no),
 	optimize_saved_vars_cell_loop	-	bool(yes),
@@ -944,6 +943,7 @@
 	optimize_duplicate_calls -	bool(no),
 	constant_propagation	-	bool(no),
 	excess_assign		-	bool(no),
+	loop_invariants		-	bool(no),
 	optimize_saved_vars_const	-	bool(no),
 	optimize_saved_vars_cell	-	bool(no),
 	optimize_saved_vars_cell_loop	-	bool(yes),
@@ -1469,7 +1469,6 @@
 long_option("optimize-constant-propagation", constant_propagation).
 long_option("optimize-saved-vars",	optimize_saved_vars).
 long_option("optimise-saved-vars",	optimize_saved_vars).
-long_option("loop-invariants",		loop_invariants).
 long_option("optimize-saved-vars-const",	optimize_saved_vars_const).
 long_option("optimise-saved-vars-const",	optimize_saved_vars_const).
 long_option("optimize-saved-vars-cell",	optimize_saved_vars_cell).
@@ -1489,6 +1488,7 @@
 long_option("optimize-constant-propagation", constant_propagation).
 long_option("optimize-saved-vars",	optimize_saved_vars).
 long_option("optimise-saved-vars",	optimize_saved_vars).
+long_option("loop-invariants",		loop_invariants).
 long_option("optimize-saved-vars-const",	optimize_saved_vars_const).
 long_option("optimise-saved-vars-const",	optimize_saved_vars_const).
 long_option("optimize-saved-vars-cell",	optimize_saved_vars_cell).
@@ -2053,8 +2053,7 @@
 	use_local_vars		-	bool(yes),
 	inline_simple_threshold	-	int(8),
 	inline_compound_threshold -	int(20),
-	higher_order_size_limit -	int(30),
-	loop_invariants		-	bool(yes)
+	higher_order_size_limit -	int(30)
 ]).
 
 % Optimization level 5: apply optimizations which may have some
@@ -2073,7 +2072,8 @@
 	use_local_vars		-	bool(yes),
 	inline_simple_threshold	-	int(8),
 	inline_compound_threshold -	int(20),
-	higher_order_size_limit -	int(30)
+	higher_order_size_limit -	int(30),
+	loop_invariants		-	bool(yes)
 ]).
 
 % Optimization level 5: apply optimizations which may have some
@@ -3160,8 +3160,6 @@
 		"--optimize-duplicate-calls",
 		"\tOptimize away multiple calls to a predicate",
 		"\twith the same input arguments.",
-		"--loop-invariants",
-		"\tHoist loop invariants out of loops.",
 		"--delay-constructs",
 		"\tReorder goals to move construction unifications after",
 		"\tprimitive goals that can fail.",
@@ -3190,6 +3188,8 @@
 		"--optimize-duplicate-calls",
 		"\tOptimize away multiple calls to a predicate",
 		"\twith the same input arguments.",
+		"--loop-invariants",
+		"\tHoist loop invariants out of loops.",
 		"--delay-constructs",
 		"\tReorder goals to move construction unifications after",
 		"\tprimitive goals that can fail.",
only in patch2:
--- doc/user_guide.texi	22 Oct 2002 04:36:03 -0000	1.332
+++ doc/user_guide.texi	25 Oct 2002 07:22:05 -0000
@@ -5603,6 +5603,12 @@
 containing large numbers of variables can cause
 slow compilation.
 
+ at item --loop-invariants
+ at findex --loop-invariants
+Optimize loop invariants by moving computations within a loop that are
+the same on every iteration to the outside so they are only calculated
+once.  (This is a separate optimization to @samp{--optimize-rl-invariants}.)
+
 @sp 1
 @item --no-common-struct
 @findex --no-common-struct
only in patch2:
--- compiler/notes/compiler_design.html	15 Apr 2002 05:04:15 -0000	1.78
+++ compiler/notes/compiler_design.html	25 Oct 2002 03:51:15 -0000
@@ -639,10 +639,17 @@
   <li>
   pd_util.m contains various utility predicates.
   </ul>
+
+<li> loop_inv.m: loop invariant hoisting.  This transformation moves
+  computations within loops that are the same on every iteration to the outside
+  of the loop so that the invariant computations are only computed once.  The
+  transformation turns a single looping predicate containing invariant
+  computations into two: one that computes the invariants on the first
+  iteration and then loops by calling the second predicate with extra arguments
+  for the invariant values.
  
 <li> issue warnings about unused arguments from predicates, and create
-  specialized versions without them (unused_args.m); type_infos are
-  often unused.
+specialized versions without them (unused_args.m); type_infos are often unused.
 
 <li> delay_construct.m pushes construction unifications to the right in
   semidet conjunctions, in an effort to reduce the probability that it will
only in patch2:
--- NEWS	22 Oct 2002 13:10:33 -0000	1.274
+++ NEWS	25 Oct 2002 07:21:56 -0000
@@ -18,6 +18,7 @@
 * A new `--smart-recompilation' option, for fine-grained dependency tracking.
 * A new optional warning: `--warn-non-tail-recursion'.
 * A new optimization: `--constraint-propagation'.
+* A new optimization: `--loop-invariants'.
 * Support for arbitrary mappings from module name to source file name. 
 
 Major improvements to the Mercury debugger, including:
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list