[m-dev.] GC & profiling

Fergus Henderson fjh at cs.mu.OZ.AU
Fri Mar 9 22:12:29 AEDT 2001


Zoltan asked about getting the GC to set a flag when it was doing a
collection, to provide more accurate information for profiling -- time
spent during garbage collection should be really be credited to the code
which allocated the memory, rather than to the code which happens to
do some small allocation which causes a garbage collection.

I had a long look at the GC sources.  There are a lot of entry points.
And even deciding exactly when you're "in" the garbage collector is a
bit difficult, e.g. do we have to worry about finalization functions,
and if so, should they be counted as part of the collection?  It is also
complicated a lot by the possibility of incremental collection.

I found several different possible solutions.  I'm not sure which one is
right.  I'll probably run this by the GC mailing list <gclist at iecc.com>.

Anyway, for the record, I've enclosed below a description of the different
alternatives I've considered.

----------

There's macros ENTER_GC() and EXIT_GC(), which [in some configurations]
set/unset a volatile global variable GC_collecting -- exactly what
we want.  Changing things so that they do that unconditionally would be
easy enough (see diff 2).  Unfortunately, the purpose of GC_collecting
is slightly different -- it is documented as "A hint that we're in the
collector and holding the allocation lock for an extended period.", and
ENTER_GC() doesn't seem to be called in all the places that we want it
to be.  So I guess that is not an ideal approach.

There's also a function GC_collection_in_progress() that looked a bit
more promising.  This looks at the GC_mark_state variable, which is set
at various stages of the marking process.  The comments for that function
mention that it does the wrong thing if a collection has been abandoned,

 | /* Is a collection in progress?  Note that this can return true in */
 | /* the nonincremental case, if a collection has been abandoned and */
 | /* the mark state is now MS_INVALID.				      */
 | GC_bool GC_collection_in_progress(void);

but I think that can only happen if you explicitly call GC_try_to_collect(),
which currently we don't do.  So it would be sufficient to just modify the
documentation of GC_try_to_collect() to mention that using that function will
break the Mercury profiler (see diff 2).

To use this function, the profiler would contain code like so:

	#include "gc.h" /* for GC_bool */
	extern GC_bool GC_collection_in_progress(void);
	
	handle_profiling_interrupt() {
		...
		if (GC_collection_in_progress()) {
			... credit this timer tick to GC ...
		} else {
			... credit this timer tick to the application ...
		}
		...
	}

On the other hand, GC_collection_in_progress() still doesn't count 100% of the
time spent in the collector, e.g. it misses out on calls to finalization
routines, and I think also on some other parts of the collection.

There's also a callback function GC_start_call_back() which looked quite
promising.  However, that is only called at the start of full collections;
it doesn't get called for partial collections, which can happen either when
doing incremental collection or when explicitly calling GC_collect_a_little().
And there's no corresponding GC_finish_call_back().
On the other hand, adding one is not hard (see diff 1).
I could also add a GC_start_partial_call_back() function,
which would be called at the start of partial collections.

*******************************************************************************
diff 1: add GC_finish_call_back
*******************************************************************************

boehm_gc/gc_priv.h:
boehm_gc/misc.c:
	Add declaration/definition for GC_finish_call_back().

boehm_gc/alloc.c:
	Call GC_finish_call_back() from all the appropriate points.

cvs diff: Diffing .
Index: alloc.c
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/alloc.c,v
retrieving revision 1.10
diff -u -d -u -r1.10 alloc.c
--- alloc.c	2000/09/18 11:08:49	1.10
+++ alloc.c	2001/03/09 10:22:36
@@ -225,6 +225,13 @@
     }
 }
 
+void GC_notify_gc_finish()
+{
+    if (GC_finish_call_back != (void (*)())0) {
+	(*GC_finish_call_back)();
+    }
+}
+
 GC_bool GC_is_full_gc = FALSE;
 
 /* 
@@ -242,6 +249,7 @@
 	    GC_notify_full_gc();
             GC_gcollect_inner();
             n_partial_gcs = 0;
+            GC_notify_gc_finished();
             return;
         } else if (GC_need_full_gc || n_partial_gcs >= GC_full_freq) {
 #   	    ifdef PRINTSTATS
@@ -274,6 +282,7 @@
 	        GC_n_attempts++;
 	    }
 	}
+	GC_notify_gc_finished();
     }
 }
 
@@ -631,6 +640,7 @@
 {
     GC_notify_full_gc();
     (void)GC_try_to_collect(GC_never_stop_func);
+    GC_notify_gc_finished();
 }
 
 word GC_n_heap_sects = 0;	/* Number of sections currently in heap. */
@@ -826,6 +836,7 @@
 	(GC_dont_expand && GC_words_allocd > 0 || GC_should_collect())) {
       GC_notify_full_gc();
       GC_gcollect_inner();
+      GC_notify_gc_finished();
     } else {
       word blocks_to_get = GC_heapsize/(HBLKSIZE*GC_free_space_divisor)
       			   + needed_blocks;
@@ -851,6 +862,7 @@
       	    WARN("Out of Memory!  Trying to continue ...\n", 0);
 	    GC_notify_full_gc();
 	    GC_gcollect_inner();
+	    GC_notify_gc_finished();
 	} else {
 	    WARN("Out of Memory!  Returning NIL!\n", 0);
 	    return(FALSE);
Index: gc_priv.h
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc_priv.h,v
retrieving revision 1.12
diff -u -d -u -r1.12 gc_priv.h
--- gc_priv.h	2000/09/18 11:08:51	1.12
+++ gc_priv.h	2001/03/09 10:26:26
@@ -1539,6 +1548,8 @@
 			/* Not called if 0.  Called with allocation 	*/
 			/* lock held.					*/
 			/* 0 by default.				*/
+extern void (*GC_finish_call_back)(/* void */);
+			/* Called at end of full or partial collections.*/
 void GC_push_regs();	/* Push register contents onto mark stack.	*/
 			/* If NURSERY is defined, the default push	*/
 			/* action can be overridden with GC_push_proc	*/
Index: misc.c
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/misc.c,v
retrieving revision 1.11
diff -u -d -u -r1.11 misc.c
--- misc.c	2000/09/18 11:08:53	1.11
+++ misc.c	2001/03/09 10:33:12
@@ -70,6 +70,7 @@
 void (*GC_check_heap)() = (void (*)())0;
 
 void (*GC_start_call_back)() = (void (*)())0;
+void (*GC_finish_call_back)() = (void (*)())0;
 
 ptr_t GC_stackbottom = 0;
 
*******************************************************************************
diff 2: use ENTER_GC/EXIT_GC
*******************************************************************************

boehm_gc/gc_priv.h:
	If PROFILE_TIME is defined, define ENTER_GC()
	and EXIT_GC() to always set/unset GC_collecting.

Index: gc_priv.h
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc_priv.h,v
retrieving revision 1.12
diff -u -d -u -r1.12 gc_priv.h
--- gc_priv.h	2000/09/18 11:08:51	1.12
+++ gc_priv.h	2001/03/09 10:26:26
@@ -667,8 +667,17 @@
 		/* so it doesn't matter if we lie.			*/
 # endif
 # ifndef ENTER_GC
-#   define ENTER_GC()
-#   define EXIT_GC()
+#   ifdef PROFILE_TIME
+      extern VOLATILE GC_bool GC_collecting;
+#     define ENTER_GC() \
+		{ \
+		    GC_collecting = 1; \
+		}
+#     define EXIT_GC() GC_collecting = 0;
+#   else
+#     define ENTER_GC()
+#     define EXIT_GC()
+#   endif
 # endif
 
 # ifndef DCL_LOCK_STATE

*******************************************************************************
diff 3: use GC_collection_in_progress and document issues for GC_try_to_collect
*******************************************************************************

boehm_gc/gc.h:
	Document that GC_try_to_collect() causes problems for
	time profiling, since Mercury's time profiling relies
	(XXX or should rely) on the GC_collection_in_progress()
	function, which does the wrong thing if GC_try_to_collect()
	is called and the collection is aborted.

Index: gc.h
===================================================================
RCS file: /home/mercury1/repository/mercury/boehm_gc/gc.h,v
retrieving revision 1.29
diff -u -d -u -r1.29 gc.h
--- gc.h	2000/10/04 04:51:02	1.29
+++ gc.h	2001/03/09 10:57:16
@@ -322,6 +322,9 @@
 /* aborted collections do no useful work; the next collection needs	*/
 /* to start from the beginning.						*/
 /* Return 0 if the collection was aborted, 1 if it succeeded.		*/
+/* WARNING: calling this function may break Mercury's time profiling;   */
+/* after an aborted collection, the profiler may think that the         */
+/* garbage collection is still running.                                 */
 typedef int (* GC_stop_func) GC_PROTO((void));
 GC_API int GC_try_to_collect GC_PROTO((GC_stop_func stop_func));
 
-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list