[m-rev.] LLDS accurate GC improvements

Fergus Henderson fjh at cs.mu.OZ.AU
Wed Oct 22 14:31:19 AEST 2003


Estimated hours taken: 12
Branches: main

Various improvements for accurate GC in LLDS grades:

    - allow the active heap size to vary
      (XXX currently we still allocate a fixed-size heap, and allow
      the active heap size to vary within that; it would be better
      to avoid that, by reallocating a bigger heap if the heap
      fills up)

    - some bug fixes.

runtime/mercury_wrapper.c:
	Increase the default heap size from 4Mb to 32Mb.
	This is OK because we only touch the parts of it that we use.

	If accurate GC is enabled, set the other parameters
	(MR_heap_redzone_size for LLDS, MR_heap_margin for MLDS)
	so that initially we still only use 4Mb of the heap
	before doing (the first) garbage collection.

runtime/mercury_wrapper.h:
runtime/mercury_wrapper.c:
	Add a new runtime option --heap-expansion-factor,
	for use by mercury_accurate_gc.c.

runtime/mercury_accurate_gc.c:
	Only allocate forwarding pointer bitmap entries for the part of
	the heap which has been used, not the whole heap.  This is to
	avoid allocating (and touching) an unnecessarily large bitmap
	in cases where we have allocated a large amount of address space
	to the heap, but are only using a small part of it before doing
	each GC.

	For the LLDS back-end, recompute the heap red zone size after
	each collection (using MR_heap_expansion_factor).

	In MR_schedule_agc(), handle the case where entry_label and/or
	proc_layout are NULL.

	Also, add some comments.

runtime/mercury_memory_zones.c:
	Change MR_reset_redzone() so that it can handle changes in the
	size of the red zone.  In particular, make sure that we unprotect
	all of the normal area, as well as protecting the red zone.

runtime/mercury_label.c:
	Fix two bugs in MR_prev_entry_by_addr(): it was not correctly
	handling the case where the entry table was empty, and it was
	not correctly handling the case where the address searched for
	was higher than any address in the entry table.

runtime/mercury_goto.h:
	For native GC grades, record the addresses of the end of modules
	in the entry table, so that we know where each procedure finishes when
	mapping from instruction pointer values to stack layout entries.
	Without this, we might think that the following C function was
	actually part of the preceding Mercury procedure, and then incorrectly
	use the stack layout of the Mercury procedure if we happened to
	get a heap overflow signal (SIGSEGV) while in that C function.

Workspace: /home/ceres/fjh/mercury
Index: runtime/mercury_accurate_gc.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_accurate_gc.c,v
retrieving revision 1.24
diff -u -d -r1.24 mercury_accurate_gc.c
--- runtime/mercury_accurate_gc.c	13 Oct 2003 12:23:02 -0000	1.24
+++ runtime/mercury_accurate_gc.c	22 Oct 2003 04:15:13 -0000
@@ -6,6 +6,24 @@
 
 /*
 ** This module contains the accurate garbage collector.
+**
+** Currently we are just using a simple copying collector.
+**
+** For the LLDS back-end, we detect heap overflow
+** using the virtual memory system; the page fault
+** handler will set the return address to call the
+** garbage collector.
+**
+** XXX this may cause problems for tight loops that
+**     allocate memory and don't have any procedure
+**     returns in them... either because they
+**     loop via tail calls and/or retries
+**     (failure in nondet code).
+**
+** For the MLDS back-end, we can't use this technique
+** of overwriting the return address, so instead we
+** just explicitly call MR_GC_check() to check for
+** heap overflow before each heap allocation.
 */
 
 #include "mercury_imp.h"
@@ -65,14 +83,14 @@
 ** with all bits unset (zero).
 */
 static void
-init_forwarding_pointer_bitmap(const MR_MemoryZone *old_heap)
+init_forwarding_pointer_bitmap(const MR_MemoryZone *old_heap, MR_Word *old_hp)
 {
     size_t			    heap_size_in_words;
     size_t			    num_words_for_bitmap;
     size_t			    num_bytes_for_bitmap;
     static size_t		    prev_num_bytes_for_bitmap;
 
-    heap_size_in_words = old_heap->hardmax - old_heap->min;
+    heap_size_in_words = old_hp - old_heap->min;
     num_words_for_bitmap = (heap_size_in_words + MR_WORDBITS - 1) / MR_WORDBITS;
     num_bytes_for_bitmap = num_words_for_bitmap * sizeof(MR_Word);
     if (MR_has_forwarding_pointer == NULL
@@ -129,7 +147,7 @@
     /*
     ** Initialize the forwarding pointer bitmap.
     */
-    init_forwarding_pointer_bitmap(old_heap);
+    init_forwarding_pointer_bitmap(old_heap, old_hp);
 
     /*
     ** Swap the two heaps.
@@ -225,6 +243,12 @@
 ** 	that the code will return to).
 */
 
+/*
+** XXX we should use write() rather than fprintf() here,
+**     since this code is called from a signal handler,
+**     and stdio is not guaranteed to be reentrant.
+**/
+
 void
 MR_schedule_agc(MR_Code *pc_at_signal, MR_Word *sp_at_signal, 
 	MR_Word *curfr_at_signal)
@@ -269,23 +293,30 @@
 	/* Search for the entry label */
 
 	entry_label = MR_prev_entry_by_addr(pc_at_signal);
-	proc_layout = entry_label->e_layout;
+	proc_layout = (entry_label != NULL ? entry_label->e_layout : NULL);
 
-	determinism = proc_layout->MR_sle_detism;
+	determinism = (proc_layout != NULL ? proc_layout->MR_sle_detism : -1);
 
 	if (determinism < 0) {
 		/*
 		** This means we have reached some handwritten code that has
 		** no further information about the stack frame.
 		*/
-		fprintf(stderr, "Mercury runtime: the label ");
-		if (entry_label->e_name != NULL) {
-			fprintf(stderr, "%s has no stack layout info\n",
-				entry_label->e_name);
+		fprintf(stderr, "Mercury runtime: "
+			"attempt to schedule garbage collection failed\n");
+		if (entry_label != NULL) {
+			fprintf(stderr, "Mercury runtime: the label ");
+			if (entry_label->e_name != NULL) {
+				fprintf(stderr, "%s has no stack layout info\n",
+					entry_label->e_name);
+			} else {
+				fprintf(stderr, "at address %p "
+					"has no stack layout info\n",
+					entry_label->e_addr);
+			}
 		} else {
-			fprintf(stderr, "at address %p "
-				"has no stack layout info\n",
-				entry_label->e_addr);
+			fprintf(stderr, "Mercury runtime: no entry label ");
+			fprintf(stderr, "for PC address %p\n", pc_at_signal);
 		}
 
 		fprintf(stderr, "Mercury runtime: Trying to continue...\n");
@@ -335,10 +366,10 @@
 		saved_success = (MR_Code *) *saved_success_location;
 	} else {
 		/*
-		** XXX we don't support nondet stack frames yet.
-		MR_fatal_error("cannot schedule in nondet stack frame");
+		** XXX we ought to also overwrite the redoip,
+		**     otherwise we might miss failure-driven loops
+		**     which don't contain any returns.
 		*/
-
 		saved_success_location = &MR_based_framevar(curfr_at_signal,
 			number);
 		saved_success = (MR_Code *) *saved_success_location;
@@ -442,7 +473,7 @@
     /*
     ** Initialize the forwarding pointer bitmap.
     */
-    init_forwarding_pointer_bitmap(old_heap);
+    init_forwarding_pointer_bitmap(old_heap, old_hp);
 
     /*
     ** Swap the two heaps.
@@ -719,8 +750,28 @@
             ((char *) MR_virtual_hp - (char *) new_heap->min));
     }
 
-    /* Reset the redzone on the old heap */
-    MR_reset_redzone(old_heap);
+    /* Reset the redzone on the new heap */
+    {
+      /* These counts include some wasted space between ->min and ->bottom. */
+      size_t old_heap_space =
+	      	(char *) old_heap->redzone_base - (char *) old_heap->bottom;
+      size_t new_heap_usage =
+	      	(char *) MR_virtual_hp - (char *) new_heap->bottom;
+      size_t gc_heap_size;
+
+      /*
+      ** Set the size at which to GC to be MR_heap_expansion_factor
+      ** (which defaults to two) times the current usage,
+      ** or the size at which we GC'd last time, whichever is larger.
+      */
+      gc_heap_size = (size_t) (MR_heap_expansion_factor * new_heap_usage);
+      if (gc_heap_size < old_heap_space) {
+	      gc_heap_size = old_heap_space;
+      }
+      old_heap->redzone_base = (MR_Word *)
+	      	((char *) old_heap->bottom + gc_heap_size);
+      MR_reset_redzone(old_heap);
+    }
 
     if (MR_agc_debug) {
         fprintf(stderr, "garbage_collect() done.\n\n");
Index: runtime/mercury_goto.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_goto.h,v
retrieving revision 1.36
diff -u -d -r1.36 mercury_goto.h
--- runtime/mercury_goto.h	7 May 2003 03:21:45 -0000	1.36
+++ runtime/mercury_goto.h	22 Oct 2003 03:49:45 -0000
@@ -108,6 +108,23 @@
   #define MR_MODULE_STATIC_OR_EXTERN static
 #endif
 
+/*
+** For LLDS native GC, we need to record the addresses of the end of modules
+** in the entry table, so that we know where each procedure finishes when
+** mapping from instruction pointer values to stack layout entries.
+** Without this, we might think that the following C function was
+** actually part of the preceding Mercury procedure, and then incorrectly
+** use the stack layout of the Mercury procedure if we happened to
+** get a heap overflow signal (SIGSEGV) while in that C function.
+*/
+#ifdef MR_NATIVE_GC
+  #define MR_MAYBE_DEFINE_MODULE_END_LABEL MR_define_local(module_end_label)
+  #define MR_MAYBE_INIT_MODULE_END_LABEL() MR_init_local(module_end_label)
+#else
+  #define MR_MAYBE_DEFINE_MODULE_END_LABEL /* nothing */
+  #define MR_MAYBE_INIT_MODULE_END_LABEL() /* nothing */
+#endif
+
 /*---------------------------------------------------------------------------*/
 
 /* MACHINE SPECIFIC STUFF REQUIRED FOR NON-LOCAL GOTOS */
@@ -538,7 +555,8 @@
 		goto *MR_dummy_identify_function(			\
 			&&MR_PASTE2(module_name,_dummy_label));		\
 		MR_PASTE2(module_name,_dummy_label):			\
-		{
+		{							\
+			MR_MAYBE_INIT_MODULE_END_LABEL();
   #else /* gcc version <= egcs 1.1.2 */
     #define MR_BEGIN_MODULE(module_name)				\
 	MR_MODULE_STATIC_OR_EXTERN void module_name(void);		\
@@ -547,13 +565,14 @@
 		MR_PRETEND_ADDRESS_IS_USED(				\
 			&&MR_PASTE2(module_name,_dummy_label));		\
 		MR_PASTE2(module_name,_dummy_label):			\
-		{
+		{							\
+			MR_MAYBE_INIT_MODULE_END_LABEL();
   #endif /* gcc version <= egcs 1.1.2 */
   /* initialization code for module goes between MR_BEGIN_MODULE */
   /* and MR_BEGIN_CODE */
   #define MR_BEGIN_CODE } return; {
   /* body of module goes between MR_BEGIN_CODE and MR_END_MODULE */
-  #define MR_END_MODULE } }
+  #define MR_END_MODULE MR_MAYBE_DEFINE_MODULE_END_LABEL; } }
 
 
   #if defined(MR_USE_ASM_LABELS)
@@ -685,9 +704,10 @@
 
   #define MR_BEGIN_MODULE(module_name)	\
 	MR_MODULE_STATIC_OR_EXTERN MR_Code* module_name(void); \
-	MR_MODULE_STATIC_OR_EXTERN MR_Code* module_name(void) {
+	MR_MODULE_STATIC_OR_EXTERN MR_Code* module_name(void) { \
+		MR_MAYBE_INIT_MODULE_END_LABEL();
   #define MR_BEGIN_CODE			return 0;
-  #define MR_END_MODULE			}
+  #define MR_END_MODULE			MR_MAYBE_DEFINE_MODULE_END_LABEL; }
 
   #define MR_declare_entry(label)		extern MR_Code *label(void)
   #define MR_declare_static(label)		static MR_Code *label(void)
Index: runtime/mercury_label.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_label.c,v
retrieving revision 1.22
diff -u -d -r1.22 mercury_label.c
--- runtime/mercury_label.c	18 Feb 2002 07:01:17 -0000	1.22
+++ runtime/mercury_label.c	22 Oct 2003 01:41:14 -0000
@@ -177,7 +177,7 @@
 	lo = 0;
 	hi = entry_array_next-1;
 
-	if (addr < entry_array[lo].e_addr) {
+	if (lo > hi || addr < entry_array[lo].e_addr) {
 		return NULL;
 	}
 
@@ -192,7 +192,7 @@
 		}
 	}
 
-	if (entry_array[lo].e_addr < addr) {
+	if (lo < entry_array_next && entry_array[lo].e_addr < addr) {
 		return &entry_array[lo];
 	} else {
 		return &entry_array[lo - 1];
Index: runtime/mercury_memory_zones.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_memory_zones.c,v
retrieving revision 1.22
diff -u -d -r1.22 mercury_memory_zones.c
--- runtime/mercury_memory_zones.c	18 Nov 2002 06:47:03 -0000	1.22
+++ runtime/mercury_memory_zones.c	17 Oct 2003 09:37:47 -0000
@@ -159,6 +159,8 @@
 */
 #ifdef MR_PROTECTPAGE
 
+  #define NORMAL_PROT (PROT_READ|PROT_WRITE)
+
   #ifdef MR_CONSERVATIVE_GC
     /*
     ** The conservative garbage collectors scans through
@@ -167,9 +169,9 @@
     ** too much memory for the GC to scan, and it probably
     ** all gets paged in.
     */
-    #define MY_PROT PROT_READ
+    #define REDZONE_PROT PROT_READ
   #else
-    #define MY_PROT PROT_NONE
+    #define REDZONE_PROT PROT_NONE
   #endif
 
   /* The BSDI BSD/386 1.1 headers don't define PROT_NONE */
@@ -398,7 +400,7 @@
 			MR_round_up((MR_Unsigned)base + size - redsize,
 				MR_unit);
 	if (MR_protect_pages((char *)zone->redzone,
-			redsize + MR_unit, MY_PROT) < 0)
+			redsize + MR_unit, REDZONE_PROT) < 0)
 	{
 		char buf[2560];
 		sprintf(buf, "unable to set %s#%d redzone\n"
@@ -414,7 +416,7 @@
 #if	defined(MR_PROTECTPAGE)
 	zone->hardmax = (MR_Word *) MR_round_up(
 			(MR_Unsigned)zone->top - MR_unit, MR_unit);
-	if (MR_protect_pages((char *)zone->hardmax, MR_unit, MY_PROT) < 0) {
+	if (MR_protect_pages((char *)zone->hardmax, MR_unit, REDZONE_PROT) < 0) {
 		char buf[2560];
 		sprintf(buf, "unable to set %s#%d hardmax\n"
 			"base=%p, hardmax=%p top=%p",
@@ -437,8 +439,22 @@
 #ifdef	MR_CHECK_OVERFLOW_VIA_MPROTECT
 	zone->redzone = zone->redzone_base;
 
+	/* unprotect the non-redzone area */
+	if (MR_protect_pages((char *)zone->bottom,
+		((char *)zone->redzone) - ((char *) zone->bottom),
+		NORMAL_PROT) < 0)
+	{
+		char buf[2560];
+		sprintf(buf, "unable to reset %s#%d normal area\n"
+			"base=%p, redzone=%p",
+			zone->name, zone->id, zone->bottom, zone->redzone);
+		MR_fatal_error(buf);
+	}
+	/* protect the redzone area */
 	if (MR_protect_pages((char *)zone->redzone,
-		((char *)zone->top) - ((char *) zone->redzone), MY_PROT) < 0) {
+		((char *)zone->top) - ((char *) zone->redzone),
+		REDZONE_PROT) < 0)
+	{
 		char buf[2560];
 		sprintf(buf, "unable to reset %s#%d redzone\n"
 			"base=%p, redzone=%p",
Index: runtime/mercury_wrapper.c
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.c,v
retrieving revision 1.125
diff -u -d -r1.125 mercury_wrapper.c
--- runtime/mercury_wrapper.c	20 Oct 2003 07:45:09 -0000	1.125
+++ runtime/mercury_wrapper.c	22 Oct 2003 04:10:12 -0000
@@ -55,12 +55,29 @@
 
 /* command-line options */
 
-/* size of data areas (including redzones), in kilobytes */
-/* (but we later multiply by 1024 to convert to bytes) */
+/*
+** size of data areas (including redzones), in kilobytes
+** (but we later multiply by 1024 to convert to bytes)
+**
+** Note that it is OK to allocate a large heap, since
+** we will only touch the part of it that we use;
+** we're really only allocating address space,
+** not physical memory.
+** But the other areas should be kept small, at least
+** in the case when conservative GC is enabled, since
+** the conservative GC will scan them.
+**
+** Note that for the accurate collector, the total heap size
+** that we use will be twice the heap size specified here,
+** since it is a two-space collector.
+**
+** Changes to MR_heap_size may also require changing MR_heap_zone_size
+** and/or the MR_heap_margin_size, which are defined below.
+*/
 #ifdef MR_DEBUG_AGC_SMALL_HEAP
   size_t	MR_heap_size =			  52;
 #else
-  size_t	MR_heap_size =			4096;
+  size_t	MR_heap_size =		       32768; /* 16 Mb */
 #endif
 size_t		MR_detstack_size =		4096;
 size_t		MR_nondstack_size =	 	 256;
@@ -72,9 +89,29 @@
 size_t		MR_cutstack_size =		  32;
 size_t		MR_pnegstack_size =		  32;
 
-/* size of the redzones at the end of data areas, in kilobytes */
-/* (but we later multiply by 1024 to convert to bytes) */
-size_t		MR_heap_zone_size =		  16;
+/*
+** size of the redzones at the end of data areas, in kilobytes
+**
+** For accurate GC, although we start out with a big heap (32 Mb -- see above),
+** we don't want to touch all of it unless we really need to.
+** So with accurate GC in LLDS grades, we start out with a 28 Mb redzone,
+** leaving an active heap size of 4Mb.
+** The collector should (XXX it currently doesn't) resize this redzone
+** automatically at the end of each collection.
+**
+** For MLDS grades, we don't use redzones to schedule GC;
+** instead GCs are scheduled, based on MR_heap_margin_size (see below),
+** by explicit calls to MR_GC_check()
+*/
+#if defined(MR_ACCURATE_GC) && !defined(MR_HIGHLEVEL_CODE)
+  #ifdef MR_DEBUG_AGC_SMALL_HEAP
+    size_t		MR_heap_zone_size =	  32;
+  #else
+    size_t		MR_heap_zone_size =	  16 + 28 * 1024;
+  #endif
+#else
+  size_t		MR_heap_zone_size =	  16;
+#endif
 size_t		MR_detstack_zone_size =		  16;
 size_t		MR_nondstack_zone_size =	  16;
 size_t		MR_solutions_heap_zone_size =	  16;
@@ -93,10 +130,19 @@
 ** amount of heap space still available, and if not, we call
 ** MR_garbage_collect().
 **
+** The collector should (XXX it currently doesn't) recompute this
+** and/or heap_zone->gc_threshold automatically at the end of each collection.
+**
 ** Like the sizes above, it is measured in kilobytes
 ** (but we later multiply by 1024 to convert to bytes).
 */
-size_t		MR_heap_margin_size =		  16;
+#ifdef MR_DEBUG_AGC_SMALL_HEAP
+  size_t	MR_heap_margin_size =		  16;
+#else
+  size_t	MR_heap_margin_size =		  28 * 1024;
+#endif
+
+double MR_heap_expansion_factor = 2.0;
 
 /* primary cache size to optimize for, in bytes */
 size_t		MR_pcache_size =	        8192;
@@ -859,6 +905,7 @@
 	MR_SOLUTIONS_HEAP_REDZONE_SIZE,
 	MR_TRAIL_REDZONE_SIZE,
 	MR_HEAP_MARGIN_SIZE,
+	MR_HEAP_EXPANSION_FACTOR,
 	MR_MDB_TTY,
 	MR_MDB_IN,
 	MR_MDB_OUT,
@@ -881,6 +928,7 @@
 	{ "solutions-heap-redzone-size",1, 0, MR_SOLUTIONS_HEAP_REDZONE_SIZE },
 	{ "trail-redzone-size", 	1, 0, MR_TRAIL_REDZONE_SIZE },
 	{ "heap-margin-size", 		1, 0, MR_HEAP_MARGIN_SIZE },
+	{ "heap-expansion-factor", 	1, 0, MR_HEAP_EXPANSION_FACTOR },
 	{ "mdb-tty", 			1, 0, MR_MDB_TTY },
 	{ "mdb-in", 			1, 0, MR_MDB_IN },
 	{ "mdb-out", 			1, 0, MR_MDB_OUT },
@@ -979,6 +1027,14 @@
 				usage();
 
 			MR_heap_margin_size = size;
+			break;
+
+		case MR_HEAP_EXPANSION_FACTOR:
+			if (sscanf(MR_optarg, "%f", &MR_heap_expansion_factor)
+					!= 1)
+			{
+				usage();
+			}
 			break;
 
 		case 'i':
Index: runtime/mercury_wrapper.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_wrapper.h,v
retrieving revision 1.61
diff -u -d -r1.61 mercury_wrapper.h
--- runtime/mercury_wrapper.h	20 Oct 2003 07:45:09 -0000	1.61
+++ runtime/mercury_wrapper.h	22 Oct 2003 04:12:40 -0000
@@ -213,6 +213,9 @@
 /* heap margin for MLDS->C accurate GC (documented in mercury_wrapper.c) */
 extern	size_t		MR_heap_margin_size;
 
+/* heap expansion factor for accurate GC (see mercury_accurate_gc.c) */
+extern  double		MR_heap_expansion_factor;
+
 /* file names for the mdb debugging streams */
 extern	const char	*MR_mdb_in_filename;
 extern	const char	*MR_mdb_out_filename;

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list