[m-rev.] for review: memory attribution profiling

Peter Wang novalazy at gmail.com
Thu Apr 21 16:54:43 AEST 2011


Branches: main

Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run.  This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site.  The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.

The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions.  The mprof tool is
extended with a new mode to parse and present that data.

Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".

Currently this profiling mode only works in conjunction wtih the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects.  Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.

The output from `mprof -s' looks like this:

------ [1] some label ------
   cells            words         cumul  procedure / type (location)
   14150            38872                total

*   1949/ 13.8%      4872/ 12.5%  12.5%  <predicate `parser.parse_rest/7' mode 0>
     975/  6.9%      1950/  5.0%         list.list/1 (parser.m:502)
     487/  3.4%      1948/  5.0%         term.term/1 (parser.m:501)
     487/  3.4%       974/  2.5%         term.const/0 (parser.m:501)

*   1424/ 10.1%      4272/ 11.0%  23.5%  <predicate `parser.parse_simple_term_2/6' mode 0>
     708/  5.0%      2832/  7.3%         term.term/1 (parser.m:643)
     708/  5.0%      1416/  3.6%         term.const/0 (parser.m:643)
...


boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
	Add a callback function to be called for every live object after a GC.

	Add a function to write out the GC_size_map array.

compiler/layout.m:
	Define the alloc_site_info type which is equivalent to the
	MR_AllocSiteInfo C structure.

	Add alloc_site_array as a kind of "layout" array.

compiler/llds.m:
	Add allocation sites to `cfile' structure.

	Replace TypeMsg argument (which was also for profiling) on `incr_hp'
	instructions by an allocation site identifier.  

	Add a new foreign_proc_component for allocation site ids.

compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
	Keep the set of allocation sites in the code_info and global_data
	structures.

compiler/unify_gen.m:
	Add allocation sites to LLDS allocation instructions.

compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
	Output MR_AllocSiteInfo arrays in generated C files.

	Output code to register the MR_AllocSiteInfo array with the Mercury
	runtime.

	Output allocation site ids for memory allocation instructions.

compiler/llds_out_util.m:
	Add allocation sites to llds_out_info.

compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
	Generate a macro MR_ALLOC_ID which resolves to an allocation site
	structure, for every foreign_proc whose C code contains the string
	"MR_ALLOC_ID".  This is to be used by hand-written C code which
	allocates memory.

	MR_PROC_LABELs are retained for backwards compatibility.  Though
	they were introduced for profiling, they seem to have been co-opted
	for printf-debugging since then.

compiler/ml_global_data.m:
	Add allocation site structures to the MLDS global data.

compiler/mlds.m:
compiler/ml_unify_gen.m:
	Add allocation site id to `new_object' instruction.

compiler/mlds_to_c.m:
	Output allocation site arrays and allocation ids in high-level C code.

	Output a call to register the allocation site array with the Mercury
	runtime.

	Delete an unused predicate.

compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
	Conform to changes.

compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
	Conform to changes in memory allocation macros.

library/benchmarking.m:
	Add the `report_memory_attribution' instrumentation predicates.

	Conform to changes to MR_memprof_record.

library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
	Use attributed memory allocation throughout the standard library so
	that objects don't show up in the memory profile as "unknown".

	Replace MR_PROC_LABEL by MR_ALLOC_ID.

mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
	Replace MR_PROC_LABEL by MR_ALLOC_ID.

profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
	Add a new mode to `mprof' to parse and present the data from
	`Prof.Snapshots' files.

	Add options for the new profiling mode.

profiler/process_file.m:
	Fix a typo.

runtime/mercury_conf_param.h:
	#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
	is enabled and we are using Boehm GC.

runtime/mercury.h:
	Make MR_new_object take an allocation id argument.

	Conform to changes in memory allocation macros.

runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
	Define MR_AllocSiteInfo.

	Add memory allocation functions and macros which take into the
	account the additional word necessary for the new profiling mode.
	These should be used in preferences to the raw memory allocation
	functions wherever possible so that objects do not show up in the
	profile as "unknown".

	Add analogues of realloc/free which take into account the offset
	introduced by the attribution word.

	Add function versions of the MR_new_object macros, which can't be
	written in standard C.  They are only used when necessary.

	Add built-in allocation site ids, to be used in the runtime and
	other hand-written code when context-specific ids are unavailable.

runtime/mercury_heap.h:
	Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
	allocate an extra word when memory attribution is desired, and store
	the allocation id there.

	Similarly for MR_create{1,2,3}_msg.

	Replace proclabel arguments in allocation macros by alloc_id
	arguments.

	Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg.  It was only
	used for boxing floats.

	Conform to change to MR_new_object macro.

runtime/mercury_bootstrap.h:
	Delete obsolete macro hp_alloc_atomic.

runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
	Add the code to summarise the live objects on the Boehm GC heap and
	writes out the data to `Prof.Snapshots', for display by mprof.

	Don't store the procedure name in MR_memprof_record: the procedure
	address is enough and faster to compare.

runtime/mercury_prof.c:
	Finish and close the `Prof.Snapshots' file when the program
	terminates.

	Conform to changes in MR_memprof_record.

runtime/mercury_misc.h:
	Add a macro to expand to the name of the allocation sites array
	in LLDS grades.

runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
	Pass allocation id through bitmap allocation functions.

	Delete unused function MR_string_to_bitmap.

runtime/mercury_string.h:
	Add MR_make_aligned_string_copy_msg.

	Make string allocation macros take allocation id arguments.

runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
	Use attributed memory allocation throughout the runtime so that
	objects don't show up in the profile as "unknown".

runtime/mercury_memory_zones.c:
	Attribute memory zones to the Mercury runtime.

runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
	Use attributed memory allocation macros for tabling structures.

	Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.

runtime/mercury_deep_copy_body.h:
	Try to retain the original attribution word when copying values.

runtime/mercury_ml_expand_body.h:
	Conform to changes in memory allocation macros.

runtime/mercury_tags.h:
	Replace proclabel arguments by alloc_id arguments in allocation macros.

runtime/mercury_wrapper.c:
	If memory attribution is enabled, tell Boehm GC that pointers may be
	displaced by an extra word.

trace/mercury_trace.c:
trace/mercury_trace_tables.c:
	Conform to changes in memory allocation macros.

extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
	Conform to changes in memory allocation macros.

doc/user_guide.texi:
	Document the new profiling mode.

doc/reference_manual.texi:
	Update a commented out example.


diff --git a/boehm_gc/alloc.c b/boehm_gc/alloc.c
index e181301..e0dcba5 100644
--- a/boehm_gc/alloc.c
+++ b/boehm_gc/alloc.c
@@ -68,6 +68,7 @@ void (*GC_mercury_callback_start_collect)(void) = NULL;
 void (*GC_mercury_callback_stop_collect)(void) = NULL;
 void (*GC_mercury_callback_pause_thread)(void) = NULL;
 void (*GC_mercury_callback_resume_thread)(void) = NULL;
+void (*GC_mercury_callback_reachable_object)(GC_word *, size_t) = NULL;
                            /* Callbacks for mercury to notify   */
                            /* the runtime of certain events     */
 
@@ -824,6 +825,9 @@ STATIC void GC_finish_collection(void)
         }
 #   endif
     COND_DUMP;
+    if (GC_mercury_callback_reachable_object) {
+	GC_mercury_enumerate_reachable_objects();
+    }
     if (GC_find_leak) {
       /* Mark all objects on the free list.  All objects should be */
       /* marked when we're done.                                   */
diff --git a/boehm_gc/include/gc.h b/boehm_gc/include/gc.h
index 52ebed9..90e46c7 100644
--- a/boehm_gc/include/gc.h
+++ b/boehm_gc/include/gc.h
@@ -38,6 +38,7 @@
         /* the latter.                                                  */
 
 #include "gc_config_macros.h"
+#include <stdio.h>
 
 #ifdef __cplusplus
   extern "C" {
@@ -357,6 +358,8 @@ GC_API void (*GC_mercury_callback_pause_thread)(void);
                  */
 GC_API void (*GC_mercury_callback_resume_thread)(void);
                 /* This thread is about to be resumed */
+GC_API void (*GC_mercury_callback_reachable_object)(GC_word *, size_t);
+                /* This object on the heap is reachable. */
 
 /* Public procedures */
 
@@ -1355,6 +1358,9 @@ GC_API void GC_CALL GC_register_has_static_roots_callback(
 GC_API void GC_CALL GC_set_force_unmap_on_gcollect(int);
 GC_API int GC_CALL GC_get_force_unmap_on_gcollect(void);
 
+/* Print out the elements of GC_size_map to the given file.		*/
+GC_API void GC_CALL GC_mercury_write_size_map(FILE *fp);
+
  /* Fully portable code should call GC_INIT() from the main program     */
  /* before making any other GC_ calls.  On most platforms this is a     */
  /* no-op and the collector self-initializes.  But a number of          */
diff --git a/boehm_gc/misc.c b/boehm_gc/misc.c
index cebfc63..088ba83 100644
--- a/boehm_gc/misc.c
+++ b/boehm_gc/misc.c
@@ -1687,3 +1687,19 @@ GC_API int GC_CALL GC_get_force_unmap_on_gcollect(void)
 {
     return (int)GC_force_unmap_on_gcollect;
 }
+
+GC_API void GC_CALL GC_mercury_write_size_map(FILE *fp)
+{
+    size_t bytes;
+    size_t limit;
+
+    for (limit = MAXOBJBYTES; limit >= 0; limit--) {
+	if (GC_size_map[limit] != 0) {
+	    break;
+	}
+    }
+
+    for (bytes = 1; bytes <= limit; bytes += BYTES_PER_WORD) {
+	fprintf(fp, " %u", GRANULES_TO_WORDS(GC_size_map[bytes]));
+    }
+}
diff --git a/boehm_gc/reclaim.c b/boehm_gc/reclaim.c
index 3c52d1f..88b05fc 100644
--- a/boehm_gc/reclaim.c
+++ b/boehm_gc/reclaim.c
@@ -604,3 +604,39 @@ GC_INNER GC_bool GC_reclaim_all(GC_stop_func stop_func, GC_bool ignore_old)
 #   endif
     return(TRUE);
 }
+
+STATIC void GC_mercury_do_enumerate_reachable_objects(struct hblk *hbp,
+    word dummy)
+{
+    struct hblkhdr * hhdr = HDR(hbp);
+    size_t sz = hhdr -> hb_sz;
+    size_t bit_no;
+    char *p, *plim;
+
+    if (GC_block_empty(hhdr)) {
+        return;
+    }
+
+    p = hbp->hb_body;
+    bit_no = 0;
+    if (sz > MAXOBJBYTES) { /* one big object */
+        plim = p;
+    } else {
+        plim = hbp->hb_body + HBLKSIZE - sz;
+    }
+    /* Go through all words in block. */
+    while (p <= plim) {
+        if (mark_bit_from_hdr(hhdr, bit_no)) {
+            GC_mercury_callback_reachable_object((GC_word *)p,
+		BYTES_TO_WORDS(sz));
+        }
+        bit_no += MARK_BIT_OFFSET(sz);
+        p += sz;
+    }
+}
+
+GC_INNER void GC_mercury_enumerate_reachable_objects(void)
+{
+    GC_ASSERT(GC_mercury_callback_reachable_object);
+    GC_apply_to_all_blocks(GC_mercury_do_enumerate_reachable_objects, (word)0);
+}
diff --git a/compiler/code_info.m b/compiler/code_info.m
index fb32645..96dc4e3 100644
--- a/compiler/code_info.m
+++ b/compiler/code_info.m
@@ -56,6 +56,7 @@
 :- import_module map.
 :- import_module maybe.
 :- import_module set.
+:- import_module set_tree234.
 :- import_module term.
 
 %----------------------------------------------------------------------------%
@@ -221,6 +222,12 @@
 :- pred set_static_cell_info(static_cell_info::in,
     code_info::in, code_info::out) is det.
 
+:- pred get_alloc_sites(code_info::in, set_tree234(alloc_site_info)::out)
+    is det.
+
+:- pred set_alloc_sites(set_tree234(alloc_site_info)::in,
+    code_info::in, code_info::out) is det.
+
 :- pred get_used_env_vars(code_info::in, set(string)::out) is det.
 
 :- pred set_used_env_vars(set(string)::in, code_info::in, code_info::out)
@@ -465,6 +472,8 @@
 
                 cip_static_cell_info        :: static_cell_info,
 
+                cip_alloc_sites             :: set_tree234(alloc_site_info),
+
                 cip_used_env_vars           :: set(string),
 
                 % A counter and table for allocating and maintaining slots
@@ -591,6 +600,7 @@ code_info_init(SaveSuccip, Globals, PredId, ProcId, PredInfo, ProcInfo,
             -1,
             no,
             StaticCellInfo,
+            set_tree234.init,
             set.init,
             TSStringTableSize,
             TSRevStringTable
@@ -721,6 +731,7 @@ get_closure_layouts(CI, CI ^ code_info_persistent ^ cip_closure_layouts).
 get_max_reg_in_use_at_trace(CI, CI ^ code_info_persistent ^ cip_max_reg_used).
 get_created_temp_frame(CI, CI ^ code_info_persistent ^ cip_created_temp_frame).
 get_static_cell_info(CI, CI ^ code_info_persistent ^ cip_static_cell_info).
+get_alloc_sites(CI, CI ^ code_info_persistent ^ cip_alloc_sites).
 get_used_env_vars(CI, CI ^ code_info_persistent ^ cip_used_env_vars).
 
 %---------------------------------------------------------------------------%
@@ -756,6 +767,8 @@ set_created_temp_frame(MR, CI,
     CI ^ code_info_persistent ^ cip_created_temp_frame := MR).
 set_static_cell_info(SCI, CI,
     CI ^ code_info_persistent ^ cip_static_cell_info := SCI).
+set_alloc_sites(ASI, CI,
+    CI ^ code_info_persistent ^ cip_alloc_sites := ASI).
 set_used_env_vars(UEV, CI,
     CI ^ code_info_persistent ^ cip_used_env_vars := UEV).
 
@@ -918,6 +931,9 @@ get_containing_goal_map_det(CI, ContainingGoalMap) :-
 :- pred add_vector_static_cell(list(llds_type)::in, list(list(rval))::in,
     data_id::out, code_info::in, code_info::out) is det.
 
+:- pred add_alloc_site_info(prog_context::in, string::in, int::in,
+    alloc_site_id::out, code_info::in, code_info::out) is det.
+
 %---------------------------------------------------------------------------%
 
 :- implementation.
@@ -1212,6 +1228,14 @@ add_vector_static_cell(Types, Vector, DataAddr, !CI) :-
         StaticCellInfo0, StaticCellInfo),
     set_static_cell_info(StaticCellInfo, !CI).
 
+add_alloc_site_info(Context, Type, Size, AllocId, !CI) :-
+    get_cur_proc_label(!.CI, ProcLabel),
+    AllocSite = alloc_site_info(ProcLabel, Context, Type, Size),
+    AllocId = alloc_site_id(AllocSite),
+    get_alloc_sites(!.CI, AllocSites0),
+    set_tree234.insert(AllocSite, AllocSites0, AllocSites),
+    set_alloc_sites(AllocSites, !CI).
+
 %---------------------------------------------------------------------------%
 %---------------------------------------------------------------------------%
 
@@ -3675,8 +3699,8 @@ should_add_region_ops(CodeInfo, _GoalInfo) = AddRegionOps :-
     %
 :- pred assign_cell_to_var(prog_var::in, bool::in, tag::in,
     list(maybe(rval))::in, how_to_construct::in, maybe(term_size_value)::in,
-    list(int)::in, string::in, may_use_atomic_alloc::in, llds_code::out,
-    code_info::in, code_info::out) is det.
+    list(int)::in, maybe(alloc_site_id)::in, may_use_atomic_alloc::in,
+    llds_code::out, code_info::in, code_info::out) is det.
 
 :- pred save_reused_cell_fields(prog_var::in, lval::in, llds_code::out,
     list(lval)::out, code_info::in, code_info::out) is det.
@@ -3819,14 +3843,14 @@ assign_expr_to_var(Var, Rval, Code, !CI) :-
     set_var_locn_info(VarLocnInfo, !CI).
 
 assign_cell_to_var(Var, ReserveWordAtStart, Ptag, MaybeRvals, HowToConstruct,
-        MaybeSize, FieldAddrs, TypeMsg, MayUseAtomic, Code, !CI) :-
+        MaybeSize, FieldAddrs, MaybeAllocId, MayUseAtomic, Code, !CI) :-
     get_next_label(Label, !CI),
     get_var_locn_info(!.CI, VarLocnInfo0),
     get_static_cell_info(!.CI, StaticCellInfo0),
     get_module_info(!.CI, ModuleInfo),
     get_exprn_opts(!.CI, ExprnOpts),
     var_locn_assign_cell_to_var(ModuleInfo, ExprnOpts, Var, ReserveWordAtStart,
-        Ptag, MaybeRvals, HowToConstruct, MaybeSize, FieldAddrs, TypeMsg,
+        Ptag, MaybeRvals, HowToConstruct, MaybeSize, FieldAddrs, MaybeAllocId,
         MayUseAtomic, Label, Code, StaticCellInfo0, StaticCellInfo,
         VarLocnInfo0, VarLocnInfo),
     set_static_cell_info(StaticCellInfo, !CI),
diff --git a/compiler/exprn_aux.m b/compiler/exprn_aux.m
index 60b7dc4..32cb86c 100644
--- a/compiler/exprn_aux.m
+++ b/compiler/exprn_aux.m
@@ -447,6 +447,9 @@ transform_lval_in_component(Transform, Component0, Component, !Acc) :-
         Component0 = foreign_proc_fail_to(_),
         Component = Component0
     ;
+        Component0 = foreign_proc_alloc_id(_),
+        Component = Component0
+    ;
         Component0 = foreign_proc_noop,
         Component = Component0
     ).
diff --git a/compiler/global_data.m b/compiler/global_data.m
index a5181a5..0a77154 100644
--- a/compiler/global_data.m
+++ b/compiler/global_data.m
@@ -27,6 +27,8 @@
 :- import_module assoc_list.
 :- import_module bool.
 :- import_module list.
+:- import_module map.
+:- import_module set_tree234.
 
 %-----------------------------------------------------------------------------%
 
@@ -87,6 +89,13 @@
 :- pred add_scalar_static_cell_natural_types(list(rval)::in, data_id::out,
     static_cell_info::in, static_cell_info::out) is det.
 
+:- pred global_data_add_new_alloc_sites(set_tree234(alloc_site_info)::in,
+    global_data::in, global_data::out) is det.
+
+:- pred global_data_get_all_alloc_sites(global_data::in,
+    list(alloc_site_info)::out, map(alloc_site_id, layout_slot_name)::out)
+    is det.
+
 :- pred find_general_llds_types(have_unboxed_floats::in, list(mer_type)::in,
     list(list(rval))::in, list(llds_type)::out) is semidet.
 
@@ -149,7 +158,6 @@
 :- import_module bimap.
 :- import_module counter.
 :- import_module int.
-:- import_module map.
 :- import_module maybe.
 :- import_module pair.
 :- import_module require.
@@ -188,14 +196,17 @@
 
                 % Information about all the statically allocated cells
                 % created so far.
-                gd_static_cell_info         :: static_cell_info
+                gd_static_cell_info         :: static_cell_info,
+
+                % Information about all allocation sites in this module.
+                gd_alloc_sites              :: set_tree234(alloc_site_info)
             ).
 
 global_data_init(StaticCellInfo, GlobalData) :-
     map.init(EmptyDataMap),
     map.init(EmptyLayoutMap),
     GlobalData = global_data(EmptyDataMap, EmptyLayoutMap, [],
-        0, [], StaticCellInfo).
+        0, [], StaticCellInfo, set_tree234.init).
 
 global_data_add_new_proc_var(PredProcId, ProcVar, !GlobalData) :-
     ProcVarMap0 = !.GlobalData ^ gd_proc_var_map,
@@ -256,6 +267,25 @@ global_data_get_static_cell_info(GlobalData, StaticCellInfo) :-
 global_data_set_static_cell_info(StaticCellInfo, !GlobalData) :-
     !GlobalData ^ gd_static_cell_info := StaticCellInfo.
 
+global_data_add_new_alloc_sites(NewAllocSites, !GlobalData) :-
+    AllocSites0 = !.GlobalData ^ gd_alloc_sites,
+    set_tree234.union(NewAllocSites, AllocSites0, AllocSites),
+    !GlobalData ^ gd_alloc_sites := AllocSites.
+
+global_data_get_all_alloc_sites(GlobalData, AllocSites, AllocIdMap) :-
+    AllocSitesSet = GlobalData ^ gd_alloc_sites,
+    AllocSites = set_tree234.to_sorted_list(AllocSitesSet),
+    list.foldl2(make_alloc_id_map, AllocSites, 0, _Slot, map.init, AllocIdMap).
+
+:- pred make_alloc_id_map(alloc_site_info::in, int::in, int::out,
+    map(alloc_site_id, layout_slot_name)::in,
+    map(alloc_site_id, layout_slot_name)::out) is det.
+
+make_alloc_id_map(AllocSite, Slot, Slot + 1, !Map) :-
+    AllocId = alloc_site_id(AllocSite),
+    ArraySlot = layout_slot(alloc_site_array, Slot),
+    svmap.det_insert(AllocId, ArraySlot, !Map).
+
 %-----------------------------------------------------------------------------%
 
     % There is one scalar_cell_group for every group of scalar cells that
@@ -678,9 +708,11 @@ bump_type_num_counter(Increment, !GlobalData) :-
 
 merge_global_datas(GlobalDataA, GlobalDataB, GlobalData, GlobalDataRemap) :-
     GlobalDataA = global_data(ProcVarMapA, ProcLayoutMapA, ClosureLayoutsA,
-        TSStringSlotCounterA, TSRevStringTableA, StaticCellInfoA),
+        TSStringSlotCounterA, TSRevStringTableA, StaticCellInfoA,
+        AllocSitesA),
     GlobalDataB = global_data(ProcVarMapB, ProcLayoutMapB, ClosureLayoutsB,
-        TSStringSlotCounterB, TSRevStringTableB, StaticCellInfoB),
+        TSStringSlotCounterB, TSRevStringTableB, StaticCellInfoB,
+        AllocSitesB),
     ProcVarMap = map.old_merge(ProcVarMapA, ProcVarMapB),
     ProcLayoutMap = map.old_merge(ProcLayoutMapA, ProcLayoutMapB),
     ClosureLayouts = ClosureLayoutsA ++ ClosureLayoutsB,
@@ -689,8 +721,9 @@ merge_global_datas(GlobalDataA, GlobalDataB, GlobalData, GlobalDataRemap) :-
         TSRevStringTable, TSStringSlotCounter, MaybeTSStringTableRemap),
     merge_static_cell_infos(StaticCellInfoA, StaticCellInfoB, StaticCellInfo,
         StaticCellRemap),
+    set_tree234.union(AllocSitesA, AllocSitesB, AllocSites),
     GlobalData = global_data(ProcVarMap, ProcLayoutMap, ClosureLayouts,
-        TSStringSlotCounter, TSRevStringTable, StaticCellInfo),
+        TSStringSlotCounter, TSRevStringTable, StaticCellInfo, AllocSites),
     GlobalDataRemap =
         global_data_remapping(MaybeTSStringTableRemap, StaticCellRemap).
 
@@ -1134,6 +1167,7 @@ remap_foreign_proc_component(Remap, Comp0, Comp) :-
         ( Comp0 = foreign_proc_raw_code(_, _, _, _)
         ; Comp0 = foreign_proc_user_code(_, _, _)
         ; Comp0 = foreign_proc_fail_to(_)
+        ; Comp0 = foreign_proc_alloc_id(_)
         ; Comp0 = foreign_proc_noop
         ),
         Comp = Comp0
diff --git a/compiler/jumpopt.m b/compiler/jumpopt.m
index 5d666d8..6eb37ca 100644
--- a/compiler/jumpopt.m
+++ b/compiler/jumpopt.m
@@ -1213,6 +1213,8 @@ short_foreign_proc_component(InstrMap, !Component, !Redirect) :-
             !:Redirect = yes
         )
     ;
+        !.Component = foreign_proc_alloc_id(_)
+    ;
         !.Component = foreign_proc_noop
     ).
 
diff --git a/compiler/layout.m b/compiler/layout.m
index 361a509..d41b258 100644
--- a/compiler/layout.m
+++ b/compiler/layout.m
@@ -294,6 +294,20 @@
 
 %-----------------------------------------------------------------------------%
 %
+% Allocation site information
+%
+
+:- type alloc_site_info
+    --->    alloc_site_info(
+                % define MR_AllocSiteInfo
+                as_proc_label       :: proc_label,
+                as_context          :: prog_context,
+                as_type             :: string,
+                as_size             :: int
+            ).
+
+%-----------------------------------------------------------------------------%
+%
 % Global variables that hold arrays of layout structures.
 %
 
@@ -329,7 +343,8 @@
     ;       proc_table_io_decl_array
     ;       proc_event_layouts_array
     ;       proc_exec_trace_array
-    ;       threadscope_string_table_array.
+    ;       threadscope_string_table_array
+    ;       alloc_site_array.
 
 %-----------------------------------------------------------------------------%
 %
diff --git a/compiler/layout_out.m b/compiler/layout_out.m
index 9164706..60c1d07 100644
--- a/compiler/layout_out.m
+++ b/compiler/layout_out.m
@@ -47,7 +47,7 @@
     list(proc_layout_proc_static)::in,
     list(int)::in, list(int)::in, list(int)::in, list(table_io_decl_data)::in,
     list(layout_slot_name)::in, list(proc_layout_exec_trace)::in,
-    io::di, io::uo) is det.
+    list(alloc_site_info)::in, io::di, io::uo) is det.
 
 :- pred output_layout_array_defns(llds_out_info::in,
     list(rval)::in, list(int)::in, list(int)::in, list(int)::in,
@@ -58,7 +58,7 @@
     list(proc_layout_proc_static)::in,
     list(int)::in, list(int)::in, list(int)::in, list(table_io_decl_data)::in,
     list(layout_slot_name)::in, list(proc_layout_exec_trace)::in,
-    list(string)::in,
+    list(string)::in, list(alloc_site_info)::in,
     decl_set::in, decl_set::out, io::di, io::uo) is det.
 
 %-----------------------------------------------------------------------------%
@@ -190,6 +190,7 @@
 :- import_module pair.
 :- import_module require.
 :- import_module string.
+:- import_module term.
 :- import_module varset.
 
 %-----------------------------------------------------------------------------%
@@ -199,7 +200,7 @@ output_layout_array_decls(Info, PseudoTypeInfos, HLDSVarNums,
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, !IO) :-
+        ProcEventLayouts, ExecTraces, AllocSites, !IO) :-
     MangledModuleName = Info ^ lout_mangled_module_name,
     (
         PseudoTypeInfos = []
@@ -368,6 +369,15 @@ output_layout_array_decls(Info, PseudoTypeInfos, HLDSVarNums,
         output_layout_array_name_storage_type_name(MangledModuleName,
             ProcEventLayoutArrayName, not_being_defined, !IO),
         io.write_string("[];\n", !IO)
+    ),
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        AllocSiteArrayName = alloc_site_array,
+        output_layout_array_name_storage_type_name(MangledModuleName,
+            AllocSiteArrayName, not_being_defined, !IO),
+        io.write_string("[];\n", !IO)
     ).
 
 output_layout_array_defns(Info, PseudoTypeInfos, HLDSVarNums,
@@ -375,7 +385,8 @@ output_layout_array_defns(Info, PseudoTypeInfos, HLDSVarNums,
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, TSStringTable, !DeclSet, !IO) :-
+        ProcEventLayouts, ExecTraces, TSStringTable, AllocSites,
+        !DeclSet, !IO) :-
     (
         PseudoTypeInfos = []
     ;
@@ -492,6 +503,12 @@ output_layout_array_defns(Info, PseudoTypeInfos, HLDSVarNums,
     ;
         TSStringTable = [_ | _],
         output_threadscope_string_table_array(Info, TSStringTable, !IO)
+    ),
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        output_alloc_sites_array(Info, AllocSites, !IO)
     ).
 
 %-----------------------------------------------------------------------------%
@@ -1550,6 +1567,42 @@ output_threadscope_string_table_slot(Info, String, !Slot, !IO) :-
     io.write_string(", 0},\n", !IO).
 
 %-----------------------------------------------------------------------------%
+%
+% Definition of array #21: allocation site structures.
+%
+
+:- pred output_alloc_sites_array(llds_out_info::in, list(alloc_site_info)::in,
+    io::di, io::uo) is det.
+
+output_alloc_sites_array(Info, AllocSites, !IO) :-
+    ModuleName = Info ^ lout_mangled_module_name,
+    output_layout_array_name_storage_type_name(ModuleName, alloc_site_array,
+        being_defined, !IO),
+    list.length(AllocSites, NumAllocSitess),
+    io.format("[%d] = {\n", [i(NumAllocSitess)], !IO),
+    list.foldl2(output_alloc_site_slot(Info), AllocSites, 0, _, !IO),
+    io.write_string("};\n\n", !IO).
+
+:- pred output_alloc_site_slot(llds_out_info::in, alloc_site_info::in, int::in,
+    int::out, io::di, io::uo) is det.
+
+output_alloc_site_slot(_Info, AllocSite, !Slot, !IO) :-
+    AllocSite = alloc_site_info(ProcLabel, Context, TypeMsg, Words),
+    term.context_file(Context, FileName),
+    term.context_line(Context, LineNumber),
+    io.write_string("\t{ ", !IO),
+    output_proc_label(ProcLabel, !IO),
+    io.write_string(", ", !IO),
+    quote_and_write_string(FileName, !IO),
+    io.write_string(", ", !IO),
+    io.write_int(LineNumber, !IO),
+    io.write_string(", ", !IO),
+    quote_and_write_string(TypeMsg, !IO),
+    io.write_string(", ", !IO),
+    io.write_int(Words, !IO),
+    io.write_string("},\n", !IO).
+
+%-----------------------------------------------------------------------------%
 
 output_layout_name_decl(LayoutName, !IO) :-
     output_layout_name_storage_type_name(LayoutName, not_being_defined, !IO),
@@ -1639,6 +1692,9 @@ output_layout_array_name(UseMacro, ModuleName, ArrayName, !IO) :-
         ;
             ArrayName = threadscope_string_table_array,
             io.write_string("MR_threadscope_strings", !IO)
+        ;
+            ArrayName = alloc_site_array,
+            io.write_string("MR_alloc_sites", !IO)
         ),
         io.write_string("(", !IO),
         io.write_string(ModuleName, !IO),
@@ -1706,6 +1762,9 @@ output_layout_array_name(UseMacro, ModuleName, ArrayName, !IO) :-
             ArrayName = threadscope_string_table_array,
             io.write_string("mercury_data__threadscope_string_table_array__",
                 !IO)
+        ;
+            ArrayName = alloc_site_array,
+            io.write_string("mercury_data__alloc_sites_array__", !IO)
         ),
         io.write_string(ModuleName, !IO)
     ).
@@ -1957,9 +2016,11 @@ output_layout_array_name_storage_type_name(ModuleName, Name, _BeingDefined,
             Name, !IO)
     ;
         Name = threadscope_string_table_array,
-        io.write_string(
-            "static MR_Threadscope_String ",
-            !IO),
+        io.write_string("static MR_Threadscope_String ", !IO)
+    ;
+        Name = alloc_site_array,
+        % The type field may be updated at runtime so this array is not const.
+        io.write_string("static MR_AllocSiteInfo ", !IO),
         output_layout_array_name(do_not_use_layout_macro, ModuleName,
             Name, !IO)
     ).
@@ -3042,6 +3103,7 @@ output_layout_slots_in_vector(ModuleName, [SlotName | SlotNames], !IO) :-
         ; ArrayName = proc_event_layouts_array
         ; ArrayName = proc_exec_trace_array
         ; ArrayName = threadscope_string_table_array
+        ; ArrayName = alloc_site_array
         ),
         output_layout_slot_addr(use_layout_macro, ModuleName, SlotName, !IO),
         io.write_string(",\n", !IO),
diff --git a/compiler/livemap.m b/compiler/livemap.m
index 016b8e0..f19dec8 100644
--- a/compiler/livemap.m
+++ b/compiler/livemap.m
@@ -396,6 +396,8 @@ build_livemap_foreign_proc_components([Component | Components],
     ;
         Component = foreign_proc_fail_to(_)
     ;
+        Component = foreign_proc_alloc_id(_)
+    ;
         Component = foreign_proc_noop
     ),
     build_livemap_foreign_proc_components(Components,
diff --git a/compiler/llds.m b/compiler/llds.m
index 81bae1e..e4a1535 100644
--- a/compiler/llds.m
+++ b/compiler/llds.m
@@ -102,6 +102,9 @@
                 cfile_proc_layouts          :: list(proc_layout_data),
                 cfile_module_layout_data    :: list(module_layout_data),
                 cfile_closure_layout_data   :: list(closure_proc_id_data),
+                cfile_alloc_sites           :: list(alloc_site_info),
+                cfile_alloc_site_map        :: map(alloc_site_id,
+                                                layout_slot_name),
                 cfile_code                  :: list(comp_gen_c_module),
                 cfile_user_init_c_names     :: list(string),
                 cfile_user_final_c_names    :: list(string),
@@ -363,16 +366,17 @@
             % Restore maxfr from the saved copy in the given lval. Assumes the
             % lval was saved with save_maxfr.
 
-    ;       incr_hp(lval, maybe(tag), maybe(int), rval, string,
+    ;       incr_hp(lval, maybe(tag), maybe(int), rval, maybe(alloc_site_id),
                 may_use_atomic_alloc, maybe(rval), llds_reuse)
-            % incr_hp(Target, MaybeTag, MaybeOffset, SizeRval, TypeMsg,
+            % incr_hp(Target, MaybeTag, MaybeOffset, SizeRval, MaybeAllocId,
             %   MayUseAtomicAlloc, MaybeRegionId, MaybeReuse)
             %
             % Get a memory block of a size given by SizeRval and put its
             % address in Target, possibly after incrementing it by Offset words
             % (if MaybeOffset = yes(Offset)) and/or after tagging it with Tag
-            % (if MaybeTag = yes(Tag)). TypeMsg gives the name of the type
-            % constructor of the memory cell for use in memory profiling.
+            % (if MaybeTag = yes(Tag)).
+            % If MaybeAllocId = yes(AllocId) then AllocId identifies the
+            % allocation site, for use in memory profiling.
             % MayUseAtomicAlloc says whether we can use the atomic variants
             % of the Boehm gc allocator calls. If MaybeRegionId =
             % yes(RegionId), then the block should be allocated in the region
@@ -645,6 +649,9 @@
     --->    foreign_proc_code(ground, ground, ground, ground, ground,
                 ground, ground, ground, ground, ground).
 
+:- type alloc_site_id
+    --->    alloc_site_id(alloc_site_info).
+
 :- type stack_incr_kind
     --->    stack_incr_leaf         % The incr_sp creates the stack frame
                                     % of a leaf procedure.
@@ -745,6 +752,7 @@
     ;       foreign_proc_raw_code(can_branch_away, proc_affects_liveness,
                 c_code_live_lvals, string)
     ;       foreign_proc_fail_to(label)
+    ;       foreign_proc_alloc_id(alloc_site_id)
     ;       foreign_proc_noop.
 
 :- type can_branch_away
diff --git a/compiler/llds_out_file.m b/compiler/llds_out_file.m
index 431e6b3..6e71168 100644
--- a/compiler/llds_out_file.m
+++ b/compiler/llds_out_file.m
@@ -161,9 +161,11 @@ output_single_c_file(Globals, CFile, FileStream, !DeclSet, !IO) :-
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TSStringTable,
         TableIoDecls, TableIoDeclMap, ProcEventLayouts, ExecTraces,
         ProcLayoutDatas, ModuleLayoutDatas, ClosureLayoutDatas,
+        AllocSites, AllocSiteMap,
         Modules, UserInitPredCNames, UserFinalPredCNames, ComplexityProcs),
     Info = init_llds_out_info(ModuleName, Globals,
-        InternalLabelToLayoutMap, EntryLabelToLayoutMap, TableIoDeclMap),
+        InternalLabelToLayoutMap, EntryLabelToLayoutMap, TableIoDeclMap,
+        AllocSiteMap),
     library.version(Version),
     io.set_output_stream(FileStream, OutputStream, !IO),
     module_name_to_file_name(Globals, ModuleName, ".m", do_not_create_dirs,
@@ -202,7 +204,7 @@ output_single_c_file(Globals, CFile, FileStream, !DeclSet, !IO) :-
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, !IO),
+        ProcEventLayouts, ExecTraces, AllocSites, !IO),
 
     list.foldl2(output_proc_layout_data_defn(Info), ProcLayoutDatas,
         !DeclSet, !IO),
@@ -218,7 +220,8 @@ output_single_c_file(Globals, CFile, FileStream, !DeclSet, !IO) :-
         NoVarLabelLayouts, SVarLabelLayouts, LVarLabelLayouts,
         CallSiteStatics, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TableIoDecls,
-        ProcEventLayouts, ExecTraces, TSStringTable, !DeclSet, !IO),
+        ProcEventLayouts, ExecTraces, TSStringTable, AllocSites,
+        !DeclSet, !IO),
 
     list.foldl2(output_comp_gen_c_module(Info), Modules, !DeclSet, !IO),
     list.foldl(output_user_foreign_code(Info), UserForeignCode, !IO),
@@ -226,7 +229,8 @@ output_single_c_file(Globals, CFile, FileStream, !DeclSet, !IO) :-
     io.write_string("\n", !IO),
     output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
         ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs, TSStringTable,
-        UserInitPredCNames, UserFinalPredCNames, !DeclSet, !IO),
+        AllocSites, UserInitPredCNames, UserFinalPredCNames,
+        !DeclSet, !IO),
     io.set_output_stream(OutputStream, _, !IO).
 
 :- pred module_gather_env_var_names(list(comp_gen_c_module)::in,
@@ -248,12 +252,13 @@ proc_gather_env_var_names([Proc | Procs], !EnvVarNames) :-
 :- pred output_c_module_init_list(llds_out_info::in, module_name::in,
     list(comp_gen_c_module)::in, list(rtti_data)::in,
     list(proc_layout_data)::in, list(module_layout_data)::in,
-    list(complexity_proc_info)::in, list(string)::in, list(string)::in,
-    list(string)::in, decl_set::in, decl_set::out, io::di, io::uo) is det.
+    list(complexity_proc_info)::in, list(string)::in,
+    list(alloc_site_info)::in, list(string)::in, list(string)::in,
+    decl_set::in, decl_set::out, io::di, io::uo) is det.
 
 output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
         ProcLayoutDatas, ModuleLayoutDatas, ComplexityProcs, TSStringTable,
-        InitPredNames, FinalPredNames, !DeclSet, !IO) :-
+        AllocSites, InitPredNames, FinalPredNames, !DeclSet, !IO) :-
     MustInit = (pred(Module::in) is semidet :-
         module_defines_label_with_layout(Info, Module)
     ),
@@ -331,7 +336,7 @@ output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
     io.write_string("\t}\n", !IO),
     io.write_string("\tdone = MR_TRUE;\n", !IO),
 
-    output_init_bunch_calls(Info, "always", 0,AlwaysInitModuleBunches, !IO),
+    output_init_bunch_calls(Info, "always", 0, AlwaysInitModuleBunches, !IO),
 
     (
         MaybeInitModuleBunches = []
@@ -341,6 +346,8 @@ output_c_module_init_list(Info, ModuleName, Modules, RttiDatas,
     ),
 
     output_c_data_init_list(RttiDatas, !IO),
+    output_alloc_sites_init(Info, AllocSites, !IO),
+
     % The call to the debugger initialization function is for bootstrapping;
     % once the debugger has been modified to call do_init_modules_debugger()
     % and all debuggable object files created before this change have been
@@ -548,6 +555,28 @@ output_c_data_init_list([Data | Datas], !IO) :-
     rtti_out.init_rtti_data_if_nec(Data, !IO),
     output_c_data_init_list(Datas, !IO).
 
+    % Output code to register the allocation sites defined in this module.
+    %
+:- pred output_alloc_sites_init(llds_out_info::in, list(alloc_site_info)::in,
+    io::di, io::uo) is det.
+
+output_alloc_sites_init(Info, AllocSites, !IO) :-
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        MangledModuleName = Info ^ lout_mangled_module_name,
+        NumAllocSites = list.length(AllocSites),
+        io.write_string("#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION\n", !IO),
+        io.write_string("\tMR_register_alloc_sites(", !IO),
+        output_layout_array_name(do_not_use_layout_macro, MangledModuleName,
+            alloc_site_array, !IO),
+        io.write_string(", ", !IO),
+        io.write_int(NumAllocSites, !IO),
+        io.write_string(");\n", !IO),
+        io.write_string("#endif\n", !IO)
+    ).
+
     % Output code to register each type_ctor_info defined in this module.
     %
 :- pred output_type_tables_init_list(list(rtti_data)::in, io::di, io::uo)
diff --git a/compiler/llds_out_instr.m b/compiler/llds_out_instr.m
index bd56b5f..13c2748 100644
--- a/compiler/llds_out_instr.m
+++ b/compiler/llds_out_instr.m
@@ -255,6 +255,7 @@ output_record_foreign_proc_component_decls(Info, Component, !DeclSet, !IO) :-
         ( Component = foreign_proc_raw_code(_, _, _, _)
         ; Component = foreign_proc_user_code(_, _, _)
         ; Component = foreign_proc_fail_to(_)
+        ; Component = foreign_proc_alloc_id(_)
         ; Component = foreign_proc_noop
         )
     ).
@@ -629,13 +630,13 @@ output_instruction(Info, Instr, ProfInfo, !IO) :-
         output_lval(Info, Lval, !IO),
         io.write_string(");\n", !IO)
     ;
-        Instr = incr_hp(Lval, MaybeTag, MaybeOffset, SizeRval, TypeMsg,
+        Instr = incr_hp(Lval, MaybeTag, MaybeOffset, SizeRval, MaybeAllocId,
             MayUseAtomicAlloc, MaybeRegionRval, MaybeReuse),
         io.write_string("\t", !IO),
         (
             MaybeReuse = no_llds_reuse,
             output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset,
-                SizeRval, TypeMsg, MayUseAtomicAlloc, MaybeRegionRval,
+                SizeRval, MaybeAllocId, MayUseAtomicAlloc, MaybeRegionRval,
                 ProfInfo, !IO)
         ;
             MaybeReuse = llds_reuse(ReuseRval, MaybeFlagLval),
@@ -674,7 +675,7 @@ output_instruction(Info, Instr, ProfInfo, !IO) :-
             output_rval(Info, ReuseRval, !IO),
             io.write_string(", ", !IO),
             output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset,
-                SizeRval, TypeMsg, MayUseAtomicAlloc, MaybeRegionRval,
+                SizeRval, MaybeAllocId, MayUseAtomicAlloc, MaybeRegionRval,
                 ProfInfo, !IO),
             io.write_string(")", !IO)
         ),
@@ -1511,12 +1512,12 @@ output_label_or_not_reached(MaybeLabel, !IO) :-
 %
 
 :- pred output_incr_hp_no_reuse(llds_out_info::in, lval::in, maybe(tag)::in,
-    maybe(int)::in, rval::in, string::in, may_use_atomic_alloc::in,
-    maybe(rval)::in, pair(label, set_tree234(label))::in, io::di, io::uo)
-    is det.
+    maybe(int)::in, rval::in, maybe(alloc_site_id)::in,
+    may_use_atomic_alloc::in, maybe(rval)::in,
+    pair(label, set_tree234(label))::in, io::di, io::uo) is det.
 
-output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset, Rval, TypeMsg,
-        MayUseAtomicAlloc, MaybeRegionRval, ProfInfo, !IO) :-
+output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset, Rval, MaybeAllocId,
+        MayUseAtomicAlloc, MaybeRegionRval, _ProfInfo, !IO) :-
     (
         MaybeRegionRval = yes(RegionRval),
         (
@@ -1575,11 +1576,8 @@ output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset, Rval, TypeMsg,
             ),
             output_rval_as_type(Info, Rval, lt_word, !IO),
             io.write_string(", ", !IO),
-            ProfInfo = CallerLabel - _,
-            output_label(CallerLabel, !IO),
-            io.write_string(", """, !IO),
-            c_util.output_quoted_string(TypeMsg, !IO),
-            io.write_string(""")", !IO)
+            output_maybe_alloc_site_id(Info, MaybeAllocId, !IO),
+            io.write_string(", NULL)", !IO)
         ;
             ProfMem = no,
             (
@@ -1646,6 +1644,28 @@ output_incr_hp_no_reuse(Info, Lval, MaybeTag, MaybeOffset, Rval, TypeMsg,
         )
     ).
 
+:- pred output_maybe_alloc_site_id(llds_out_info::in, maybe(alloc_site_id)::in,
+    io::di, io::uo) is det.
+
+output_maybe_alloc_site_id(Info, MaybeAllocId, !IO) :-
+    (
+        MaybeAllocId = yes(AllocId),
+        output_alloc_site_id(Info, AllocId, !IO)
+    ;
+        MaybeAllocId = no,
+        io.write_string("NULL", !IO)
+    ).
+
+:- pred output_alloc_site_id(llds_out_info::in, alloc_site_id::in,
+    io::di, io::uo) is det.
+
+output_alloc_site_id(Info, AllocId, !IO) :-
+    MangledModuleName = Info ^ lout_mangled_module_name,
+    AllocSiteMap = Info ^ lout_alloc_site_map,
+    map.lookup(AllocSiteMap, AllocId, AllocSiteSlotName),
+    output_layout_slot_addr(use_layout_macro, MangledModuleName,
+        AllocSiteSlotName, !IO).
+
 %----------------------------------------------------------------------------%
 %
 % Code for the output of push_region_frame, region_fill_frame,
@@ -1743,6 +1763,9 @@ output_foreign_proc_component(Info, Component, !IO) :-
         output_label_no_prefix(Label, !IO),
         io.write_string(");\n", !IO)
     ;
+        Component = foreign_proc_alloc_id(AllocId),
+        output_alloc_site_id(Info, AllocId, !IO)
+    ;
         Component = foreign_proc_noop
     ).
 
diff --git a/compiler/llds_out_util.m b/compiler/llds_out_util.m
index 1009be1..471685a 100644
--- a/compiler/llds_out_util.m
+++ b/compiler/llds_out_util.m
@@ -39,6 +39,8 @@
                 lout_entry_label_to_layout      :: map(label, data_id),
                 lout_table_io_decl_map          :: map(pred_proc_id,
                                                     layout_slot_name),
+                lout_alloc_site_map             :: map(alloc_site_id,
+                                                    layout_slot_name),
                 lout_auto_comments              :: bool,
                 lout_line_numbers               :: bool,
                 lout_emit_c_loops               :: bool,
@@ -57,7 +59,8 @@
 
 :- func init_llds_out_info(module_name, globals,
     map(label, layout_slot_name), map(label, data_id),
-    map(pred_proc_id, layout_slot_name)) = llds_out_info.
+    map(pred_proc_id, layout_slot_name),
+    map(alloc_site_id, layout_slot_name)) = llds_out_info.
 
 :- pred output_set_line_num(llds_out_info::in, prog_context::in,
     io::di, io::uo) is det.
@@ -110,8 +113,8 @@
 %----------------------------------------------------------------------------%
 
 init_llds_out_info(ModuleName, Globals,
-        InternalLabelToLayoutMap, EntryLabelToLayoutMap, TableIoDeclMap)
-        = Info :-
+        InternalLabelToLayoutMap, EntryLabelToLayoutMap, TableIoDeclMap,
+        AllocSiteMap) = Info :-
     MangledModuleName = sym_name_mangle(ModuleName),
     globals.lookup_bool_option(Globals, auto_comments, AutoComments),
     globals.lookup_bool_option(Globals, line_numbers, LineNumbers),
@@ -131,6 +134,7 @@ init_llds_out_info(ModuleName, Globals,
     globals.get_trace_level(Globals, TraceLevel),
     Info = llds_out_info(ModuleName, MangledModuleName,
         InternalLabelToLayoutMap, EntryLabelToLayoutMap, TableIoDeclMap,
+        AllocSiteMap,
         AutoComments, LineNumbers,
         EmitCLoops, GenerateBytecode, LocalThreadEngineBase,
         ProfileCalls, ProfileTime, ProfileMemory, ProfileDeep,
diff --git a/compiler/mercury_compile_llds_back_end.m b/compiler/mercury_compile_llds_back_end.m
index 8119d5a..b8e37f6 100644
--- a/compiler/mercury_compile_llds_back_end.m
+++ b/compiler/mercury_compile_llds_back_end.m
@@ -592,6 +592,7 @@ llds_output_pass(HLDS, GlobalData0, Procs, ModuleName, CompileErrors,
     global_data_get_static_cell_info(GlobalData, StaticCellInfo),
     get_static_cells(StaticCellInfo,
         ScalarCommonCellDatas, VectorCommonCellDatas),
+    global_data_get_all_alloc_sites(GlobalData, AllocSites, AllocIdMap),
     global_data_get_threadscope_string_table(GlobalData, TSStringTable),
 
     % Next we put it all together and output it to one or more C files.
@@ -641,7 +642,8 @@ llds_output_pass(HLDS, GlobalData0, Procs, ModuleName, CompileErrors,
         CallSites, CoveragePoints, ProcStatics,
         ProcHeadVarNums, ProcVarNames, ProcBodyBytecodes, TSStringTable,
         TableIoDecls, TableIoDeclMap, ProcEventLayouts, ExecTraces,
-        ProcLayoutDatas, ModuleLayoutDatas, ClosureLayoutDatas, ChunkedModules,
+        ProcLayoutDatas, ModuleLayoutDatas, ClosureLayoutDatas,
+        AllocSites, AllocIdMap, ChunkedModules,
         UserInitPredCNames, UserFinalPredCNames, ComplexityProcs),
 
     output_llds_file(Globals, ModuleName, CFile, Verbose, Stats, !IO),
diff --git a/compiler/middle_rec.m b/compiler/middle_rec.m
index f6bcb82..fd2a999 100644
--- a/compiler/middle_rec.m
+++ b/compiler/middle_rec.m
@@ -616,6 +616,7 @@ find_used_registers_component(foreign_proc_outputs(Out), !Used) :-
 find_used_registers_component(foreign_proc_user_code(_, _, _), !Used).
 find_used_registers_component(foreign_proc_raw_code(_, _, _, _), !Used).
 find_used_registers_component(foreign_proc_fail_to(_), !Used).
+find_used_registers_component(foreign_proc_alloc_id(_), !Used).
 find_used_registers_component(foreign_proc_noop, !Used).
 
 :- pred find_used_registers_lvals(list(lval)::in,
diff --git a/compiler/ml_accurate_gc.m b/compiler/ml_accurate_gc.m
index f5d44d1..0c9ee5f 100644
--- a/compiler/ml_accurate_gc.m
+++ b/compiler/ml_accurate_gc.m
@@ -535,7 +535,7 @@ fixup_newobj_in_atomic_statement(AtomicStatement0, Stmt, !Fixup) :-
     (
         AtomicStatement0 = new_object(Lval, MaybeTag, _ExplicitSecTag,
             PointerType, _MaybeSizeInWordsRval, _MaybeCtorName,
-            ArgRvals, _ArgTypes, _MayUseAtomic)
+            ArgRvals, _ArgTypes, _MayUseAtomic, _AllocId)
     ->
         % Generate the declaration of the new local variable.
         %
diff --git a/compiler/ml_elim_nested.m b/compiler/ml_elim_nested.m
index 08eadcf..02e0545 100644
--- a/compiler/ml_elim_nested.m
+++ b/compiler/ml_elim_nested.m
@@ -882,9 +882,11 @@ ml_create_env(Action, EnvClassName, EnvTypeName, LocalVars, Context,
         % OnHeap should be "yes" only on for the IL backend, for which
         % the value of MayUseAtomic is immaterial.
         MayUseAtomic = may_not_use_atomic_alloc,
+        MaybeAllocId = no,
         NewObj = [statement(
             ml_stmt_atomic(new_object(ml_var(EnvVar, EnvTypeName),
-                no, no, EnvTypeName, no, no, [], [], MayUseAtomic)),
+                no, no, EnvTypeName, no, no, [], [], MayUseAtomic,
+                MaybeAllocId)),
             Context)]
     ;
         OnHeap = no,
@@ -1841,11 +1843,13 @@ fixup_atomic_stmt(Action, Info, Atomic0, Atomic) :-
         Atomic = delete_object(Rval)
     ;
         Atomic0 = new_object(Target0, MaybeTag, ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args0, ArgTypes, MayUseAtomic),
+            MaybeSize, MaybeCtorName, Args0, ArgTypes, MayUseAtomic,
+            MaybeAllocId),
         fixup_lval(Action, Info, Target0, Target),
         fixup_rvals(Action, Info, Args0, Args),
         Atomic = new_object(Target, MaybeTag, ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic)
+            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic,
+            MaybeAllocId)
     ;
         Atomic0 = mark_hp(Lval0),
         fixup_lval(Action, Info, Lval0, Lval),
@@ -1917,6 +1921,7 @@ fixup_target_code_component(Action, Info, Component0, Component) :-
         ; Component0 = user_target_code(_Code, _Context, _Attrs)
         ; Component0 = target_code_type(_Type)
         ; Component0 = target_code_name(_Name)
+        ; Component0 = target_code_alloc_id(_AllocId)
         ),
         Component = Component0
     ;
diff --git a/compiler/ml_foreign_proc_gen.m b/compiler/ml_foreign_proc_gen.m
index 7121a4f..fd451cb 100644
--- a/compiler/ml_foreign_proc_gen.m
+++ b/compiler/ml_foreign_proc_gen.m
@@ -56,6 +56,7 @@
 :- import_module libs.globals.
 :- import_module libs.options.
 :- import_module ml_backend.ml_code_util.
+:- import_module ml_backend.ml_global_data.
 :- import_module parse_tree.builtin_lib_types.
 
 :- import_module bool.
@@ -108,6 +109,7 @@ ml_generate_runtime_cond_code(Expr, CondRval, !Info) :-
     % For model_non pragma c_code,
     % we generate code of the following form:
     %
+    %   #define MR_ALLOC_ID <allocation id>
     %   #define MR_PROC_LABEL <procedure name>
     %   <declaration of locals needed for boxing/unboxing>
     %   {
@@ -146,15 +148,13 @@ ml_generate_runtime_cond_code(Expr, CondRval, !Info) :-
     %       #undef SUCCEED_LAST
     %       #undef LOCALS
     %   }
+    %   #undef MR_ALLOC_ID
     %   #undef MR_PROC_LABEL
     %
-    % We insert a #define for MR_PROC_LABEL, so that the C code in the Mercury
-    % standard library that allocates memory manually can use MR_PROC_LABEL
-    % as the procname argument to incr_hp_msg(), for memory profiling.
-    % Hard-coding the procname argument in the C code would be wrong,
-    % since it wouldn't handle the case where the original pragma foreign_proc
-    % procedure gets inlined and optimized away. Of course we also need to
-    % #undef it afterwards.
+    % We insert a #define MR_ALLOC_ID so that the C code in the Mercury
+    % standard library that allocates memory manually can use MR_ALLOC_ID as an
+    % argument to incr_hp_msg(), for memory profiling.  It replaces an older
+    % macro MR_PROC_LABEL, which is retained only for backwards compatibility.
     %
 ml_gen_nondet_pragma_foreign_proc(CodeModel, Attributes, PredId, _ProcId,
         Args, Context, LocalVarsDecls, LocalVarsContext,
@@ -198,13 +198,18 @@ ml_gen_nondet_pragma_foreign_proc(CodeModel, Attributes, PredId, _ProcId,
     ml_gen_obtain_release_global_lock(!.Info, ThreadSafe, PredId,
         ObtainLock, ReleaseLock),
 
+    % Generate the MR_ALLOC_ID #define.
+    ml_gen_hash_define_mr_alloc_id([FirstCode, SharedCode], Context,
+        HashDefineAllocId, HashUndefAllocId, !Info),
+
     % Generate the MR_PROC_LABEL #define.
-    ml_gen_hash_define_mr_proc_label(!.Info, HashDefine),
+    ml_gen_hash_define_mr_proc_label(!.Info, HashDefineProcLabel),
 
     % Put it all together.
     Starting_C_Code = list.condense([
         [raw_target_code("{\n", [])],
-        HashDefine,
+        HashDefineAllocId,
+        HashDefineProcLabel,
         ArgDeclsList,
         [raw_target_code("\tstruct {\n", []),
         user_target_code(LocalVarsDecls, LocalVarsContext, []),
@@ -223,8 +228,9 @@ ml_gen_nondet_pragma_foreign_proc(CodeModel, Attributes, PredId, _ProcId,
         raw_target_code("\twhile (1) {\n", []),
         raw_target_code("\t\t{\n", []),
         user_target_code(SharedCode, SharedContext, []),
-        raw_target_code("\n\t\t;}\n", []),
-        raw_target_code("#undef MR_PROC_LABEL\n", []),
+        raw_target_code("\n\t\t;}\n", [])],
+        HashUndefAllocId,
+        [raw_target_code("#undef MR_PROC_LABEL\n", []),
         raw_target_code(ReleaseLock, []),
         raw_target_code("\t\tif (MR_succeeded) {\n", [])],
         AssignOutputsList
@@ -737,6 +743,7 @@ ml_gen_pragma_il_proc_var_decl_defn(ModuleInfo, MLDSModuleName, ArgMap, VarSet,
     %
     % model_det pragma_c_proc:
     %
+    %   #define MR_ALLOC_ID <allocation id>
     %   #define MR_PROC_LABEL <procedure name>
     %   <declaration of locals needed for boxing/unboxing>
     %   {
@@ -749,10 +756,12 @@ ml_gen_pragma_il_proc_var_decl_defn(ModuleInfo, MLDSModuleName, ArgMap, VarSet,
     %       <release global lock>
     %       <assign output args>
     %   }
+    %   #undef MR_ALLOC_ID
     %   #undef MR_PROC_LABEL
     %
     % model_semi pragma_c_proc:
     %
+    %   #define MR_ALLOC_ID <allocation id>
     %   #define MR_PROC_LABEL <procedure name>
     %   <declaration of locals needed for boxing/unboxing>
     %   {
@@ -770,16 +779,13 @@ ml_gen_pragma_il_proc_var_decl_defn(ModuleInfo, MLDSModuleName, ArgMap, VarSet,
     %
     %       <succeeded> = SUCCESS_INDICATOR;
     %   }
+    %   #undef MR_ALLOC_ID
     %   #undef MR_PROC_LABEL
     %
-    % We insert a #define for MR_PROC_LABEL, so that the C code in
-    % the Mercury standard library that allocates memory manually
-    % can use MR_PROC_LABEL as the procname argument to
-    % incr_hp_msg(), for memory profiling. Hard-coding the procname
-    % argument in the C code would be wrong, since it wouldn't
-    % handle the case where the original pragma c_code procedure
-    % gets inlined and optimized away. Of course we also need to
-    % #undef it afterwards.
+    % We insert a #define MR_ALLOC_ID so that the C code in the Mercury
+    % standard library that allocates memory manually can use MR_ALLOC_ID as an
+    % argument to incr_hp_msg(), for memory profiling.  It replaces an older
+    % macro MR_PROC_LABEL, which is retained only for backwards compatibility.
     %
     % Note that we generate this code directly as
     % `target_code(lang_C, <string>)' instructions in the MLDS.
@@ -816,23 +822,29 @@ ml_gen_ordinary_pragma_c_proc(OrdinaryKind, Attributes, PredId, _ProcId,
     ml_gen_obtain_release_global_lock(!.Info, ThreadSafe, PredId,
         ObtainLock, ReleaseLock),
 
+    % Generate the MR_ALLOC_ID #define.
+    ml_gen_hash_define_mr_alloc_id([C_Code], Context,
+        HashDefineAllocId, HashUndefAllocId, !Info),
+
     % Generate the MR_PROC_LABEL #define.
-    ml_gen_hash_define_mr_proc_label(!.Info, HashDefine),
+    ml_gen_hash_define_mr_proc_label(!.Info, HashDefineProcLabel),
 
     % Put it all together.
     (
         OrdinaryKind = kind_det,
         Starting_C_Code = list.condense([
             [raw_target_code("{\n", [])],
-            HashDefine,
+            HashDefineAllocId,
+            HashDefineProcLabel,
             ArgDeclsList,
             [raw_target_code("\n", [])],
             AssignInputsList,
             [raw_target_code(ObtainLock, []),
             raw_target_code("\t\t{\n", []),
             user_target_code(C_Code, yes(Context), []),
-            raw_target_code("\n\t\t;}\n", []),
-            raw_target_code("#undef MR_PROC_LABEL\n", []),
+            raw_target_code("\n\t\t;}\n", [])],
+            HashUndefAllocId,
+            [raw_target_code("#undef MR_PROC_LABEL\n", []),
             raw_target_code(ReleaseLock, [])],
             AssignOutputsList
         ]),
@@ -846,15 +858,17 @@ ml_gen_ordinary_pragma_c_proc(OrdinaryKind, Attributes, PredId, _ProcId,
         ml_success_lval(!.Info, SucceededLval),
         Starting_C_Code = list.condense([
             [raw_target_code("{\n", [])],
-            HashDefine,
+            HashDefineAllocId,
+            HashDefineProcLabel,
             ArgDeclsList,
             [raw_target_code("\n", [])],
             AssignInputsList,
             [raw_target_code(ObtainLock, []),
             raw_target_code("\t\t{\n", []),
             user_target_code(C_Code, yes(Context), []),
-            raw_target_code("\n\t\t;}\n", []),
-            raw_target_code("#undef MR_PROC_LABEL\n", []),
+            raw_target_code("\n\t\t;}\n", [])],
+            HashUndefAllocId,
+            [raw_target_code("#undef MR_PROC_LABEL\n", []),
             raw_target_code(ReleaseLock, [])]
         ]),
         Ending_C_Code = [
@@ -867,7 +881,8 @@ ml_gen_ordinary_pragma_c_proc(OrdinaryKind, Attributes, PredId, _ProcId,
         ml_success_lval(!.Info, SucceededLval),
         Starting_C_Code = list.condense([
             [raw_target_code("{\n", [])],
-            HashDefine,
+            HashDefineAllocId,
+            HashDefineProcLabel,
             ArgDeclsList,
             [raw_target_code("\tMR_bool SUCCESS_INDICATOR;\n", []),
             raw_target_code("\n", [])],
@@ -875,8 +890,9 @@ ml_gen_ordinary_pragma_c_proc(OrdinaryKind, Attributes, PredId, _ProcId,
             [raw_target_code(ObtainLock, []),
             raw_target_code("\t\t{\n", []),
             user_target_code(C_Code, yes(Context), []),
-            raw_target_code("\n\t\t;}\n", []),
-            raw_target_code("#undef MR_PROC_LABEL\n", []),
+            raw_target_code("\n\t\t;}\n", [])],
+            HashUndefAllocId,
+            [raw_target_code("#undef MR_PROC_LABEL\n", []),
             raw_target_code(ReleaseLock, []),
             raw_target_code("\tif (SUCCESS_INDICATOR) {\n", [])],
             AssignOutputsList
@@ -925,6 +941,37 @@ ml_gen_obtain_release_global_lock(Info, ThreadSafe, PredId,
         ReleaseLock = ""
     ).
 
+:- pred ml_gen_hash_define_mr_alloc_id(list(string)::in, prog_context::in,
+    list(target_code_component)::out, list(target_code_component)::out,
+    ml_gen_info::in, ml_gen_info::out) is det.
+
+ml_gen_hash_define_mr_alloc_id(C_Codes, Context, HashDefine, HashUndef,
+        !Info) :-
+    ml_gen_info_get_globals(!.Info, Globals),
+    globals.lookup_bool_option(Globals, profile_memory, ProfileMemory),
+    (
+        ProfileMemory = yes,
+        list.member(C_Code, C_Codes),
+        string.sub_string_search(C_Code, "MR_ALLOC_ID", _)
+    ->
+        ml_gen_info_get_module_info(!.Info, ModuleInfo),
+        ml_gen_info_get_pred_id(!.Info, PredId),
+        ml_gen_info_get_proc_id(!.Info, ProcId),
+        ml_gen_info_get_global_data(!.Info, GlobalData0),
+        ml_gen_proc_label(ModuleInfo, PredId, ProcId, ProcLabel, _Module),
+        ml_gen_alloc_site(ProcLabel, no, 0, Context, AllocId,
+            GlobalData0, GlobalData),
+        ml_gen_info_set_global_data(GlobalData, !Info),
+        HashDefine = [
+            raw_target_code("#define MR_ALLOC_ID ", []),
+            target_code_alloc_id(AllocId),
+            raw_target_code("\n", [])],
+        HashUndef = [raw_target_code("#undef MR_ALLOC_ID\n", [])]
+    ;
+        HashDefine = [],
+        HashUndef = []
+    ).
+
 :- pred ml_gen_hash_define_mr_proc_label(ml_gen_info::in,
     list(target_code_component)::out) is det.
 
diff --git a/compiler/ml_global_data.m b/compiler/ml_global_data.m
index 24884d8..c7a9f23 100644
--- a/compiler/ml_global_data.m
+++ b/compiler/ml_global_data.m
@@ -23,11 +23,13 @@
 :- import_module ml_backend.mlds.
 :- import_module parse_tree.prog_data.
 
+:- import_module assoc_list.
 :- import_module bimap.
 :- import_module cord.
 :- import_module counter.
 :- import_module list.
 :- import_module map.
+:- import_module maybe.
 
     % This abstract type represents the MLDS code generator's repository of
     % data structures that are "born global", i.e. the ones for which we
@@ -66,6 +68,14 @@
                 mvcg_rows           :: cord(mlds_initializer)
             ).
 
+:- type ml_alloc_site_data
+    --->    ml_alloc_site_data(
+                masd_proc_label     :: mlds_entity_name,
+                masd_context        :: prog_context,
+                masd_type           :: string,
+                masd_size           :: int
+            ).
+
     % Initialize the ml_global_data structure to a value that represents
     % no global data structures known yet.
     %
@@ -90,8 +100,9 @@
     % Note that this order may still require forward declarations.
     %
 :- pred ml_global_data_get_all_global_defns(ml_global_data::in,
-    ml_scalar_cell_map::out, ml_vector_cell_map::out, list(mlds_defn)::out)
-    is det.
+    ml_scalar_cell_map::out, ml_vector_cell_map::out,
+    assoc_list(mlds_alloc_id, ml_alloc_site_data)::out,
+    list(mlds_defn)::out) is det.
 
     % This type maps the names of rtti data structures that have already been
     % generated to the rval that refers to that data structure, and its type.
@@ -180,11 +191,19 @@
     ml_vector_common_type_num::in, list(mlds_initializer)::in,
     mlds_vector_common::out, ml_global_data::in, ml_global_data::out) is det.
 
+    % Generate or look up an allocation site.
+    %
+:- pred ml_gen_alloc_site(mlds_entity_name::in, maybe(cons_id)::in, int::in,
+    prog_context::in, mlds_alloc_id::out,
+    ml_global_data::in, ml_global_data::out) is det.
+
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
 :- implementation.
 
+:- import_module hlds.hlds_out.
+:- import_module hlds.hlds_out.hlds_out_util.
 :- import_module ml_backend.ml_type_gen.
 
 :- import_module int.
@@ -203,6 +222,9 @@
 :- type ml_vector_cell_type_map
     == map(list(mlds_type), ml_vector_common_type_num).
 
+:- type ml_alloc_id_map
+    == bimap(mlds_alloc_id, ml_alloc_site_data).
+
 :- type ml_global_data
     --->    ml_global_data(
                 mgd_pdup_rval_type_map          :: ml_rtti_rval_type_map,
@@ -218,7 +240,10 @@
                 mgd_scalar_cell_group_map       :: ml_scalar_cell_map,
 
                 mgd_vector_type_num_map         :: ml_vector_cell_type_map,
-                mgd_vector_cell_group_map       :: ml_vector_cell_map
+                mgd_vector_cell_group_map       :: ml_vector_cell_map,
+
+                mgd_alloc_id_counter            :: counter,
+                mgd_alloc_id_map                :: ml_alloc_id_map
             ).
 
 %-----------------------------------------------------------------------------%
@@ -226,7 +251,8 @@
 ml_global_data_init(UseCommonCells) = GlobalData :-
     GlobalData = ml_global_data(map.init, UseCommonCells,
         counter.init(1), [], [], [],
-        counter.init(1), map.init, map.init, map.init, map.init).
+        counter.init(1), map.init, map.init, map.init, map.init,
+        counter.init(0), bimap.init).
 
 ml_global_data_get_global_defns(GlobalData,
         ScalarCellGroupMap, VectorCellGroupMap,
@@ -236,16 +262,19 @@ ml_global_data_get_global_defns(GlobalData,
         RevFlatCellDefns, RevFlatRttiDefns, RevMaybeNonFlatDefns,
         _TypeNumCounter,
         _ScalarTypeNumMap, ScalarCellGroupMap,
-        _VectorTypeNumMap, VectorCellGroupMap).
+        _VectorTypeNumMap, VectorCellGroupMap,
+        _AllocIdNumCounter, _AllocIdMap).
 
 ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, Defns) :-
+        ScalarCellGroupMap, VectorCellGroupMap, AllocIds, Defns) :-
     GlobalData = ml_global_data(_PDupRvalTypeMap, _UseCommonCells,
         _ConstCounter,
         RevFlatCellDefns, RevFlatRttiDefns, RevMaybeNonFlatDefns,
         _TypeNumCounter,
         _ScalarTypeNumMap, ScalarCellGroupMap,
-        _VectorTypeNumMap, VectorCellGroupMap),
+        _VectorTypeNumMap, VectorCellGroupMap,
+        _AllocIdNumCounter, AllocIdMap),
+    bimap.to_assoc_list(AllocIdMap, AllocIds),
     % RevFlatRttiDefns are type_ctor_infos and the like, while
     % RevNonFlatDefns are type_infos and pseudo_type_infos.
     % They refer to each other, so neither order is obviously better.
@@ -572,6 +601,64 @@ ml_gen_static_vector_defn(MLDS_ModuleName, TypeNum, RowInitializers, Common,
 
 %-----------------------------------------------------------------------------%
 
+ml_gen_alloc_site(ProcLabel, MaybeConsId, Size, Context, AllocId,
+        !GlobalData) :-
+    (
+        MaybeConsId = yes(ConsId),
+        TypeStr = cons_id_to_alloc_site_string(ConsId)
+    ;
+        MaybeConsId = no,
+        TypeStr = "unknown"
+    ),
+    AllocData = ml_alloc_site_data(ProcLabel, Context, TypeStr, Size),
+    Map0 = !.GlobalData ^ mgd_alloc_id_map,
+    ( bimap.search(Map0, AllocId0, AllocData) ->
+        AllocId = AllocId0
+    ;
+        Counter0 = !.GlobalData ^ mgd_alloc_id_counter,
+        counter.allocate(AllocIdNum, Counter0, Counter),
+        AllocId = mlds_alloc_id(AllocIdNum),
+        bimap.det_insert(Map0, AllocId, AllocData, Map),
+        !GlobalData ^ mgd_alloc_id_counter := Counter,
+        !GlobalData ^ mgd_alloc_id_map := Map
+    ).
+
+:- func cons_id_to_alloc_site_string(cons_id) = string.
+
+cons_id_to_alloc_site_string(ConsId) = TypeStr :-
+    (
+        ConsId = cons(_, _, TypeCtor),
+        TypeStr = type_ctor_to_string(TypeCtor)
+    ;
+        ConsId = tuple_cons(Arity),
+        TypeStr = "{}/" ++ string.from_int(Arity)
+    ;
+        ConsId = closure_cons(_, _),
+        TypeStr = "closure"
+    ;
+        ConsId = type_info_cell_constructor(_),
+        TypeStr = "private_builtin.type_info/0"
+    ;
+        ConsId = typeclass_info_cell_constructor,
+        TypeStr = "typeclass_info"
+    ;
+        ( ConsId = int_const(_)
+        ; ConsId = float_const(_)
+        ; ConsId = char_const(_)
+        ; ConsId = string_const(_)
+        ; ConsId = impl_defined_const(_)
+        ; ConsId = type_ctor_info_const(_, _, _)
+        ; ConsId = base_typeclass_info_const(_, _, _, _)
+        ; ConsId = tabling_info_const(_)
+        ; ConsId = table_io_decl(_)
+        ; ConsId = deep_profiling_proc_layout(_)
+        ),
+        unexpected(this_file,
+            "cons_id_to_alloc_site_string: unexpected cons_id")
+    ).
+
+%-----------------------------------------------------------------------------%
+
 :- func this_file = string.
 
 this_file = "ml_global_data.m".
diff --git a/compiler/ml_optimize.m b/compiler/ml_optimize.m
index 55f029e..724224b 100644
--- a/compiler/ml_optimize.m
+++ b/compiler/ml_optimize.m
@@ -1221,11 +1221,13 @@ eliminate_var_in_atomic_stmt(Stmt0, Stmt, !VarElimInfo) :-
         Stmt = delete_object(Rval)
     ;
         Stmt0 = new_object(Target0, MaybeTag, ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args0, ArgTypes, MayUseAtomic),
+            MaybeSize, MaybeCtorName, Args0, ArgTypes, MayUseAtomic,
+            MaybeAllocId),
         eliminate_var_in_lval(Target0, Target, !VarElimInfo),
         eliminate_var_in_rvals(Args0, Args, !VarElimInfo),
         Stmt = new_object(Target, MaybeTag, ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic)
+            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic,
+            MaybeAllocId)
     ;
         Stmt0 = mark_hp(Lval0),
         eliminate_var_in_lval(Lval0, Lval, !VarElimInfo),
@@ -1275,6 +1277,7 @@ eliminate_var_in_target_code_component(Component0, Component, !VarElimInfo) :-
         ; Component0 = user_target_code(_Code, _Context, _Attrs)
         ; Component0 = target_code_type(_Type)
         ; Component0 = target_code_name(_Name)
+        ; Component0 = target_code_alloc_id(_AllocId)
         ),
         Component = Component0
     ;
diff --git a/compiler/ml_unify_gen.m b/compiler/ml_unify_gen.m
index 260b156..40b74e4 100644
--- a/compiler/ml_unify_gen.m
+++ b/compiler/ml_unify_gen.m
@@ -685,12 +685,29 @@ ml_gen_new_object_dynamically(MaybeConsId, MaybeCtorName, MaybeTag, ExplicitSecT
     list.length(ArgRvals, NumArgs),
     SizeInWordsRval = ml_const(mlconst_int(NumArgs)),
 
+    % Generate an allocation site id.
+    globals.lookup_bool_option(Globals, profile_memory, ProfileMemory),
+    (
+        ProfileMemory = yes,
+        ml_gen_info_get_pred_id(!.Info, PredId),
+        ml_gen_info_get_proc_id(!.Info, ProcId),
+        ml_gen_info_get_global_data(!.Info, GlobalData0),
+        ml_gen_proc_label(ModuleInfo, PredId, ProcId, ProcLabel, _Module),
+        ml_gen_alloc_site(ProcLabel, MaybeConsId, NumArgs, Context, AllocId,
+            GlobalData0, GlobalData),
+        ml_gen_info_set_global_data(GlobalData, !Info),
+        MaybeAllocId = yes(AllocId)
+    ;
+        ProfileMemory = no,
+        MaybeAllocId = no
+    ),
+
     % Generate a `new_object' statement to dynamically allocate the memory
     % for this term from the heap. The `new_object' statement will also
     % initialize the fields of this term with the specified arguments.
     MakeNewObject = new_object(VarLval, MaybeTag, ExplicitSecTag, MLDS_Type,
         yes(SizeInWordsRval), MaybeCtorName, ArgRvals, MLDS_ArgTypes,
-        MayUseAtomic),
+        MayUseAtomic, MaybeAllocId),
     Stmt = ml_stmt_atomic(MakeNewObject),
     Statement = statement(Stmt, mlds_make_context(Context)),
 
diff --git a/compiler/ml_util.m b/compiler/ml_util.m
index b2f1ac6..3da5ecf 100644
--- a/compiler/ml_util.m
+++ b/compiler/ml_util.m
@@ -468,7 +468,8 @@ atomic_stmt_contains_var(AtomicStmt, DataName) = ContainsVar :-
         ContainsVar = rval_contains_var(Rval, DataName)
     ;
         AtomicStmt = new_object(Target, _MaybeTag, _ExplicitSecTag, _Type,
-            _MaybeSize, _MaybeCtorName, Args, _ArgTypes, _MayUseAtomic),
+            _MaybeSize, _MaybeCtorName, Args, _ArgTypes, _MayUseAtomic,
+            _AllocId),
         ( lval_contains_var(Target, DataName) = yes ->
             ContainsVar = yes
         ;
@@ -542,6 +543,7 @@ target_code_component_contains_var(TargetCode, DataName) = ContainsVar :-
         ( TargetCode = user_target_code(_, _, _)
         ; TargetCode = raw_target_code(_, _)
         ; TargetCode = target_code_type(_)
+        ; TargetCode = target_code_alloc_id(_)
         ),
         ContainsVar = no
     ;
diff --git a/compiler/mlds.m b/compiler/mlds.m
index c953eb6..343a30e 100644
--- a/compiler/mlds.m
+++ b/compiler/mlds.m
@@ -1283,8 +1283,8 @@
             % deallocate) the memory used by the lval.
 
     ;       new_object(
-                % new_object(Target, Tag, Type,
-                %   Size, CtorName, Args, ArgTypes, MayUseAtomic):
+                % new_object(Target, Tag, Type, Size, CtorName,
+                %   Args, ArgTypes, MayUseAtomic, MaybeAllocId):
                 % Allocate a memory block of the given size,
                 % initialize it with a new object of the given
                 % type by calling the constructor with the specified
@@ -1332,7 +1332,10 @@
 
                 % Can we use a cell allocated with GC_malloc_atomic to hold
                 % this object in the C backend?
-                may_use_atomic_alloc
+                may_use_atomic_alloc,
+
+                % The allocation site identifier.
+                maybe(mlds_alloc_id)
             )
 
     ;       gc_check
@@ -1446,7 +1449,8 @@
     ;       target_code_input(mlds_rval)
     ;       target_code_output(mlds_lval)
     ;       target_code_type(mlds_type)
-    ;       target_code_name(mlds_qualified_entity_name).
+    ;       target_code_name(mlds_qualified_entity_name)
+    ;       target_code_alloc_id(mlds_alloc_id).
 
 :- type target_code_attributes == list(target_code_attribute).
 
@@ -1460,6 +1464,9 @@
             % `IL' foreign language interface, and is measured in units
             % of stack items.
 
+:- type mlds_alloc_id
+    --->    mlds_alloc_id(int).
+
     % Constructor id.
     %
 :- type ctor_name == mlds_qualified_ctor_id.
diff --git a/compiler/mlds_to_c.m b/compiler/mlds_to_c.m
index 60c9feb..aeda1f6 100644
--- a/compiler/mlds_to_c.m
+++ b/compiler/mlds_to_c.m
@@ -243,7 +243,7 @@ mlds_output_hdr_file(Opts, Indent, MLDS, !IO) :-
     MLDS = mlds(ModuleName, AllForeignCode, Imports, GlobalData, PlainDefns,
         InitPreds, FinalPreds, ExportEnums),
     ml_global_data_get_all_global_defns(GlobalData,
-        _ScalarCellGroupMap, _VectorCellGroupMap, GlobalDefns),
+        _ScalarCellGroupMap, _VectorCellGroupMap, _AllocSites, GlobalDefns),
     Defns = GlobalDefns ++ PlainDefns,
 
     mlds_output_hdr_start(Opts, Indent, ModuleName, !IO),
@@ -361,7 +361,7 @@ mlds_output_src_file(Opts, Indent, MLDS, !IO) :-
     MLDS = mlds(ModuleName, AllForeignCode, Imports, GlobalData, PlainDefns,
         InitPreds, FinalPreds, _ExportEnums),
     ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, AllocSites, GlobalDefns),
     Defns = GlobalDefns ++ PlainDefns,
     map.to_assoc_list(ScalarCellGroupMap, ScalarCellGroups),
     map.to_assoc_list(VectorCellGroupMap, VectorCellGroups),
@@ -421,6 +421,8 @@ mlds_output_src_file(Opts, Indent, MLDS, !IO) :-
     mlds_output_vector_cell_group_decls(Opts, Indent, MLDS_ModuleName,
         MangledModuleName, VectorCellGroups, !IO),
     io.nl(!IO),
+    mlds_output_alloc_site_decls(Indent, AllocSites, !IO),
+    io.nl(!IO),
 
     mlds_output_scalar_cell_group_defns(Opts, Indent, MangledModuleName,
         ScalarCellGroups, !IO),
@@ -428,13 +430,16 @@ mlds_output_src_file(Opts, Indent, MLDS, !IO) :-
     mlds_output_vector_cell_group_defns(Opts, Indent, MangledModuleName,
         VectorCellGroups, !IO),
     io.nl(!IO),
+    mlds_output_alloc_site_defns(Opts, Indent, MLDS_ModuleName, AllocSites,
+        !IO),
+    io.nl(!IO),
 
     mlds_output_c_defns(Opts, MLDS_ModuleName, Indent, ForeignCode, !IO),
     io.nl(!IO),
     mlds_output_defns(Opts, Indent, yes, MLDS_ModuleName, NonTypeDefns, !IO),
     io.nl(!IO),
     mlds_output_init_fn_defns(Opts, MLDS_ModuleName, FuncDefns,
-        TypeCtorInfoDefns, InitPreds, FinalPreds, !IO),
+        TypeCtorInfoDefns, AllocSites, InitPreds, FinalPreds, !IO),
     io.nl(!IO),
     mlds_output_grade_var(!IO),
     io.nl(!IO),
@@ -689,7 +694,6 @@ mlds_get_c_foreign_code(AllForeignCode) = ForeignCode :-
     % then output the functions: `mercury__<modulename>__required_init()' and
     % `mercury__<modulename>__required_final()' as necessary.
     %
-    %
 :- pred mlds_output_init_fn_decls(mlds_module_name::in, list(string)::in,
     list(string)::in, io::di, io::uo) is det.
 
@@ -717,10 +721,11 @@ mlds_output_init_fn_decls(ModuleName, InitPreds, FinalPreds, !IO) :-
 
 :- pred mlds_output_init_fn_defns(mlds_to_c_opts::in, mlds_module_name::in,
     list(mlds_defn)::in, list(mlds_defn)::in,
+    assoc_list(mlds_alloc_id, ml_alloc_site_data)::in,
     list(string)::in, list(string)::in, io::di, io::uo) is det.
 
 mlds_output_init_fn_defns(Opts, ModuleName, FuncDefns, TypeCtorInfoDefns,
-        InitPreds, FinalPreds, !IO) :-
+        AllocSites, InitPreds, FinalPreds, !IO) :-
     output_init_fn_name(ModuleName, "", !IO),
     io.write_string("\n{\n", !IO),
     NeedToInit = Opts ^ m2co_need_to_init,
@@ -731,7 +736,8 @@ mlds_output_init_fn_defns(Opts, ModuleName, FuncDefns, TypeCtorInfoDefns,
         io.write_strings(["\tstatic MR_bool initialised = MR_FALSE;\n",
             "\tif (initialised) return;\n",
             "\tinitialised = MR_TRUE;\n\n"], !IO),
-        mlds_output_calls_to_init_entry(ModuleName, FuncDefns, !IO)
+        mlds_output_calls_to_init_entry(ModuleName, FuncDefns, !IO),
+        mlds_output_call_to_register_alloc_sites(AllocSites, !IO)
     ;
         true
     ),
@@ -859,6 +865,22 @@ mlds_output_calls_to_register_tci(ModuleName,
     io.write_string(");\n", !IO),
     mlds_output_calls_to_register_tci(ModuleName, TypeCtorInfoDefns, !IO).
 
+    % Generate call to MR_register_alloc_sites.
+    %
+:- pred mlds_output_call_to_register_alloc_sites(
+    assoc_list(mlds_alloc_id, ml_alloc_site_data)::in, io::di, io::uo) is det.
+
+mlds_output_call_to_register_alloc_sites(AllocSites, !IO) :-
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        list.length(AllocSites, Length),
+        io.write_string("\tMR_register_alloc_sites(MR_alloc_sites, ", !IO),
+        io.write_int(Length, !IO),
+        io.write_string(");\n", !IO)
+    ).
+
 %-----------------------------------------------------------------------------%
 %
 % Foreign language interface stuff.
@@ -1664,6 +1686,59 @@ mlds_output_cell(Opts, Indent, Initializer, !RowNum, !IO) :-
     mlds_output_initializer_body(Opts, Indent, Initializer, !IO),
     io.write_string(",\n", !IO).
 
+:- pred mlds_output_alloc_site_decls(indent::in,
+    assoc_list(mlds_alloc_id, ml_alloc_site_data)::in, io::di, io::uo) is det.
+
+mlds_output_alloc_site_decls(Indent, AllocSites, !IO) :-
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        mlds_indent(Indent, !IO),
+        io.write_string("static MR_AllocSiteInfo MR_alloc_sites[];\n", !IO)
+    ).
+
+:- pred mlds_output_alloc_site_defns(mlds_to_c_opts::in, indent::in,
+    mlds_module_name::in, assoc_list(mlds_alloc_id, ml_alloc_site_data)::in,
+    io::di, io::uo) is det.
+
+mlds_output_alloc_site_defns(Opts, Indent, MLDS_ModuleName, AllocSites, !IO) :-
+    (
+        AllocSites = []
+    ;
+        AllocSites = [_ | _],
+        mlds_indent(Indent, !IO),
+        io.write_string("static MR_AllocSiteInfo MR_alloc_sites[] = {\n", !IO),
+        list.foldl(
+            mlds_output_alloc_site_defn(Opts, Indent + 1, MLDS_ModuleName),
+            AllocSites, !IO),
+        mlds_indent(Indent, !IO),
+        io.write_string("};\n", !IO)
+    ).
+
+:- pred mlds_output_alloc_site_defn(mlds_to_c_opts::in, indent::in,
+    mlds_module_name::in, pair(mlds_alloc_id, ml_alloc_site_data)::in,
+    io::di, io::uo) is det.
+
+mlds_output_alloc_site_defn(_Opts, Indent, MLDS_ModuleName,
+        _AllocId - AllocData, !IO) :-
+    AllocData = ml_alloc_site_data(ProcLabel, Context, Type, Size),
+    QualProcLabel = qual(MLDS_ModuleName, module_qual, ProcLabel),
+    term.context_file(Context, FileName),
+    term.context_line(Context, LineNumber),
+    mlds_indent(Indent, !IO),
+    io.write_string("{ ", !IO),
+    mlds_output_fully_qualified_name(QualProcLabel, !IO),
+    io.write_string(", """, !IO),
+    c_util.output_quoted_string(FileName, !IO),
+    io.write_string(""", ", !IO),
+    io.write_int(LineNumber, !IO),
+    io.write_string(", """, !IO),
+    c_util.output_quoted_string(Type, !IO),
+    io.write_string(""", ", !IO),
+    io.write_int(Size, !IO),
+    io.write_string("},\n", !IO).
+
 :- pred mlds_output_type_forward_decls(mlds_to_c_opts::in, indent::in,
     list(mlds_type)::in, io::di, io::uo) is det.
 
@@ -3513,47 +3588,6 @@ mlds_output_switch_default(Opts, Indent, FuncInfo, Context, Default, !IO) :-
 
 %-----------------------------------------------------------------------------%
 
-    % If memory profiling is turned on, output an instruction to
-    % record the heap allocation.
-    %
-:- pred mlds_maybe_output_heap_profile_instr(mlds_to_c_opts::in,
-    mlds_context::in, indent::in, list(mlds_rval)::in,
-    mlds_qualified_entity_name::in, maybe(ctor_name)::in, io::di, io::uo)
-    is det.
-
-mlds_maybe_output_heap_profile_instr(Opts, Context, Indent, Args, FuncName,
-        MaybeCtorName, !IO) :-
-    ProfileMemory = Opts ^ m2co_profile_calls,
-    (
-        ProfileMemory = yes,
-        output_context_opts(Opts, Context, !IO),
-        mlds_indent(Indent, !IO),
-        io.write_string("MR_record_allocation(", !IO),
-        io.write_int(list.length(Args), !IO),
-        io.write_string(", ", !IO),
-        mlds_output_fully_qualified_name(FuncName, !IO),
-        io.write_string(", """, !IO),
-        mlds_output_fully_qualified_name(FuncName, !IO),
-        io.write_string(""", ", !IO),
-        (
-            MaybeCtorName = yes(CtorId),
-            io.write_char('"', !IO),
-            CtorId = qual(_ModuleName, _QualKind, CtorDefn),
-            CtorDefn = ctor_id(CtorName, _CtorArity),
-            c_util.output_quoted_string(CtorName, !IO),
-            io.write_char('"', !IO)
-        ;
-            MaybeCtorName = no,
-            % Just use an empty string.  Note that we can't use a null pointer
-            % here, because MR_record_allocation() requires its string
-            % arguments to not be NULL.
-            io.write_string("\"\"", !IO)
-        ),
-        io.write_string(");\n", !IO)
-    ;
-        ProfileMemory = no
-    ).
-
     % If call profiling is turned on output an instruction to record
     % an arc in the call profile between the callee and caller.
     %
@@ -3607,7 +3641,7 @@ mlds_output_label_name(LabelName, !IO) :-
 :- pred mlds_output_atomic_stmt(mlds_to_c_opts::in, indent::in, func_info::in,
     mlds_atomic_statement::in, mlds_context::in, io::di, io::uo) is det.
 
-mlds_output_atomic_stmt(Opts, Indent, FuncInfo, Statement, Context, !IO) :-
+mlds_output_atomic_stmt(Opts, Indent, _FuncInfo, Statement, Context, !IO) :-
     (
         Statement = comment(Comment),
         % XXX We should escape any "*/"'s in the Comment. We should also split
@@ -3639,7 +3673,8 @@ mlds_output_atomic_stmt(Opts, Indent, FuncInfo, Statement, Context, !IO) :-
         io.write_string(");\n", !IO)
     ;
         Statement = new_object(Target, MaybeTag, _ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic),
+            MaybeSize, _MaybeCtorName, Args, ArgTypes, MayUseAtomic,
+            MaybeAllocId),
         mlds_indent(Indent, !IO),
         io.write_string("{\n", !IO),
 
@@ -3703,10 +3738,6 @@ mlds_output_atomic_stmt(Opts, Indent, FuncInfo, Statement, Context, !IO) :-
             )
         ),
 
-        FuncInfo = func_info(FuncName, _FuncSignature),
-        mlds_maybe_output_heap_profile_instr(Opts, Context, Indent + 1, Args,
-            FuncName, MaybeCtorName, !IO),
-
         output_context_opts(Opts, Context, !IO),
         mlds_indent(Indent + 1, !IO),
         write_lval_or_string(Opts, Base, !IO),
@@ -3748,18 +3779,8 @@ mlds_output_atomic_stmt(Opts, Indent, FuncInfo, Statement, Context, !IO) :-
             io.write_int(-1, !IO)
         ),
         io.write_string(", ", !IO),
-        (
-            MaybeCtorName = yes(QualifiedCtorId),
-            io.write_char('"', !IO),
-            QualifiedCtorId = qual(_ModuleName, _QualKind, CtorDefn),
-            CtorDefn = ctor_id(CtorName, _CtorArity),
-            c_util.output_quoted_string(CtorName, !IO),
-            io.write_char('"', !IO)
-        ;
-            MaybeCtorName = no,
-            io.write_string("NULL", !IO)
-        ),
-        io.write_string(")", !IO),
+        mlds_output_maybe_alloc_id(MaybeAllocId, !IO),
+        io.write_string(", NULL)", !IO),
         io.write_string(EndMkword, !IO),
         io.write_string(";\n", !IO),
         (
@@ -3817,6 +3838,18 @@ mlds_output_atomic_stmt(Opts, Indent, FuncInfo, Statement, Context, !IO) :-
         unexpected(this_file, "outline_foreign_proc is not used in C backend")
     ).
 
+:- pred mlds_output_maybe_alloc_id(maybe(mlds_alloc_id)::in, io::di, io::uo)
+    is det.
+
+mlds_output_maybe_alloc_id(MaybeAllocId, !IO) :-
+    (
+        MaybeAllocId = yes(mlds_alloc_id(Num)),
+        io.format("&MR_alloc_sites[%d]", [i(Num)], !IO)
+    ;
+        MaybeAllocId = no,
+        io.write_string("NULL", !IO)
+    ).
+
 :- pred mlds_output_target_code_component(mlds_to_c_opts::in, mlds_context::in,
     target_code_component::in, io::di, io::uo) is det.
 
@@ -3864,6 +3897,9 @@ mlds_output_target_code_component(Opts, Context, TargetCode, !IO) :-
         TargetCode = target_code_name(Name),
         mlds_output_fully_qualified_name(Name, !IO),
         io.write_string("\n", !IO)
+    ;
+        TargetCode = target_code_alloc_id(AllocId),
+        mlds_output_maybe_alloc_id(yes(AllocId), !IO)
     ).
 
 :- func type_needs_forwarding_pointer_space(mlds_type) = bool.
diff --git a/compiler/mlds_to_cs.m b/compiler/mlds_to_cs.m
index 686b8b0..a56ac26 100644
--- a/compiler/mlds_to_cs.m
+++ b/compiler/mlds_to_cs.m
@@ -136,7 +136,7 @@ output_csharp_src_file(ModuleInfo, Indent, MLDS, !IO) :-
     MLDS = mlds(ModuleName, AllForeignCode, Imports, GlobalData, Defns0,
         InitPreds, FinalPreds, ExportedEnums),
     ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, _AllocIdMap, GlobalDefns),
     Defns = GlobalDefns ++ Defns0,
 
     % Find all methods which would have their addresses taken to be used as a
@@ -537,7 +537,8 @@ method_ptrs_in_stmt(CallStmt, !CodeAddrs) :-
 method_ptrs_in_stmt(ml_stmt_atomic(AtomicStatement), !CodeAddrs) :-
     (
         AtomicStatement = new_object(Lval, _MaybeTag, _Bool,
-            _Type, _MemRval, _MaybeCtorName, Rvals, _Types, _MayUseAtomic)
+            _Type, _MemRval, _MaybeCtorName, Rvals, _Types, _MayUseAtomic,
+            _AllocId)
     ->
         % We don't need to check "_MemRval" since this just stores
         % the amount of memory needed for the new object.
@@ -3085,7 +3086,8 @@ output_atomic_stmt(Info, Indent, AtomicStmt, _Context, !IO) :-
         unexpected(this_file, "delete_object not supported in C#.")
     ;
         AtomicStmt = new_object(Target, _MaybeTag, ExplicitSecTag, Type,
-            _MaybeSize, MaybeCtorName, Args, ArgTypes, _MayUseAtomic),
+            _MaybeSize, MaybeCtorName, Args, ArgTypes, _MayUseAtomic,
+            _AllocId),
         (
             ExplicitSecTag = yes,
             unexpected(this_file, "output_atomic_stmt: explicit secondary tag")
@@ -3199,6 +3201,9 @@ output_target_code_component(Info, TargetCode, !IO) :-
     ;
         TargetCode = target_code_name(Name),
         output_maybe_qualified_name(Info, Name, !IO)
+    ;
+        TargetCode = target_code_alloc_id(_),
+        unexpected(this_file, "target_code_alloc_id not implemented")
     ).
 
 %-----------------------------------------------------------------------------%
diff --git a/compiler/mlds_to_gcc.m b/compiler/mlds_to_gcc.m
index e05292d..6fa0651 100644
--- a/compiler/mlds_to_gcc.m
+++ b/compiler/mlds_to_gcc.m
@@ -3014,7 +3014,8 @@ gen_atomic_stmt(DefnInfo, AtomicStmt, Context, !IO) :-
         sorry(this_file, "NYI delete_object")
     ;
         AtomicStmt = new_object(Target, MaybeTag, _ExplicitSecTag, Type,
-            MaybeSize, _MaybeCtorName, Args, ArgTypes, _MayUseAtomic),
+            MaybeSize, _MaybeCtorName, Args, ArgTypes, _MayUseAtomic,
+            _AllocId),
 
         % Calculate the size that we're going to allocate.
         (
diff --git a/compiler/mlds_to_il.m b/compiler/mlds_to_il.m
index f8a97fe..ef09f0c 100644
--- a/compiler/mlds_to_il.m
+++ b/compiler/mlds_to_il.m
@@ -258,7 +258,7 @@ generate_il(Globals, MLDS0, Version, ILAsm, ForeignLangs) :-
     MLDS = mlds(MercuryModuleName, ForeignCode, Imports, GlobalData, Defns0,
         _, _, _),
     ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, _AllocIdMap, GlobalDefns),
     expect(map.is_empty(ScalarCellGroupMap), this_file,
         "generate_il: nonempty ScalarCellGroupMap"),
     expect(map.is_empty(VectorCellGroupMap), this_file,
@@ -373,7 +373,7 @@ il_transform_mlds(MLDS0, MLDS) :-
 
     % We take all the definitions out of the global data field of the MLDS.
     ml_global_data_get_all_global_defns(GlobalData0,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, _AllocIdMap, GlobalDefns),
     expect(map.is_empty(ScalarCellGroupMap), this_file,
         "il_transform_mlds: nonempty ScalarCellGroupMap"),
     expect(map.is_empty(VectorCellGroupMap), this_file,
@@ -547,9 +547,9 @@ rename_atomic(assign(L, R)) = assign(rename_lval(L), rename_rval(R)).
 rename_atomic(assign_if_in_heap(L, R)) = assign(rename_lval(L), rename_rval(R)).
 rename_atomic(delete_object(O)) = delete_object(rename_rval(O)).
 rename_atomic(new_object(L, Tag, ExplicitSecTag, Type, MaybeSize, Ctxt, Args,
-        Types, MayUseAtomic))
+        Types, MayUseAtomic, AllocId))
     = new_object(rename_lval(L), Tag, ExplicitSecTag, Type, MaybeSize,
-        Ctxt, list.map(rename_rval, Args), Types, MayUseAtomic).
+        Ctxt, list.map(rename_rval, Args), Types, MayUseAtomic, AllocId).
 rename_atomic(gc_check) = gc_check.
 rename_atomic(mark_hp(L)) = mark_hp(rename_lval(L)).
 rename_atomic(restore_hp(R)) = restore_hp(rename_rval(R)).
@@ -2039,8 +2039,9 @@ atomic_statement_to_il(delete_object(_Target), Instrs, !Info) :-
     % Instrs = LoadInstrs ++ singleton(ldnull) ++ StoreInstrs.
     Instrs = empty.
 
-atomic_statement_to_il(new_object(Target0, _MaybeTag, ExplicitSecTag, Type,
-        Size, MaybeCtorName, Args, ArgTypes, _MayUseAtomic), Instrs, !Info) :-
+atomic_statement_to_il(NewObject, Instrs, !Info) :-
+    NewObject = new_object(Target0, _MaybeTag, ExplicitSecTag, Type,
+        Size, MaybeCtorName, Args, ArgTypes, _MayUseAtomic, _AllocId),
     (
         ExplicitSecTag = yes,
         unexpected(this_file, "new_object has explicit secondary tag")
@@ -2226,6 +2227,9 @@ inline_code_to_il_asm([T | Ts]) = Instrs ++ Rest :-
     ;
         T = target_code_name(_),
         Instrs = empty
+    ;
+        T = target_code_alloc_id(_),
+        unexpected(this_file, "target_code_alloc_id not implemented")
     ),
     Rest = inline_code_to_il_asm(Ts).
 
diff --git a/compiler/mlds_to_java.m b/compiler/mlds_to_java.m
index cf9db1c..7972d7b 100644
--- a/compiler/mlds_to_java.m
+++ b/compiler/mlds_to_java.m
@@ -342,7 +342,7 @@ output_java_src_file(ModuleInfo, Indent, MLDS, !IO) :-
     MLDS = mlds(ModuleName, AllForeignCode, Imports, GlobalData, Defns0,
         InitPreds, FinalPreds, ExportedEnums),
     ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, _AllocIdMap, GlobalDefns),
 
     % Do NOT enforce the outermost "mercury" qualifier here.  This module
     % name is compared with other module names in the MLDS, to avoid
@@ -844,7 +844,8 @@ method_ptrs_in_stmt(CallStmt, !CodeAddrs) :-
 method_ptrs_in_stmt(ml_stmt_atomic(AtomicStatement), !CodeAddrs) :-
     (
         AtomicStatement = new_object(Lval, _MaybeTag, _Bool,
-            _Type, _MemRval, _MaybeCtorName, Rvals, _Types, _MayUseAtomic)
+            _Type, _MemRval, _MaybeCtorName, Rvals, _Types, _MayUseAtomic,
+            _AllocId)
     ->
         % We don't need to check "_MemRval" since this just stores
         % the amount of memory needed for the new object.
@@ -1672,13 +1673,13 @@ rename_class_names_atomic(Renaming, !Statement) :-
         !:Statement = delete_object(Rval)
     ;
         !.Statement = new_object(TargetLval0, MaybeTag, ExplicitSecTag, Type0,
-            MaybeSize, MaybeCtorName, Args0, ArgTypes0, MayUseAtomic),
+            MaybeSize, MaybeCtorName, Args0, ArgTypes0, MayUseAtomic, AllocId),
         rename_class_names_lval(Renaming, TargetLval0, TargetLval),
         rename_class_names_type(Renaming, Type0, Type),
         list.map(rename_class_names_rval(Renaming), Args0, Args),
         list.map(rename_class_names_type(Renaming), ArgTypes0, ArgTypes),
         !:Statement = new_object(TargetLval, MaybeTag, ExplicitSecTag, Type,
-            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic)
+            MaybeSize, MaybeCtorName, Args, ArgTypes, MayUseAtomic, AllocId)
     ;
         !.Statement = inline_target_code(Lang, Components0),
         (
@@ -1840,6 +1841,8 @@ rename_class_names_target_code_component(Renaming, !Component) :-
     ;
         !.Component = raw_target_code(_, _)
     ;
+        !.Component = target_code_alloc_id(_)
+    ;
         !.Component = target_code_input(Rval0),
         rename_class_names_rval(Renaming, Rval0, Rval),
         !:Component = target_code_input(Rval)
@@ -4460,7 +4463,8 @@ output_atomic_stmt(Info, Indent, AtomicStmt, Context, !IO) :-
         unexpected(this_file, "delete_object not supported in Java.")
     ;
         AtomicStmt = new_object(Target, _MaybeTag, ExplicitSecTag, Type,
-            _MaybeSize, MaybeCtorName, Args, ArgTypes, _MayUseAtomic),
+            _MaybeSize, MaybeCtorName, Args, ArgTypes, _MayUseAtomic,
+            _AllocId),
         (
             ExplicitSecTag = yes,
             unexpected(this_file, "output_atomic_stmt: explicit secondary tag")
@@ -4570,6 +4574,9 @@ output_target_code_component(Info, TargetCode, !IO) :-
     ;
         TargetCode = target_code_name(Name),
         output_maybe_qualified_name(Info, Name, !IO)
+    ;
+        TargetCode = target_code_alloc_id(_),
+        unexpected(this_file, "target_code_alloc_id not implemented")
     ).
 
 %-----------------------------------------------------------------------------%
diff --git a/compiler/mlds_to_managed.m b/compiler/mlds_to_managed.m
index c3c02a3..74ba714 100644
--- a/compiler/mlds_to_managed.m
+++ b/compiler/mlds_to_managed.m
@@ -68,7 +68,7 @@ output_csharp_code(Globals, MLDS, !IO) :-
     MLDS = mlds(ModuleName, AllForeignCode, _Imports, GlobalData, Defns0,
         _InitPreds, _FinalPreds, _ExportedEnums),
     ml_global_data_get_all_global_defns(GlobalData,
-        ScalarCellGroupMap, VectorCellGroupMap, GlobalDefns),
+        ScalarCellGroupMap, VectorCellGroupMap, _AllocIdMap, GlobalDefns),
     expect(map.is_empty(ScalarCellGroupMap), this_file,
         "output_csharp_code: nonempty ScalarCellGroupMap"),
     expect(map.is_empty(VectorCellGroupMap), this_file,
diff --git a/compiler/opt_debug.m b/compiler/opt_debug.m
index ae20e9b..2117eb3 100644
--- a/compiler/opt_debug.m
+++ b/compiler/opt_debug.m
@@ -569,6 +569,9 @@ dump_layout_array_name(ArrayName) = Str :-
     ;
         ArrayName = threadscope_string_table_array,
         Str = "threadscope_string_table_array"
+    ;
+        ArrayName = alloc_site_array,
+        Str = "alloc_site_array"
     ).
 
 dump_layout_name(proc_layout(RttiProcLabel, _)) =
@@ -1170,6 +1173,7 @@ dump_component(_, foreign_proc_raw_code(_, AL, _, Code)) =
     ).
 dump_component(MaybeProcLabel, foreign_proc_fail_to(Label)) =
     "fail to " ++ dump_label(MaybeProcLabel, Label) ++ "\n".
+dump_component(_, foreign_proc_alloc_id(_)) = "<alloc_id>".
 dump_component(_, foreign_proc_noop) = "".
 
 :- func dump_affects_liveness(proc_affects_liveness) = string.
diff --git a/compiler/opt_util.m b/compiler/opt_util.m
index a86d8f7..1f2121f 100644
--- a/compiler/opt_util.m
+++ b/compiler/opt_util.m
@@ -985,6 +985,7 @@ foreign_proc_component_refers_stackvars(Component) = Refers :-
         ( Component = foreign_proc_user_code(_, _, _)
         ; Component = foreign_proc_raw_code(_, _, _, _)
         ; Component = foreign_proc_fail_to(_)
+        ; Component = foreign_proc_alloc_id(_)
         ; Component = foreign_proc_noop
         ),
         Refers = no
@@ -1153,6 +1154,7 @@ can_component_branch_away(foreign_proc_raw_code(CanBranchAway, _, _, _))
     ).
 can_component_branch_away(foreign_proc_user_code(_, _, _)) = no.
 can_component_branch_away(foreign_proc_fail_to(_)) = yes.
+can_component_branch_away(foreign_proc_alloc_id(_)) = no.
 can_component_branch_away(foreign_proc_noop) = no.
 
 can_instr_fall_through(comment(_)) = yes.
@@ -1579,6 +1581,8 @@ foreign_proc_component_get_rvals_and_lvals(foreign_proc_raw_code(_, _, _, _),
         !Rvals, !Lvals).
 foreign_proc_component_get_rvals_and_lvals(foreign_proc_fail_to(_),
         !Rvals, !Lvals).
+foreign_proc_component_get_rvals_and_lvals(foreign_proc_alloc_id(_),
+        !Rvals, !Lvals).
 foreign_proc_component_get_rvals_and_lvals(foreign_proc_noop,
         !Rvals, !Lvals).
 
@@ -1754,6 +1758,8 @@ count_temps_component(Comp, !R, !F) :-
     ;
         Comp = foreign_proc_fail_to(_)
     ;
+        Comp = foreign_proc_alloc_id(_)
+    ;
         Comp = foreign_proc_noop
     ).
 
@@ -2088,6 +2094,7 @@ touches_nondet_ctrl_component(foreign_proc_outputs(_)) = no.
 touches_nondet_ctrl_component(foreign_proc_raw_code(_, _, _, _)) = no.
 touches_nondet_ctrl_component(foreign_proc_user_code(_, _, _)) = yes.
 touches_nondet_ctrl_component(foreign_proc_fail_to(_)) = no.
+touches_nondet_ctrl_component(foreign_proc_alloc_id(_)) = no.
 touches_nondet_ctrl_component(foreign_proc_noop) = no.
 
 %-----------------------------------------------------------------------------%
@@ -2570,6 +2577,7 @@ replace_labels_comp(Comp0, Comp, ReplMap) :-
         ; Comp0 = foreign_proc_outputs(_)
         ; Comp0 = foreign_proc_user_code(_, _, _)
         ; Comp0 = foreign_proc_raw_code(_, _, _, _)
+        ; Comp0 = foreign_proc_alloc_id(_)
         ; Comp0 = foreign_proc_noop
         ),
         Comp = Comp0
diff --git a/compiler/pickle.m b/compiler/pickle.m
index 9d1402a..4b3aa3d 100644
--- a/compiler/pickle.m
+++ b/compiler/pickle.m
@@ -436,7 +436,7 @@ unpickle_string_2(Handle, Index, Length, !String, !State) :-
     allocate_string(Length::in, Str::uo),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
-    MR_allocate_aligned_string_msg(Str, Length, ""pickle.allocate_string"");
+    MR_allocate_aligned_string_msg(Str, Length, MR_ALLOC_ID);
     Str[Length] = '\\0';
 ").
 
diff --git a/compiler/pragma_c_gen.m b/compiler/pragma_c_gen.m
index f642643..f6705c8 100644
--- a/compiler/pragma_c_gen.m
+++ b/compiler/pragma_c_gen.m
@@ -67,6 +67,7 @@
 :- import_module hlds.hlds_pred.
 :- import_module hlds.instmap.
 :- import_module libs.globals.
+:- import_module libs.options.
 :- import_module ll_backend.code_util.
 :- import_module ll_backend.llds_out.
 :- import_module ll_backend.llds_out.llds_out_code_addr.
@@ -92,7 +93,8 @@
 %   <save live variables onto the stack> /* see note (1) below */
 %   {
 %       <declaration of one local variable for each arg>
-%       #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%       #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%       #define MR_PROC_LABEL <procedure label>
 %
 %       <assignment of input values from registers to local variables>
 %       MR_save_registers(); /* see notes (1) and (2) below */
@@ -103,7 +105,8 @@
 %       #endif
 %       <assignment of the output values from local variables to registers>
 %
-%       #undef MR_PROC_LABEL /* see note (5) below */
+%       #undef MR_ALLOC_ID /* see note (5) below */
+%       #undef MR_PROC_LABEL
 %   }
 %
 % In the case of a semidet foreign_proc, the above is followed by
@@ -142,7 +145,8 @@
 %   <assignment of input values from registers to local variables>
 %   <assignment to save struct pointer>
 %   MR_save_registers(); /* see notes (1) and (2) below */
-%   #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%   #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%   #define MR_PROC_LABEL <procedure label>
 %   #define SUCCEED()   goto callsuccesslabel
 %   #define SUCCEED_LAST()  goto calllastsuccesslabel
 %   #define FAIL()      fail()
@@ -159,7 +163,8 @@
 %   #undef SUCCEED
 %   #undef SUCCEED_LAST
 %   #undef FAIL
-%   #undef MR_PROC_LABEL /* see note (5) below */
+%   #undef MR_ALLOC_ID /* see note (5) below */
+%   #undef MR_PROC_LABEL
 % }
 % MR_define_label(xxx_i1)
 % <code for entry to a later disjunct>
@@ -168,7 +173,8 @@
 %   <declaration of one local variable to point to save struct>
 %   <assignment to save struct pointer>
 %   MR_save_registers(); /* see notes (1) and (2) below */
-%   #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%   #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%   #define MR_PROC_LABEL <procedure label>
 %   #define SUCCEED()   goto retrysuccesslabel
 %   #define SUCCEED_LAST()  goto retrylastsuccesslabel
 %   #define FAIL()      fail()
@@ -185,7 +191,8 @@
 %   #undef SUCCEED
 %   #undef SUCCEED_LAST
 %   #undef FAIL
-%   #undef MR_PROC_LABEL /* see note (5) below */
+%   #undef MR_ALLOC_ID /* see note (5) below */
+%   #undef MR_PROC_LABEL
 % }
 % <--- boundary between code generated here and epilog --->
 % <#undef MR_ORDINARY_SLOTS>
@@ -203,7 +210,8 @@
 %   <assignment of input values from registers to local variables>
 %   <assignment to save struct pointer>
 %   MR_save_registers(); /* see notes (1) and (2) below */
-%   #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%   #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%   #define MR_PROC_LABEL <procedure label>
 %   #define SUCCEED()   goto callsuccesslabel
 %   #define SUCCEED_LAST()  goto calllastsuccesslabel
 %   #define FAIL()      fail()
@@ -220,7 +228,8 @@
 %   #undef SUCCEED
 %   #undef SUCCEED_LAST
 %   #undef FAIL
-%   #undef MR_PROC_LABEL /* see note (5) below */
+%   #undef MR_ALLOC_ID /* see note (5) below */
+%   #undef MR_PROC_LABEL
 % }
 % MR_define_label(xxx_i1)
 % <code for entry to a later disjunct>
@@ -229,7 +238,8 @@
 %   <declaration of one local variable to point to save struct>
 %   <assignment to save struct pointer>
 %   MR_save_registers(); /* see notes (1) and (2) below */
-%   #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%   #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%   #define MR_PROC_LABEL <procedure label>
 %   #define SUCCEED()   goto retrysuccesslabel
 %   #define SUCCEED_LAST()  goto retrylastsuccesslabel
 %   #define FAIL()      fail()
@@ -246,14 +256,16 @@
 %   #undef SUCCEED
 %   #undef SUCCEED_LAST
 %   #undef FAIL
-%   #undef MR_PROC_LABEL /* see note (5) below */
+%   #undef MR_ALLOC_ID /* see note (5) below */
+%   #undef MR_PROC_LABEL
 % }
 % MR_define_label(xxx_i2)
 % {
 %   <declaration of one local variable for each output arg>
 %   <declaration of one local variable to point to save struct>
 %   <assignment to save struct pointer>
-%   #define MR_PROC_LABEL <procedure label> /* see note (5) below */
+%   #define MR_ALLOC_ID <allocation id> /* see note (5) below */
+%   #define MR_PROC_LABEL <procedure label>
 %   #define SUCCEED()   goto sharedsuccesslabel
 %   #define SUCCEED_LAST()  goto sharedlastsuccesslabel
 %   #define FAIL()      fail()
@@ -269,7 +281,8 @@
 %   #undef SUCCEED
 %   #undef SUCCEED_LAST
 %   #undef FAIL
-%   #undef MR_PROC_LABEL /* see note (5) below */
+%   #undef MR_ALLOC_ID /* see note (5) below */
+%   #undef MR_PROC_LABEL
 % }
 % <--- boundary between code generated here and epilog --->
 % <#undef MR_ORDINARY_SLOTS>
@@ -337,12 +350,10 @@
 %   these macros can be invoked from other macros, and thus we do not have
 %   a sure test of whether the code fragments invoke the macros.
 %
-% 5 We insert a #define for MR_PROC_LABEL, so that the C code in the Mercury
-%   standard library that allocates memory manually can use MR_PROC_LABEL as
-%   the procname argument to incr_hp_msg(), for memory profiling. Hard-coding
-%   the procname argument in the C code would be wrong, since it wouldn't
-%   handle the case where the original foreign_proc gets inlined and optimized
-%   away. Of course we also need to #undef it afterwards.
+% 5 We insert a #define for MR_ALLOC_ID so that the C code that allocates
+%   memory manually can use MR_ALLOC_ID as an argument to incr_hp_msg(), for
+%   memory profiling. It replaces an older macro MR_PROC_LABEL, which is
+%   retained only for backwards compatibility.
 
 %---------------------------------------------------------------------------%
 
@@ -489,8 +500,12 @@ generate_ordinary_foreign_proc_code(CodeModel, Attributes, PredId, ProcId,
     % Generate <declaration of one local variable for each arg>.
     make_foreign_proc_decls(CArgs, ModuleInfo, CanOptAwayUnnamedArgs, Decls),
 
-    % Generate #define MR_PROC_LABEL <procedure label> /* see note (5) */
-    % and #undef MR_PROC_LABEL.
+    % Generate #define MR_ALLOC_ID and #undef MR_ALLOC_ID /* see note (5) */
+    make_alloc_id_hash_define(C_Code, Context, AllocIdHashDefine,
+        AllocIdHashUndef, !CI),
+
+    % Generate #define MR_PROC_LABEL and #undef MR_PROC_LABEL
+    % for backwards compatibility with older hand-written code.
     get_pred_id(!.CI, CallerPredId),
     get_proc_id(!.CI, CallerProcId),
     make_proc_label_hash_define(ModuleInfo, CallerPredId, CallerProcId,
@@ -606,10 +621,12 @@ generate_ordinary_foreign_proc_code(CodeModel, Attributes, PredId, ProcId,
     OutputComp = foreign_proc_outputs(OutputDescs),
 
     % Join all the components of the foreign_proc_code together.
-    Components = [ProcLabelHashDefine, DefSuccessComp, InputComp,
+    Components = [ProcLabelHashDefine | AllocIdHashDefine] ++
+        [DefSuccessComp, InputComp,
         SaveRegsComp, ObtainLock, C_Code_Comp, ReleaseLock,
         CheckSuccess_Comp, RestoreRegsComp,
-        OutputComp, UndefSuccessComp, ProcLabelHashUndef],
+        OutputComp, UndefSuccessComp,
+        ProcLabelHashUndef | AllocIdHashUndef],
     MaybeMayDupl = get_may_duplicate(Attributes),
     (
         MaybeMayDupl = yes(MayDupl)
@@ -679,12 +696,10 @@ generate_ordinary_foreign_proc_code(CodeModel, Attributes, PredId, ProcId,
 
 make_proc_label_hash_define(ModuleInfo, PredId, ProcId,
         ProcLabelHashDef, ProcLabelHashUndef) :-
-    ProcLabelHashDef = foreign_proc_raw_code(cannot_branch_away,
-        proc_does_not_affect_liveness, live_lvals_info(set.init),
-        "#define\tMR_PROC_LABEL\t" ++
-            make_proc_label_string(ModuleInfo, PredId, ProcId) ++ "\n"),
-    ProcLabelHashUndef = foreign_proc_raw_code(cannot_branch_away,
-        proc_does_not_affect_liveness, live_lvals_info(set.init),
+    ProcLabelStr = make_proc_label_string(ModuleInfo, PredId, ProcId),
+    ProcLabelHashDef = simple_foreign_proc_raw_code(
+        "#define\tMR_PROC_LABEL\t" ++ ProcLabelStr ++ "\n"),
+    ProcLabelHashUndef = simple_foreign_proc_raw_code(
         "#undef\tMR_PROC_LABEL\n").
 
 :- func make_proc_label_string(module_info, pred_id, proc_id) = string.
@@ -699,6 +714,44 @@ make_proc_label_string(ModuleInfo, PredId, ProcId) = ProcLabelString :-
         unexpected(this_file, "code_addr in make_proc_label_hash_define")
     ).
 
+:- pred make_alloc_id_hash_define(string::in, maybe(prog_context)::in,
+    list(foreign_proc_component)::out, list(foreign_proc_component)::out,
+    code_info::in, code_info::out) is det.
+
+make_alloc_id_hash_define(C_Code, MaybeContext,
+        AllocIdHashDefine, AllocIdHashUndef, !CI) :-
+    code_info.get_globals(!.CI, Globals),
+    globals.lookup_bool_option(Globals, profile_memory, ProfileMemory),
+    (
+        ProfileMemory = yes,
+        string.sub_string_search(C_Code, "MR_ALLOC_ID", _)
+    ->
+        (
+            MaybeContext = yes(Context)
+        ;
+            MaybeContext = no,
+            Context = term.context_init
+        ),
+        add_alloc_site_info(Context, "unknown", 0, AllocId, !CI),
+        AllocIdHashDefine = [
+            simple_foreign_proc_raw_code("#define\tMR_ALLOC_ID\t"),
+            foreign_proc_alloc_id(AllocId),
+            simple_foreign_proc_raw_code("\n")
+        ],
+        AllocIdHashUndef = [
+            simple_foreign_proc_raw_code("#undef\tMR_ALLOC_ID\n")
+        ]
+    ;
+        AllocIdHashDefine = [],
+        AllocIdHashUndef = []
+    ).
+
+:- func simple_foreign_proc_raw_code(string) = foreign_proc_component.
+
+simple_foreign_proc_raw_code(Code) =
+    foreign_proc_raw_code(cannot_branch_away, proc_does_not_affect_liveness,
+        live_lvals_info(set.init), Code).
+
 %-----------------------------------------------------------------------------%
 
 :- type c_arg
diff --git a/compiler/proc_gen.m b/compiler/proc_gen.m
index d8e8f75..19144b2 100644
--- a/compiler/proc_gen.m
+++ b/compiler/proc_gen.m
@@ -491,6 +491,9 @@ generate_proc_code(PredInfo, ProcInfo0, PredId, ProcId, ModuleInfo0,
     global_data_add_new_closure_layouts(ClosureLayouts, !GlobalData),
     ProcLabel = make_proc_label(ModuleInfo, PredId, ProcId),
 
+    get_alloc_sites(CodeInfo, AllocSites),
+    global_data_add_new_alloc_sites(AllocSites, !GlobalData),
+
     Name = pred_info_name(PredInfo),
     Arity = pred_info_orig_arity(PredInfo),
 
diff --git a/compiler/prog_event.m b/compiler/prog_event.m
index 22d06d5..822f481 100644
--- a/compiler/prog_event.m
+++ b/compiler/prog_event.m
@@ -154,12 +154,13 @@ read_event_set(SpecsFileName, EventSetName, EventSpecMap, ErrorSpecs, !IO) :-
 #include ""mercury_event_spec.h""
 #include <stdio.h>
 
-MR_String   read_specs_file_2(MR_Code *proc_label, MR_String specs_file_name,
-    MR_String term_file_name);
-MR_String   read_specs_file_3(MR_Code *proc_label, MR_String specs_file_name,
-    MR_String term_file_name, int spec_fd);
-MR_String   read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
-    MR_String term_file_name, int spec_fd, size_t size, char *spec_buf);
+MR_String   read_specs_file_2(MR_AllocSiteInfoPtr alloc_id,
+    MR_String specs_file_name, MR_String term_file_name);
+MR_String   read_specs_file_3(MR_AllocSiteInfoPtr alloc_id,
+    MR_String specs_file_name, MR_String term_file_name, int spec_fd);
+MR_String   read_specs_file_4(MR_AllocSiteInfoPtr alloc_id,
+    MR_String specs_file_name, MR_String term_file_name, int spec_fd,
+    size_t size, char *spec_buf);
 ").
 
 :- pragma foreign_proc("C",
@@ -172,7 +173,7 @@ MR_String   read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
     ** value on Mercury's heap if necessary.
     */
     MR_save_transient_hp();
-    Problem = read_specs_file_2(MR_PROC_LABEL, SpecsFileName, TermFileName);
+    Problem = read_specs_file_2(MR_ALLOC_ID, SpecsFileName, TermFileName);
     MR_restore_transient_hp();
 ").
 
@@ -182,7 +183,7 @@ read_specs_file(_, _, _, _, _) :-
 :- pragma foreign_code("C", "
 
 MR_String
-read_specs_file_2(MR_Code *proc_label, MR_String specs_file_name,
+read_specs_file_2(MR_AllocSiteInfoPtr alloc_id, MR_String specs_file_name,
     MR_String term_file_name)
 {
     int         spec_fd;
@@ -196,10 +197,10 @@ read_specs_file_2(MR_Code *proc_label, MR_String specs_file_name,
 
     spec_fd = open(specs_file_name, O_RDONLY);
     if (spec_fd < 0) {
-        problem = MR_make_string(proc_label, ""could not open %s: %s"",
+        problem = MR_make_string(alloc_id, ""could not open %s: %s"",
             specs_file_name, strerror(errno));
     } else {
-        problem = read_specs_file_3(proc_label, specs_file_name,
+        problem = read_specs_file_3(alloc_id, specs_file_name,
             term_file_name, spec_fd);
         (void) close(spec_fd);
     }
@@ -207,25 +208,25 @@ read_specs_file_2(MR_Code *proc_label, MR_String specs_file_name,
 }
 
 MR_String
-read_specs_file_3(MR_Code *proc_label, MR_String specs_file_name,
+read_specs_file_3(MR_AllocSiteInfoPtr alloc_id, MR_String specs_file_name,
     MR_String term_file_name, int spec_fd)
 {
     struct stat stat_buf;
     MR_String   problem;
 
     if (fstat(spec_fd, &stat_buf) != 0) {
-        problem = MR_make_string(proc_label, ""could not stat %s"",
+        problem = MR_make_string(alloc_id, ""could not stat %s"",
             specs_file_name);
     } else {
         char        *spec_buf;
 
         spec_buf = malloc(stat_buf.st_size + 1);
         if (spec_buf == NULL) {
-            problem = MR_make_string(proc_label,
+            problem = MR_make_string(alloc_id,
                 ""could not allocate memory for a copy of %s"",
                 specs_file_name);
         } else {
-            problem = read_specs_file_4(proc_label, specs_file_name,
+            problem = read_specs_file_4(alloc_id, specs_file_name,
                 term_file_name, spec_fd, stat_buf.st_size, spec_buf);
             free(spec_buf);
         }
@@ -234,7 +235,7 @@ read_specs_file_3(MR_Code *proc_label, MR_String specs_file_name,
 }
 
 MR_String
-read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
+read_specs_file_4(MR_AllocSiteInfoPtr alloc_id, MR_String specs_file_name,
     MR_String term_file_name, int spec_fd, size_t size, char *spec_buf)
 {
     size_t      num_bytes_read;
@@ -245,7 +246,7 @@ read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
         num_bytes_read = read(spec_fd, spec_buf, size);
     } while (num_bytes_read == -1 && MR_is_eintr(errno));
     if (num_bytes_read != size) {
-        problem = MR_make_string(proc_label, ""could not read in %s"",
+        problem = MR_make_string(alloc_id, ""could not read in %s"",
             specs_file_name);
     } else {
         MR_EventSet event_set;
@@ -254,14 +255,14 @@ read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
         spec_buf[num_bytes_read] = '\\0';
         event_set = MR_read_event_set(specs_file_name, spec_buf);
         if (event_set == NULL) {
-            problem = MR_make_string(proc_label, ""could not parse %s"",
+            problem = MR_make_string(alloc_id, ""could not parse %s"",
                 specs_file_name);
         } else {
             FILE *term_fp;
 
             term_fp = fopen(term_file_name, ""w"");
             if (term_fp == NULL) {
-                problem = MR_make_string(proc_label, ""could not open %s: %s"",
+                problem = MR_make_string(alloc_id, ""could not open %s: %s"",
                     term_file_name, strerror(errno));
             } else {
                 MR_print_event_set(term_fp, event_set);
@@ -270,7 +271,7 @@ read_specs_file_4(MR_Code *proc_label, MR_String specs_file_name,
                 /*
                 ** Our caller tests Problem against the empty string, not NULL.
                 */
-                problem = MR_make_string(proc_label, """");
+                problem = MR_make_string(alloc_id, """");
             }
         }
     }
diff --git a/compiler/timestamp.m b/compiler/timestamp.m
index 81be210..dbfec23 100644
--- a/compiler/timestamp.m
+++ b/compiler/timestamp.m
@@ -105,7 +105,7 @@ gmtime_to_timestamp(tm(Year, Month, MD, Hrs, Min, Sec, YD, WD, DST)) =
     t.tm_isdst = N;
 
     size = sizeof ""yyyy-mm-dd hh:mm:ss"";
-    MR_allocate_aligned_string_msg(Result, size - 1, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Result, size - 1, MR_ALLOC_ID);
 
     strftime(Result, size, ""%Y-%m-%d %H:%M:%S"", &t);
 }").
diff --git a/compiler/unify_gen.m b/compiler/unify_gen.m
index 7f07f7a..f97a7c4 100644
--- a/compiler/unify_gen.m
+++ b/compiler/unify_gen.m
@@ -512,8 +512,9 @@ generate_construction_2(ConsTag, Var, Args, Modes, HowToConstruct,
         var_types(!.CI, Args, ArgTypes),
         generate_cons_args(Args, ArgTypes, Modes, 0, 1, TakeAddr, !.CI,
             MaybeRvals, FieldAddrs, MayUseAtomic),
+        Context = goal_info_get_context(GoalInfo),
         construct_cell(Var, Ptag, MaybeRvals, HowToConstruct,
-            MaybeSize, FieldAddrs, MayUseAtomic, Code, !CI)
+            MaybeSize, FieldAddrs, Context, MayUseAtomic, Code, !CI)
     ;
         ConsTag = shared_remote_tag(Ptag, Sectag),
         var_types(!.CI, Args, ArgTypes),
@@ -521,8 +522,9 @@ generate_construction_2(ConsTag, Var, Args, Modes, HowToConstruct,
             MaybeRvals0, FieldAddrs, MayUseAtomic),
         % The first field holds the secondary tag.
         MaybeRvals = [yes(const(llconst_int(Sectag))) | MaybeRvals0],
+        Context = goal_info_get_context(GoalInfo),
         construct_cell(Var, Ptag, MaybeRvals, HowToConstruct,
-            MaybeSize, FieldAddrs, MayUseAtomic, Code, !CI)
+            MaybeSize, FieldAddrs, Context, MayUseAtomic, Code, !CI)
     ;
         ConsTag = shared_local_tag(Ptag, Sectag),
         assign_const_to_var(Var,
@@ -687,6 +689,9 @@ generate_closure(PredId, ProcId, EvalMethod, Var, Args, GoalInfo, Code, !CI) :-
             NumNewArgsPlusThree = NumNewArgs + 3,
             NumNewArgsPlusThree_Rval = const(llconst_int(NumNewArgsPlusThree)),
             produce_variable(CallPred, OldClosureCode, OldClosure, !CI),
+            Context = goal_info_get_context(GoalInfo),
+            maybe_add_alloc_site_info(Context, "closure", NumNewArgsPlusThree,
+                MaybeAllocId, !CI),
             % The new closure contains a pointer to the old closure.
             NewClosureMayUseAtomic = may_not_use_atomic_alloc,
             NewClosureCode = from_list([
@@ -696,7 +701,7 @@ generate_closure(PredId, ProcId, EvalMethod, Var, Args, GoalInfo, Code, !CI) :-
                     "get number of arguments"),
                 llds_instr(incr_hp(NewClosure, no, no,
                     binop(int_add, lval(NumOldArgs), NumNewArgsPlusThree_Rval),
-                    "closure", NewClosureMayUseAtomic, no, no_llds_reuse),
+                    MaybeAllocId, NewClosureMayUseAtomic, no, no_llds_reuse),
                     "allocate new closure"),
                 llds_instr(assign(field(yes(0), lval(NewClosure), Zero),
                     lval(field(yes(0), OldClosure, Zero))),
@@ -785,8 +790,10 @@ generate_closure(PredId, ProcId, EvalMethod, Var, Args, GoalInfo, Code, !CI) :-
         ],
         % XXX construct_dynamically is just a dummy value. We just want
         % something which is not construct_in_region(_).
+        maybe_add_alloc_site_info(Context, "closure", length(Vector),
+            MaybeAllocId, !CI),
         assign_cell_to_var(Var, no, 0, Vector, construct_dynamically, no, [],
-            "closure", MayUseAtomic, Code, !CI)
+            MaybeAllocId, MayUseAtomic, Code, !CI)
     ).
 
 :- pred generate_extra_closure_args(list(prog_var)::in, lval::in,
@@ -935,11 +942,11 @@ initial_may_use_atomic(ModuleInfo) = InitMayUseAtomic :-
 
 :- pred construct_cell(prog_var::in, tag::in, list(maybe(rval))::in,
     how_to_construct::in, maybe(term_size_value)::in,
-    assoc_list(int, prog_var)::in, may_use_atomic_alloc::in,
+    assoc_list(int, prog_var)::in, prog_context::in, may_use_atomic_alloc::in,
     llds_code::out, code_info::in, code_info::out) is det.
 
 construct_cell(Var, Ptag, MaybeRvals, HowToConstruct, MaybeSize, FieldAddrs,
-        MayUseAtomic, Code, !CI) :-
+        Context, MayUseAtomic, Code, !CI) :-
     VarType = variable_type(!.CI, Var),
     var_type_msg(VarType, VarTypeMsg),
     % If we're doing accurate GC, then for types which hold RTTI that
@@ -959,8 +966,10 @@ construct_cell(Var, Ptag, MaybeRvals, HowToConstruct, MaybeSize, FieldAddrs,
         ReserveWordAtStart = no
     ),
     FieldNums = list.map(fst, FieldAddrs),
+    Size = list.length(MaybeRvals),
+    maybe_add_alloc_site_info(Context, VarTypeMsg, Size, MaybeAllocId, !CI),
     assign_cell_to_var(Var, ReserveWordAtStart, Ptag, MaybeRvals,
-        HowToConstruct, MaybeSize, FieldNums, VarTypeMsg, MayUseAtomic,
+        HowToConstruct, MaybeSize, FieldNums, MaybeAllocId, MayUseAtomic,
         CellCode, !CI),
     (
         FieldAddrs = [],
@@ -976,6 +985,21 @@ construct_cell(Var, Ptag, MaybeRvals, HowToConstruct, MaybeSize, FieldAddrs,
         Code = CellCode ++ FieldCode
     ).
 
+:- pred maybe_add_alloc_site_info(prog_context::in, string::in, int::in,
+    maybe(alloc_site_id)::out, code_info::in, code_info::out) is det.
+
+maybe_add_alloc_site_info(Context, VarTypeMsg, Size, MaybeAllocId, !CI) :-
+    get_globals(!.CI, Globals),
+    globals.lookup_bool_option(Globals, profile_memory, ProfileMemory),
+    (
+        ProfileMemory = yes,
+        add_alloc_site_info(Context, VarTypeMsg, Size, AllocId, !CI),
+        MaybeAllocId = yes(AllocId)
+    ;
+        ProfileMemory = no,
+        MaybeAllocId = no
+    ).
+
 :- pred generate_field_take_address_assigns(assoc_list(int, prog_var)::in,
     prog_var::in, int::in, llds_code::out, code_info::in, code_info::out)
     is det.
diff --git a/compiler/use_local_vars.m b/compiler/use_local_vars.m
index 0c7c93b..1ad8b9e 100644
--- a/compiler/use_local_vars.m
+++ b/compiler/use_local_vars.m
@@ -522,6 +522,7 @@ substitute_lval_in_defn_components(OldLval, NewLval,
         ; Comp0 = foreign_proc_user_code(_, _, _)
         ; Comp0 = foreign_proc_raw_code(_, _, _, _)
         ; Comp0 = foreign_proc_fail_to(_)
+        ; Comp0 = foreign_proc_alloc_id(_)
         ; Comp0 = foreign_proc_noop
         ),
         Comp = Comp0
@@ -734,6 +735,7 @@ component_updates_oldlval(Component, Lval) = Updates :-
     ;
         ( Component = foreign_proc_inputs(_)
         ; Component = foreign_proc_fail_to(_)
+        ; Component = foreign_proc_alloc_id(_)
         ; Component = foreign_proc_noop
         ; Component = foreign_proc_user_code(_, _, _)
         ; Component = foreign_proc_raw_code(_, _, _, _)
@@ -759,6 +761,7 @@ component_affects_liveness(Component) = Affects :-
         ( Component = foreign_proc_inputs(_)
         ; Component = foreign_proc_outputs(_)
         ; Component = foreign_proc_fail_to(_)
+        ; Component = foreign_proc_alloc_id(_)
         ; Component = foreign_proc_noop
         ),
         Affects = no
diff --git a/compiler/var_locn.m b/compiler/var_locn.m
index 53dba7e..82dc781 100644
--- a/compiler/var_locn.m
+++ b/compiler/var_locn.m
@@ -183,8 +183,8 @@
 :- pred var_locn_assign_cell_to_var(module_info::in, exprn_opts::in,
     prog_var::in, bool::in, tag::in, list(maybe(rval))::in,
     how_to_construct::in, maybe(term_size_value)::in, list(int)::in,
-    string::in, may_use_atomic_alloc::in, label::in, llds_code::out,
-    static_cell_info::in, static_cell_info::out,
+    maybe(alloc_site_id)::in, may_use_atomic_alloc::in, label::in,
+    llds_code::out, static_cell_info::in, static_cell_info::out,
     var_locn_info::in, var_locn_info::out) is det.
 
     % var_locn_save_cell_fields(ModuleInfo, Var, VarLval, Code,
@@ -811,7 +811,7 @@ add_use_ref(ContainedVar, UsingVar, !VarStateMap) :-
 %----------------------------------------------------------------------------%
 
 var_locn_assign_cell_to_var(ModuleInfo, ExprnOpts, Var, ReserveWordAtStart,
-        Ptag, MaybeRvals0, HowToConstruct, MaybeSize, FieldAddrs, TypeMsg,
+        Ptag, MaybeRvals0, HowToConstruct, MaybeSize, FieldAddrs, MaybeAllocId,
         MayUseAtomic, Label, Code, !StaticCellInfo, !VLI) :-
     (
         MaybeSize = yes(SizeSource),
@@ -846,16 +846,16 @@ var_locn_assign_cell_to_var(ModuleInfo, ExprnOpts, Var, ReserveWordAtStart,
     ;
         var_locn_assign_dynamic_cell_to_var(ModuleInfo, Var,
             ReserveWordAtStart, Ptag, MaybeRvals, HowToConstruct,
-            MaybeOffset, TypeMsg, MayUseAtomic, Label, Code, !VLI)
+            MaybeOffset, MaybeAllocId, MayUseAtomic, Label, Code, !VLI)
     ).
 
 :- pred var_locn_assign_dynamic_cell_to_var(module_info::in, prog_var::in,
     bool::in, tag::in, list(maybe(rval))::in, how_to_construct::in,
-    maybe(int)::in, string::in, may_use_atomic_alloc::in, label::in,
-    llds_code::out, var_locn_info::in, var_locn_info::out) is det.
+    maybe(int)::in, maybe(alloc_site_id)::in, may_use_atomic_alloc::in,
+    label::in, llds_code::out, var_locn_info::in, var_locn_info::out) is det.
 
 var_locn_assign_dynamic_cell_to_var(ModuleInfo, Var, ReserveWordAtStart, Ptag,
-        Vector, HowToConstruct, MaybeOffset, TypeMsg, MayUseAtomic, Label,
+        Vector, HowToConstruct, MaybeOffset, MaybeAllocId, MayUseAtomic, Label,
         Code, !VLI) :-
     check_var_is_unknown(!.VLI, Var),
 
@@ -952,7 +952,7 @@ var_locn_assign_dynamic_cell_to_var(ModuleInfo, Var, ReserveWordAtStart, Ptag,
     CellCode = singleton(
         llds_instr(
             incr_hp(Lval, yes(Ptag), TotalOffset,
-                const(llconst_int(TotalSize)), TypeMsg, MayUseAtomic,
+                const(llconst_int(TotalSize)), MaybeAllocId, MayUseAtomic,
                 MaybeRegionRval, MaybeReuse),
             LldsComment ++ VarName)
     ),
diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
index 1da72e2..3594eae 100644
--- a/doc/reference_manual.texi
+++ b/doc/reference_manual.texi
@@ -6604,7 +6604,7 @@ based on factors such as the size of the code fragment.
     @c [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
     @c sharing(yes(int, T, array(T)), [cel(Item,[]) - cel(Array,[T])])],
 @c "
-    @c ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+    @c ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
     @c ML_init_array(Array, Size, Item);
 @c ").
 @c @end example
diff --git a/doc/user_guide.texi b/doc/user_guide.texi
index a89d68f..5f7de70 100644
--- a/doc/user_guide.texi
+++ b/doc/user_guide.texi
@@ -5576,8 +5576,11 @@ then a progress message will be displayed as each file is read.
 * Creating profiles::               How to create profile data.
 * Using mprof for time profiling::  How to analyze the time performance of a
                                     program with mprof.
-* Using mprof for memory profiling::How to analyze the memory performance of a
+* Using mprof for profiling memory allocation::
+                                    How to analyze the memory performance of a
                                     program with mprof.
+* Using mprof -s for profiling memory retention::
+                                    How to analyze what memory is on the heap.
 * Using mdprof::                    How to analyze the time and/or memory
                                     performance of a program with mdprof.
 * Using threadscope::               How to analyse the parallel
@@ -5965,15 +5968,14 @@ time represent the proportion of the current procedure's self and descendent
 time due to that parent.  These times are obtained using the assumption that
 each call contributes equally to the total time of the current procedure.
 
- at node Using mprof for memory profiling
- at section Using mprof for memory profiling
+ at node Using mprof for profiling memory allocation
+ at section Using mprof for profiling memory allocation
 @pindex mprof
 @cindex Memory profiling
 @cindex Allocation profiling
- at cindex Heap profiling
 @cindex Profiling memory allocation
 
-To create a memory profile, you can invoke @samp{mprof}
+To create a profile of memory allocations, you can invoke @samp{mprof}
 with the @samp{-m} (@samp{--profile memory-words}) option.
 This will profile the amount of memory allocated, measured in units of words.
 (A word is 4 bytes on a 32-bit architecture,
@@ -5992,13 +5994,72 @@ With memory profiling, just as with time profiling,
 you can use the @samp{-c} (@samp{--call-graph}) option to display
 call graph profiles in addition to flat profiles.
 
-Note that Mercury's memory profiler will only tell you about allocation,
+The options so far will only tell you about allocation,
 not about deallocation (garbage collection).
 It can tell you how much memory was allocated by each procedure,
 but it won't tell you how long the memory was live for,
 or how much of that memory was garbage-collected.
 This is also true for @samp{mdprof}.
 
+The @samp{mprof -s} tool described in the next section can tell
+you which memory cells remain on the heap.
+
+ at node Using mprof -s for profiling memory retention
+ at section Using mprof -s for profiling memory retention
+ at pindex mprof -s
+ at cindex Memory attribution
+ at cindex Memory retention
+ at cindex Heap profiling
+
+When a program is built with memory profiling and uses the Boehm
+garbage collector, i.e. a grade with @samp{.memprof.gc} modifiers,
+each memory cell is ``attributed'' with information about where it
+was originated, and its type constructor.  This information can be
+collated to tell you what kinds of objects are being retained when
+the program executes.
+
+To you this, you must instrument the program by adding calls to
+ at code{benchmarking.report_memory_attribution/1} or
+ at code{benchmarking.report_memory_attribution/3}
+at points of interest, passing an appropriate label for
+your reference.  For example, if a program operates in distinct phases
+you may want to add a call in between the phases.
+The @samp{report_memory_attribution} predicates do nothing in other grades,
+so are safe to leave in the program.  You may want to call them from
+within @samp{trace} goals:
+
+ at example
+trace [run_time(env("SNAPSHOTS")), io(!IO)] (
+    benchmarking.report_memory_attribution("Phase 2", !IO)
+)
+ at end example
+
+Next, build the program in a @samp{.memprof.gc} grade.
+After the program has finished executing, it will generate a file
+called @samp{Prof.Snapshots} in the current directory.
+Run @samp{mprof -s} to view the profile.
+You will see the memory cells which were on the heap at each time
+that @samp{report_memory_attribution} was called: the origin of the cells, and
+their type constructors.
+
+Passing the additional option @samp{-g type} will group the profile first by
+type constructors, then by procedure.  The @samp{-H} option hides the secondary
+level of information.  Memory allocated by the Mercury runtime system itself
+are normally excluded from the profile; they can be viewed by passing the
+ at samp{-r} option.
+
+Mercury values which are dead may in fact be still reachable from the various
+execution stacks. This is particularly noticeable on the high-level C back-end,
+as the C compiler does not take conservative garbage collection into account;
+the values of Mercury variables may linger on the C stack for much longer than
+necessary.
+The low-level C grades should suffer to a lesser extent.
+
+The attribution requires an extra word of memory per cell, which
+is then rounded up by the memory allocator.
+This is accounted for in @samp{mprof} output, but the memory usage
+of the program may be significantly higher than in non-memory profiling grades.
+
 @node Using mdprof
 @section Using mdprof
 @pindex mdprof
@@ -7793,7 +7854,9 @@ This option is only supported by the C back-ends.
 Enable memory profiling.  Insert memory profiling hooks in the
 generated code, and also output some profiling
 information (the static call graph) to the file
- at samp{@var{module}.prof}.  @xref{Using mprof for memory profiling}.
+ at samp{@var{module}.prof}.
+ at xref{Using mprof for profiling memory allocation}.
+ at xref{Using mprof -s for profiling memory retention}.
 This option is only supported by the C back-ends.
 
 @sp 1
diff --git a/extras/net/tcp.m b/extras/net/tcp.m
index ee2b41f..75ffdf7 100755
--- a/extras/net/tcp.m
+++ b/extras/net/tcp.m
@@ -556,7 +556,7 @@ socket_fd(Tcp) = socket_fd_c(Tcp ^ handle).
         MR_Word ret_string_word;
         MR_offset_incr_hp_atomic_msg(ret_string_word,
             0, TCP_IO_BYTES_TO_WORDS((i + 1) * sizeof(MR_Char)),
-            MR_PROC_LABEL, ""string.string/0"");
+            MR_ALLOC_ID, ""string.string/0"");
         RetString = (MR_String) ret_string_word;
         MR_memcpy(RetString, read_buffer, i * sizeof(MR_Char));
         RetString[i] = '\\0';
diff --git a/extras/solver_types/library/any_array.m b/extras/solver_types/library/any_array.m
index 483f07f..065725d 100644
--- a/extras/solver_types/library/any_array.m
+++ b/extras/solver_types/library/any_array.m
@@ -361,11 +361,11 @@
 **   the any_array, and updating the any_array size accordingly.
 */
 
-#define ML_alloc_any_array(newarray, any_arraysize, proclabel)              \
+#define ML_alloc_any_array(newarray, any_arraysize, alloc_id)               \
     do {                                                                    \
         MR_Word newarray_word;                                              \
         MR_offset_incr_hp_msg(newarray_word, 0, (any_arraysize),            \
-            proclabel, ""any_array:any_array/1"");                          \
+            alloc_id, ""any_array.any_array/1"");                           \
         (newarray) = (MR_ArrayPtr) newarray_word;                           \
     } while (0)
 ").
@@ -405,7 +405,7 @@ any_array.init(Size, Item, Array) :-
     any_array.init_2(Size::in, Item::ia, Array::any_array_uo),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
-    ML_alloc_any_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_any_array(Array, Size + 1, MR_ALLOC_ID);
     ML_init_any_array(Array, Size, Item);
 ").
 
@@ -413,7 +413,7 @@ any_array.init(Size, Item, Array) :-
     any_array.make_empty_array(Array::any_array_uo),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
-    ML_alloc_any_array(Array, 1, MR_PROC_LABEL);
+    ML_alloc_any_array(Array, 1, MR_ALLOC_ID);
     ML_init_any_array(Array, 0, 0);
 ").
 
@@ -577,7 +577,7 @@ ML_resize_any_array(MR_ArrayPtr any_array, MR_ArrayPtr old_array,
     if ((Array0)->size == Size) {
         Array = Array0;
     } else {
-        ML_alloc_any_array(Array, Size + 1, MR_PROC_LABEL);
+        ML_alloc_any_array(Array, Size + 1, MR_ALLOC_ID);
         ML_resize_any_array(Array, Array0, Size, Item);
     }
 ").
@@ -633,7 +633,7 @@ any_array.shrink(Array0, Size, Array) :-
     any_array.shrink_2(Array0::any_array_di, Size::in, Array::any_array_uo),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
-    ML_alloc_any_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_any_array(Array, Size + 1, MR_ALLOC_ID);
     ML_shrink_any_array(Array, Array0, Size);
 ").
 
@@ -671,7 +671,7 @@ ML_copy_any_array(MR_ArrayPtr any_array, MR_ConstArrayPtr old_array)
     any_array.copy(Array0::any_array_ui, Array::any_array_uo),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
-    ML_alloc_any_array(Array, Array0->size + 1, MR_PROC_LABEL);
+    ML_alloc_any_array(Array, Array0->size + 1, MR_ALLOC_ID);
     ML_copy_any_array(Array, (MR_ConstArrayPtr) Array0);
 ").
 
diff --git a/extras/trailed_update/tr_array.m b/extras/trailed_update/tr_array.m
index a16765c..db3ba76 100644
--- a/extras/trailed_update/tr_array.m
+++ b/extras/trailed_update/tr_array.m
@@ -358,7 +358,7 @@ ML_tr_resize_array(MR_ArrayType *array, const MR_ArrayType *old_array,
         Array::array_uo),
     [promise_pure, will_not_call_mercury],
 "
-    ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
     ML_tr_resize_array((MR_ArrayType *)Array, (const MR_ArrayType *)Array0,
         Size, Item);
 ").
@@ -367,7 +367,7 @@ ML_tr_resize_array(MR_ArrayType *array, const MR_ArrayType *old_array,
 %   tr_array.resize(Array0::in, Size::in, Item::in, Array::array_uo),
 %   [promise_pure, will_not_call_mercury],
 % "
-%   MR_incr_hp_msg(Array, Size + 1, MR_PROC_LABEL, ""array:array/1"");
+%   MR_incr_hp_msg(Array, Size + 1, MR_ALLOC_ID, ""array.array/1"");
 %   ML_tr_resize_array((MR_ArrayType *)Array, (const MR_ArrayType *)Array0,
 %       Size, Item);
 % ").
@@ -409,7 +409,7 @@ ML_tr_shrink_array(MR_ArrayType *array, const MR_ArrayType *old_array,
     tr_array.shrink(Array0::array_mui, Size::in, Array::array_uo),
     [promise_pure, will_not_call_mercury],
 "
-    ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
     ML_tr_shrink_array((MR_ArrayType *) Array,
         (const MR_ArrayType *) Array0, Size);
 ").
@@ -418,7 +418,7 @@ ML_tr_shrink_array(MR_ArrayType *array, const MR_ArrayType *old_array,
 %   tr_array.shrink(Array0::in, Size::in, Array::array_uo),
 %   [promise_pure, will_not_call_mercury],
 % "
-%   MR_incr_hp_msg(Array, Size + 1, MR_PROC_LABEL, ""array:array/1"");
+%   MR_incr_hp_msg(Array, Size + 1, MR_ALLOC_ID, ""array.array/1"");
 %   ML_tr_shrink_array((MR_ArrayType *) Array,
 %       (const MR_ArrayType *) Array0, Size);
 % ").
@@ -457,7 +457,7 @@ ML_tr_copy_array(MR_ArrayType *array, const MR_ArrayType *old_array)
     [promise_pure, will_not_call_mercury],
 "
     ML_alloc_array(Array, ((const MR_ArrayType *)Array0)->size + 1,
-        MR_PROC_LABEL);
+        MR_ALLOC_ID);
     ML_tr_copy_array((MR_ArrayType *)Array, (const MR_ArrayType *)Array0);
 ").
 
@@ -466,7 +466,7 @@ ML_tr_copy_array(MR_ArrayType *array, const MR_ArrayType *old_array)
 %   [promise_pure, will_not_call_mercury],
 % "
 %   MR_incr_hp_msg(Array, ((const MR_ArrayType *)Array0)->size + 1,
-%       MR_PROC_LABEL, ""array:array/1"");
+%       MR_ALLOC_ID, ""array.array/1"");
 %   ML_tr_copy_array((MR_ArrayType *)Array, (const MR_ArrayType *)Array0);
 % ").
 
diff --git a/library/array.m b/library/array.m
index e309122..8044e38 100644
--- a/library/array.m
+++ b/library/array.m
@@ -732,11 +732,11 @@ array.compare_elements(N, Size, Array1, Array2, Result) :-
 **   the array, and updating the array size accordingly.
 */
 
-#define ML_alloc_array(newarray, arraysize, proclabel)                  \
+#define ML_alloc_array(newarray, arraysize, alloc_id)                   \
     do {                                                                \
         MR_Word newarray_word;                                          \
         MR_offset_incr_hp_msg(newarray_word, 0, (arraysize),            \
-            proclabel, ""array:array/1"");                              \
+            alloc_id, ""array.array/1"");                               \
         (newarray) = (MR_ArrayPtr) newarray_word;                       \
     } while (0)
 ").
@@ -995,7 +995,7 @@ array.init(Size, Item, Array) :-
         ])
     ],
 "
-    ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
     ML_init_array(Array, Size, Item);
 ").
 
@@ -1004,7 +1004,7 @@ array.init(Size, Item, Array) :-
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness, no_sharing],
 "
-    ML_alloc_array(Array, 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, 1, MR_ALLOC_ID);
     ML_init_array(Array, 0, 0);
 ").
 
@@ -1403,7 +1403,7 @@ ML_resize_array(MR_ArrayPtr array, MR_ArrayPtr old_array,
     ** deallocate the storage for it.
     */
 #ifdef MR_CONSERVATIVE_GC
-    GC_FREE(old_array);
+    MR_GC_free_attrib(old_array);
 #endif
 }
 ").
@@ -1421,7 +1421,7 @@ ML_resize_array(MR_ArrayPtr array, MR_ArrayPtr old_array,
     if ((Array0)->size == Size) {
         Array = Array0;
     } else {
-        ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+        ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
         ML_resize_array(Array, Array0, Size, Item);
     }
 ").
@@ -1486,7 +1486,7 @@ ML_shrink_array(MR_ArrayPtr array, MR_ArrayPtr old_array,
     ** deallocate the storage for it.
     */
 #ifdef MR_CONSERVATIVE_GC
-    GC_FREE(old_array);
+    MR_GC_free_attrib(old_array);
 #endif
 }
 ").
@@ -1513,7 +1513,7 @@ array.shrink(Array0, Size, Array) :-
         ])
     ],
 "
-    ML_alloc_array(Array, Size + 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, Size + 1, MR_ALLOC_ID);
     ML_shrink_array(Array, Array0, Size);
 ").
 
@@ -1594,7 +1594,7 @@ ML_copy_array(MR_ArrayPtr array, MR_ConstArrayPtr old_array)
         ])
     ],
 "
-    ML_alloc_array(Array, Array0->size + 1, MR_PROC_LABEL);
+    ML_alloc_array(Array, Array0->size + 1, MR_ALLOC_ID);
     ML_copy_array(Array, (MR_ConstArrayPtr) Array0);
 ").
 
@@ -1923,7 +1923,7 @@ array.append(A, B) = C :-
     MR_Integer offset;
 
     sizeC = ArrayA->size + ArrayB->size;
-    ML_alloc_array(ArrayC, sizeC + 1, MR_PROC_LABEL);
+    ML_alloc_array(ArrayC, sizeC + 1, MR_ALLOC_ID);
 
     ArrayC->size = sizeC;
     for (i = 0; i < ArrayA->size; i++) {
diff --git a/library/benchmarking.m b/library/benchmarking.m
index b2bd1ab..78dc8bb 100644
--- a/library/benchmarking.m
+++ b/library/benchmarking.m
@@ -42,6 +42,19 @@
     %
 :- impure pred report_full_memory_stats is det.
 
+    % report_memory_attribution(Label, !IO) is a procedure intended for use in
+    % profiling the memory usage by a program. On `memprof.gc' grades it has
+    % the side-effect of forcing a garbage collection and reporting a summary
+    % of the objects on the heap to a data file. See ``Using mprof -s for
+    % profiling memory retention'' in the Mercury User's Guide. The label is
+    % for your reference.
+    %
+    % On other grades this procedure does nothing.
+    %
+:- pred report_memory_attribution(string::in, io::di, io::uo) is det.
+
+:- impure pred report_memory_attribution(string::in) is det.
+
     % benchmark_det(Pred, In, Out, Repeats, Time) is for benchmarking the det
     % predicate Pred. We call Pred with the input In and the output Out, and
     % return Out so that the caller can check the correctness of the
@@ -207,6 +220,25 @@ extern void ML_report_full_memory_stats(void);
     'ML_report_stats'()
 ").
 
+:- pragma foreign_proc("C",
+    report_memory_attribution(Label::in),
+    [will_not_call_mercury],
+"
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    MR_report_memory_attribution(Label);
+#else
+    (void) Label;
+#endif
+").
+
+report_memory_attribution(_) :-
+    impure_true.
+
+:- pragma promise_pure(report_memory_attribution/3).
+
+report_memory_attribution(Label, !IO) :-
+    impure report_memory_attribution(Label).
+
 %-----------------------------------------------------------------------------%
 
 :- pragma foreign_code("C", "
@@ -587,7 +619,11 @@ ML_memory_profile_top_table(MR_memprof_record *node,
         next_slot = ML_memory_profile_top_table(node->left,
             table, table_size, next_slot);
 
-        new_entry.name = node->name;
+        if (node->type_name != NULL) {
+            new_entry.name = node->type_name;
+        } else {
+            new_entry.name = MR_lookup_entry_or_internal(node->proc);
+        }
         ML_update_counter(&node->counter, &new_entry.counter);
         next_slot = ML_insert_into_table(&new_entry,
             table, table_size, next_slot);
@@ -615,7 +651,11 @@ ML_memory_profile_fill_table(MR_memprof_record *node,
         next_slot = ML_memory_profile_fill_table(node->left,
             table, next_slot);
 
-        table[next_slot].name = node->name;
+        if (node->type_name != NULL) {
+            table[next_slot].name = node->type_name;
+        } else {
+            table[next_slot].name = MR_lookup_entry_or_internal(node->proc);
+        }
         ML_update_counter(&node->counter, &table[next_slot].counter);
         next_slot++;
 
@@ -1070,7 +1110,7 @@ repeat(N) :-
     [will_not_call_mercury],
 "
     MR_offset_incr_hp_msg(Ref, MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1,
-        MR_PROC_LABEL, ""benchmarking:int_reference/1"");
+        MR_ALLOC_ID, ""benchmarking.int_reference/1"");
     MR_define_size_slot(0, Ref, 1);
     * (MR_Integer *) Ref = X;
 ").
diff --git a/library/bit_buffer.m b/library/bit_buffer.m
index a61e650..c15bbcf 100644
--- a/library/bit_buffer.m
+++ b/library/bit_buffer.m
@@ -173,14 +173,15 @@ new_buffer_2(BM, Pos, Size, UseStream, Stream, State, ReadStatus) =
         Stream::in, State::in, ReadStatus::in) = (Buffer::out),
     [will_not_call_mercury, promise_pure],
 "{
-    Buffer = MR_GC_NEW(ML_BitBuffer);
+    MR_incr_hp_type_msg(Buffer, ML_BitBuffer, MR_ALLOC_ID,
+        ""bit_buffer.bit_buffer/3"");
     Buffer->ML_bit_buffer_bitmap = BM;
     Buffer->ML_bit_buffer_pos = Pos;
     Buffer->ML_bit_buffer_size = Size;
     Buffer->ML_bit_buffer_use_stream = UseStream;
     Buffer->ML_bit_buffer_stream = Stream;
     Buffer->ML_bit_buffer_state = State;
-    Buffer->ML_bit_buffer_filled_bitmaps = MR_list_empty();
+    Buffer->ML_bit_buffer_filled_bitmaps = MR_list_empty_msg(MR_ALLOC_ID);
     Buffer->ML_bit_buffer_read_status = ReadStatus;
 }").
 
diff --git a/library/bitmap.m b/library/bitmap.m
index efbd0cd..ff819a8 100644
--- a/library/bitmap.m
+++ b/library/bitmap.m
@@ -1952,7 +1952,7 @@ _ ^ unsafe_byte(_) = _ :- private_builtin.sorry("bitmap.unsafe_byte").
     allocate_bitmap(N::in) = (BM::bitmap_uo),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
 "
-    MR_allocate_bitmap_msg(BM, N, MR_PROC_LABEL);
+    MR_allocate_bitmap_msg(BM, N, MR_ALLOC_ID);
 ").
 
 :- pragma foreign_proc("Java",
@@ -1989,7 +1989,7 @@ resize_bitmap(OldBM, N) =
     copy(BM0::in) = (BM::bitmap_uo),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
 "
-    MR_allocate_bitmap_msg(BM, BM0->num_bits, MR_PROC_LABEL);
+    MR_allocate_bitmap_msg(BM, BM0->num_bits, MR_ALLOC_ID);
     MR_copy_bitmap(BM, BM0);
 ").
 
diff --git a/library/construct.m b/library/construct.m
index 672252a..ed06104 100644
--- a/library/construct.m
+++ b/library/construct.m
@@ -334,7 +334,7 @@ get_functor_with_names_internal(TypeDesc, FunctorNumber, FunctorName, Arity,
             ArgNameList = MR_list_empty();
             for (i = 0; i < Arity; i++) {
                 ArgNameList = MR_string_list_cons_msg((MR_Word) NULL,
-                    ArgNameList, MR_PROC_LABEL);
+                    ArgNameList, MR_ALLOC_ID);
             }
             MR_restore_transient_registers();
         } else {
@@ -689,7 +689,7 @@ find_functor_2(TypeInfo, Functor, Arity, Num0, FunctorNumber, ArgTypes) :-
 
                     MR_tag_offset_incr_hp_msg(new_data, ptag,
                         MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1 + arity,
-                        MR_PROC_LABEL, ""<created by construct.construct/3>"");
+                        MR_ALLOC_ID, ""<created by construct.construct/3>"");
 
                     size = MR_cell_size(arity);
                     MR_field(ptag, new_data, 0) =
@@ -714,7 +714,7 @@ find_functor_2(TypeInfo, Functor, Arity, Num0, FunctorNumber, ArgTypes) :-
 
                     MR_tag_offset_incr_hp_msg(new_data, ptag,
                         MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + arity,
-                        MR_PROC_LABEL, ""<created by construct.construct/3>"");
+                        MR_ALLOC_ID, ""<created by construct.construct/3>"");
 
                     size = MR_cell_size(arity);
                     for (i = 0; i < arity; i++) {
@@ -765,7 +765,7 @@ find_functor_2(TypeInfo, Functor, Arity, Num0, FunctorNumber, ArgTypes) :-
                     new_data = (MR_Word) NULL;
                 } else {
                     MR_offset_incr_hp_msg(new_data, MR_SIZE_SLOT_SIZE,
-                        MR_SIZE_SLOT_SIZE + arity, MR_PROC_LABEL,
+                        MR_SIZE_SLOT_SIZE + arity, MR_ALLOC_ID,
                         ""<created by construct.construct/3>"");
 
                     size = MR_cell_size(arity);
@@ -1020,7 +1020,7 @@ construct_tuple(Args) =
         new_data = (MR_Word) NULL;
     } else {
         MR_offset_incr_hp_msg(new_data, MR_SIZE_SLOT_SIZE,
-            MR_SIZE_SLOT_SIZE + Arity, MR_PROC_LABEL,
+            MR_SIZE_SLOT_SIZE + Arity, MR_ALLOC_ID,
             ""<created by construct.construct_tuple/1>"");
 
         size = MR_cell_size(Arity);
diff --git a/library/deconstruct.m b/library/deconstruct.m
index ca6b3d5..e312131 100644
--- a/library/deconstruct.m
+++ b/library/deconstruct.m
@@ -857,6 +857,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
 #define ARITY_ARG               Arity
 #define ARGUMENTS_ARG           Arguments
 #define NONCANON                MR_NONCANON_ABORT
+/* This comment tells the compiler to define MR_ALLOC_ID. */
 #include ""mercury_ml_deconstruct_body.h""
 #undef  EXPAND_INFO_TYPE
 #undef  EXPAND_INFO_CALL
@@ -882,6 +883,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
 #define ARITY_ARG               Arity
 #define ARGUMENTS_ARG           Arguments
 #define NONCANON                MR_NONCANON_ALLOW
+/* This comment tells the compiler to define MR_ALLOC_ID. */
 #include ""mercury_ml_deconstruct_body.h""
 #undef  EXPAND_INFO_TYPE
 #undef  EXPAND_INFO_CALL
@@ -907,6 +909,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
 #define ARITY_ARG               Arity
 #define ARGUMENTS_ARG           Arguments
 #define NONCANON                MR_NONCANON_CC
+/* This comment tells the compiler to define MR_ALLOC_ID. */
 #include ""mercury_ml_deconstruct_body.h""
 #undef  EXPAND_INFO_TYPE
 #undef  EXPAND_INFO_CALL
@@ -934,6 +937,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
 #define ARGUMENTS_ARG           Arguments
 #define NONCANON                MR_NONCANON_ABORT
 #define SAVE_SUCCESS
+/* This comment tells the compiler to define MR_ALLOC_ID. */
 #include ""mercury_ml_deconstruct_body.h""
 #undef  EXPAND_INFO_TYPE
 #undef  EXPAND_INFO_CALL
@@ -962,6 +966,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
 #define ARGUMENTS_ARG           Arguments
 #define NONCANON                MR_NONCANON_ALLOW
 #define SAVE_SUCCESS
+/* This comment tells the compiler to define MR_ALLOC_ID. */
 #include ""mercury_ml_deconstruct_body.h""
 #undef  EXPAND_INFO_TYPE
 #undef  EXPAND_INFO_CALL
@@ -989,6 +994,7 @@ univ_named_arg_idcc(Term, Name, DummyUniv, Argument, Success) :-
     #define ARITY_ARG           Arity
     #define ARGUMENTS_ARG       Arguments
     #define NONCANON            MR_NONCANON_CC
+    /* This comment tells the compiler to define MR_ALLOC_ID. */
     #include ""mercury_ml_deconstruct_body.h""
     #undef  EXPAND_INFO_TYPE
     #undef  EXPAND_INFO_CALL
diff --git a/library/dir.m b/library/dir.m
index 640f4c5..d4d34f2 100644
--- a/library/dir.m
+++ b/library/dir.m
@@ -853,11 +853,15 @@ dir.relative_path_name_from_components(Components) = PathName :-
     */
 
     size_t      size = 256;
+    MR_Word     ptr;
     char        *buf;
     MR_String   str;
 
     while (1) {
-        buf = MR_GC_NEW_ARRAY(char, size);
+        MR_offset_incr_hp_atomic_msg(ptr, 0,
+            (size + sizeof(MR_Word) - 1) / sizeof(MR_Word),
+            MR_ALLOC_ID, ""string.string/0"");
+        buf = (char *) ptr;
         if (getcwd(buf, size)) {
             MR_make_aligned_string(str, buf);
             Res = ML_make_io_res_1_ok_string(str);
@@ -1804,7 +1808,7 @@ copy_c_string(_) = _ :-
     [will_not_call_mercury, promise_pure, thread_safe,
         will_not_modify_trail, does_not_affect_liveness],
 "
-    MR_make_aligned_string_copy(Str, (char *) Ptr);
+    MR_make_aligned_string_copy_msg(Str, (char *) Ptr, MR_ALLOC_ID);
 ").
 
 :- func make_dir_open_result_eof = io.result({dir.stream, string}).
@@ -1938,7 +1942,8 @@ dir.read_entry(Dir0, Res, !IO) :-
     IO = IO0;
     if (FindNextFile(Dir, &file_data)) {
         Status = 1;
-        MR_make_aligned_string_copy(FileName, file_data.cFileName);
+        MR_make_aligned_string_copy_msg(FileName, file_data.cFileName,
+            MR_ALLOC_ID);
     } else {
         Error = GetLastError();
         Status = (Error == ERROR_NO_MORE_FILES ? -1 : 0);
@@ -1957,7 +1962,8 @@ dir.read_entry(Dir0, Res, !IO) :-
         FileName = NULL;
         Status = (Error == 0 ? -1 : 0);
     } else {
-        MR_make_aligned_string_copy(FileName, dir_entry->d_name);
+        MR_make_aligned_string_copy_msg(FileName, dir_entry->d_name,
+            MR_ALLOC_ID);
         Error = 0;
         Status = 1;
     }
diff --git a/library/io.m b/library/io.m
index babb155..106a406 100644
--- a/library/io.m
+++ b/library/io.m
@@ -2208,7 +2208,7 @@ io.read_line_as_string(input_stream(Stream), Result, !IO) :-
         MR_Word ret_string_word;
         MR_offset_incr_hp_atomic_msg(ret_string_word,
             0, ML_IO_BYTES_TO_WORDS((i + 1) * sizeof(char)),
-            MR_PROC_LABEL, ""string.string/0"");
+            MR_ALLOC_ID, ""string.string/0"");
         RetString = (MR_String) ret_string_word;
         MR_memcpy(RetString, read_buffer, i * sizeof(char));
         RetString[i] = '\\0';
@@ -2550,7 +2550,7 @@ io.check_err(Stream, Res, !IO) :-
     }
 
     ML_maybe_make_err_msg(RetVal != 0, errno, ""read failed: "",
-        ""io.ferror/5"", MR_TRUE, RetStr);
+        MR_ALLOC_ID, MR_TRUE, RetStr);
     MR_update_io(IO0, IO);
 ").
 
@@ -2635,8 +2635,7 @@ io.make_err_msg(Msg0, Msg, !IO) :-
     [will_not_call_mercury, promise_pure, tabled_for_io,
         does_not_affect_liveness, no_sharing],
 "
-    ML_maybe_make_err_msg(MR_TRUE, Error, Msg0, ""io.make_err_msg/5"",
-        MR_FALSE, Msg);
+    ML_maybe_make_err_msg(MR_TRUE, Error, Msg0, MR_ALLOC_ID, MR_FALSE, Msg);
     MR_update_io(IO0, IO);
 ").
 
@@ -2727,7 +2726,7 @@ make_win32_err_msg(_, _, "", !IO) :-
     [will_not_call_mercury, promise_pure, tabled_for_io,
         does_not_affect_liveness, no_sharing],
 "
-    ML_maybe_make_win32_err_msg(MR_TRUE, Error, Msg0, MR_PROC_LABEL, Msg);
+    ML_maybe_make_win32_err_msg(MR_TRUE, Error, Msg0, MR_ALLOC_ID, Msg);
     MR_update_io(IO0, IO);
 ").
 
@@ -2853,7 +2852,7 @@ io.file_modification_time(File, Result, !IO) :-
         Status = 1;
     } else {
         ML_maybe_make_err_msg(MR_TRUE, errno, ""stat() failed: "",
-            ""io.file_modification_time_2/6"", MR_TRUE, Msg);
+            MR_ALLOC_ID, MR_TRUE, Msg);
         Status = 0;
     }
 #else
@@ -3938,7 +3937,7 @@ io.file_id(FileName, Result, !IO) :-
         Status = 1;
     } else {
         ML_maybe_make_err_msg(MR_TRUE, errno, ""stat() failed: "",
-            ""io.file_id_2/6"", MR_TRUE, Msg);
+            MR_ALLOC_ID, MR_TRUE, Msg);
         Status = 0;
     }
     MR_update_io(IO0, IO);
@@ -4033,7 +4032,7 @@ have_file_ids :- semidet_fail.
     MR_Word buf;
     MR_offset_incr_hp_atomic_msg(buf, 0,
         (Size * sizeof(char) + sizeof(MR_Word) - 1) / sizeof(MR_Word),
-        MR_PROC_LABEL, ""io:buffer/0"");
+        MR_ALLOC_ID, ""io.buffer/0"");
     Buffer = (char *) buf;
 }").
 
@@ -4061,7 +4060,7 @@ io.alloc_buffer(Size, buffer(Array)) :-
         MR_offset_incr_hp_atomic_msg(next, 0,
             (NewSize * sizeof(char) + sizeof(MR_Word) - 1)
                 / sizeof(MR_Word),
-            MR_PROC_LABEL, ""io:buffer/0"");
+            MR_ALLOC_ID, ""io.buffer/0"");
         assert(Buffer0 + OldSize == (char *) next);
         Buffer = Buffer0;
     } else {
@@ -4070,7 +4069,7 @@ io.alloc_buffer(Size, buffer(Array)) :-
         MR_offset_incr_hp_atomic_msg(buf, 0,
             (NewSize * sizeof(char) + sizeof(MR_Word) - 1)
                 / sizeof(MR_Word),
-            MR_PROC_LABEL, ""io:buffer/0"");
+            MR_ALLOC_ID, ""io.buffer/0"");
         Buffer = (char *) buf;
         if (OldSize > NewSize) {
             MR_memcpy(Buffer, Buffer0, NewSize);
@@ -5696,7 +5695,8 @@ MercuryFilePtr  mercury_current_text_output(void);
 MercuryFilePtr  mercury_current_binary_input(void);
 MercuryFilePtr  mercury_current_binary_output(void);
 int             mercury_next_stream_id(void);
-MercuryFilePtr  mercury_open(const char *filename, const char *openmode);
+MercuryFilePtr  mercury_open(const char *filename, const char *openmode,
+                    MR_AllocSiteInfoPtr alloc_id);
 void            mercury_io_error(MercuryFilePtr mf, const char *format, ...);
 void            mercury_output_error(MercuryFilePtr mf);
 void            mercury_print_string(MercuryFilePtr mf, const char *s);
@@ -6921,7 +6921,8 @@ mercury_set_current_binary_output(Stream) ->
 :- pragma foreign_code("C", "
 
 MercuryFilePtr
-mercury_open(const char *filename, const char *openmode)
+mercury_open(const char *filename, const char *openmode,
+    MR_AllocSiteInfoPtr alloc_id)
 {
     MercuryFilePtr  mf;
     FILE            *f;
@@ -6930,7 +6931,7 @@ mercury_open(const char *filename, const char *openmode)
     if (f == NULL) {
         return NULL;
     }
-    mf = MR_GC_NEW(MercuryFile);
+    MR_incr_hp_type_msg(mf, MercuryFile, alloc_id, ""MercuryFile"");
     MR_mercuryfile_init(f, 1, mf);
     return mf;
 }
@@ -9297,7 +9298,7 @@ io.set_binary_output_stream(binary_output_stream(NewStream),
     [will_not_call_mercury, promise_pure, tabled_for_io, thread_safe,
         does_not_affect_liveness, no_sharing],
 "
-    Stream = mercury_open(FileName, Mode);
+    Stream = mercury_open(FileName, Mode, MR_ALLOC_ID);
     if (Stream != NULL) {
         ResultCode = 0;
         StreamId = mercury_next_stream_id();
@@ -9314,7 +9315,7 @@ io.set_binary_output_stream(binary_output_stream(NewStream),
     [will_not_call_mercury, promise_pure, tabled_for_io, thread_safe,
         does_not_affect_liveness, no_sharing],
 "
-    Stream = mercury_open(FileName, Mode);
+    Stream = mercury_open(FileName, Mode, MR_ALLOC_ID);
     if (Stream != NULL) {
         ResultCode = 0;
         StreamId = mercury_next_stream_id();
@@ -9549,10 +9550,10 @@ io.close_binary_output(binary_output_stream(Stream), !IO) :-
     ** Convert mercury_argv from a vector to a list.
     */
     i = mercury_argc;
-    Args = MR_list_empty_msg(MR_PROC_LABEL);
+    Args = MR_list_empty_msg(MR_ALLOC_ID);
     while (--i >= 0) {
         Args = MR_string_list_cons_msg((MR_Word) mercury_argv[i], Args,
-            MR_PROC_LABEL);
+            MR_ALLOC_ID);
     }
     MR_update_io(IO0, IO);
 }").
@@ -9654,8 +9655,8 @@ io.close_binary_output(binary_output_stream(Stream), !IO) :-
         /* Spawn failed. */
         Status = 127;
         ML_maybe_make_err_msg(MR_TRUE, errno,
-            ""error invoking system command: "", ""io.call_system_code/5"",
-            MR_TRUE, Msg);
+            ""error invoking system command: "",
+            MR_ALLOC_ID, MR_TRUE, Msg);
     } else {
         /* Wait for the spawned process to exit. */
         do {
@@ -9664,8 +9665,8 @@ io.close_binary_output(binary_output_stream(Stream), !IO) :-
         if (err == -1) {
             Status = 127;
             ML_maybe_make_err_msg(MR_TRUE, errno,
-                ""error invoking system command: "", ""io.call_system_code/5"",
-                MR_TRUE, Msg);
+                ""error invoking system command: "",
+                MR_ALLOC_ID, MR_TRUE, Msg);
         } else {
             Status = st;
             Msg = MR_make_string_const("""");
@@ -9683,8 +9684,8 @@ io.close_binary_output(binary_output_stream(Stream), !IO) :-
         */
         Status = 127;
         ML_maybe_make_err_msg(MR_TRUE, errno,
-            ""error invoking system command: "", ""io.call_system_code/5"",
-            MR_TRUE, Msg);
+            ""error invoking system command: "",
+            MR_ALLOC_ID, MR_TRUE, Msg);
     } else {
         Msg = MR_make_string_const("""");
     }
@@ -10208,21 +10209,21 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
 #ifdef MR_HAVE_MKSTEMP
     int err, fd;
 
-    FileName = MR_make_string(MR_PROC_LABEL, ""%s%s%.5sXXXXXX"",
+    FileName = MR_make_string(MR_ALLOC_ID, ""%s%s%.5sXXXXXX"",
         Dir, Sep, Prefix);
     fd = mkstemp(FileName);
     if (fd == -1) {
         ML_maybe_make_err_msg(MR_TRUE, errno,
-            ""error opening temporary file: "", ""io.do_make_temp/8"", MR_TRUE,
-            ErrorMessage);
+            ""error opening temporary file: "", MR_ALLOC_ID,
+            MR_TRUE, ErrorMessage);
         Error = -1;
     } else {
         do {
             err = close(fd);
         } while (err == -1 && MR_is_eintr(errno));
         ML_maybe_make_err_msg(err, errno,
-            ""error closing temporary file: "", ""io.do_make_temp/8"", MR_TRUE,
-            ErrorMessage);
+            ""error closing temporary file: "", MR_ALLOC_ID,
+            MR_TRUE, ErrorMessage);
         Error = err;
     }
 #else
@@ -10241,7 +10242,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
     /* Dir + / + Prefix + counter_high + . + counter_low + \\0 */
     MR_offset_incr_hp_atomic_msg(filename_word, 0,
         (len + sizeof(MR_Word)) / sizeof(MR_Word),
-        MR_PROC_LABEL, ""string:string/0"");
+        MR_ALLOC_ID, ""string.string/0"");
     FileName = (MR_String) filename_word;
     if (ML_io_tempnam_counter == 0) {
         ML_io_tempnam_counter = getpid();
@@ -10264,7 +10265,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
         num_tries < ML_MAX_TEMPNAME_TRIES);
     if (fd == -1) {
         ML_maybe_make_err_msg(MR_TRUE, errno,
-            ""error opening temporary file: "", ""io.do_make_temp/8"",
+            ""error opening temporary file: "", MR_ALLOC_ID,
             MR_TRUE, ErrorMessage);
         Error = -1;
     }  else {
@@ -10272,7 +10273,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
             err = close(fd);
         } while (err == -1 && MR_is_eintr(errno));
         ML_maybe_make_err_msg(err, errno,
-            ""error closing temporary file: "", ""io.do_make_temp/8"",
+            ""error closing temporary file: "", MR_ALLOC_ID,
             MR_TRUE, ErrorMessage);
         Error = err;
     }
@@ -10430,7 +10431,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
 #include <errno.h>
 
 /*
-** ML_maybe_make_err_msg(was_error, errno, msg, procname, req_lock, error_msg):
+** ML_maybe_make_err_msg(was_error, errno, msg, alloc_id, req_lock, error_msg):
 ** if `was_error' is true, then append `msg' and `strerror(errno)'
 ** to give `error_msg'; otherwise, set `error_msg' to "".
 ** `req_lock' must be true iff the caller is marked `thread_safe' as the
@@ -10443,43 +10444,41 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
 ** This is defined as a macro rather than a C function
 ** to avoid worrying about the `hp' register being
 ** invalidated by the function call.
-** It also needs to be a macro because MR_offset_incr_hp_atomic_msg()
-** stringizes the procname argument.
 */
 
-#define ML_maybe_make_err_msg(was_error, error, msg, procname, req_lock,    \\
-            error_msg)                                                      \\
-    do {                                                                    \\
-        char    *errno_msg;                                                 \\
-        size_t  total_len;                                                  \\
-        MR_Word tmp;                                                        \\
-                                                                            \\
-        if (was_error) {                                                    \\
-            if (req_lock) {                                                 \\
-                MR_OBTAIN_GLOBAL_LOCK(procname);                            \\
-            }                                                               \\
-            errno_msg = strerror(error);                                    \\
-            total_len = strlen(msg) + strlen(errno_msg);                    \\
-            MR_offset_incr_hp_atomic_msg(tmp, 0,                            \\
-                (total_len + sizeof(MR_Word)) / sizeof(MR_Word),            \\
-                procname, ""string.string/0"");                             \\
-            (error_msg) = (char *) tmp;                                     \\
-            strcpy((error_msg), msg);                                       \\
-            strcat((error_msg), errno_msg);                                 \\
-            if (req_lock) {                                                 \\
-                MR_RELEASE_GLOBAL_LOCK(procname);                           \\
-            }                                                               \\
-        } else {                                                            \\
-            /*                                                              \\
-            ** We can't just return NULL here, because otherwise mdb        \\
-            ** will break when it tries to print the string.                \\
-            */                                                              \\
-            (error_msg) = MR_make_string_const("""");                       \\
-        }                                                                   \\
+#define ML_maybe_make_err_msg(was_error, error, msg, alloc_id, req_lock,   \\
+            error_msg)                                                     \\
+    do {                                                                   \\
+        char    *errno_msg;                                                \\
+        size_t  total_len;                                                 \\
+        MR_Word tmp;                                                       \\
+                                                                           \\
+        if (was_error) {                                                   \\
+            if (req_lock) {                                                \\
+                MR_OBTAIN_GLOBAL_LOCK(""ML_maybe_make_err_msg"");          \\
+            }                                                              \\
+            errno_msg = strerror(error);                                   \\
+            total_len = strlen(msg) + strlen(errno_msg);                   \\
+            MR_offset_incr_hp_atomic_msg(tmp, 0,                           \\
+                (total_len + sizeof(MR_Word)) / sizeof(MR_Word),           \\
+                (alloc_id), ""string.string/0"");                          \\
+            (error_msg) = (char *) tmp;                                    \\
+            strcpy((error_msg), msg);                                      \\
+            strcat((error_msg), errno_msg);                                \\
+            if (req_lock) {                                                \\
+                MR_RELEASE_GLOBAL_LOCK(""ML_maybe_make_err_msg"");         \\
+            }                                                              \\
+        } else {                                                           \\
+            /*                                                             \\
+            ** We can't just return NULL here, because otherwise mdb       \\
+            ** will break when it tries to print the string.               \\
+            */                                                             \\
+            (error_msg) = MR_make_string_const("""");                      \\
+        }                                                                  \\
     } while(0)
 
 /*
-** ML_maybe_make_win32_err_msg(was_error, error, msg, procname, error_msg):
+** ML_maybe_make_win32_err_msg(was_error, error, msg, alloc_id, error_msg):
 ** if `was_error' is true, then append `msg' and the string
 ** returned by the Win32 API function FormatMessage() for the
 ** last error to give `error_msg'; otherwise, set `error_msg' to "".
@@ -10492,14 +10491,13 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
 ** This is defined as a macro rather than a C function
 ** to avoid worrying about the `hp' register being
 ** invalidated by the function call.
-** It also needs to be a macro because MR_incr_hp_atomic_msg()
-** stringizes the procname argument.
 */
 #ifdef MR_WIN32
 
 #include <windows.h>
 
-#define ML_maybe_make_win32_err_msg(was_error, error, msg, procname, error_msg) \\
+#define ML_maybe_make_win32_err_msg(was_error, error, msg, alloc_id,        \\
+        error_msg)                                                          \\
     do {                                                                    \\
         size_t total_len;                                                   \\
         MR_Word tmp;                                                        \\
@@ -10524,7 +10522,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
             total_len = strlen(msg) + strlen((char *)err_buf);              \\
             MR_incr_hp_atomic_msg(tmp,                                      \\
                 (total_len + sizeof(MR_Word)) / sizeof(MR_Word),            \\
-                procname, ""string.string/0"");                             \\
+                (alloc_id), ""string.string/0"");                           \\
             (error_msg) = (char *) tmp;                                     \\
             strcpy((error_msg), msg);                                       \\
             strcat((error_msg), (char *)err_buf);                           \\
@@ -10542,7 +10540,7 @@ io.make_temp(Dir, Prefix, Name, !IO) :-
 
 #else /* !MR_WIN32 */
 
-#define ML_maybe_make_win32_err_msg(was_error, error, msg, procname, error_msg) \\
+#define ML_maybe_make_win32_err_msg(was_error, error, msg, alloc_id, error_msg) \\
     MR_fatal_error(""ML_maybe_make_win32_err_msg called on non-Windows platform"")
 
 #endif /* !MR_WIN32 */
@@ -10568,7 +10566,7 @@ io.remove_file(FileName, Result, !IO) :-
 "{
     RetVal = remove(FileName);
     ML_maybe_make_err_msg(RetVal != 0, errno, ""remove failed: "",
-        ""io.remove_file_2/5"", MR_TRUE, RetStr);
+        MR_ALLOC_ID, MR_TRUE, RetStr);
     MR_update_io(IO0, IO);
 }").
 
@@ -10707,7 +10705,7 @@ io.rename_file(OldFileName, NewFileName, Result, IO0, IO) :-
 "{
     RetVal = rename(OldFileName, NewFileName);
     ML_maybe_make_err_msg(RetVal != 0, errno, ""rename failed: "",
-        ""io.rename_file_2/6"", MR_TRUE, RetStr);
+        MR_ALLOC_ID, MR_TRUE, RetStr);
     MR_update_io(IO0, IO);
 }").
 
@@ -10896,7 +10894,8 @@ io.read_symlink(FileName, Result, !IO) :-
             Status = 0;
         } else {
             buffer2[num_chars] = '\\0';
-            MR_make_aligned_string_copy(TargetFileName, buffer2);
+            MR_make_aligned_string_copy_msg(TargetFileName, buffer2,
+                MR_ALLOC_ID);
             Status = 1;
         }
         MR_free(buffer2);
@@ -10906,7 +10905,7 @@ io.read_symlink(FileName, Result, !IO) :-
         Status = 0;
     } else {
         buffer[num_chars] = '\\0';
-        MR_make_aligned_string_copy(TargetFileName, buffer);
+        MR_make_aligned_string_copy_msg(TargetFileName, buffer, MR_ALLOC_ID);
         Status = 1;
     }
 #else /* !MR_HAVE_READLINK */
diff --git a/library/mutvar.m b/library/mutvar.m
index f2c55e6..057a7be 100644
--- a/library/mutvar.m
+++ b/library/mutvar.m
@@ -89,7 +89,7 @@ new_mutvar(X, Ref) :-
     [will_not_call_mercury, thread_safe],
 "
     MR_offset_incr_hp_msg(Ref, MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1,
-        MR_PROC_LABEL, ""mutvar.mutvar/1"");
+        MR_ALLOC_ID, ""mutvar.mutvar/1"");
     MR_define_size_slot(0, Ref, 1);
 ").
 
diff --git a/library/store.m b/library/store.m
index 6c9ed0a..ca6e0fd 100644
--- a/library/store.m
+++ b/library/store.m
@@ -321,7 +321,7 @@ store.new(S) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail],
 "
     MR_offset_incr_hp_msg(Mutvar, MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1,
-        MR_PROC_LABEL, ""store.mutvar/2"");
+        MR_ALLOC_ID, ""store.mutvar/2"");
     MR_define_size_slot(0, Mutvar, 1);
     * (MR_Word *) Mutvar = Val;
     S = S0;
@@ -431,7 +431,7 @@ copy_mutvar(Mutvar, Copy, !S) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail],
 "
     MR_offset_incr_hp_msg(Mutvar, MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1,
-        MR_PROC_LABEL, ""store.mutvar/2"");
+        MR_ALLOC_ID, ""store.mutvar/2"");
     MR_define_size_slot(0, Mutvar, 1);
     S = S0;
 ").
@@ -588,7 +588,7 @@ store.new_cyclic_mutvar(Func, MutVar, !Store) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail],
 "
     MR_offset_incr_hp_msg(Ref, MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 1,
-        MR_PROC_LABEL, ""store:ref/2"");
+        MR_ALLOC_ID, ""store.ref/2"");
     MR_define_size_slot(0, Ref, 1);
     * (MR_Word *) Ref = Val;
     S = S0;
@@ -765,7 +765,7 @@ ref_functor(Ref, Functor, Arity, !Store) :-
 
     if (arg_ref == &Val) {
         MR_offset_incr_hp_msg(ArgRef, MR_SIZE_SLOT_SIZE,
-            MR_SIZE_SLOT_SIZE + 1, MR_PROC_LABEL, ""store:ref/2"");
+            MR_SIZE_SLOT_SIZE + 1, MR_ALLOC_ID, ""store.ref/2"");
         MR_define_size_slot(0, ArgRef, 1);
         * (MR_Word *) ArgRef = Val;
     } else {
diff --git a/library/string.m b/library/string.m
index 2aca710..03bbbc9 100644
--- a/library/string.m
+++ b/library/string.m
@@ -1551,14 +1551,14 @@ string.to_char_list(Str::uo, CharList::in) :-
     int pos = strlen(Str);
     int c;
 
-    CharList = MR_list_empty_msg(MR_PROC_LABEL);
+    CharList = MR_list_empty_msg(MR_ALLOC_ID);
     for (;;) {
         c = MR_utf8_prev_get(Str, &pos);
         if (c <= 0) {
             break;
         }
         CharList = MR_char_list_cons_msg((MR_UnsignedChar) c, CharList,
-            MR_PROC_LABEL);
+            MR_ALLOC_ID);
     }
 }").
 
@@ -1664,7 +1664,7 @@ string.from_char_list(Chars::in, Str::uo) :-
     /*
     ** Allocate heap space for string
     */
-    MR_allocate_aligned_string_msg(Str, size, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
 
     /*
     ** Loop to copy the characters from the char_list to the string.
@@ -1784,7 +1784,7 @@ string.from_rev_char_list(Chars, Str) :-
     /*
     ** Allocate heap space for string
     */
-    MR_allocate_aligned_string_msg(Str, size, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
 
     /*
     ** Set size to be the offset of the end of the string
@@ -1883,7 +1883,7 @@ string.to_code_unit_list_2(String, Index, End, List) :-
         list_ptr = MR_list_tail(list_ptr);
     }
 
-    MR_allocate_aligned_string_msg(Str, size, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
 
     SUCCESS_INDICATOR = MR_TRUE;
     size = 0;
@@ -2236,7 +2236,7 @@ string.append_list(Lists, string.append_list(Lists)).
     }
 
     /* Allocate enough word aligned memory for the string */
-    MR_allocate_aligned_string_msg(Str, len, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
 
     /* Copy the strings into the new memory */
     len = 0;
@@ -2325,7 +2325,7 @@ string.append_list(Strs::in) = (Str::uo) :-
         add_sep = MR_TRUE;
     }
 
-    MR_allocate_aligned_string_msg(Str, len, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
 
     /* Copy the strings into the new memory */
     len = 0;
@@ -3147,7 +3147,7 @@ int_length_modifer = _ :-
         does_not_affect_liveness, no_sharing],
 "{
     MR_save_transient_hp();
-    Str = MR_make_string(MR_PROC_LABEL, FormatStr, (double) Val);
+    Str = MR_make_string(MR_ALLOC_ID, FormatStr, (double) Val);
     MR_restore_transient_hp();
 }").
 native_format_float(_, _) = _ :-
@@ -3167,7 +3167,7 @@ native_format_float(_, _) = _ :-
         does_not_affect_liveness, no_sharing],
 "{
     MR_save_transient_hp();
-    Str = MR_make_string(MR_PROC_LABEL, FormatStr, Val);
+    Str = MR_make_string(MR_ALLOC_ID, FormatStr, Val);
     MR_restore_transient_hp();
 }").
 native_format_int(_, _) = _ :-
@@ -3187,7 +3187,7 @@ native_format_int(_, _) = _ :-
         does_not_affect_liveness, no_sharing],
 "{
     MR_save_transient_hp();
-    Str = MR_make_string(MR_PROC_LABEL, FormatStr, Val);
+    Str = MR_make_string(MR_ALLOC_ID, FormatStr, Val);
     MR_restore_transient_hp();
 }").
 native_format_string(_, _) = _ :-
@@ -3207,7 +3207,7 @@ native_format_string(_, _) = _ :-
         does_not_affect_liveness, no_sharing],
 "{
     MR_save_transient_hp();
-    Str = MR_make_string(MR_PROC_LABEL, FormatStr, Val);
+    Str = MR_make_string(MR_ALLOC_ID, FormatStr, Val);
     MR_restore_transient_hp();
 }").
 native_format_char(_, _) = _ :-
@@ -4045,7 +4045,7 @@ string.from_float(Flt) = string.float_to_string(Flt).
     ** For efficiency reasons we duplicate the C implementation
     ** of string.lowlevel_float_to_string.
     */
-    MR_float_to_string(Flt, Str);
+    MR_float_to_string(Flt, Str, MR_ALLOC_ID);
 }").
 
 :- pragma foreign_proc("C#",
@@ -4135,7 +4135,7 @@ max_precision = min_precision + 2.
     ** Note any changes here will require the same changes in
     ** string.float_to_string.
     */
-    MR_float_to_string(Flt, Str);
+    MR_float_to_string(Flt, Str, MR_ALLOC_ID);
 }").
 
 :- pragma foreign_proc("C#",
@@ -4713,7 +4713,7 @@ string.set_char(Char, Index, !Str) :-
     } else if (MR_is_ascii(Str0[Index]) && MR_is_ascii(Ch)) {
         /* Fast path. */
         SUCCESS_INDICATOR = MR_TRUE;
-        MR_allocate_aligned_string_msg(Str, len, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
         strcpy(Str, Str0);
         MR_set_code_unit(Str, Index, Ch);
     } else {
@@ -4727,7 +4727,7 @@ string.set_char(Char, Index, !Str) :-
             size_t tailofs;
 
             newlen = len - oldwidth + newwidth;
-            MR_allocate_aligned_string_msg(Str, newlen, MR_PROC_LABEL);
+            MR_allocate_aligned_string_msg(Str, newlen, MR_ALLOC_ID);
             MR_memcpy(Str, Str0, Index);
             MR_utf8_encode(Str + Index, Ch);
             strcpy(Str + Index + newwidth, Str0 + Index + oldwidth);
@@ -4822,7 +4822,7 @@ string.unsafe_set_char(Char, Index, !Str) :-
     size_t len = strlen(Str0);
     if (MR_is_ascii(Str0[Index]) && MR_is_ascii(Ch)) {
         /* Fast path. */
-        MR_allocate_aligned_string_msg(Str, len, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
         strcpy(Str, Str0);
         MR_set_code_unit(Str, Index, Ch);
     } else {
@@ -4833,7 +4833,7 @@ string.unsafe_set_char(Char, Index, !Str) :-
         size_t tailofs;
 
         newlen = len - oldwidth + newwidth;
-        MR_allocate_aligned_string_msg(Str, newlen, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Str, newlen, MR_ALLOC_ID);
         MR_memcpy(Str, Str0, Index);
         MR_utf8_encode(Str + Index, Ch);
         strcpy(Str + Index + newwidth, Str0 + Index + oldwidth);
@@ -5142,7 +5142,7 @@ string.append_iii(X, Y, Z) :-
         /*
         ** We need to make a copy to ensure that the pointer is word-aligned.
         */
-        MR_allocate_aligned_string_msg(S2, len_2, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(S2, len_2, MR_ALLOC_ID);
         strcpy(S2, S3 + len_1);
         SUCCESS_INDICATOR = MR_TRUE;
     }
@@ -5188,7 +5188,7 @@ string.append_ioi(X, Y, Z) :-
     size_t len_1, len_2;
     len_1 = strlen(S1);
     len_2 = strlen(S2);
-    MR_allocate_aligned_string_msg(S3, len_1 + len_2, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(S3, len_1 + len_2, MR_ALLOC_ID);
     strcpy(S3, S1);
     strcpy(S3 + len_1, S2);
 }").
@@ -5246,10 +5246,10 @@ string.append_ooi_2(NextS1Len, S3Len, S1, S2, S3) :-
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness, may_not_duplicate, no_sharing],
 "{
-    MR_allocate_aligned_string_msg(S1, S1Len, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(S1, S1Len, MR_ALLOC_ID);
     MR_memcpy(S1, S3, S1Len);
     S1[S1Len] = '\\0';
-    MR_allocate_aligned_string_msg(S2, S3Len - S1Len, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(S2, S3Len - S1Len, MR_ALLOC_ID);
     strcpy(S2, S3 + S1Len);
 }").
 
@@ -5339,7 +5339,7 @@ strchars(I, End, Str) = Chars :-
         if (Count > len - Start) {
             Count = len - Start;
         }
-        MR_allocate_aligned_string_msg(SubString, Count, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(SubString, Count, MR_ALLOC_ID);
         MR_memcpy(SubString, Str + Start, Count);
         SubString[Count] = '\\0';
     }
@@ -5434,7 +5434,7 @@ string.substring_by_codepoint(Str, Start, Count, SubString) :-
 "{
     MR_Integer len;
 
-    MR_allocate_aligned_string_msg(SubString, Count, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(SubString, Count, MR_ALLOC_ID);
     MR_memcpy(SubString, Str + Start, Count);
     SubString[Count] = '\\0';
 }").
@@ -5475,13 +5475,13 @@ string.substring_by_codepoint(Str, Start, Count, SubString) :-
             Count = len;
         }
 
-        MR_allocate_aligned_string_msg(Left, Count, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Left, Count, MR_ALLOC_ID);
         MR_memcpy(Left, Str, Count);
         Left[Count] = '\\0';
         /*
         ** We need to make a copy to ensure that the pointer is word-aligned.
         */
-        MR_allocate_aligned_string_msg(Right, len - Count, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Right, len - Count, MR_ALLOC_ID);
         strcpy(Right, Str + Count);
     }
 }").
@@ -5708,7 +5708,7 @@ string.split_by_codepoint(Str, Count, Left, Right) :-
         /*
         ** We need to make a copy to ensure that the pointer is word-aligned.
         */
-        MR_allocate_aligned_string_msg(Rest, strlen(Str), MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Rest, strlen(Str), MR_ALLOC_ID);
         strcpy(Rest, Str);
         SUCCESS_INDICATOR = MR_TRUE;
     }
@@ -5776,7 +5776,7 @@ string.split_by_codepoint(Str, Count, Left, Right) :-
         /*
         ** We need to make a copy to ensure that the pointer is word-aligned.
         */
-        MR_allocate_aligned_string_msg(Rest, strlen(Str), MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(Rest, strlen(Str), MR_ALLOC_ID);
         strcpy(Rest, Str);
         SUCCESS_INDICATOR = MR_TRUE;
     }
@@ -5841,7 +5841,7 @@ string.split_by_codepoint(Str, Count, Left, Right) :-
 "{
     size_t firstw = MR_utf8_width(First);
     size_t len = firstw + strlen(Rest);
-    MR_allocate_aligned_string_msg(Str, len, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
     MR_utf8_encode(Str, First);
     strcpy(Str + firstw, Rest);
 }").
@@ -6218,7 +6218,7 @@ value_to_revstrings_prio(NonCanon, OpsTable, Priority, X, !Rs) :-
     ; dynamic_cast(X, C_Pointer) ->
         add_revstring(c_pointer_to_string(C_Pointer), !Rs)
     ;
-        % Check if the type is array:array/1. We can't just use dynamic_cast
+        % Check if the type is array.array/1. We can't just use dynamic_cast
         % here since array.array/1 is a polymorphic type.
         %
         % The calls to type_ctor_name and type_ctor_module_name are not really
diff --git a/library/thread.semaphore.m b/library/thread.semaphore.m
index 3cc048d..f50a21a 100644
--- a/library/thread.semaphore.m
+++ b/library/thread.semaphore.m
@@ -113,12 +113,10 @@ new(Semaphore, !IO) :-
     new(Count::in) = (Semaphore::uo),
     [promise_pure, will_not_call_mercury, thread_safe],
 "
-    MR_Word         sem_mem;
     ML_Semaphore    *sem;
 
-    MR_alloc_heap(sem_mem,
-        MR_round_up(sizeof(ML_Semaphore), sizeof(MR_Word)));
-    sem = (ML_Semaphore *) sem_mem;
+    MR_incr_hp_type_msg(sem, ML_Semaphore,
+        MR_ALLOC_ID, ""thread.semaphore.semaphore/0"");
     sem->count = Count;
 #ifndef MR_HIGHLEVEL_CODE
     sem->suspended_head = NULL;
diff --git a/library/version_array.m b/library/version_array.m
index 006ba4d..18c2584 100644
--- a/library/version_array.m
+++ b/library/version_array.m
@@ -485,15 +485,20 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    VA = MR_GC_NEW(struct ML_va);
+    MR_Word array;
+
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array, 1,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
 
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = 0;
 
 #ifdef MR_THREAD_SAFE
-    VA->lock             = MR_GC_NEW(MercuryLock);
+    MR_incr_hp_type_msg(VA->lock, MercuryLock, MR_ALLOC_ID, NULL);
     pthread_mutex_init(VA->lock, MR_MUTEX_ATTR);
 #endif
 ").
@@ -519,11 +524,16 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    VA = MR_GC_NEW(struct ML_va);
+    MR_Word array;
+
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array, 1,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
 
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = 0;
 
 #ifdef MR_THREAD_SAFE
@@ -553,11 +563,16 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
         does_not_affect_liveness, may_not_duplicate],
 "
     MR_Integer  i;
+    MR_Word     array;
+
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array, N + 1,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
 
-    VA = MR_GC_NEW(struct ML_va);
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = N;
 
     for (i = 0; i < N; i++) {
@@ -565,7 +580,7 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     }
 
 #ifdef MR_THREAD_SAFE
-    VA->lock             = MR_GC_NEW(MercuryLock);
+    MR_incr_hp_type_msg(VA->lock, MercuryLock, MR_ALLOC_ID, NULL);
     pthread_mutex_init(VA->lock, MR_MUTEX_ATTR);
 #endif
 ").
@@ -592,11 +607,16 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
         does_not_affect_liveness, may_not_duplicate],
 "
     MR_Integer  i;
+    MR_Word     array;
+
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array,  N + 1,
+        MR_ALLOC_ID, ""version_array.version_array/1"");
 
-    VA = MR_GC_NEW(struct ML_va);
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = N;
 
     for (i = 0; i < N; i++) {
@@ -629,7 +649,7 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    VA = ML_va_resize_dolock(VA0, N, X);
+    VA = ML_va_resize_dolock(VA0, N, X, MR_ALLOC_ID);
 ").
 
 :- pragma foreign_proc("C#",
@@ -720,7 +740,7 @@ resize(N, X, VA, resize(VA, N, X)).
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    SUCCESS_INDICATOR = ML_va_set_dolock(VA0, I, X, &VA);
+    SUCCESS_INDICATOR = ML_va_set_dolock(VA0, I, X, &VA, MR_ALLOC_ID);
 ").
 
 :- pragma foreign_proc("C#",
@@ -824,7 +844,8 @@ ML_va_get_dolock(ML_va_ptr, MR_Integer, MR_Word *);
 ** returns MR_TRUE.  Otherwise it returns MR_FALSE.
 */
 extern MR_bool
-ML_va_set_dolock(ML_va_ptr, MR_Integer, MR_Word, ML_va_ptr *);
+ML_va_set_dolock(ML_va_ptr, MR_Integer, MR_Word, ML_va_ptr *,
+    MR_AllocSiteInfoPtr);
 
 /*
 ** `Rewinds' a version array, invalidating all extant successors
@@ -837,7 +858,7 @@ ML_va_rewind_dolock(ML_va_ptr);
 ** Resize a version array.
 */
 extern ML_va_ptr
-ML_va_resize_dolock(ML_va_ptr, MR_Integer, MR_Word);
+ML_va_resize_dolock(ML_va_ptr, MR_Integer, MR_Word, MR_AllocSiteInfoPtr);
 
 ").
 
@@ -863,13 +884,14 @@ ML_va_get(ML_va_ptr VA, MR_Integer I, MR_Word *Xptr);
 ** returns MR_TRUE.  Otherwise it returns MR_FALSE.
 */
 static MR_bool
-ML_va_set(ML_va_ptr, MR_Integer, MR_Word, ML_va_ptr *);
+ML_va_set(ML_va_ptr, MR_Integer, MR_Word, ML_va_ptr *,
+    MR_AllocSiteInfoPtr alloc_id);
     
 /*
 ** Create a copy of VA0 as a new array.
 */
 static ML_va_ptr
-ML_va_flat_copy(const ML_va_ptr VA0);
+ML_va_flat_copy(const ML_va_ptr VA0, MR_AllocSiteInfoPtr alloc_id);
     
 /*
 ** Update the array VA using the override values in VA0
@@ -889,7 +911,7 @@ ML_va_rewind(ML_va_ptr VA);
 ** Resize a version array.
 */
 static ML_va_ptr
-ML_va_resize(ML_va_ptr, MR_Integer, MR_Word);
+ML_va_resize(ML_va_ptr, MR_Integer, MR_Word, MR_AllocSiteInfoPtr);
 
 ").
 
@@ -989,7 +1011,8 @@ ML_va_get(ML_va_ptr VA, MR_Integer I, MR_Word *Xptr)
 }
 
 int
-ML_va_set_dolock(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
+ML_va_set_dolock(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr,
+    MR_AllocSiteInfoPtr alloc_id)
 {
 #ifdef MR_THREAD_SAFE
     MercuryLock *lock = VA0->lock;
@@ -998,7 +1021,7 @@ ML_va_set_dolock(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
 
     ML_maybe_lock(lock);
 
-    ret = ML_va_set(VA0, I, X, VAptr);
+    ret = ML_va_set(VA0, I, X, VAptr, alloc_id);
 
     ML_maybe_unlock(lock);
 
@@ -1006,7 +1029,8 @@ ML_va_set_dolock(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
 }
 
 static int
-ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
+ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr,
+    MR_AllocSiteInfoPtr alloc_id)
 {
     ML_va_ptr VA1;
 
@@ -1015,7 +1039,8 @@ ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
             return MR_FALSE;
         }
 
-        VA1 = MR_GC_NEW(struct ML_va);
+        MR_incr_hp_type_msg(VA1, struct ML_va, alloc_id,
+            ""version_array.version_array/1"");
         VA1->index      = -1;
         VA1->value      = (MR_Word) NULL;
         VA1->rest.array = VA0->rest.array;
@@ -1029,7 +1054,7 @@ ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
 
         VA1->rest.array->elements[I] = X;
     } else {
-        VA1 = ML_va_flat_copy(VA0);
+        VA1 = ML_va_flat_copy(VA0, alloc_id);
 
         if (I < 0 || I >= VA1->rest.array->size) {
             return MR_FALSE;
@@ -1043,20 +1068,25 @@ ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
 }
 
 static ML_va_ptr
-ML_va_flat_copy(const ML_va_ptr VA0)
+ML_va_flat_copy(const ML_va_ptr VA0, MR_AllocSiteInfoPtr alloc_id)
 {
     ML_va_ptr   latest;
     ML_va_ptr   VA;
+    MR_Word     array;
     MR_Integer  N;
     MR_Integer  i;
 
     latest = ML_va_get_latest(VA0);
     N = latest->rest.array->size;
 
-    VA = MR_GC_NEW(struct ML_va);
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        alloc_id, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array, N + 1,
+        alloc_id, ""version_array.version_array/1"");
+
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = N;
 
     for (i = 0; i < N; i++) {
@@ -1065,7 +1095,7 @@ ML_va_flat_copy(const ML_va_ptr VA0)
 
 #ifdef MR_THREAD_SAFE
     if (VA0->lock != NULL) {
-        VA->lock = MR_GC_NEW(MercuryLock);
+        MR_incr_hp_type_msg(VA->lock, MercuryLock, alloc_id, NULL);
         pthread_mutex_init(VA->lock, MR_MUTEX_ATTR);
     } else {
         VA->lock = NULL;
@@ -1130,7 +1160,8 @@ ML_va_rewind(ML_va_ptr VA)
 }
 
 ML_va_ptr
-ML_va_resize_dolock(ML_va_ptr VA0, MR_Integer N, MR_Word X)
+ML_va_resize_dolock(ML_va_ptr VA0, MR_Integer N, MR_Word X,
+    MR_AllocSiteInfoPtr alloc_id)
 {
 #ifdef MR_THREAD_SAFE
     MercuryLock *lock = VA0->lock;
@@ -1139,7 +1170,7 @@ ML_va_resize_dolock(ML_va_ptr VA0, MR_Integer N, MR_Word X)
 
     ML_maybe_lock(lock);
 
-    VA = ML_va_resize(VA0, N, X);
+    VA = ML_va_resize(VA0, N, X, alloc_id);
 
     ML_maybe_unlock(lock);
 
@@ -1147,23 +1178,28 @@ ML_va_resize_dolock(ML_va_ptr VA0, MR_Integer N, MR_Word X)
 }
 
 static ML_va_ptr
-ML_va_resize(ML_va_ptr VA0, MR_Integer N, MR_Word X)
+ML_va_resize(ML_va_ptr VA0, MR_Integer N, MR_Word X,
+    MR_AllocSiteInfoPtr alloc_id)
 {
     ML_va_ptr   latest;
     ML_va_ptr   VA;
     MR_Integer  i;
     MR_Integer  size_VA0;
     MR_Integer  min;
+    MR_Word     array;
 
     latest = ML_va_get_latest(VA0);
 
     size_VA0 = ML_va_size(latest);
     min      = (N <= size_VA0 ? N : size_VA0);
-    VA       = MR_GC_NEW(struct ML_va);
+    MR_incr_hp_type_msg(VA, struct ML_va,
+        alloc_id, ""version_array.version_array/1"");
+    MR_incr_hp_msg(array, N + 1,
+        alloc_id, ""version_array.version_array/1"");
 
     VA->index            = -1;
     VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
+    VA->rest.array       = (MR_ArrayPtr) array;
     VA->rest.array->size = N;
 
     for (i = 0; i < min; i++) {
@@ -1172,7 +1208,7 @@ ML_va_resize(ML_va_ptr VA0, MR_Integer N, MR_Word X)
 
 #ifdef MR_THREAD_SAFE
     if (VA0->lock != NULL) {
-        VA->lock = MR_GC_NEW(MercuryLock);
+        MR_incr_hp_type_msg(VA->lock, MercuryLock, alloc_id, NULL);
         pthread_mutex_init(VA->lock, MR_MUTEX_ATTR);
     } else {
         VA->lock = NULL;
diff --git a/mdbcomp/program_representation.m b/mdbcomp/program_representation.m
index 591accf..55cd975 100644
--- a/mdbcomp/program_representation.m
+++ b/mdbcomp/program_representation.m
@@ -923,7 +923,7 @@ read_file_as_bytecode(FileName, Result, !IO) :-
         char    *buf;
 
         Size = statbuf.st_size;
-        MR_allocate_aligned_string_msg(buf, Size, MR_PROC_LABEL);
+        MR_allocate_aligned_string_msg(buf, Size, MR_ALLOC_ID);
         fd = open(FileName, O_RDONLY, 0);
         if (fd < 0) {
             Bytes = NULL;
diff --git a/mdbcomp/rtti_access.m b/mdbcomp/rtti_access.m
index dd7d9cf..fe1051a 100644
--- a/mdbcomp/rtti_access.m
+++ b/mdbcomp/rtti_access.m
@@ -762,7 +762,7 @@ read_string_table(ByteCode, StringTable, !Pos) :-
     char    *table;
     int     i;
 
-    MR_allocate_aligned_string_msg(buf, Size, MR_PROC_LABEL);
+    MR_allocate_aligned_string_msg(buf, Size, MR_ALLOC_ID);
     table = ((char *) Bytes) + Offset;
     for (i = 0; i < Size; i++) {
         buf[i] = table[i];
diff --git a/profiler/Mercury.options b/profiler/Mercury.options
index e82c4aa..51d2536 100644
--- a/profiler/Mercury.options
+++ b/profiler/Mercury.options
@@ -5,3 +5,5 @@
 #-----------------------------------------------------------------------------#
 # Mercury.options - module-specific flags for Mmake and `mmc --make'.
 #-----------------------------------------------------------------------------#
+
+MCFLAGS-snapshots = --optimise-constructor-last-call
diff --git a/profiler/globals.m b/profiler/globals.m
index dbc34cf..aa0ca4e 100644
--- a/profiler/globals.m
+++ b/profiler/globals.m
@@ -34,6 +34,7 @@
 :- type what_to_profile
     --->    memory_words
     ;       memory_cells
+    ;       memory_snapshots
     ;       user_plus_system_time
     ;       user_time
     ;       real_time.
@@ -107,6 +108,7 @@
 
 what_to_profile("memory-words", memory_words).
 what_to_profile("memory-cells", memory_cells).
+what_to_profile("snapshots", memory_snapshots).
 what_to_profile("user-plus-system-time", user_plus_system_time).
 what_to_profile("user-time", user_time).
 what_to_profile("real-time", real_time).
diff --git a/profiler/mercury_profile.m b/profiler/mercury_profile.m
index d08abaa..6a805b6 100644
--- a/profiler/mercury_profile.m
+++ b/profiler/mercury_profile.m
@@ -38,6 +38,7 @@
 :- import_module output.
 :- import_module process_file.
 :- import_module propagate.
+:- import_module snapshots.
 
 :- import_module bool.
 :- import_module getopt.
@@ -53,7 +54,13 @@ main(!IO) :-
         special_handler),
     getopt.process_options(OptionOps, Args0, Args, Result0),
     postprocess_options(Result0, Args, Result, !IO),
-    main_2(Result, Args, !IO).
+    (
+        Result = yes(ErrorMessage),
+        usage_error(ErrorMessage, !IO)
+    ;
+        Result = no,
+        main_2(Args, !IO)
+    ).
 
 :- pred postprocess_options(maybe_option_table(option)::in, list(string)::in,
     maybe(string)::out, io::di, io::uo) is det.
@@ -136,48 +143,59 @@ long_usage(!IO) :-
 
 %-----------------------------------------------------------------------------%
 
-:- pred main_2(maybe(string)::in, list(string)::in, io::di, io::uo) is det.
+:- pred main_2(list(string)::in, io::di, io::uo) is det.
 
-main_2(yes(ErrorMessage), _, !IO) :-
-    usage_error(ErrorMessage, !IO).
-main_2(no, Args, !IO) :-
-    io.stderr_stream(StdErr, !IO),
-    io.set_output_stream(StdErr, StdOut, !IO),
-    globals.io_lookup_bool_option(call_graph, CallGraphOpt, !IO),
-    globals.io_lookup_bool_option(help, Help, !IO),
+main_2(Args, !IO) :-
+    globals.io_get_globals(Globals, !IO),
+    globals.lookup_bool_option(Globals, help, Help),
     (
         Help = yes,
         long_usage(!IO)
     ;
         Help = no,
-        globals.io_lookup_bool_option(verbose, Verbose, !IO),
-
-        maybe_write_string(Verbose, "% Processing input files...", !IO),
-        process_profiling_data_files(Prof0, CallGraph0, !IO),
-        maybe_write_string(Verbose, " done\n", !IO),
-
+        globals.lookup_bool_option(Globals, snapshots, Snapshots),
         (
-            CallGraphOpt = yes,
-            maybe_write_string(Verbose, "% Building call graph...", !IO),
-            build_call_graph(Args, CallGraph0, CallGraph, !IO),
-            maybe_write_string(Verbose, " done\n", !IO),
-
-            maybe_write_string(Verbose, "% Propagating counts...", !IO),
-            propagate_counts(CallGraph, Prof0, Prof, !IO),
-            maybe_write_string(Verbose, " done\n", !IO)
+            Snapshots = yes,
+            show_snapshots(!IO)
         ;
-            CallGraphOpt = no,
-            Prof = Prof0
-        ),
+            Snapshots = no,
+            main_3(Args, !IO)
+        )
+    ).
+
+:- pred main_3(list(string)::in, io::di, io::uo) is det.
 
-        maybe_write_string(Verbose, "% Generating output...", !IO),
-        generate_prof_output(Prof, IndexMap, OutputProf, !IO),
+main_3(Args, !IO) :-
+    io.stderr_stream(StdErr, !IO),
+    io.set_output_stream(StdErr, StdOut, !IO),
+    globals.io_lookup_bool_option(verbose, Verbose, !IO),
+
+    maybe_write_string(Verbose, "% Processing input files...", !IO),
+    process_profiling_data_files(Prof0, CallGraph0, !IO),
+    maybe_write_string(Verbose, " done\n", !IO),
+
+    globals.io_lookup_bool_option(call_graph, CallGraphOpt, !IO),
+    (
+        CallGraphOpt = yes,
+        maybe_write_string(Verbose, "% Building call graph...", !IO),
+        build_call_graph(Args, CallGraph0, CallGraph, !IO),
         maybe_write_string(Verbose, " done\n", !IO),
 
-        io.set_output_stream(StdOut, _, !IO),
-        output_profile(OutputProf, IndexMap, !IO),
-        io.nl(!IO)
-    ).
+        maybe_write_string(Verbose, "% Propagating counts...", !IO),
+        propagate_counts(CallGraph, Prof0, Prof, !IO),
+        maybe_write_string(Verbose, " done\n", !IO)
+    ;
+        CallGraphOpt = no,
+        Prof = Prof0
+    ),
+
+    maybe_write_string(Verbose, "% Generating output...", !IO),
+    generate_prof_output(Prof, IndexMap, OutputProf, !IO),
+    maybe_write_string(Verbose, " done\n", !IO),
+
+    io.set_output_stream(StdOut, _, !IO),
+    output_profile(OutputProf, IndexMap, !IO),
+    io.nl(!IO).
 
 %-----------------------------------------------------------------------------%
 :- end_module mercury_profile.
diff --git a/profiler/options.m b/profiler/options.m
index 50b8cbc..2ecad4b 100644
--- a/profiler/options.m
+++ b/profiler/options.m
@@ -41,6 +41,13 @@
     ;       pairfile
     ;       declfile
     ;       libraryfile
+    % Snapshot (memory attribution profiling) options
+    ;       snapshots
+    ;       snapshots_file
+    ;       snapshots_grouping
+    ;       snapshots_hide_details
+    ;       snapshots_include_runtime
+    ;       snapshots_recalc_size       % developers only
     % Miscellaneous Options
     ;       help.
 
@@ -84,6 +91,12 @@ option_default(pairfile,                string("Prof.CallPair")).
 option_default(declfile,                string("Prof.Decl")).
 option_default(libraryfile,             string("")).
 option_default(demangle,                bool(yes)).
+option_default(snapshots,               bool(no)).
+option_default(snapshots_file,          string("Prof.Snapshots")).
+option_default(snapshots_grouping,      string("proc")).
+option_default(snapshots_hide_details,  bool(no)).
+option_default(snapshots_include_runtime, bool(no)).
+option_default(snapshots_recalc_size,   bool(yes)).
 
     % Miscellaneous Options
 option_default(help,                    bool(no)).
@@ -94,12 +107,16 @@ short_option('C', countfile).
 short_option('c', call_graph).
 short_option('d', dynamic_cg).
 short_option('D', declfile).
+short_option('g', snapshots_grouping).
 short_option('h', help).
+short_option('H', snapshots_hide_details).
 short_option('L', libraryfile).
 short_option('m', profile_memory_words).
 short_option('M', profile_memory_cells).
 short_option('p', profile).
 short_option('P', pairfile).
+short_option('r', snapshots_include_runtime).
+short_option('s', snapshots).
 short_option('t', profile_time).
 short_option('v', verbose).
 short_option('V', very_verbose).
@@ -116,6 +133,12 @@ long_option("profile",              profile).
 long_option("profile-memory-words", profile_memory_words).
 long_option("profile-memory-cells", profile_memory_cells).
 long_option("profile-time",         profile_time).
+long_option("snapshots",            snapshots).
+long_option("snapshots-file",       snapshots_file).
+long_option("snapshots-grouping",   snapshots_grouping).
+long_option("snapshots-hide-details", snapshots_hide_details).
+long_option("snapshots-include-runtime", snapshots_include_runtime).
+long_option("snapshots-recalc-size", snapshots_recalc_size).
 long_option("use-dynamic",          dynamic_cg).
 long_option("verbose",              verbose).
 long_option("very-verbose",         very_verbose).
@@ -173,6 +196,22 @@ options_help -->
     io.write_string("\t\tName of the file which contains the call graph for\n"),
     io.write_string("\t\tthe library modules.\n"),
 
+    io.write_string("\nSnapshot options:\n"),
+    io.write_string("\t-s, --snapshots\n"),
+    io.write_string("\t\tShow summary of heap objects at the times\n"),
+    io.write_string("\t\t`benchmarking.report_memory_attribution' was called.\n"),
+    io.write_string("\t\tThis overrides other profiler modes.\n"),
+    io.write_string("\t--snapshot-file <file>\n"),
+    io.write_string("\t\tName of the snapshots file. Usually `Prof.Snapshots'.\n"),
+    io.write_string("\t-g <type>, --snapshots-grouping <type>\n"),
+    io.write_string("\t\tSelect method of grouping results.\n"),
+    io.write_string("\t\tMay be by ""type"" or ""proc"" (default).\n"),
+    io.write_string("\t-H, --snapshots-hide-details\n"),
+    io.write_string("\t\tGenerate a brief profile.\n"),
+    io.write_string("\t-i, --snapshots-include-runtime\n"),
+    io.write_string("\t\tInclude internal Mercury runtime structures in the\n"),
+    io.write_string("\t\tprofile. These are excluded by default.\n"),
+
     io.write_string("\nVerbosity Options:\n"),
     io.write_string("\t-v, --verbose\n"),
     io.write_string("\t\tOutput progress messages at each stage.\n"),
diff --git a/profiler/output.m b/profiler/output.m
index fdc38dd..939f6bd 100644
--- a/profiler/output.m
+++ b/profiler/output.m
@@ -81,6 +81,7 @@ classify_profile(user_plus_system_time) = time_headers.
 classify_profile(real_time) = time_headers.
 classify_profile(memory_words) = memory_words_headers.
 classify_profile(memory_cells) = memory_cells_headers.
+classify_profile(memory_snapshots) = memory_cells_headers.  % dummy
 
 :- pred units(header_category::in, string::out, string::out, string::out,
     string::out, string::out, string::out, string::out, string::out) is det.
diff --git a/profiler/process_file.m b/profiler/process_file.m
index dbd71b4..b8a0c37 100644
--- a/profiler/process_file.m
+++ b/profiler/process_file.m
@@ -9,7 +9,7 @@
 % File: process_file.m
 % Main author: petdr.
 %
-% Processs the files that contain the label declarations, label counts and
+% Process the files that contain the label declarations, label counts and
 % the caller-callee pairs, also builds the dynamic call graph if the option
 % set.
 %
diff --git a/profiler/snapshots.m b/profiler/snapshots.m
new file mode 100644
index 0000000..d38e7a6
--- /dev/null
+++ b/profiler/snapshots.m
@@ -0,0 +1,581 @@
+%-----------------------------------------------------------------------------%
+% vim: ft=mercury ts=4 sw=4 et
+%-----------------------------------------------------------------------------%
+% Copyright (C) 2011 The University of Melbourne.
+% This file may only be copied under the terms of the GNU General
+% Public License - see the file COPYING in the Mercury distribution.
+%-----------------------------------------------------------------------------%
+%
+% File: snapshots.m.
+% Main author: wangp.
+%
+% This module summarises and outputs the data for memory attribution profiling.
+% This is a distinct mode from the rest of the mprof tool.
+%
+%-----------------------------------------------------------------------------%
+
+:- module snapshots.
+:- interface.
+
+:- import_module io.
+
+:- pred show_snapshots(io::di, io::uo) is det.
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+
+:- implementation.
+
+:- import_module demangle.
+:- import_module globals.
+:- import_module options.
+
+:- import_module bool.
+:- import_module float.
+:- import_module int.
+:- import_module list.
+:- import_module map.
+:- import_module require.
+:- import_module string.
+:- import_module svmap.
+
+%-----------------------------------------------------------------------------%
+
+:- type alloc_site_map ==  map(alloc_id, alloc_site).
+
+:- type alloc_id
+    --->    alloc_id(int).
+
+:- type alloc_site
+            % Field order affects secondary sort order.
+    --->    alloc_site(
+                alloc_proc          :: string,
+                alloc_type          :: string,
+                alloc_file_name     :: string,
+                alloc_line_number   :: int,
+                alloc_words         :: int
+                % If non-zero, the fixed number of words allocated at this
+                % allocation site.  If zero, the size is unknown or variable.
+            ).
+
+:- type alloc_site_counts
+            % alloc_site field must be first for secondary sort order.
+    --->    alloc_site_counts(
+                asc_alloc_site  :: alloc_site,
+                asc_num_cells   :: int,
+                asc_num_words   :: int
+                % The number of words allocated during the profiling run.
+                % If possible, we do NOT use this number as is inflated by
+                % the extra attribution word plus roundup.
+            ).
+
+:- type group
+    --->    group(
+                g_total_cells   :: int,
+                g_total_words   :: int,
+                g_representative:: alloc_site,
+                g_details       :: list(alloc_site_counts)
+            ).
+
+    % size_map[ReqWords] = ActualWords.
+    % ReqWords is the number of words that was requested to be allocated.
+    % ActualWords is the number of words that Boehm GC would have rounded the
+    % request up to.
+    %
+:- type size_map == list(int).
+
+:- type snapshot_options
+    --->    snapshot_options(
+                major_axis      :: major_axis,
+                hide_details    :: bool,
+                recalc_words    :: bool,
+                include_runtime :: bool
+            ).
+
+:- type major_axis
+    --->    major_axis_proc
+    ;       major_axis_type.
+
+%-----------------------------------------------------------------------------%
+
+show_snapshots(!IO) :-
+    globals.io_lookup_string_option(snapshots_file, SnapshotsFile, !IO),
+    globals.io_lookup_string_option(snapshots_grouping, SnapshotsGroup, !IO),
+    globals.io_lookup_bool_option(snapshots_hide_details, HideDetails, !IO),
+    globals.io_lookup_bool_option(snapshots_recalc_size, RecalcSize, !IO),
+    globals.io_lookup_bool_option(snapshots_include_runtime, InclRuntime, !IO),
+    ( valid_grouping(SnapshotsGroup, MajorAxis0) ->
+        MajorAxis = MajorAxis0
+    ;
+        error("unexpected grouping option `" ++ SnapshotsGroup ++ "'")
+    ),
+    Options = snapshot_options(MajorAxis, HideDetails, RecalcSize, InclRuntime),
+    io.open_input(SnapshotsFile, OpenDeclRes, !IO),
+    (
+        OpenDeclRes = ok(DeclStream),
+        parse_alloc_site_decls(DeclStream, AllocSiteMap, SizeMap, !IO),
+        io.close_input(DeclStream, !IO)
+    ;
+        OpenDeclRes = error(DeclError),
+        DeclErrorStr = "error opening file `" ++ SnapshotsFile ++
+            "': " ++ io.error_message(DeclError) ++ "\n",
+        error(DeclErrorStr)
+    ),
+    io.open_input(SnapshotsFile, OpenRes, !IO),
+    (
+        OpenRes = ok(Stream),
+        show_all_snapshots(Stream, Options, AllocSiteMap, SizeMap, !IO),
+        io.close_input(Stream, !IO)
+    ;
+        OpenRes = error(Error),
+        ErrorStr = "error opening file `" ++ SnapshotsFile ++
+            "': " ++ io.error_message(Error) ++ "\n",
+        error(ErrorStr)
+    ).
+
+:- pred valid_grouping(string::in, major_axis::out) is semidet.
+
+valid_grouping("type", major_axis_type).
+valid_grouping("proc", major_axis_proc).
+valid_grouping("procedure", major_axis_proc).
+
+%-----------------------------------------------------------------------------%
+
+:- pred parse_alloc_site_decls(io.input_stream::in,
+    alloc_site_map::out, size_map::out, io::di, io::uo) is det.
+
+parse_alloc_site_decls(Stream, AllocSiteMap, SizeMap, !IO) :-
+    io.read_line_as_string(Stream, LineRes, !IO),
+    (
+        LineRes = ok(Line),
+        % Search for the size_map, which indicates the start of allocation site
+        % declarations.
+        ( string.prefix(Line, "size_map ") ->
+            parse_size_map(Line, SizeMap),
+            parse_alloc_site_lines(Stream, map.init, AllocSiteMap, !IO)
+        ;
+            parse_alloc_site_decls(Stream, AllocSiteMap, SizeMap, !IO)
+        )
+    ;
+        LineRes = eof,
+        unexpected($module, $pred, "format error: cannot find declarations")
+    ;
+        LineRes = error(Error),
+        unexpected($module, $pred, io.error_message(Error))
+    ).
+
+:- pred parse_size_map(string::in, list(int)::out) is det.
+
+parse_size_map(Line, SizeMap) :-
+    (
+        string.words(Line) = ["size_map" | Words],
+        list.map(string.to_int, Words, Ints)
+    ->
+        SizeMap = Ints
+    ;
+        unexpected($module, $pred, "format error: bad size_map line")
+    ).
+
+:- pred parse_alloc_site_lines(io.input_stream::in,
+    alloc_site_map::in, alloc_site_map::out, io::di, io::uo) is det.
+
+parse_alloc_site_lines(Stream, !AllocSiteMap, !IO) :-
+    io.read_line_as_string(Stream, LineRes, !IO),
+    (
+        LineRes = ok(Line),
+        parse_alloc_site_line(Line, !AllocSiteMap, !IO),
+        parse_alloc_site_lines(Stream, !AllocSiteMap, !IO)
+    ;
+        LineRes = eof
+    ;
+        LineRes = error(Error),
+        unexpected($module, $pred, io.error_message(Error))
+    ).
+
+:- pred parse_alloc_site_line(string::in,
+    alloc_site_map::in, alloc_site_map::out, io::di, io::uo) is det.
+
+parse_alloc_site_line(Line0, !AllocSiteMap, !IO) :-
+    Line = string.chomp(Line0),
+    Words = string.split_at_char('\t', Line),
+    (
+        Words = [IdStr, MangledProcName, FileName, LineNumStr, Type,
+            NumWordsStr],
+        string.to_int(IdStr, Id),
+        string.to_int(LineNumStr, LineNum),
+        string.to_int(NumWordsStr, NumWords)
+    ->
+        demangle(MangledProcName, ProcName),
+        AllocSite = alloc_site(ProcName, Type, FileName, LineNum, NumWords),
+        svmap.det_insert(alloc_id(Id), AllocSite, !AllocSiteMap)
+    ;
+        unexpected($module, $pred, "format error: bad alloc site declaration")
+    ).
+
+%-----------------------------------------------------------------------------%
+
+:- pred show_all_snapshots(io.input_stream::in, snapshot_options::in,
+    alloc_site_map::in, size_map::in, io::di, io::uo) is det.
+
+show_all_snapshots(Stream, Options, AllocSiteMap, SizeMap, !IO) :-
+    io.read_line_as_string(Stream, LineRes, !IO),
+    (
+        LineRes = ok(Line),
+        ( string.remove_prefix("start ", Line, SnapshotName0) ->
+            SnapshotName = string.chomp(SnapshotName0),
+            output_snapshot_title(SnapshotName, !IO),
+            show_single_snapshot(Stream, Options, AllocSiteMap, SizeMap, !IO),
+            show_all_snapshots(Stream, Options, AllocSiteMap, SizeMap, !IO)
+        ;
+            true
+        )
+    ;
+        LineRes = eof
+    ;
+        LineRes = error(Error),
+        unexpected($module, $pred, io.error_message(Error))
+    ).
+
+:- pred show_single_snapshot(io.input_stream::in, snapshot_options::in,
+    alloc_site_map::in, size_map::in, io::di, io::uo) is det.
+
+show_single_snapshot(Stream, Options, AllocSiteMap, SizeMap, !IO) :-
+    parse_snapshot(Stream, Options, AllocSiteMap, SizeMap, AllocCounts, !IO),
+    MajorAxis = Options ^ major_axis,
+    make_sorted_groups(MajorAxis, AllocCounts, Groups),
+    output_snapshot(Options, Groups, !IO).
+
+:- pred parse_snapshot(io.input_stream::in, snapshot_options::in,
+    alloc_site_map::in, size_map::in, list(alloc_site_counts)::out,
+    io::di, io::uo) is det.
+
+parse_snapshot(Stream, Options, AllocSiteMap, SizeMap, AllocCounts, !IO) :-
+    io.read_line_as_string(Stream, LineRes, !IO),
+    (
+        LineRes = ok(Line),
+        (
+            string.prefix(Line, "end ")
+        ->
+            AllocCounts = []
+        ;
+            parse_alloc_site(Options, AllocSiteMap, SizeMap, Line, Counts)
+        ->
+            parse_snapshot(Stream, Options, AllocSiteMap, SizeMap,
+                RestCounts, !IO),
+            AllocCounts = [Counts | RestCounts]
+        ;
+            parse_snapshot(Stream, Options, AllocSiteMap, SizeMap,
+                AllocCounts, !IO)
+        )
+    ;
+        LineRes = eof,
+        unexpected($module, $pred, "format error")
+    ;
+        LineRes = error(Error),
+        unexpected($module, $pred, io.error_message(Error))
+    ).
+
+:- pred parse_alloc_site(snapshot_options::in, alloc_site_map::in,
+    size_map::in, string::in, alloc_site_counts::out) is semidet.
+
+parse_alloc_site(Options, AllocSiteMap, SizeMap, Line, Counts) :-
+    string.words(Line) = [IdStr, NumCellsStr, NumWordsStr0],
+    string.to_int(NumCellsStr, NumCells),
+    string.to_int(NumWordsStr0, NumWords0),
+    ( string.to_int(IdStr, Id) ->
+        get_alloc_site(AllocSiteMap, alloc_id(Id), AllocSite),
+        RecalcSize = Options ^ recalc_words
+    ;
+        IdStr = "runtime",
+        Options ^ include_runtime = yes,
+        string.format("runtime struct (%d words)", [i(NumWords0)], Type),
+        AllocSite = alloc_site("unknown", Type, "unknown", 0, NumWords0),
+        RecalcSize = no
+    ;
+        IdStr = "unknown",
+        string.format("unknown (%d words)", [i(NumWords0)], Type),
+        AllocSite = alloc_site("unknown", Type, "unknown", 0, NumWords0),
+        RecalcSize = no
+    ),
+    (
+        RecalcSize = yes,
+        WordsPerCell = AllocSite ^ alloc_words,
+        WordsPerCell > 0,
+        list.index1(SizeMap, WordsPerCell, SizeMapWords)
+    ->
+        NumWords = NumCells * SizeMapWords
+    ;
+        NumWords = NumWords0
+    ),
+    Counts = alloc_site_counts(AllocSite, NumCells, NumWords).
+
+:- pred get_alloc_site(alloc_site_map::in, alloc_id::in, alloc_site::out)
+    is det.
+
+get_alloc_site(AllocSiteMap, AllocId, AllocSite) :-
+    ( map.search(AllocSiteMap, AllocId, AllocSite0) ->
+        AllocSite = AllocSite0
+    ;
+        AllocSite = alloc_site("unknown", "unknown", "(unknown)", 0, 0)
+    ).
+
+%-----------------------------------------------------------------------------%
+
+:- pred make_sorted_groups(major_axis::in, list(alloc_site_counts)::in,
+    list(group)::out) is det.
+
+make_sorted_groups(MajorAxis, Counts, SortedGroups) :-
+    (
+        MajorAxis = major_axis_proc,
+        Compare = counts_by_proc
+    ;
+        MajorAxis = major_axis_type,
+        Compare = counts_by_type
+    ),
+    sort(Compare, Counts, SortedCounts),
+    make_groups(Compare, SortedCounts, Groups),
+    sort(group_by_words, Groups, SortedGroups).
+
+:- pred make_groups(comparison_pred(alloc_site_counts)::in(comparison_pred),
+    list(alloc_site_counts)::in, list(group)::out) is det.
+
+make_groups(Compare, Counts, Groups) :-
+    (
+        Counts = [],
+        Groups = []
+    ;
+        Counts = [_ | _],
+        takewhile(Compare, Counts, First, Rest),
+        make_group(First, FirstGroup),
+        make_groups(Compare, Rest, RestGroups),
+        Groups = [FirstGroup | RestGroups]
+    ).
+
+:- pred make_group(list(alloc_site_counts)::in, group::out) is det.
+
+make_group(Counts, Group) :-
+    % This relies on the alloc_site being the first field of alloc_site_counts,
+    % and on the default ordering of alloc_site.
+    sort(Counts, SortedCounts0),
+    sort(counts_by_words, SortedCounts0, SortedCounts),
+    list.foldl2(sum_counts, SortedCounts, 0, TotalCells, 0, TotalWords),
+    FirstSite = list.det_head(SortedCounts) ^ asc_alloc_site,
+    Group = group(TotalCells, TotalWords, FirstSite, SortedCounts).
+
+%-----------------------------------------------------------------------------%
+
+:- pred output_snapshot_title(string::in, io::di, io::uo) is det.
+
+output_snapshot_title(SnapshotName, !IO) :-
+    io.write_string("------ ", !IO),
+    io.write_string(SnapshotName, !IO),
+    io.write_string(" ------\n", !IO).
+
+:- pred output_snapshot(snapshot_options::in, list(group)::in, io::di, io::uo)
+    is det.
+
+output_snapshot(Options, Grouped, !IO) :-
+    output_column_names(Options, !IO),
+    list.foldl2(sum_groups, Grouped, 0, TotalCells, 0, TotalWords),
+    io.format(" %7d%17d%14s  %s\n",
+        [i(TotalCells), i(TotalWords), s(""), s("total")], !IO),
+    HideDetails = Options ^ hide_details,
+    (
+        HideDetails = yes
+    ;
+        HideDetails = no,
+        io.nl(!IO)
+    ),
+    list.foldl2(output_group(Options, TotalCells, TotalWords),
+        Grouped, 0, _CumulWords, !IO).
+
+:- pred output_column_names(snapshot_options::in, io::di, io::uo) is det.
+
+output_column_names(Options, !IO) :-
+    MajorAxis = Options ^ major_axis,
+    (
+        MajorAxis = major_axis_proc,
+        RightLabel = "procedure / type (location)"
+    ;
+        MajorAxis = major_axis_type,
+        RightLabel = "type / procedure (location)"
+    ),
+    io.format(" %7s%17s%14s  %s\n",
+        [s("cells"),
+        s("words"),
+        s("cumul"),
+        s(RightLabel)], !IO).
+
+:- pred output_group(snapshot_options::in, int::in, int::in, group::in,
+    int::in, int::out, io::di, io::uo) is det.
+
+output_group(Options, TotalCells, TotalWords, Group, !CumulWords, !IO) :-
+    Group = group(NumCells, NumWords, AllocSite, Counts),
+    !:CumulWords = !.CumulWords + NumWords,
+    CellsPercent = percentage(NumCells, TotalCells),
+    WordsPercent = percentage(NumWords, TotalWords),
+    CumulPercent = percentage(!.CumulWords, TotalWords),
+    (
+        CellsPercent =< min_percentage_major,
+        WordsPercent =< min_percentage_major
+    ->
+        true
+    ;
+        MajorAxis = Options ^ major_axis,
+        HideDetails = Options ^ hide_details,
+        (
+            MajorAxis = major_axis_proc,
+            RightLabel = AllocSite ^ alloc_proc
+        ;
+            MajorAxis = major_axis_type,
+            RightLabel = AllocSite ^ alloc_type
+        ),
+        (
+            HideDetails = yes,
+            Star = ' '
+        ;
+            HideDetails = no,
+            Star = ('*')
+        ),
+        io.format("%c%7d/%5.1f%% %9d/%5.1f%% %5.1f%%  %s\n",
+            [c(Star),
+            i(NumCells), f(CellsPercent),
+            i(NumWords), f(WordsPercent), f(CumulPercent),
+            s(RightLabel)], !IO),
+        (
+            HideDetails = yes
+        ;
+            HideDetails = no,
+            Single = ( Counts = [_] -> yes ; no ),
+            list.foldl(output_site(MajorAxis, TotalCells, TotalWords, Single),
+                Counts, !IO),
+            io.nl(!IO)
+        )
+    ).
+
+:- pred output_site(major_axis::in, int::in, int::in, bool::in,
+    alloc_site_counts::in, io::di, io::uo) is det.
+
+output_site(MajorAxis, TotalCells, TotalWords, Single, AllocCounts, !IO) :-
+    AllocCounts = alloc_site_counts(AllocSite, NumCells, NumWords),
+    AllocSite = alloc_site(Proc, Type, File, LineNum, _),
+    CellsPercent = percentage(NumCells, TotalCells),
+    WordsPercent = percentage(NumWords, TotalWords),
+    (
+        MajorAxis = major_axis_proc,
+        RightLabel = Type
+    ;
+        MajorAxis = major_axis_type,
+        RightLabel = Proc
+    ),
+    (
+        Single = yes,
+        io.format(" %38s  %s (%s:%d)\n",
+            [s(""), s(RightLabel), s(File), i(LineNum)], !IO)
+    ;
+        Single = no,
+        (
+            CellsPercent =< min_percentage_major,
+            WordsPercent =< min_percentage_major
+        ->
+            true
+        ;
+            io.format(" %7d/%5.1f%% %9d/%5.1f%%  %5s  %s (%s:%d)\n", [
+                i(NumCells), f(CellsPercent),
+                i(NumWords), f(WordsPercent), s(""),
+                s(RightLabel), s(File), i(LineNum)
+            ], !IO)
+        )
+    ).
+
+%-----------------------------------------------------------------------------%
+
+:- pred group_by_words(group::in, group::in, comparison_result::out) is det.
+
+group_by_words(GroupA, GroupB, Result) :-
+    A = GroupA ^ g_total_words,
+    B = GroupB ^ g_total_words,
+    compare(Result, B, A).
+
+:- pred counts_by_proc(alloc_site_counts::in, alloc_site_counts::in,
+    comparison_result::out) is det.
+
+counts_by_proc(CountsA, CountsB, Result) :-
+    A = CountsA ^ asc_alloc_site ^ alloc_proc,
+    B = CountsB ^ asc_alloc_site ^ alloc_proc,
+    compare(Result, B, A).
+
+:- pred counts_by_type(alloc_site_counts::in, alloc_site_counts::in,
+    comparison_result::out) is det.
+
+counts_by_type(CountsA, CountsB, Result) :-
+    A = CountsA ^ asc_alloc_site ^ alloc_type,
+    B = CountsB ^ asc_alloc_site ^ alloc_type,
+    compare(Result, B, A).
+
+:- pred counts_by_words(alloc_site_counts::in, alloc_site_counts::in,
+    comparison_result::out) is det.
+
+counts_by_words(CountsA, CountsB, Result) :-
+    A = CountsA ^ asc_num_words,
+    B = CountsB ^ asc_num_words,
+    compare(Result, B, A).
+
+:- pred sum_groups(group::in, int::in, int::out, int::in, int::out) is det.
+
+sum_groups(Group, !TotalCells, !TotalWords) :-
+    !:TotalCells = !.TotalCells + Group ^ g_total_cells,
+    !:TotalWords = !.TotalWords + Group ^ g_total_words.
+
+:- pred sum_counts(alloc_site_counts::in, int::in, int::out,
+    int::in, int::out) is det.
+
+sum_counts(Site, !TotalCells, !TotalWords) :-
+    !:TotalCells = !.TotalCells + (Site ^ asc_num_cells),
+    !:TotalWords = !.TotalWords + (Site ^ asc_num_words).
+
+%-----------------------------------------------------------------------------%
+
+:- func percentage(int, int) = float.
+
+percentage(N, Total) = 100.0 * float(N) / float(Total).
+
+:- func min_percentage_major = float.
+
+min_percentage_major = 0.1.
+
+:- func min_percentage_minor = float.
+
+min_percentage_minor = 0.05.
+
+%-----------------------------------------------------------------------------%
+
+:- pred takewhile(comparison_pred(T)::in(comparison_pred),
+    list(T)::in, list(T)::out, list(T)::out) is det.
+
+takewhile(Compare, List, Upto, After) :-
+    (
+        List = [],
+        Upto = [],
+        After = []
+    ;
+        List = [_],
+        Upto = List,
+        After = []
+    ;
+        List = [A, B | Cs],
+        Compare(A, B, Cmp),
+        (
+            Cmp = (=),
+            takewhile(Compare, [B | Cs], Upto0, After),
+            Upto = [A | Upto0]
+        ;
+            ( Cmp = (<)
+            ; Cmp = (>)
+            ),
+            Upto = [A],
+            After = [B | Cs]
+        )
+    ).
+
+%-----------------------------------------------------------------------------%
diff --git a/runtime/mercury.c b/runtime/mercury.c
index 495e5f6..9aaf16a 100644
--- a/runtime/mercury.c
+++ b/runtime/mercury.c
@@ -49,8 +49,8 @@ MR_OUTLINE_DEFN(
     MR_create1_func(MR_Word w1) 
 ,
     {
-        MR_Word *p = (MR_Word *) MR_new_object(MR_Word,
-            1 * sizeof(MR_Word), "create1");
+        MR_Word *p = (MR_Word *) MR_new_object(MR_Word, 1 * sizeof(MR_Word),
+            NULL, "create1");
         p[0] = w1;
         return (MR_Word) p;
     }
@@ -61,8 +61,8 @@ MR_OUTLINE_DEFN(
     MR_create2_func(MR_Word w1, MR_Word w2) 
 ,
     {
-        MR_Word *p = (MR_Word *) MR_new_object(MR_Word,
-            2 * sizeof(MR_Word), "create2");
+        MR_Word *p = (MR_Word *) MR_new_object(MR_Word, 2 * sizeof(MR_Word),
+            NULL, "create2");
         p[0] = w1;
         p[1] = w2;
         return (MR_Word) p;
@@ -74,8 +74,8 @@ MR_OUTLINE_DEFN(
     MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3) 
 ,
     {
-        MR_Word *p = (MR_Word *) MR_new_object(MR_Word,
-            3 * sizeof(MR_Word), "create3");
+        MR_Word *p = (MR_Word *) MR_new_object(MR_Word, 3 * sizeof(MR_Word),
+            NULL, "create3");
         p[0] = w1;
         p[1] = w2;
         p[2] = w3;
@@ -93,7 +93,8 @@ MR_OUTLINE_DEFN(
         MR_Float *ptr;
 
         MR_make_hp_float_aligned();
-        ptr = (MR_Float *) MR_new_object(MR_Float, sizeof(MR_Float), "float");
+        ptr = (MR_Float *) MR_new_object(MR_Float, sizeof(MR_Float),
+            MR_ALLOC_SITE_FLOAT, NULL);
         *ptr = f;
         return (MR_Box) ptr;
     }
@@ -126,7 +127,8 @@ MR_asm_box_float(MR_Float f)
     MR_Float *ptr;
 
     MR_make_hp_float_aligned();
-    ptr = (MR_Float *) MR_new_object(MR_Float, sizeof(MR_Float), "float");
+    ptr = (MR_Float *) MR_new_object(MR_Float, sizeof(MR_Float),
+        MR_ALLOC_SITE_FLOAT, NULL);
     *ptr = f;
     return (MR_Box) ptr;
 }
diff --git a/runtime/mercury.h b/runtime/mercury.h
index 6000920..90827e2 100644
--- a/runtime/mercury.h
+++ b/runtime/mercury.h
@@ -202,21 +202,29 @@ extern	MR_Word	mercury__private_builtin__dummy_var;
           })                                                            \
         : GC_MALLOC(bytes)                         			\
         )
-    #define MR_new_object(type, size, name) \
-  		((type *) MR_GC_MALLOC_INLINE(size))
+    #define MR_new_object(type, size, alloc_id, name) \
+		((type *) MR_GC_MALLOC_INLINE(size))
     /*
     ** Since the Boehm collector defined GC_MALLOC_WORDS but not
     ** GC_MALLOC_WORDS_ATOMIC, we can define MR_new_object_atomic here
     ** to call either MR_GC_MALLOC_ATOMIC or MR_GC_MALLOC_INLINE,
     ** depending on whether we value atomicity or inline expansion more.
     */
-    #define MR_new_object_atomic(type, size, name) \
-  		((type *) MR_GC_MALLOC_ATOMIC(size))
+    #define MR_new_object_atomic(type, size, alloc_id, name) \
+		((type *) MR_GC_MALLOC_ATOMIC(size))
   #else /* !MR_INLINE_ALLOC */
-    #define MR_new_object(type, size, name) \
-  		((type *) GC_MALLOC(size)) 
-    #define MR_new_object_atomic(type, size, name) \
-  		((type *) GC_MALLOC_ATOMIC(size)) 
+
+    #ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+	#define MR_new_object(type, size, alloc_id, name)		\
+		((type *) MR_new_object_func(size, alloc_id, name))
+	#define MR_new_object_atomic(type, size, alloc_id, name)	\
+		((type *) MR_new_object_atomic_func(size, alloc_id, name))
+    #else
+	#define MR_new_object(type, size, alloc_id, name)		\
+		((type *) GC_MALLOC(size))
+	#define MR_new_object_atomic(type, size, alloc_id, name)	\
+		((type *) GC_MALLOC_ATOMIC(size))
+    #endif
   #endif /* !MR_INLINE_ALLOC */
 
 #else /* !MR_CONSERVATIVE_GC */
@@ -236,18 +244,18 @@ extern	MR_Word	mercury__private_builtin__dummy_var;
   ** if they don't fit in a word. This would need to change if we ever start
   ** using unboxed fields whose alignment requirement is greater than one word.
   */
-  #define MR_new_object(type, size, name)				\
+  #define MR_new_object(type, size, alloc_id, name)			\
     ({ 									\
-        size_t 	MR_new_object_num_words;				\
-        MR_Word MR_new_object_ptr;					\
+	size_t 	MR_new_object_num_words;				\
+	MR_Word MR_new_object_ptr;					\
 									\
-	MR_new_object_num_words = 					\
-		((size) + sizeof(MR_Word) - 1) / sizeof(MR_Word);	\
-	MR_incr_hp(MR_new_object_ptr, MR_new_object_num_words);		\
+	MR_new_object_num_words = MR_bytes_to_words(size);		\
+	MR_incr_hp_msg(MR_new_object_ptr, MR_new_object_num_words,	\
+		(alloc_id), (name));					\
 	/* return */ (type *) MR_new_object_ptr;			\
     })
-  #define MR_new_object_atomic(type, size, name)			\
-    MR_new_object(type, size, name)
+  #define MR_new_object_atomic(type, size, alloc_id, name)		\
+    MR_new_object(type, size, alloc_id, name)
 
 #endif
 
@@ -263,8 +271,8 @@ extern	MR_Word	mercury__private_builtin__dummy_var;
 	MR_Float *MR_box_float_ptr;					\
 									\
 	MR_make_hp_float_aligned();					\
-	MR_box_float_ptr = 						\
-		MR_new_object_atomic(MR_Float, sizeof(MR_Float), "float"); \
+	MR_box_float_ptr = MR_new_object_atomic(MR_Float,		\
+	    sizeof(MR_Float), MR_ALLOC_SITE_FLOAT, NULL);		\
 	*MR_box_float_ptr = (f);					\
 	/* return */ (MR_Box) MR_box_float_ptr;				\
   })
@@ -276,7 +284,8 @@ extern	MR_Word	mercury__private_builtin__dummy_var;
 	MR_Float *ptr;
 
 	MR_make_hp_float_aligned();
-	ptr = MR_new_object_atomic(MR_Float, sizeof(MR_Float), "float");
+	ptr = MR_new_object_atomic(MR_Float, sizeof(MR_Float),
+	    MR_ALLOC_SITE_FLOAT, NULL);
 	*ptr = f;
 	return (MR_Box) ptr;
   }
diff --git a/runtime/mercury_array_macros.h b/runtime/mercury_array_macros.h
index c83935e..6a7b63f 100644
--- a/runtime/mercury_array_macros.h
+++ b/runtime/mercury_array_macros.h
@@ -50,7 +50,7 @@
 **   which _is_ scanned by the collector).
 **
 ** It is the caller's responsibility to deallocate the memory for the array
-** if/when it is no longer needed, using MR_free() or MR_GC_free()
+** if/when it is no longer needed, using MR_free() or MR_GC_free_attrib()
 ** respectively.
 */
 
@@ -66,15 +66,17 @@
             }                                                           \
         }                                                               \
     } while(0)
-#define MR_GC_ensure_room_for_next(base, type, init)                    \
+#define MR_GC_ensure_room_for_next(base, type, init, alloc_id)          \
     do {                                                                \
         if (base##_next >= base##_max) {                                \
             if (base##_max == 0) {                                      \
                 base##_max = (init);                                    \
-                base##s = MR_GC_NEW_ARRAY(type, base##_max);            \
+                base##s = MR_GC_NEW_ARRAY_ATTRIB(type, base##_max,      \
+                    (alloc_id));                                         \
             } else {                                                    \
                 base##_max *= 2;                                        \
-                base##s = MR_GC_RESIZE_ARRAY(base##s, type, base##_max);\
+                base##s = MR_GC_RESIZE_ARRAY_ATTRIB(base##s, type,      \
+                    base##_max);                                        \
             }                                                           \
         }                                                               \
     } while(0)
diff --git a/runtime/mercury_bitmap.c b/runtime/mercury_bitmap.c
index 95c9095..a20e737 100644
--- a/runtime/mercury_bitmap.c
+++ b/runtime/mercury_bitmap.c
@@ -15,7 +15,8 @@
 #include <stdio.h>
 
 static int MR_hex_char_to_int(char digit);
-static MR_String MR_do_bitmap_to_string(MR_ConstBitmapPtr, MR_bool, MR_bool);
+static MR_String MR_do_bitmap_to_string(MR_ConstBitmapPtr, MR_bool, MR_bool,
+    MR_AllocSiteInfoPtr);
 
 /*
 ** Note that MR_bitmap_cmp and MR_hash_bitmap are actually defined
@@ -69,20 +70,15 @@ MR_hex_char_to_int(char digit)
 }
 
 MR_String
-MR_bitmap_to_string(MR_ConstBitmapPtr b)
+MR_bitmap_to_quoted_string_saved_hp(MR_ConstBitmapPtr b,
+    MR_AllocSiteInfoPtr alloc_id)
 {
-    return MR_do_bitmap_to_string(b, MR_FALSE, MR_TRUE);
-}
-
-MR_String
-MR_bitmap_to_quoted_string_saved_hp(MR_ConstBitmapPtr b)
-{
-    return MR_do_bitmap_to_string(b, MR_TRUE, MR_TRUE);
+    return MR_do_bitmap_to_string(b, MR_TRUE, MR_TRUE, alloc_id);
 }
 
 static MR_String
 MR_do_bitmap_to_string(MR_ConstBitmapPtr b,
-    MR_bool quote, MR_bool use_saved_hp)
+    MR_bool quote, MR_bool use_saved_hp, MR_AllocSiteInfoPtr alloc_id)
 {
     MR_String result;
     int i;
@@ -101,9 +97,9 @@ MR_do_bitmap_to_string(MR_ConstBitmapPtr b,
     }
 
     if (use_saved_hp) {
-        MR_allocate_aligned_string_saved_hp(result, len);
+        MR_allocate_aligned_string_saved_hp(result, len, alloc_id);
     } else {
-        MR_allocate_aligned_string_msg(result, len, "MR_do_bitmap_to_string");
+        MR_allocate_aligned_string_msg(result, len, alloc_id);
     }
 
     if (quote) {
@@ -127,40 +123,3 @@ MR_do_bitmap_to_string(MR_ConstBitmapPtr b,
     result[len] = '\0';
     return result;
 }
-
-MR_BitmapPtr
-MR_string_to_bitmap(MR_ConstString s)
-{
-    MR_BitmapPtr result;
-    int i;
-    int len;
-    int start;
-    int res;
-    unsigned int result_bits;
-
-    len = strlen(s);
-    if (len < 4 || s[0] != '<' || s[len - 1] != '>') {
-        return NULL;
-    }
-    res = sscanf(s, "<%u:%n", &result_bits, &start);
-    if (res != 1) {
-        return NULL;
-    }
-    MR_allocate_bitmap_msg(result, (MR_Integer) result_bits,
-        "MR_string_to_bitmap");
-    result->num_bits = result_bits;
-    for (i = 0; i < MR_bitmap_length_in_bytes(result_bits); i++) {
-        int h1, h2;
-        if (start + 1 >= len - 1) {
-            return NULL;
-        }
-        h1 = MR_hex_char_to_int(s[start++]);
-        h2 = MR_hex_char_to_int(s[start++]);
-        if (h1 < 0 || h2 < 0) {
-            return NULL;
-        }
-        result->elements[i] = (MR_uint_least8_t) (h1 << 4) | h2;
-    }
-    return result;
-}
-
diff --git a/runtime/mercury_bitmap.h b/runtime/mercury_bitmap.h
index c2dc536..8fc5cb1 100644
--- a/runtime/mercury_bitmap.h
+++ b/runtime/mercury_bitmap.h
@@ -134,25 +134,12 @@ MR_Integer    MR_hash_bitmap(MR_ConstBitmapPtr);
 
 /*
 ** Convert a bitmap to a string consisting of a length followed by a colon
-** and a string of hexadecimal digits surrounded by angle brackets
-** (e.g. "<24:12A>").
-**
-*/
-MR_String MR_bitmap_to_string(MR_ConstBitmapPtr);
-
-/*
-** Convert a bitmap to a string consisting of a length followed by a colon
 ** and a string of hexadecimal digits surrounded by angle brackets and
 ** double quotes (e.g. "\"<24:12A>\"").  Used by `deconstruct.functor/3'.
 **
 */
-MR_String MR_bitmap_to_quoted_string_saved_hp(MR_ConstBitmapPtr);
-
-/*
-** Convert the output of MR_bitmap_to_string back into a bitmap.
-** Returns NULL if the string can't be converted.
-*/
-MR_BitmapPtr MR_string_to_bitmap(MR_ConstString);
+MR_String MR_bitmap_to_quoted_string_saved_hp(MR_ConstBitmapPtr,
+        MR_AllocSiteInfoPtr alloc_id);
 
 /*
 ** Return the length of the element array in words.
@@ -185,24 +172,25 @@ MR_BitmapPtr MR_string_to_bitmap(MR_ConstString);
 ** MR_{save/restore}_transient_hp().
 */
 
-#define MR_allocate_bitmap_msg(ptr, bits, proclabel)                    \
+#define MR_allocate_bitmap_msg(ptr, bits, alloc_id)                     \
     do {                                                                \
         MR_Word    make_bitmap_tmp;                                     \
         MR_BitmapPtr make_bitmap_ptr;                                   \
         MR_offset_incr_hp_atomic_msg(make_bitmap_tmp, 0,                \
-            MR_bitmap_length_in_words(bits) + 1, proclabel,             \
-            "bitmap:bitmap/0");                                         \
+            MR_bitmap_length_in_words(bits) + 1, (alloc_id),            \
+            "bitmap.bitmap/0");                                         \
         make_bitmap_ptr = (MR_BitmapPtr) make_bitmap_tmp;               \
         make_bitmap_ptr->num_bits = bits;                               \
         (ptr) = make_bitmap_ptr;                                        \
     } while(0)
 
-#define MR_allocate_bitmap_saved_hp(ptr, bits)                          \
+#define MR_allocate_bitmap_saved_hp(ptr, bits, alloc_id)                \
     do {                                                                \
         MR_Word    make_bitmap_tmp;                                     \
         MR_BitmapPtr make_bitmap_ptr;                                   \
         MR_offset_incr_saved_hp_atomic(make_bitmap_tmp, 0,              \
-            MR_bitmap_length_in_words(bits) + 1);                       \
+            MR_bitmap_length_in_words(bits) + 1, (alloc_id),            \
+            "bitmap.bitmap/0");                                         \
         make_bitmap_ptr = (MR_BitmapPtr) make_bitmap_tmp;               \
         make_bitmap_ptr->num_bits = bits;                               \
         (ptr) = make_bitmap_ptr;                                        \
diff --git a/runtime/mercury_bootstrap.h b/runtime/mercury_bootstrap.h
index 4bb94e6..dda3098 100644
--- a/runtime/mercury_bootstrap.h
+++ b/runtime/mercury_bootstrap.h
@@ -233,7 +233,6 @@ typedef MR_Bool			Bool;
 #define	mark_hp(d)			MR_mark_hp((d))
 #define	restore_hp(d)			MR_restore_hp((d))
 #define	hp_alloc(c)			MR_hp_alloc((c))
-#define	hp_alloc_atomic(c)		MR_hp_alloc_atomic((c))
 #define	tag_incr_hp_msg(d, t, c, p, ty)	MR_tag_incr_hp_msg((d), (t),	    \
 						(c), p, (ty))
 #define	tag_incr_hp_atomic_msg(d, t, c, p, ty)				    \
diff --git a/runtime/mercury_conf_param.h b/runtime/mercury_conf_param.h
index 05f8584..c28c382 100644
--- a/runtime/mercury_conf_param.h
+++ b/runtime/mercury_conf_param.h
@@ -823,6 +823,18 @@
   #define MR_PROFILE_PARALLEL_EXECUTION_SUPPORT
 #endif
 
+/*
+** Memory attribution profiling requires the procedure names from
+** memory profiling and hooks in Boehm GC.
+*/
+#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+  #error "MR_MPROF_PROFILE_MEMORY_ATTRIBUTION should not be defined " \
+    "on the command line"
+#endif
+#if defined(MR_BOEHM_GC) && defined(MR_MPROF_PROFILE_MEMORY)
+  #define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+#endif
+
 /* XXX document MR_BYTECODE_CALLABLE */
 
 /*
diff --git a/runtime/mercury_context.c b/runtime/mercury_context.c
index ca0c822..2d38cdf 100644
--- a/runtime/mercury_context.c
+++ b/runtime/mercury_context.c
@@ -799,7 +799,7 @@ MR_create_context(const char *id, MR_ContextSize ctxt_size, MR_Generator *gen)
 #endif
 
     if (c == NULL) {
-        c = MR_GC_NEW(MR_Context);
+        c = MR_GC_NEW_ATTRIB(MR_Context, MR_ALLOC_SITE_RUNTIME);
 #ifdef MR_DEBUG_STACK_SEGMENTS
         if (c) {
             MR_debug_log_message("Creating new context: %p", c);
diff --git a/runtime/mercury_deconstruct.c b/runtime/mercury_deconstruct.c
index f62a54e..7e826d3 100644
--- a/runtime/mercury_deconstruct.c
+++ b/runtime/mercury_deconstruct.c
@@ -329,7 +329,7 @@ MR_expand_type_name(MR_TypeCtorInfo tci, MR_bool wrap)
     }
 
     MR_restore_transient_hp();
-    MR_allocate_aligned_string_msg(str, len, "MR_expand_type_name");
+    MR_allocate_aligned_string_msg(str, len, MR_ALLOC_SITE_STRING);
     MR_save_transient_hp();
 
     sprintf(str, wrap? "<<%s.%s/%d>>" : "%s.%s/%d",
diff --git a/runtime/mercury_deconstruct_macros.h b/runtime/mercury_deconstruct_macros.h
index fe02bda..d117a1a 100644
--- a/runtime/mercury_deconstruct_macros.h
+++ b/runtime/mercury_deconstruct_macros.h
@@ -53,7 +53,7 @@
     do {                                                            \
         int     i;                                                  \
                                                                     \
-        var = MR_list_empty_msg(MR_PROC_LABEL);                     \
+        var = MR_list_empty_msg(MR_ALLOC_ID);                       \
         i = (ei).arity;                                             \
                                                                     \
         while (--i >= 0) {                                          \
@@ -66,7 +66,7 @@
                     (ei).args_field.num_extra_args]);               \
                                                                     \
                 /* Join the argument to the front of the list */    \
-            var = MR_univ_list_cons_msg(arg, var, MR_PROC_LABEL);   \
+            var = MR_univ_list_cons_msg(arg, var, MR_ALLOC_ID);     \
         }                                                           \
     } while (0)
 
diff --git a/runtime/mercury_deep_copy_body.h b/runtime/mercury_deep_copy_body.h
index 90cd43e..11c17c4 100644
--- a/runtime/mercury_deep_copy_body.h
+++ b/runtime/mercury_deep_copy_body.h
@@ -36,6 +36,11 @@ static  MR_PseudoTypeInfo   copy_pseudo_type_info(
 static  MR_Word             copy_typeclass_info(MR_Word typeclass_info,
                                 const MR_Word *lower_limit,
                                 const MR_Word *upper_limit);
+#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+static MR_AllocSiteInfoPtr  maybe_attrib(MR_Word *data_value);
+#else
+#define maybe_attrib(x)     NULL
+#endif
 
 /*
 ** We need to make sure that we don't clobber any part of
@@ -214,6 +219,7 @@ try_again:
                 {                                                           \
                     const MR_DuFunctorDesc  *functor_desc;                  \
                     const MR_DuExistInfo    *exist_info;                    \
+                    MR_AllocSiteInfoPtr     attrib;                         \
                     int                     sectag;                         \
                     int                     cell_size;                      \
                     int                     cur_slot;                       \
@@ -241,8 +247,9 @@ try_again:
                     cell_size += MR_SIZE_SLOT_SIZE;                         \
                                                                             \
                     if (exist_info == NULL) {                               \
+                        attrib = maybe_attrib(data_value);                  \
                         MR_offset_incr_saved_hp(new_data, MR_SIZE_SLOT_SIZE, \
-                                cell_size);                                 \
+                                cell_size, attrib, NULL);                   \
                                                                             \
                         MR_copy_size_slot(0, new_data, ptag, data);         \
                         MR_get_first_slot(have_sectag);                     \
@@ -254,8 +261,9 @@ try_again:
                         num_tci = exist_info->MR_exist_tcis;                \
                         cell_size += num_ti_plain + num_tci;                \
                                                                             \
+                        attrib = maybe_attrib(data_value);                  \
                         MR_offset_incr_saved_hp(new_data, MR_SIZE_SLOT_SIZE, \
-                                cell_size);                                 \
+                                cell_size, attrib, NULL);                   \
                                                                             \
                         MR_copy_size_slot(0, new_data, ptag, data);         \
                         MR_get_first_slot(have_sectag);                     \
@@ -398,9 +406,11 @@ try_again:
             RETURN_IF_OUT_OF_RANGE(data, (MR_Word *) data, 0, MR_Word);
 
             {
-                MR_String   new_string;
+                MR_String           new_string;
+                MR_AllocSiteInfoPtr attrib;
+                attrib = maybe_attrib((MR_Word *) data);
                 MR_make_aligned_string_copy_saved_hp(new_string,
-                        (MR_String) data);
+                        (MR_String) data, attrib);
                 new_data = (MR_Word) new_string;
                 leave_forwarding_pointer(data, 0, new_data);
             }
@@ -432,13 +442,16 @@ try_again:
                 MR_Word             new_closure_word;
                 MR_Closure_Layout   *closure_layout;
                 MR_TypeInfo         *type_info_arg_vector;
+                MR_AllocSiteInfoPtr attrib;
 
                 old_closure = (MR_Closure *) data_value;
                 closure_layout = old_closure->MR_closure_layout;
                 args = old_closure->MR_closure_num_hidden_args;
 
                 /* create new closure */
-                MR_offset_incr_saved_hp(new_closure_word, 0, args + 3);
+                attrib = maybe_attrib(data_value);
+                MR_offset_incr_saved_hp(new_closure_word, 0, args + 3,
+                    attrib, NULL);
                 new_closure = (MR_Closure *) new_closure_word;
 
                 /* copy the fixed fields */
@@ -490,6 +503,7 @@ try_again:
             {
                 MR_Word *new_data_ptr;
                 MR_TypeInfo *arg_typeinfo_vector;
+                MR_AllocSiteInfoPtr attrib;
 
                 arity = MR_TYPEINFO_GET_VAR_ARITY_ARITY(type_info);
 
@@ -497,8 +511,9 @@ try_again:
                     new_data = (MR_Word) NULL;
                 } else {
                     /* allocate space for the new tuple */
+                    attrib = maybe_attrib(data_value);
                     MR_offset_incr_saved_hp(new_data, MR_SIZE_SLOT_SIZE,
-                        MR_SIZE_SLOT_SIZE + arity);
+                        MR_SIZE_SLOT_SIZE + arity, attrib, NULL);
                     MR_copy_size_slot(0, new_data, 0, data);
                     new_data_ptr = (MR_Word *) new_data;
 
@@ -536,10 +551,13 @@ try_again:
                 MR_ArrayType *new_array;
                 MR_ArrayType *old_array;
                 MR_Integer array_size;
+                MR_AllocSiteInfoPtr attrib;
 
                 old_array = (MR_ArrayType *) data_value;
                 array_size = old_array->size;
-                MR_offset_incr_saved_hp(new_data, 0, array_size + 1);
+                attrib = maybe_attrib(data_value);
+                MR_offset_incr_saved_hp(new_data, 0, array_size + 1,
+                    attrib, NULL);
                 new_array = (MR_ArrayType *) new_data;
                 new_array->size = array_size;
                 for (i = 0; i < array_size; i++) {
@@ -568,7 +586,8 @@ try_again:
                 MR_BitmapPtr old_array;
 
                 old_array = (MR_BitmapPtr) data_value;
-                MR_allocate_bitmap_saved_hp(new_array, old_array->num_bits);
+                MR_allocate_bitmap_saved_hp(new_array, old_array->num_bits,
+                    NULL);
                 MR_copy_bitmap(new_array, old_array);
                 new_data = (MR_Word) new_array;
                 leave_forwarding_pointer(data, 0, new_data);
@@ -658,13 +677,15 @@ try_again:
             MR_Word *ref;
             MR_Word *new_ref;
             int     i;
+            MR_AllocSiteInfoPtr attrib;
 
             assert(MR_tag(data) == 0);
             ref = (MR_Word *) MR_body(data, MR_mktag(0));
 
             RETURN_IF_OUT_OF_RANGE(data, ref, 0, MR_Word);
 
-            MR_offset_incr_saved_hp(new_data, 0, 1);
+            attrib = maybe_attrib(ref);
+            MR_offset_incr_saved_hp(new_data, 0, 1, attrib, NULL);
             new_ref = (MR_Word *) new_data;
             *new_ref = copy_arg(NULL, *ref, NULL,
                         MR_TYPEINFO_GET_FIXED_ARITY_ARG_VECTOR(type_info),
@@ -792,6 +813,7 @@ copy_type_info(MR_TypeInfo type_info,
         int             arity;
         int             i;
         int             forwarding_pointer_size;
+        MR_AllocSiteInfoPtr attrib;
         
         /*
         ** Note that we assume type_ctor_infos will always be
@@ -821,18 +843,23 @@ copy_type_info(MR_TypeInfo type_info,
             arity = MR_TYPEINFO_GET_VAR_ARITY_ARITY(type_info);
             type_info_args =
                 MR_TYPEINFO_GET_VAR_ARITY_ARG_VECTOR(type_info);
+            attrib = maybe_attrib((MR_Word *) type_info);
             MR_offset_incr_saved_hp(new_type_info_arena_word,
                 forwarding_pointer_size,
-                MR_var_arity_type_info_size(arity) + forwarding_pointer_size);
+                MR_var_arity_type_info_size(arity) + forwarding_pointer_size,
+                attrib, NULL
+            );
             new_type_info_arena = (MR_Word *) new_type_info_arena_word;
             MR_fill_in_var_arity_type_info(new_type_info_arena,
                 type_ctor_info, arity, new_type_info_args);
         } else {
             arity = type_ctor_info->MR_type_ctor_arity;
             type_info_args = MR_TYPEINFO_GET_FIXED_ARITY_ARG_VECTOR(type_info);
+            attrib = maybe_attrib((MR_Word *) type_info);
             MR_offset_incr_saved_hp(new_type_info_arena_word,
                 forwarding_pointer_size,
-                MR_fixed_arity_type_info_size(arity) + forwarding_pointer_size
+                MR_fixed_arity_type_info_size(arity) + forwarding_pointer_size,
+                attrib, NULL
             );
             new_type_info_arena = (MR_Word *) new_type_info_arena_word;
             MR_fill_in_fixed_arity_type_info(new_type_info_arena,
@@ -875,6 +902,7 @@ copy_pseudo_type_info(MR_PseudoTypeInfo pseudo_type_info,
         int                 arity;
         int                 i;
         int                 forwarding_pointer_size;
+        MR_AllocSiteInfoPtr attrib;
         
         /*
         ** Note that we assume type_ctor_infos will always be
@@ -905,10 +933,13 @@ copy_pseudo_type_info(MR_PseudoTypeInfo pseudo_type_info,
             arity = MR_PSEUDO_TYPEINFO_GET_VAR_ARITY_ARITY(pseudo_type_info);
             pseudo_type_info_args =
                 MR_PSEUDO_TYPEINFO_GET_VAR_ARITY_ARG_VECTOR(pseudo_type_info);
+            attrib = maybe_attrib((MR_Word *) pseudo_type_info);
             MR_offset_incr_saved_hp(new_pseudo_type_info_arena_word,
                 forwarding_pointer_size,
                 MR_var_arity_pseudo_type_info_size(arity)
-                    + forwarding_pointer_size);
+                    + forwarding_pointer_size,
+                attrib, NULL
+            );
             new_pseudo_type_info_arena = (MR_Word *)
                 new_pseudo_type_info_arena_word;
             MR_fill_in_var_arity_pseudo_type_info(new_pseudo_type_info_arena,
@@ -917,10 +948,12 @@ copy_pseudo_type_info(MR_PseudoTypeInfo pseudo_type_info,
             arity = type_ctor_info->MR_type_ctor_arity;
             pseudo_type_info_args =
                 MR_PSEUDO_TYPEINFO_GET_FIXED_ARITY_ARG_VECTOR(pseudo_type_info);
+            attrib = maybe_attrib((MR_Word *) pseudo_type_info);
             MR_offset_incr_saved_hp(new_pseudo_type_info_arena_word,
                 forwarding_pointer_size,
                 MR_fixed_arity_pseudo_type_info_size(arity)
-                    + forwarding_pointer_size
+                    + forwarding_pointer_size,
+                attrib, NULL
             );
             new_pseudo_type_info_arena = (MR_Word *)
                 new_pseudo_type_info_arena_word;
@@ -985,7 +1018,8 @@ copy_typeclass_info(MR_Word typeclass_info_param,
         MR_offset_incr_saved_hp(new_typeclass_info_word,
             forwarding_pointer_size,
             forwarding_pointer_size + 1 /* for basetypeclass_info */
-            + num_instance_constraints + num_super + num_arg_typeinfos);
+            + num_instance_constraints + num_super + num_arg_typeinfos,
+            NULL, NULL);
         new_typeclass_info = (MR_Word *) new_typeclass_info_word;
 
         new_typeclass_info[0] = (MR_Word) base_typeclass_info;
@@ -1030,3 +1064,26 @@ copy_typeclass_info(MR_Word typeclass_info_param,
         return (MR_Word) new_typeclass_info;
     }
 }
+
+/*
+** Try to return the allocation identifier for the given object, or NULL.
+** If present, allocation identifiers always occupy the first word of an
+** allocated object, with the Mercury cell starting at the second word.
+*/
+#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+static MR_AllocSiteInfoPtr
+maybe_attrib(MR_Word *data_value)
+{
+    MR_Word *base;
+
+    /* Strings are not always aligned, for example. */
+    if (MR_tag(data_value) == 0) {
+        base = GC_base(data_value);
+        if (&base[1] == data_value) {
+            return (MR_AllocSiteInfoPtr) *base;
+        }
+    }
+
+    return NULL;
+}
+#endif
diff --git a/runtime/mercury_dlist.c b/runtime/mercury_dlist.c
index c1330a8..92da651 100644
--- a/runtime/mercury_dlist.c
+++ b/runtime/mercury_dlist.c
@@ -21,7 +21,7 @@ MR_dlist_makelist0(void)
 {
     MR_Dlist    *list;
 
-    list = MR_GC_NEW(MR_Dlist);
+    list = MR_GC_NEW_ATTRIB(MR_Dlist, MR_ALLOC_SITE_RUNTIME);
     MR_dlist_data(list) = NULL;
     MR_dlist_next(list) = list;
     MR_dlist_prev(list) = list;
@@ -49,7 +49,7 @@ MR_dlist_addhead(MR_Dlist *list, const void *data)
         list = MR_dlist_makelist0();
     }
 
-    item = MR_GC_NEW(MR_Dlist);
+    item = MR_GC_NEW_ATTRIB(MR_Dlist, MR_ALLOC_SITE_RUNTIME);
     MR_dlist_data(item) = data;
     MR_dlist_length_field(list)++;
 
@@ -72,7 +72,7 @@ MR_dlist_addtail(MR_Dlist *list, const void *data)
         list = MR_dlist_makelist0();
     }
 
-    item = MR_GC_NEW(MR_Dlist);
+    item = MR_GC_NEW_ATTRIB(MR_Dlist, MR_ALLOC_SITE_RUNTIME);
     MR_dlist_data(item) = data;
     MR_dlist_length_field(list)++;
 
@@ -147,7 +147,7 @@ MR_dlist_insert_before(MR_Dlist *list, MR_Dlist *where, const void *data)
 {
     MR_Dlist    *item;
 
-    item = MR_GC_NEW(MR_Dlist);
+    item = MR_GC_NEW_ATTRIB(MR_Dlist, MR_ALLOC_SITE_RUNTIME);
     MR_dlist_data(item) = data;
     MR_dlist_length_field(list)++;
 
@@ -164,7 +164,7 @@ MR_dlist_insert_after(MR_Dlist *list, MR_Dlist *where, const void *data)
 {
     MR_Dlist    *item;
 
-    item = MR_GC_NEW(MR_Dlist);
+    item = MR_GC_NEW_ATTRIB(MR_Dlist, MR_ALLOC_SITE_RUNTIME);
     MR_dlist_data(item) = data;
     MR_dlist_length_field(list)++;
 
diff --git a/runtime/mercury_engine.c b/runtime/mercury_engine.c
index 1a6a270..dd9c479 100644
--- a/runtime/mercury_engine.c
+++ b/runtime/mercury_engine.c
@@ -190,7 +190,8 @@ MR_create_engine(void)
     ** since the engine pointer will normally be stored in thread-local
     ** storage, which is not traced by the conservative garbage collector.
     */
-    eng = MR_GC_NEW_UNCOLLECTABLE(MercuryEngine);
+    eng = MR_GC_NEW_UNCOLLECTABLE_ATTRIB(MercuryEngine,
+        MR_ALLOC_SITE_RUNTIME);
     MR_init_engine(eng);
     return eng;
 }
@@ -199,7 +200,7 @@ void
 MR_destroy_engine(MercuryEngine *eng)
 {
     MR_finalize_engine(eng);
-    MR_GC_free(eng);
+    MR_GC_free_attrib(eng);
 }
 
 /*---------------------------------------------------------------------------*/
@@ -485,7 +486,7 @@ dummy_label:
     if (MR_ENGINE(MR_eng_this_context) != NULL) {
         MR_SavedOwner *owner;
 
-        owner = MR_GC_NEW(MR_SavedOwner);
+        owner = MR_GC_NEW_ATTRIB(MR_SavedOwner, MR_ALLOC_SITE_RUNTIME);
         owner->MR_saved_owner_engine = MR_ENGINE(MR_eng_id);
         owner->MR_saved_owner_c_depth = MR_ENGINE(MR_eng_c_depth);
         owner->MR_saved_owner_next =
@@ -521,7 +522,7 @@ MR_define_label(engine_done);
         if ((owner->MR_saved_owner_engine == MR_ENGINE(MR_eng_id)) &&
             owner->MR_saved_owner_c_depth == MR_ENGINE(MR_eng_c_depth))
         {
-            MR_GC_free(owner);
+            MR_GC_free_attrib(owner);
             MR_GOTO_LABEL(engine_done_2);
         }
 
@@ -533,7 +534,7 @@ MR_define_label(engine_done);
         this_ctxt->MR_ctxt_resume_owner_engine = owner->MR_saved_owner_engine;
         this_ctxt->MR_ctxt_resume_c_depth = owner->MR_saved_owner_c_depth;
         this_ctxt->MR_ctxt_resume_engine_required = MR_TRUE;
-        MR_GC_free(owner);
+        MR_GC_free_attrib(owner);
         MR_schedule_context(this_ctxt);
 
         MR_ENGINE(MR_eng_this_context) = NULL;
diff --git a/runtime/mercury_float.h b/runtime/mercury_float.h
index bb61b98..b358b15 100644
--- a/runtime/mercury_float.h
+++ b/runtime/mercury_float.h
@@ -26,19 +26,21 @@
 #endif
 #define MR_FLT_MAX_PRECISION	(MR_FLT_MIN_PRECISION + 2)
 
+#define MR_FLOAT_WORDS		((sizeof(MR_Float) + sizeof(MR_Word) - 1) \
+					/ sizeof(MR_Word))
+
 #ifdef MR_BOXED_FLOAT 
 
 #define MR_word_to_float(w)	(* (MR_Float *) (w))
 
-#define MR_FLOAT_WORDS		((sizeof(MR_Float) + sizeof(MR_Word) - 1) \
-					/ sizeof(MR_Word))
-
 #ifdef MR_CONSERVATIVE_GC
-  #define MR_float_to_word(f) ( \
-		MR_hp_alloc_atomic(MR_FLOAT_WORDS), \
-		* (MR_Float *) (void *) (MR_hp - MR_FLOAT_WORDS) = (f), \
-		/* return */ (MR_Word) (MR_hp - MR_FLOAT_WORDS) \
-	)
+  #define MR_float_to_word(f)                                               \
+    (                                                                       \
+	MR_hp_alloc_atomic_msg(MR_FLOAT_WORDS,                              \
+	    (MR_AllocSiteInfoPtr) MR_ALLOC_SITE_FLOAT, NULL),               \
+	* (MR_Float *) (void *) (MR_hp - MR_FLOAT_WORDS) = (f),             \
+	/* return */ (MR_Word) (MR_hp - MR_FLOAT_WORDS)                     \
+    )
   #define MR_make_hp_float_aligned() ((void)0)
 #else
   /*
@@ -48,16 +50,19 @@
   ** XXX This code assumes that sizeof(MR_Float) is a power of two,
   ** and not greater than 2 * sizeof(MR_Word).
   */
-  #define MR_make_hp_float_aligned() ( \
-		( (MR_Word) MR_hp & (sizeof(MR_Float) - 1) ? \
-			MR_hp_alloc_atomic(1) : (void)0 ) \
-	)
-  #define MR_float_to_word(f) ( \
-		MR_make_hp_float_aligned(), \
-		MR_hp_alloc_atomic(MR_FLOAT_WORDS), \
-		* (MR_Float *) (void *)(MR_hp - MR_FLOAT_WORDS) = (f), \
-		/* return */ (MR_Word) (MR_hp - MR_FLOAT_WORDS) \
-	)
+  #define MR_make_hp_float_aligned()                                        \
+    ( (MR_Word) MR_hp & (sizeof(MR_Float) - 1) ?                            \
+	MR_hp_alloc_atomic_msg(1, MR_ALLOC_SITE_FLOAT, NULL)                \
+    :                                                                       \
+	(void)0                                                             \
+    )
+  #define MR_float_to_word(f)                                               \
+    (                                                                       \
+	MR_make_hp_float_aligned(),                                         \
+	MR_hp_alloc_atomic_msg(MR_FLOAT_WORDS, MR_ALLOC_SITE_FLOAT, NULL),  \
+	* (MR_Float *) (void *)(MR_hp - MR_FLOAT_WORDS) = (f),              \
+	/* return */ (MR_Word) (MR_hp - MR_FLOAT_WORDS)                     \
+    )
 #endif
 
 #ifdef __GNUC__
@@ -108,12 +113,12 @@
 */
 #define MR_SPRINTF_FLOAT_BUF_SIZE   80
 
-#define MR_float_to_string(Float, String)			\
-	do {							\
-		char buf[MR_SPRINTF_FLOAT_BUF_SIZE];		\
-		MR_sprintf_float(buf, Float);			\
-		MR_make_aligned_string_copy(Str, buf);		\
-	} while (0)
+#define MR_float_to_string(Float, String, alloc_id)                         \
+    do {                                                                    \
+	char buf[MR_SPRINTF_FLOAT_BUF_SIZE];                                \
+	MR_sprintf_float(buf, Float);                                       \
+	MR_make_aligned_string_copy_msg(Str, buf, (alloc_id));              \
+    } while (0)
 
 void MR_sprintf_float(char *buf, MR_Float f);
 
diff --git a/runtime/mercury_hash_table.c b/runtime/mercury_hash_table.c
index 1b3391a..fe4dc84 100644
--- a/runtime/mercury_hash_table.c
+++ b/runtime/mercury_hash_table.c
@@ -28,7 +28,8 @@ MR_ht_init_table(MR_Hash_Table *table)
 {
     int i;
 
-    table->MR_ht_store = MR_GC_NEW_ARRAY(MR_Dlist *, table->MR_ht_size);
+    table->MR_ht_store = MR_GC_NEW_ARRAY_ATTRIB(MR_Dlist *, table->MR_ht_size,
+        MR_ALLOC_SITE_RUNTIME);
 
     for (i = 0; i < table->MR_ht_size; i++) {
         table->MR_ht_store[i] = NULL;
diff --git a/runtime/mercury_heap.h b/runtime/mercury_heap.h
index a19e293..c2fc74e 100644
--- a/runtime/mercury_heap.h
+++ b/runtime/mercury_heap.h
@@ -18,10 +18,6 @@
 ** reason for this is MR_float_to_word, which is used not just as an operand
 ** in expressions, but also as an initializer in static cells generated
 ** by the compiler.
-**
-** Note: the macros that take a proclabel as argument do not put parentheses
-** around it. The reason is that we may need to put the `_entry_' prefix
-** in front of the label name, which wouldn't work if it was parenthesized.
 */
 
 #ifndef MERCURY_HEAP_H
@@ -111,7 +107,7 @@
                         + (offset))),                                       \
                 MR_debug_tag_offset_incr_hp_base((dest), (tag), (offset),   \
                     (count), (is_atomic)),                                  \
-                /* return */ (dest)                                         \
+                ((void) 0)                                                  \
             )
 
   #define   MR_tag_offset_incr_hp_n(dest, tag, offset, count)               \
@@ -150,6 +146,9 @@
       */
       #error "MR_INLINE_ALLOC requires the use of GCC"
     #endif
+    #ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+      #error "MR_INLINE_ALLOC and MR_MPROF_PROFILE_MEMORY_ATTRIBUTION both defined"
+    #endif
 
     #include "gc_inline.h"
     #define MR_tag_offset_incr_hp(dest, tag, offset, count)                 \
@@ -240,30 +239,62 @@
 
 /*
 ** The second level of heap allocation macros. These are concerned with
-** recording profiling information for profmem grades.
+** recording profiling information for memory profiling grades. Memory
+** attribution profiling adds an extra word at the start of each object.
 */
 
 #if defined(MR_MPROF_PROFILE_MEMORY)
-  #define   MR_profmem_record_allocation(count, proclabel, type)            \
-            MR_record_allocation((count), MR_ENTRY(proclabel),              \
-                MR_STRINGIFY(proclabel), (type))
+  #define   MR_profmem_record_allocation(count, alloc_id, type)             \
+            MR_record_allocation((count), (alloc_id), (type))
 #else
-  #define   MR_profmem_record_allocation(count, proclabel, type)            \
+  #define   MR_profmem_record_allocation(count, alloc_id, type)             \
+            ((void) 0)
+#endif
+
+#if defined(MR_MPROF_PROFILE_MEMORY_ATTRIBUTION)
+  #define   MR_profmem_attrib_word   (1)
+  #define   MR_profmem_set_attrib(dest, tag, alloc_id)                      \
+            ((MR_Word *) MR_strip_tag(dest))[-1] = (MR_Word) (alloc_id)
+            /*
+            ** XXX This version causes gcc 4.4.4 on x86 to abort when
+            ** compiling mercury_bitmap.c.
+            */
+            /* MR_field((tag), (dest), 0) = (MR_Word) (alloc_id) */
+            /*
+            ** Hand-written code must set the MR_asi_type field at runtime.
+            ** When the type argument is NULL, as it is for generated code,
+            ** the C compiler can optimise away the condition and assignment.
+            */
+  #define   MR_profmem_set_alloc_type(alloc_id, type)                       \
+            ((alloc_id) != NULL && (type) != NULL &&                        \
+             (((MR_AllocSiteInfo *) (alloc_id))->MR_asi_type = (type)))
+#else
+  #define   MR_profmem_attrib_word   (0)
+  #define   MR_profmem_set_attrib(dest, tag, alloc_id)                      \
+            ((void) 0)
+  #define   MR_profmem_set_alloc_type(alloc_id, type)                       \
             ((void) 0)
 #endif
 
 #define     MR_tag_offset_incr_hp_msg(dest, tag, offset, count,             \
-                proclabel, type)                                            \
+                alloc_id, type)                                             \
             (                                                               \
-                MR_profmem_record_allocation((count), proclabel, (type)),   \
-                MR_tag_offset_incr_hp((dest), (tag), (offset), (count))     \
+                MR_tag_offset_incr_hp((dest), (tag),                        \
+                    (offset) + MR_profmem_attrib_word,                      \
+                    (count) + MR_profmem_attrib_word),                      \
+                MR_profmem_set_attrib((dest), (tag), (alloc_id)),           \
+                MR_profmem_set_alloc_type((alloc_id), (type)),              \
+                MR_profmem_record_allocation((count), (alloc_id), (type))   \
             )
 #define     MR_tag_offset_incr_hp_atomic_msg(dest, tag, offset, count,      \
-                proclabel, type)                                            \
+                alloc_id, type)                                             \
             (                                                               \
-                MR_profmem_record_allocation((count), proclabel, (type)),   \
-                MR_tag_offset_incr_hp_atomic((dest), (tag), (offset),       \
-                    (count))                                                \
+                MR_tag_offset_incr_hp_atomic((dest), (tag),                 \
+                    (offset) + MR_profmem_attrib_word,                      \
+                    (count) + MR_profmem_attrib_word),                      \
+                MR_profmem_set_attrib((dest), (tag), (alloc_id)),           \
+                MR_profmem_set_alloc_type((alloc_id), (type)),              \
+                MR_profmem_record_allocation((count), (alloc_id), (type))   \
             )
 
 /***************************************************************************/
@@ -277,12 +308,12 @@
             MR_tag_offset_incr_hp((dest), (tag), 0, (count))
 #define     MR_tag_incr_hp_atomic(dest, tag, count)                         \
             MR_tag_offset_incr_hp_atomic((dest), (tag), 0, (count))
-#define     MR_tag_incr_hp_msg(dest, tag, count, proclabel, type)           \
+#define     MR_tag_incr_hp_msg(dest, tag, count, alloc_id, type)            \
             MR_tag_offset_incr_hp_msg((dest), (tag), 0, (count),            \
-                proclabel, (type))
-#define     MR_tag_incr_hp_atomic_msg(dest, tag, count, proclabel, type)    \
+                (alloc_id), (type))
+#define     MR_tag_incr_hp_atomic_msg(dest, tag, count, alloc_id, type)     \
             MR_tag_offset_incr_hp_atomic_msg((dest), (tag), 0, (count),     \
-                proclabel, (type))
+                (alloc_id), (type))
 
 /*
 ** The MR_offset_incr_hp*() macros are defined in terms of the
@@ -291,16 +322,16 @@
 
 #define     MR_offset_incr_hp(dest, offset, count)                          \
             MR_tag_offset_incr_hp((dest), MR_mktag(0), (offset), (count))
-#define     MR_offset_incr_hp_msg(dest, offset, count, proclabel, type)     \
+#define     MR_offset_incr_hp_msg(dest, offset, count, alloc_id, type)      \
             MR_tag_offset_incr_hp_msg((dest), MR_mktag(0),                  \
-                (offset), (count), proclabel, (type))
+                (offset), (count), (alloc_id), (type))
 #define     MR_offset_incr_hp_atomic(dest, offset, count)                   \
             MR_tag_offset_incr_hp_atomic((dest), MR_mktag(0), (offset),     \
                 (count))
-#define     MR_offset_incr_hp_atomic_msg(dest, offset, count, proclabel,    \
+#define     MR_offset_incr_hp_atomic_msg(dest, offset, count, alloc_id,     \
                 type)                                                       \
             MR_tag_offset_incr_hp_atomic_msg((dest), MR_mktag(0), (offset), \
-                (count), proclabel, (type))
+                (count), (alloc_id), (type))
 
 #ifdef  MR_CONSERVATIVE_GC
             /* we use `MR_hp_word' as a convenient temporary here */
@@ -309,18 +340,19 @@
                 MR_hp_word = (MR_Word) (MR_hp + (count)),                   \
                 (void) 0                                                    \
             )
-  #define   MR_hp_alloc_atomic(count) (                                     \
-                MR_offset_incr_hp_atomic(MR_hp_word, 0, (count)),           \
+  #define   MR_hp_alloc_atomic_msg(count, alloc_id, type) (                 \
+                MR_offset_incr_hp_atomic_msg(MR_hp_word, 0, (count),        \
+                    (alloc_id), (type)),                                    \
                 MR_hp_word = (MR_Word) (MR_hp + (count)),                   \
                 (void) 0                                                    \
             )
-
 #else /* !MR_CONSERVATIVE_GC */
 
   #define   MR_hp_alloc(count)                                              \
             MR_offset_incr_hp(MR_hp_word, 0, (count))
-  #define   MR_hp_alloc_atomic(count)                                       \
-            MR_offset_incr_hp_atomic(MR_hp_word, 0, (count))
+  #define   MR_hp_alloc_atomic_msg(count, alloc_id, type)                   \
+            MR_offset_incr_hp_atomic_msg(MR_hp_word, 0, (count),            \
+                (alloc_id), (type))
 
 #endif /* MR_CONSERVATIVE_GC */
 
@@ -333,14 +365,15 @@
 
 #ifndef MR_RECORD_TERM_SIZES
 
-#define     MR_incr_hp(dest, count)                                         \
+#define     MR_incr_hp(dest, count)                                          \
             MR_offset_incr_hp((dest), 0, (count))
-#define     MR_incr_hp_msg(dest, count, proclabel, type)                    \
-            MR_offset_incr_hp_msg((dest), 0, (count), proclabel, (type))
-#define     MR_incr_hp_atomic(dest, count)                                  \
+#define     MR_incr_hp_msg(dest, count, alloc_id, type)                      \
+            MR_offset_incr_hp_msg((dest), 0, (count), (alloc_id), (type))
+#define     MR_incr_hp_atomic(dest, count)                                   \
             MR_offset_incr_hp_atomic((dest), 0, (count))
-#define     MR_incr_hp_atomic_msg(dest, count, proclabel, type)             \
-            MR_offset_incr_hp_atomic_msg((dest), 0, (count), proclabel, (type))
+#define     MR_incr_hp_atomic_msg(dest, count, alloc_id, type)               \
+            MR_offset_incr_hp_atomic_msg((dest), 0, (count), (alloc_id),     \
+                (type))
 
 #endif
 
@@ -351,15 +384,19 @@
                     (MR_bytes_to_words(sizeof(typename))));                 \
                 (dest) = (typename *) tmp;                                  \
             } while (0)
-#define     MR_incr_hp_type_msg(dest, typename, proclabel, type)            \
+#define     MR_incr_hp_type_msg(dest, typename, alloc_id, type)             \
             do {                                                            \
                 MR_Word tmp;                                                \
                 MR_tag_incr_hp_msg(tmp, MR_mktag(0),                        \
                     (MR_bytes_to_words(sizeof(typename))),                  \
-                    proclabel, (type));                                     \
+                    (alloc_id), (type));                                    \
                 (dest) = (typename *) tmp;                                  \
             } while (0)
 
+/*
+** These are only used by the compiler in non-memory profiling grades,
+** so do not have _msg equivalents.  Avoid these in hand-written code.
+*/
 #define     MR_alloc_heap(dest, count)                                      \
             MR_tag_offset_incr_hp((dest), MR_mktag(0), 0, (count))
 #define     MR_alloc_heap_atomic(dest, count)                               \
@@ -492,11 +529,12 @@
                 ** This assumes that we don't keep term sizes               \
                 ** in grades that use boxes.                                \
                 */                                                          \
-                MR_offset_incr_hp(box_word, 0, size_in_words);              \
+                MR_offset_incr_hp_msg(box_word, 0, size_in_words,           \
+                    MR_ALLOC_SITE_FOREIGN, NULL);                    \
                 box = (MR_Box) box_word;                                    \
                 MR_assign_structure(*(T *)(box), (value));                  \
-                MR_profmem_record_allocation(size_in_words,                   \
-                    "", "foreign type: " MR_STRINGIFY(T));                  \
+                MR_profmem_record_allocation(size_in_words, NULL,           \
+                    "foreign type: " MR_STRINGIFY(T));                      \
             } else {                                                        \
                 /* We can't take the address of `box' here, */              \
                 /* since it might be a global register. */                  \
@@ -560,7 +598,8 @@ MR_create1_func(MR_Word w1)
 {
     MR_Word *p;
 
-    p = (MR_Word *) MR_new_object(MR_Word, 1 * sizeof(MR_Word), "create1");
+    p = (MR_Word *) MR_new_object(MR_Word, 1 * sizeof(MR_Word),
+        NULL, "create1");
     p[0] = w1;
     return (MR_Word) p;
 }
@@ -570,7 +609,8 @@ MR_create2_func(MR_Word w1, MR_Word w2)
 {
     MR_Word *p;
 
-    p = (MR_Word *) MR_new_object(MR_Word, 2 * sizeof(MR_Word), "create2");
+    p = (MR_Word *) MR_new_object(MR_Word, 2 * sizeof(MR_Word),
+        NULL, "create2");
     p[0] = w1;
     p[1] = w2;
     return (MR_Word) p;
@@ -581,7 +621,8 @@ MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3)
 {
     MR_Word *p;
 
-    p = (MR_Word *) MR_new_object(MR_Word, 3 * sizeof(MR_Word), "create3");
+    p = (MR_Word *) MR_new_object(MR_Word, 3 * sizeof(MR_Word),
+        NULL, "create3");
     p[0] = w1;
     p[1] = w2;
     p[2] = w3;
@@ -595,11 +636,11 @@ MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3)
   #define   MR_create3(ti1, w1, ti2, w2, ti3, w3)                           \
             MR_create3_func((w1), (w2), (w3))
 
-  #define   MR_create1_msg(ti1, w1, proclabel, type)                        \
+  #define   MR_create1_msg(ti1, w1, alloc_id, type)                         \
             MR_create1((ti1), (w1))
-  #define   MR_create2_msg(ti1, w1, ti2, w2, proclabel, type)               \
+  #define   MR_create2_msg(ti1, w1, ti2, w2, alloc_id, type)                \
             MR_create2((ti1), (w1), (ti2), (w2))
-  #define   MR_create3_msg(ti1, w1, ti2, w2, ti3, w3, proclabel, type)      \
+  #define   MR_create3_msg(ti1, w1, ti2, w2, ti3, w3, alloc_id, type)       \
             MR_create3((ti1), (w1), (ti2), (w2), (ti3), (w3))
 
 #else /* ! MR_HIGHLEVEL_CODE */
@@ -625,6 +666,19 @@ MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3)
     #define     MR_fill_create3_size(hp, ti1, w1, ti2, w2, ti3, w3)   0
   #endif
 
+  #ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    #define     MR_fill_create1_origin(hp, alloc_id)                        \
+                (hp[-2] = (MR_Word) (alloc_id))
+    #define     MR_fill_create2_origin(hp, alloc_id)                        \
+                (hp[-3] = (MR_Word) (alloc_id))
+    #define     MR_fill_create3_origin(hp, alloc_id)                        \
+                (hp[-4] = (MR_Word) (alloc_id))
+  #else
+    #define     MR_fill_create1_origin(hp, alloc_id)     ((void) 0)
+    #define     MR_fill_create2_origin(hp, alloc_id)     ((void) 0)
+    #define     MR_fill_create3_origin(hp, alloc_id)     ((void) 0)
+  #endif
+
 /*
 ** Note that gcc optimizes `hp += 2; return hp - 2;'
 ** to `tmp = hp; hp += 2; return tmp;', so we don't need to use
@@ -661,44 +715,50 @@ MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3)
                 MR_hp[-1] = (MR_Word) (w3),                                 \
                 MR_fill_create3_size(MR_hp, ti1, w1, ti2, w2, ti3, w3),     \
                 MR_debugcr3(MR_hp),                                         \
-                /* return */ (MR_Word) (MR_hp - 3)                              \
+                /* return */ (MR_Word) (MR_hp - 3)                          \
             )
 
 /* used only by hand-written code not by the automatically generated code */
-  #define   MR_create1_msg(ti1, w1, proclabel, type)                        \
+  #define   MR_create1_msg(ti1, w1, alloc_id, type)                         \
             (                                                               \
                 MR_profmem_record_allocation(MR_SIZE_SLOT_SIZE + 1,         \
-                    proclabel, (type)),                                     \
-                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 1),                         \
+                    (alloc_id), (type)),                                    \
+                MR_profmem_set_alloc_type((alloc_id), (type)),              \
+                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 1 + MR_profmem_attrib_word),\
                 MR_hp[-1] = (MR_Word) (w1),                                 \
                 MR_fill_create1_size(MR_hp, ti1, w1),                       \
+                MR_fill_create1_origin(MR_hp, (alloc_id)),                  \
                 MR_debugcr1(MR_hp),                                         \
                 /* return */ (MR_Word) (MR_hp - 1)                          \
             )
 
 /* used only by hand-written code not by the automatically generated code */
-  #define   MR_create2_msg(ti1, w1, ti2, w2, proclabel, type)               \
+  #define   MR_create2_msg(ti1, w1, ti2, w2, alloc_id, type)                \
             (                                                               \
                 MR_profmem_record_allocation(MR_SIZE_SLOT_SIZE + 2,         \
-                    proclabel, (type)),                                     \
-                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 2),                         \
+                    (alloc_id), (type)),                                    \
+                MR_profmem_set_alloc_type((alloc_id), (type)),              \
+                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 2 + MR_profmem_attrib_word),\
                 MR_hp[-2] = (MR_Word) (w1),                                 \
                 MR_hp[-1] = (MR_Word) (w2),                                 \
                 MR_fill_create2_size(MR_hp, ti1, w1, ti2, w2),              \
+                MR_fill_create2_origin(MR_hp, (alloc_id)),                  \
                 MR_debugcr2(MR_hp),                                         \
                 /* return */ (MR_Word) (MR_hp - 2)                          \
             )
 
 /* used only by hand-written code not by the automatically generated code */
-  #define   MR_create3_msg(ti1, w1, ti2, w2, ti3, w3, proclabel, type)      \
+  #define   MR_create3_msg(ti1, w1, ti2, w2, ti3, w3, alloc_id, type)       \
             (                                                               \
                 MR_profmem_record_allocation(MR_SIZE_SLOT_SIZE + 3,         \
-                    proclabel, (type)),                                     \
-                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 3),                         \
+                    (alloc_id), (type)),                                    \
+                MR_profmem_set_alloc_type((alloc_id), (type)),              \
+                MR_hp_alloc(MR_SIZE_SLOT_SIZE + 3 + MR_profmem_attrib_word),\
                 MR_hp[-3] = (MR_Word) (w1),                                 \
                 MR_hp[-2] = (MR_Word) (w2),                                 \
                 MR_hp[-1] = (MR_Word) (w3),                                 \
                 MR_fill_create3_size(MR_hp, ti1, w1, ti2, w2, ti3, w3),     \
+                MR_fill_create3_origin(MR_hp, (alloc_id)),                  \
                 MR_debugcr3(MR_hp),                                         \
                 /* return */ (MR_Word) (MR_hp - 3)                          \
             )
@@ -717,17 +777,19 @@ MR_create3_func(MR_Word w1, MR_Word w2, MR_Word w3)
 ** to think about the implications of their code for term size profiling.
 */
 
-#define MR_offset_incr_saved_hp(dest, offset, count)                        \
+#define MR_offset_incr_saved_hp(dest, offset, count, alloc_id, type)        \
         do {                                                                \
             MR_restore_transient_hp();                                      \
-            MR_offset_incr_hp((dest), (offset), (count));                   \
+            MR_offset_incr_hp_msg((dest), (offset), (count),                \
+                (alloc_id), (type));                                        \
             MR_save_transient_hp();                                         \
         } while (0)
 
-#define MR_offset_incr_saved_hp_atomic(dest, offset, count)                 \
+#define MR_offset_incr_saved_hp_atomic(dest, offset, count, alloc_id, type) \
         do {                                                                \
             MR_restore_transient_hp();                                      \
-            MR_offset_incr_hp_atomic((dest), (offset), (count));            \
+            MR_offset_incr_hp_atomic_msg((dest), (offset), (count),         \
+                (alloc_id), (type));                                        \
             MR_save_transient_hp();                                         \
         } while (0)
 
diff --git a/runtime/mercury_heap_profile.c b/runtime/mercury_heap_profile.c
index 06dcc0a..30880e1 100644
--- a/runtime/mercury_heap_profile.c
+++ b/runtime/mercury_heap_profile.c
@@ -2,14 +2,14 @@
 ** vim: ts=4 sw=4 expandtab
 */
 /*
-** Copyright (C) 1997, 1999-2001, 2006 The University of Melbourne.
+** Copyright (C) 1997, 1999-2001, 2006, 2011 The University of Melbourne.
 ** This file may only be copied under the terms of the GNU Library General
 ** Public License - see the file COPYING.LIB in the Mercury distribution.
 */
 
 /*
 ** File: mercury_heap_profile.c.
-** Main authors: zs, fjh
+** Main authors: zs, fjh, wangp.
 **
 ** This module records information about the allocations of cells on the heap.
 **
@@ -27,6 +27,7 @@
 #include "mercury_prof_mem.h"
 #include "mercury_dword.h"
 #include "mercury_heap_profile.h"
+#include "mercury_runtime_util.h"
 
 /* all fields of these variables are initialized to 0 */
 MR_memprof_counter  MR_memprof_overall;
@@ -67,8 +68,8 @@ MR_increment_counter(MR_memprof_counter *counter, int size)
 */
 
 static void
-MR_increment_table_entry(MR_memprof_table *table, const char *name,
-    MR_Code *addr, int size)
+MR_increment_table_entry(MR_memprof_table *table,
+    const MR_Code *proc, const char *type_name, int size)
 {
     MR_bool             found;
     int                 diff;
@@ -76,29 +77,41 @@ MR_increment_table_entry(MR_memprof_table *table, const char *name,
     MR_memprof_record   *node;
 
     /*
-    ** Search the tree to find the node with this name.
+    ** Search the tree either by procedure address or by type name.
     */
     found = MR_FALSE;
     node_addr = &table->root;
-    while ((node = *node_addr) != NULL) {
-        diff = strcmp(name, node->name);
-        if (diff < 0) {
-            node_addr = &node->left;
-        } else if (diff > 0) {
-            node_addr = &node->right;
-        } else {
-            found = MR_TRUE;
-            break;
+    if (proc != NULL) {
+        while ((node = *node_addr) != NULL) {
+            diff = proc - node->proc;
+            if (diff < 0) {
+                node_addr = &node->left;
+            } else if (diff > 0) {
+                node_addr = &node->right;
+            } else {
+                found = MR_TRUE;
+                break;
+            }
+        }
+    } else {
+        while ((node = *node_addr) != NULL) {
+            diff = strcmp(type_name, node->type_name);
+            if (diff < 0) {
+                node_addr = &node->left;
+            } else if (diff > 0) {
+                node_addr = &node->right;
+            } else {
+                found = MR_TRUE;
+                break;
+            }
         }
     }
 
     /*
-    ** If the tree didn't already contain a node with this name,
-    ** create a new node for it.
+    ** If the tree didn't already contain a node with this procedure address or
+    ** type name, create a new node for it.
     */
     if (!found) {
-        char *copy_of_name;
-
         node = MR_PROF_NEW(MR_memprof_record);
         /*
         ** We need to make a fresh copy of the name, rather than just copying
@@ -107,10 +120,15 @@ MR_increment_table_entry(MR_memprof_table *table, const char *name,
         ** be a string literal from a dlopen()'ed module which will later
         ** get dlclose()'d.
         */
-        copy_of_name = MR_PROF_NEW_ARRAY(char, strlen(name) + 1);
-        strcpy(copy_of_name, name);
-        node->name = copy_of_name;
-        node->addr = addr;
+        if (type_name != NULL) {
+            size_t len = strlen(type_name);
+            char *copy_of_name = MR_PROF_NEW_ARRAY(char, len + 1);
+            MR_memcpy(copy_of_name, type_name, len + 1);
+            node->type_name = copy_of_name;
+        } else {
+            node->type_name = NULL;
+        }
+        node->proc = proc;
         node->left = NULL;
         node->right = NULL;
         MR_init_counter(&node->counter);
@@ -130,10 +148,10 @@ MR_increment_table_entry(MR_memprof_table *table, const char *name,
 */
 
 void
-MR_record_allocation(int size, MR_Code *proc_addr,
-    const char *proc_name, const char *type)
+MR_record_allocation(int size, const MR_AllocSiteInfoPtr alloc_id,
+    const char *type)
 {
-    if (!profile_heap) {
+    if (!profile_heap || alloc_id == NULL || alloc_id->MR_asi_proc == NULL) {
         return;
     }
 
@@ -143,8 +161,15 @@ MR_record_allocation(int size, MR_Code *proc_addr,
     ** record the allocation in the per-type table.
     */
     MR_increment_counter(&MR_memprof_overall, size);
-    MR_increment_table_entry(&MR_memprof_procs, proc_name, proc_addr, size);
-    MR_increment_table_entry(&MR_memprof_types, type, NULL, size);
+    MR_increment_table_entry(&MR_memprof_procs, alloc_id->MR_asi_proc, NULL,
+        size);
+    if (type == NULL) {
+        type = alloc_id->MR_asi_type;
+        if (type == NULL) {
+            type = "unknown";
+        }
+    }
+    MR_increment_table_entry(&MR_memprof_types, NULL, type, size);
 }
 
 void
@@ -158,3 +183,393 @@ MR_prof_turn_off_heap_profiling(void)
 {
     profile_heap = MR_FALSE;
 }
+
+/*---------------------------------------------------------------------------*/
+/*
+** Memory attribution profiling
+**
+** Memory attribution profiling actually bears no particular relationship to
+** regular memory profiling, but in the interests of reducing the number of
+** grades both are lumped under the `--memory-profiling' option.
+**
+** For every memory cell, we allocate an extra memory word at the front of the
+** object which points to an MR_AllocSiteInfo structure, indicating the
+** procedure which allocated the cell, its type, and its true (desired) size.
+** We call that the "attribution".
+**
+** When `benchmarking.report_memory_attribution' is called we force a GC.
+** Using hooks inserted into Boehm GC, a function is called for every live
+** memory object on the heap.  During that callback we increment the counters
+** in a hash table (for attributed objects), or counters in a binary tree (for
+** unattributed objects, or Mercury runtime objects that we don't care to
+** distinguish).
+*/
+
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+
+typedef struct MR_AttribCount_Struct    MR_AttribCount;
+typedef struct MR_VarSizeCount_Struct   MR_VarSizeCount;
+
+/*
+** Counts for attributed objects.
+*/
+struct MR_AttribCount_Struct {
+    unsigned                MR_atc_id;
+    MR_AllocSiteInfo const  *MR_atc_alloc_site;
+    size_t                  MR_atc_num_cells;
+    size_t                  MR_atc_num_words;
+};
+
+/*
+** Objects which are unattributed, or explicitly attributed as runtime
+** structures, may come in many different sizes.  We store the counters for
+** each different size as a separate node in a binary search tree.
+*/
+struct MR_VarSizeCount_Struct {
+    size_t              MR_vsc_size;
+    size_t              MR_vsc_count;
+    MR_VarSizeCount     *MR_vsc_left;
+    MR_VarSizeCount     *MR_vsc_right;
+};
+
+#define SNAPSHOTS_FILENAME              "Prof.Snapshots"
+#define KNOWN_COUNT_TABLE_INITIAL_SIZE  (1 << 8)    /* power of two */
+
+static MR_AttribCount   *attrib_count_table;
+static size_t           attrib_count_table_size;
+static size_t           attrib_count_table_used;
+static MR_VarSizeCount  *runtime_count_tree;
+static MR_VarSizeCount  *unknown_count_tree;
+static int              snapshot_counter;
+static const char       *snapshot_label;
+static FILE             *snapshot_file;
+
+static void     add_attrib_count_entry(MR_AttribCount *table,
+                    size_t table_size, size_t *table_used, unsigned id,
+                    const MR_AllocSiteInfo *alloc_site);
+static void     rehash_attrib_count_table(void);
+static unsigned hash_addr(MR_Word key);
+static void     reachable_object_callback(GC_word *p, size_t words);
+static MR_bool  increment_attrib_count(MR_Word addr, size_t num_words);
+static void     increment_var_size_count(MR_VarSizeCount **node, size_t words);
+static void     stop_collect_callback(void);
+static void     write_attrib_counts(FILE *fp, MR_AttribCount *table,
+                    size_t table_size);
+static void     write_var_size_counts(FILE *fp, const char *prefix,
+                    MR_VarSizeCount *node);
+static const char *maybe_filename(const char *s);
+
+#define MR_NUM_BUILTIN_ALLOC_SITES  6
+
+MR_AllocSiteInfo MR_builtin_alloc_sites[MR_NUM_BUILTIN_ALLOC_SITES] = {
+    /* These must match the macros in mercury_memory.h. */
+    { NULL, "runtime", 0, "<runtime structs>",  0 },
+    { NULL, "unknown", 0, "float.float/0",      MR_FLOAT_WORDS },
+    { NULL, "unknown", 0, "string.string/0",    0 },
+    { NULL, "unknown", 0, "<foreign>",          0 },
+    { NULL, "unknown", 0, "<tabling structs>",  0 },
+    { NULL, "unknown", 0, "type_info/0",        0 }
+};
+
+void
+MR_register_alloc_sites(const MR_AllocSiteInfo *alloc_sites, int size)
+{
+    size_t      bytes;
+    unsigned    id;
+    int         i;
+
+    if (attrib_count_table == NULL) {
+        /* We must not use GC allocation here. */
+        attrib_count_table_size = KNOWN_COUNT_TABLE_INITIAL_SIZE;
+        bytes = attrib_count_table_size * sizeof(MR_AttribCount);
+        attrib_count_table = MR_malloc(bytes);
+        memset(attrib_count_table, 0, bytes);
+
+        MR_register_alloc_sites(MR_builtin_alloc_sites,
+            MR_NUM_BUILTIN_ALLOC_SITES);
+    }
+
+    for (i = 0; i < size; i++) {
+        /* Enlarge the hash table if necessary. */
+        if (attrib_count_table_size > 0 &&
+            2 * attrib_count_table_used >= attrib_count_table_size)
+        {
+            rehash_attrib_count_table();
+        }
+
+        id = attrib_count_table_used + 1;
+        add_attrib_count_entry(attrib_count_table, attrib_count_table_size,
+            &attrib_count_table_used, id, &alloc_sites[i]);
+    }
+}
+
+static void
+add_attrib_count_entry(MR_AttribCount *table, size_t table_size,
+    size_t *table_used, unsigned id, const MR_AllocSiteInfo *alloc_site)
+{
+    MR_AttribCount   *entry;
+    unsigned        i;
+
+    i = hash_addr((MR_Word) alloc_site) & (table_size - 1);
+    for (;;) {
+        assert(i < table_size);
+        entry = &table[i];
+        if (entry->MR_atc_alloc_site == alloc_site) {
+            break;
+        }
+        if (entry->MR_atc_alloc_site == NULL) {
+            entry->MR_atc_id = id;
+            entry->MR_atc_alloc_site = alloc_site;
+            (*table_used)++;
+            break;
+        }
+        i = (i + 1) & (table_size - 1);
+    }
+}
+
+static void
+rehash_attrib_count_table(void)
+{
+    MR_AttribCount    *new_table;
+    size_t     new_size;
+    size_t     new_used;
+    size_t     i;
+
+    new_size = attrib_count_table_size * 2;
+    new_table = MR_malloc(new_size * sizeof(MR_AttribCount));
+    memset(new_table, 0, new_size * sizeof(MR_AttribCount));
+
+    new_used = 0;
+    for (i = 0; i < attrib_count_table_size; i++) {
+        if (attrib_count_table[i].MR_atc_alloc_site != NULL) {
+            add_attrib_count_entry(new_table, new_size, &new_used,
+                attrib_count_table[i].MR_atc_id,
+                attrib_count_table[i].MR_atc_alloc_site);
+        }
+    }
+
+    MR_free(attrib_count_table);
+
+    attrib_count_table_size = new_size;
+    attrib_count_table = new_table;
+    assert(attrib_count_table_used == new_used);
+}
+
+/* http://www.concentric.net/~ttwang/tech/inthash.htm */
+static unsigned
+hash_addr(MR_Word key)
+{
+    if (sizeof(MR_Word) == 4) {
+        unsigned c2 = 0x27d4eb2d; /* a prime or an odd constant */
+        key = (key ^ 61) ^ (key >> 16);
+        key = key + (key << 3);
+        key = key ^ (key >> 4);
+        key = key * c2;
+        key = key ^ (key >> 15);
+        return key;
+    } else {
+        key = (~key) + (key << 18);
+        key = key ^ (key >> 31);
+        key = key * 21;
+        key = key ^ (key >> 11);
+        key = key + (key << 6);
+        key = key ^ (key >> 22);
+        return (unsigned) key;
+    }
+}
+
+void
+MR_report_memory_attribution(const char *label)
+{
+#ifdef MR_BOEHM_GC
+    void *old_reachable_cb = GC_mercury_callback_reachable_object;
+    void *old_stop_cb = GC_mercury_callback_stop_collect;
+
+    GC_mercury_callback_reachable_object = reachable_object_callback;
+    GC_mercury_callback_stop_collect = stop_collect_callback;
+    snapshot_label = label;
+
+  #ifndef MR_HIGHLEVEL_CODE
+	/* clear out the stacks and registers before garbage collecting */
+	MR_clear_zone_for_GC(MR_CONTEXT(MR_ctxt_detstack_zone), MR_sp + 1);
+	MR_clear_zone_for_GC(MR_CONTEXT(MR_ctxt_nondetstack_zone),
+		MR_maxfr + 1);
+	MR_clear_regs_for_GC();
+  #endif
+
+    GC_gcollect();
+
+    GC_mercury_callback_reachable_object = old_reachable_cb;
+    GC_mercury_callback_stop_collect = old_stop_cb;
+    snapshot_label = NULL;
+#endif
+}
+
+static void
+reachable_object_callback(GC_word *p, size_t words)
+{
+    MR_Word addr;
+
+    addr = (MR_Word) p[0];
+
+    if ((void *) addr == MR_ALLOC_SITE_RUNTIME) {
+        increment_var_size_count(&runtime_count_tree, words);
+        return;
+    }
+
+    if (addr == (MR_Word) NULL ||
+        !increment_attrib_count(addr, words))
+    {
+        increment_var_size_count(&unknown_count_tree, words);
+    }
+}
+
+static MR_bool
+increment_attrib_count(MR_Word addr, size_t num_words)
+{
+    MR_AttribCount   *entry;
+    MR_Unsigned     orig;
+    MR_Unsigned     i;
+
+    orig = i = hash_addr(addr) & (attrib_count_table_size - 1);
+    do {
+        assert(i < attrib_count_table_size);
+        entry = &attrib_count_table[i];
+        if ((MR_Word) entry->MR_atc_alloc_site == addr) {
+            entry->MR_atc_num_cells++;
+            entry->MR_atc_num_words += num_words;
+            return MR_TRUE;
+        }
+        if (entry->MR_atc_alloc_site == NULL) {
+            return MR_FALSE;
+        }
+        i = (i + 1) & (attrib_count_table_size - 1);
+    } while (i != orig);
+
+    return MR_FALSE;
+}
+
+static void
+increment_var_size_count(MR_VarSizeCount **node, size_t words)
+{
+    while (*node != NULL) {
+        if ((*node)->MR_vsc_size == words) {
+            (*node)->MR_vsc_count++;
+            return;
+        } else if (words < (*node)->MR_vsc_size) {
+            node = &(*node)->MR_vsc_left;
+        } else {
+            node = &(*node)->MR_vsc_right;
+        }
+    }
+
+    /* We must not use GC allocation here. */
+    *node = MR_NEW(MR_VarSizeCount);
+    (*node)->MR_vsc_size = words;
+    (*node)->MR_vsc_count = 1;
+    (*node)->MR_vsc_left = NULL;
+    (*node)->MR_vsc_right = NULL;
+}
+
+static void
+stop_collect_callback(void)
+{
+    if (snapshot_file == NULL) {
+        snapshot_file = MR_checked_fopen(SNAPSHOTS_FILENAME, "create", "w");
+    }
+
+    snapshot_counter++;
+    fprintf(snapshot_file, "start [%d] %s\n",
+        snapshot_counter, snapshot_label);
+
+    write_attrib_counts(snapshot_file, attrib_count_table,
+        attrib_count_table_size);
+    write_var_size_counts(snapshot_file, "runtime", runtime_count_tree);
+    write_var_size_counts(snapshot_file, "unknown", unknown_count_tree);
+
+    fprintf(snapshot_file, "end [%d] %s\n",
+        snapshot_counter, snapshot_label);
+}
+
+static void
+write_attrib_counts(FILE *fp, MR_AttribCount *table, size_t table_size)
+{
+    size_t i;
+
+    for (i = 0; i < table_size; i++) {
+        if (table[i].MR_atc_alloc_site != NULL &&
+            table[i].MR_atc_num_cells != 0)
+        {
+            fprintf(fp, "%d %lu %lu\n",
+                table[i].MR_atc_id,
+                table[i].MR_atc_num_cells,
+                table[i].MR_atc_num_words);
+
+            table[i].MR_atc_num_cells = 0;
+            table[i].MR_atc_num_words = 0;
+        }
+    }
+}
+
+static void
+write_var_size_counts(FILE *fp, const char *prefix, MR_VarSizeCount *node)
+{
+    while (node != NULL) {
+        write_var_size_counts(fp, prefix, node->MR_vsc_left);
+
+        if (node->MR_vsc_count != 0) {
+            fprintf(fp, "%s %ld %ld\n",
+                prefix,
+                node->MR_vsc_count,
+                node->MR_vsc_size);
+            node->MR_vsc_count = 0;
+        }
+
+        node = node->MR_vsc_right;
+    }
+}
+
+void
+MR_finish_prof_snapshots_file(void)
+{
+    FILE                    *fp;
+    const MR_AllocSiteInfo  *site;
+    size_t                  i;
+
+    if (!(fp = snapshot_file)) {
+        return;
+    }
+
+    fprintf(fp, "size_map");
+    GC_mercury_write_size_map(fp);
+    fprintf(fp, "\n");
+
+    for (i = 0; i < attrib_count_table_size; i++) {
+        site = attrib_count_table[i].MR_atc_alloc_site;
+        if (site != NULL) {
+            fprintf(fp, "%d\t", attrib_count_table[i].MR_atc_id);
+            fprintf(fp, "%s\t", MR_lookup_entry_or_internal(site->MR_asi_proc));
+            fprintf(fp, "%s\t", maybe_filename(site->MR_asi_file_name));
+            fprintf(fp, "%d\t", site->MR_asi_line_number);
+            fprintf(fp, "%s\t", site->MR_asi_type);
+            fprintf(fp, "%d\n", site->MR_asi_words);
+        }
+    }
+
+    MR_checked_fclose(snapshot_file, SNAPSHOTS_FILENAME);
+    snapshot_file = NULL;
+}
+
+static const char *
+maybe_filename(const char *s)
+{
+    if (s == NULL || s[0] == '\0') {
+        return "(unknown)";
+    } else {
+        return s;
+    }
+}
+
+#endif  /* MR_MPROF_PROFILE_MEMORY_ATTRIBUTION */
+
+/*---------------------------------------------------------------------------*/
diff --git a/runtime/mercury_heap_profile.h b/runtime/mercury_heap_profile.h
index df8752e..e31f366 100644
--- a/runtime/mercury_heap_profile.h
+++ b/runtime/mercury_heap_profile.h
@@ -62,8 +62,8 @@ typedef	struct MR_memprof_counter
 /* type representing a binary tree node */
 typedef	struct MR_memprof_record
 {
-	const char			*name; /* of the type or procedure */
-	MR_Code				*addr; /* for procedures only */
+	const char			*type_name; /* type name or NULL */
+	MR_STATIC_CODE_CONST MR_Code	*proc; /* proc address or NULL */
 	MR_memprof_counter		counter;
 	struct MR_memprof_record	*left;	/* left sub-tree */
 	struct MR_memprof_record	*right;	/* right sub-tree */
@@ -89,16 +89,15 @@ extern	MR_memprof_table	MR_memprof_types;
 /* function declarations */
 
 /*
-** MR_record_allocation(size, proc_addr, proc_name, type):
+** MR_record_allocation(size, alloc_id, type):
 **
 ** Record heap profiling information for an allocation of one cell of `size'
-** words by procedure `proc_name' with address `proc_addr' for an object of
-** type `type'. The heap profiling information is recorded in the three global
+** words.  The heap profiling information is recorded in the three global
 ** variables above.
 */
 
-extern void	MR_record_allocation(int size, MR_Code *proc_addr,
-			const char *proc_name, const char *type);
+extern void	MR_record_allocation(int size,
+		    const MR_AllocSiteInfoPtr alloc_id, const char *type_name);
 
 /*
 ** MR_prof_output_mem_tables():
@@ -119,6 +118,16 @@ extern void MR_prof_turn_off_heap_profiling(void);
 
 /*---------------------------------------------------------------------------*/
 
+extern void MR_register_alloc_sites(const MR_AllocSiteInfo *alloc_sites,
+			int size);
+
+extern void MR_report_memory_attribution(MR_ConstString label);
+
+extern void MR_finish_prof_snapshots_file(void);
+
+/*---------------------------------------------------------------------------*/
+
 #endif /* MERCURY_HEAP_PROFILE_H */
 
 /*---------------------------------------------------------------------------*/
+/* vim: set ts=4 sts=4 sw=4 noet: */
diff --git a/runtime/mercury_ho_call.c b/runtime/mercury_ho_call.c
index 9d90ea4..3cb8e63 100644
--- a/runtime/mercury_ho_call.c
+++ b/runtime/mercury_ho_call.c
@@ -944,7 +944,8 @@ MR_make_closure(MR_Code *proc_addr)
     /*
     ** Construct the MR_ClosureId.
     */
-    MR_incr_hp_type(closure_id, MR_ClosureId);
+    MR_incr_hp_type_msg(closure_id, MR_ClosureId,
+        MR_ALLOC_SITE_RUNTIME, NULL);
     closure_id->MR_closure_proc_id.MR_proc_user.MR_user_pred_or_func =
         MR_PREDICATE;
     closure_id->MR_closure_proc_id.MR_proc_user.MR_user_decl_module =
@@ -956,12 +957,14 @@ MR_make_closure(MR_Code *proc_addr)
     closure_id->MR_closure_module_name = "dl";
     closure_id->MR_closure_file_name = __FILE__;
     closure_id->MR_closure_line_number = __LINE__;
-    MR_make_aligned_string_copy(closure_id->MR_closure_goal_path, buf);
+    MR_make_aligned_string_copy_msg(closure_id->MR_closure_goal_path, buf,
+        MR_ALLOC_SITE_STRING);
 
     /*
     ** Construct the MR_Closure_Layout.
     */
-    MR_incr_hp_type(closure_layout, MR_Closure_Dyn_Link_Layout);
+    MR_incr_hp_type_msg(closure_layout, MR_Closure_Dyn_Link_Layout,
+        MR_ALLOC_SITE_RUNTIME, NULL);
     closure_layout->MR_closure_dl_id = closure_id;
     closure_layout->MR_closure_dl_type_params = NULL;
     closure_layout->MR_closure_dl_num_all_args = 0;
@@ -974,7 +977,8 @@ MR_make_closure(MR_Code *proc_addr)
 #else
     num_hidden_args = 0;
 #endif
-    MR_offset_incr_hp(closure_word, 0, 3 + num_hidden_args);
+    MR_offset_incr_hp_msg(closure_word, 0, 3 + num_hidden_args,
+        MR_ALLOC_SITE_RUNTIME, NULL);
     closure = (MR_Closure *) closure_word;
 
     closure->MR_closure_layout = (MR_Closure_Layout *) closure_layout;
diff --git a/runtime/mercury_label.c b/runtime/mercury_label.c
index 0fa1b48..4b9b4a7 100644
--- a/runtime/mercury_label.c
+++ b/runtime/mercury_label.c
@@ -248,7 +248,7 @@ MR_insert_internal_label(const char *name, MR_Code *addr,
 
     MR_do_init_label_tables();
 
-    internal = MR_GC_NEW(MR_Internal);
+    internal = MR_GC_NEW_ATTRIB(MR_Internal, MR_ALLOC_SITE_RUNTIME);
     internal->MR_internal_addr = addr;
     internal->MR_internal_layout = label_layout;
     internal->MR_internal_name = name;
diff --git a/runtime/mercury_memory.c b/runtime/mercury_memory.c
index 6387c88..aab0eca 100644
--- a/runtime/mercury_memory.c
+++ b/runtime/mercury_memory.c
@@ -387,3 +387,87 @@ MR_GC_realloc(void *old_ptr, size_t num_bytes)
 }
 
 /*---------------------------------------------------------------------------*/
+
+void *
+MR_GC_malloc_attrib(size_t num_bytes, void *attrib)
+{
+    MR_Word     *ptr;
+
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    ptr = MR_GC_malloc(sizeof(MR_Word) + num_bytes);
+    *ptr = (MR_Word) attrib;
+    ptr++;
+#else
+    ptr = MR_GC_malloc(num_bytes);
+#endif
+
+    return ptr;
+}
+
+void *
+MR_GC_malloc_uncollectable_attrib(size_t num_bytes, void *attrib)
+{
+    MR_Word     *ptr;
+
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    ptr = MR_GC_malloc_uncollectable(num_bytes + sizeof(MR_Word));
+    *ptr = (MR_Word) attrib;
+    ptr++;
+#else
+    ptr = MR_GC_malloc_uncollectable(num_bytes);
+#endif
+
+    return ptr;
+}
+
+void *
+MR_GC_realloc_attrib(void *ptr, size_t num_bytes)
+{
+    MR_Word     *wptr = ptr;
+
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    wptr = MR_GC_realloc(wptr - 1, num_bytes + sizeof(MR_Word));
+    wptr = wptr + 1;
+#else
+    wptr = MR_GC_realloc(wptr, num_bytes);
+#endif
+
+    return wptr;
+}
+
+void
+MR_GC_free_attrib(void *ptr)
+{
+#ifdef  MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    ptr = (char *) ptr - sizeof(MR_Word);
+#endif
+    GC_free(ptr);
+}
+
+/*---------------------------------------------------------------------------*/
+
+void *
+MR_new_object_func(size_t num_bytes, MR_AllocSiteInfoPtr alloc_id,
+    const char *name)
+{
+    size_t  num_words;
+    MR_Word dest;
+
+    num_words = MR_bytes_to_words(num_bytes);
+    MR_incr_hp_msg(dest, num_words, alloc_id, name);
+    return (void *) dest;
+}
+
+void *
+MR_new_object_atomic_func(size_t num_bytes, MR_AllocSiteInfoPtr alloc_id,
+    const char *name)
+{
+    size_t  num_words;
+    MR_Word dest;
+
+    num_words = MR_bytes_to_words(num_bytes);
+    MR_incr_hp_atomic_msg(dest, num_words, alloc_id, name);
+    return (void *) dest;
+}
+
+/*---------------------------------------------------------------------------*/
diff --git a/runtime/mercury_memory.h b/runtime/mercury_memory.h
index f624a3c..5f6083b 100644
--- a/runtime/mercury_memory.h
+++ b/runtime/mercury_memory.h
@@ -75,8 +75,8 @@ extern	void	MR_init_heap(void);
 **
 ** Structures allocated with MR_malloc() and MR_realloc() must NOT contain
 ** pointers into GC'ed memory, because those pointers will never be traced
-** by the conservative GC. ** Use MR_GC_malloc() or
-** MR_GC_malloc_uncollectable() for that.
+** by the conservative GC. Use MR_GC_malloc() or MR_GC_malloc_uncollectable()
+** for that.
 **
 ** MR_NEW(type):
 **	Allocates space for an object of the specified type.
@@ -159,6 +159,9 @@ extern	void 	MR_ensure_big_enough_buffer(char **buffer_ptr,
 **	The memory will not be garbage collected, and so
 **	it should be explicitly deallocated using MR_GC_free().
 **
+** MR_GC_realloc(ptr, bytes):
+**	Reallocates the memory block pointed to by ptr.
+**
 ** MR_GC_free(ptr):
 **	Deallocates the memory.
 **
@@ -170,6 +173,8 @@ extern	void 	MR_ensure_big_enough_buffer(char **buffer_ptr,
 **          grades, it is a no-op in non .gc grades.
 **
 **      XXX this interface is subject to change.
+**
+** Note: consider using the _attrib variants below.
 */
 
 extern	void	*MR_GC_malloc(size_t num_bytes);
@@ -203,6 +208,82 @@ typedef void 	(*MR_GC_finalizer)(void *ptr, void *data);
   #define MR_GC_register_finalizer(ptr, finalizer, data)
 #endif
 
+/*
+** MR_GC_NEW_ATTRIB(type, attrib):
+** MR_GC_NEW_UNCOLLECTABLE_ATTRIB(type, attrib):
+** MR_GC_NEW_ARRAY_ATTRIB(type, attrib):
+** MR_GC_malloc_attrib(bytes, attrib):
+** MR_GC_malloc_uncollectable_attrib(bytes, attrib):
+** MR_GC_realloc_attrib(ptr, num_bytes):
+**	In grades with memory attribution support, these variants will allocate
+**	an extra word before the object.  The value stored `attrib' is stored
+**	in that extra word.
+**
+** MR_GC_RESIZE_ARRAY_ATTRIB(ptr, type, num):
+** MR_GC_free_attrib(ptr):
+**	These variants take into account the extra word before ptr.
+**	You must NOT pass pointers which were returned by non-"attrib"
+**	functions/macros to these "attrib" variants, and vice versa.
+*/
+
+#define MR_GC_NEW_ATTRIB(type, attrib) \
+	((type *) MR_GC_malloc_attrib(sizeof(type), (attrib)))
+
+#define MR_GC_NEW_UNCOLLECTABLE_ATTRIB(type, attrib) \
+	((type *) MR_GC_malloc_uncollectable_attrib(sizeof(type), (attrib)))
+
+#define MR_GC_NEW_ARRAY_ATTRIB(type, num, attrib) \
+	((type *) MR_GC_malloc_attrib((num) * sizeof(type), (attrib)))
+
+#define MR_GC_RESIZE_ARRAY_ATTRIB(ptr, type, num) \
+	((type *) MR_GC_realloc_attrib((ptr), (num) * sizeof(type)))
+
+extern	void	*MR_GC_malloc_attrib(size_t num_bytes, void *attrib);
+extern	void	*MR_GC_malloc_uncollectable_attrib(size_t num_bytes,
+		    void *attrib);
+extern	void	*MR_GC_realloc_attrib(void *ptr, size_t num_bytes);
+extern	void	MR_GC_free_attrib(void *ptr);
+
+struct MR_AllocSiteInfo_Struct {
+    MR_Code	*MR_asi_proc;
+    const char  *MR_asi_file_name;
+    const int   MR_asi_line_number;
+    const char  *MR_asi_type;
+    const int   MR_asi_words;
+};
+
+/*
+** Built-in allocation site ids for use in the runtime and other
+** hand-written code when context-specific ids are unavailable.
+** MR_ALLOC_SITE_RUNTIME is a catch-all for internal runtime structures;
+** these are hidden by default in `mprof -s' output.
+*/
+
+#define MR_ALLOC_SITE_NONE		((void *) 0)
+#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    /* These must match the entries in mercury_heap_profile.c. */
+    extern MR_AllocSiteInfo		MR_builtin_alloc_sites[6];
+    #define MR_ALLOC_SITE_RUNTIME	((void *) &MR_builtin_alloc_sites[0])
+    #define MR_ALLOC_SITE_FLOAT    	((void *) &MR_builtin_alloc_sites[1])
+    #define MR_ALLOC_SITE_FOREIGN	((void *) &MR_builtin_alloc_sites[2])
+    #define MR_ALLOC_SITE_STRING     	((void *) &MR_builtin_alloc_sites[3])
+    #define MR_ALLOC_SITE_TABLING	((void *) &MR_builtin_alloc_sites[4])
+    #define MR_ALLOC_SITE_TYPE_INFO	((void *) &MR_builtin_alloc_sites[5])
+#else
+    #define MR_ALLOC_ID			MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_RUNTIME	MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_FLOAT    	MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_FOREIGN	MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_STRING     	MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_TABLING	MR_ALLOC_SITE_NONE
+    #define MR_ALLOC_SITE_TYPE_INFO	MR_ALLOC_SITE_NONE
+#endif
+
+extern	void	*MR_new_object_func(size_t num_bytes,
+		    MR_AllocSiteInfoPtr alloc_id, const char *name);
+extern	void	*MR_new_object_atomic_func(size_t num_bytes,
+		    MR_AllocSiteInfoPtr alloc_id, const char *name);
+
 /*---------------------------------------------------------------------------*/
 
 /*
@@ -210,7 +291,7 @@ typedef void 	(*MR_GC_finalizer)(void *ptr, void *data);
 ** using memory allocated with MR_malloc().
 */
 
-char	*MR_copy_string(const char *s);
+extern char	*MR_copy_string(const char *s);
 
 /*---------------------------------------------------------------------------*/
 
diff --git a/runtime/mercury_memory_zones.c b/runtime/mercury_memory_zones.c
index 315f228..f3c08ba 100644
--- a/runtime/mercury_memory_zones.c
+++ b/runtime/mercury_memory_zones.c
@@ -190,7 +190,18 @@ MR_dealloc_zone_memory(void *base, size_t size)
 static void *
 MR_alloc_zone_memory(size_t size)
 {
-    return GC_MALLOC_UNCOLLECTABLE(size);
+    MR_Word *ptr;
+
+    ptr = GC_MALLOC_UNCOLLECTABLE(size);
+    /*
+    ** Mark the memory zone as being allocated by the Mercury runtime.
+    ** We do not use a helper function like MR_GC_malloc_uncollectable_attrib
+    ** as that returns an offset pointer which can interfere with memory page
+    ** protection.  The first word will be within the redzone so it will not be
+    ** clobbered later.
+    */
+    *ptr = (MR_Word) MR_ALLOC_SITE_RUNTIME;
+    return ptr;
 }
 
 static void *
@@ -438,7 +449,8 @@ MR_init_offsets()
 
     offset_counter = 0;
 
-    offset_vector = MR_GC_NEW_ARRAY(size_t, CACHE_SLICES - 1);
+    offset_vector = MR_GC_NEW_ARRAY_ATTRIB(size_t, CACHE_SLICES - 1,
+        MR_ALLOC_SITE_RUNTIME);
 
     fake_reg_offset = (MR_Unsigned) MR_fake_reg % MR_pcache_size;
 
diff --git a/runtime/mercury_misc.h b/runtime/mercury_misc.h
index 3ee44a3..dffca61 100644
--- a/runtime/mercury_misc.h
+++ b/runtime/mercury_misc.h
@@ -156,6 +156,9 @@ MR_perform_registered_exception_cleanups(void);
 #define MR_threadscope_strings(m)                   \
     MR_PASTE2(mercury_data__threadscope_string_table_array__, m)
 
+#define MR_alloc_sites(m)                           \
+    MR_PASTE2(mercury_data__alloc_sites_array__, m)
+
 /*---------------------------------------------------------------------------*/
 
 #define MR_no_var_label_layout_refs1(m, s1)             \
diff --git a/runtime/mercury_ml_expand_body.h b/runtime/mercury_ml_expand_body.h
index 6dc7d5b..6b1ec89 100644
--- a/runtime/mercury_ml_expand_body.h
+++ b/runtime/mercury_ml_expand_body.h
@@ -748,7 +748,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 sprintf(buf, "%ld", (long) data_word);
-                MR_make_aligned_string_copy_saved_hp(str, buf);
+                MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -766,7 +766,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 sprintf(buf, "\'%c\'", (char) data_word);
-                MR_make_aligned_string_copy_saved_hp(str, buf);
+                MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -785,7 +785,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
                 data_word = *data_word_ptr;
                 f = MR_word_to_float(data_word);
                 MR_sprintf_float(buf, f);
-                MR_make_aligned_string_copy_saved_hp(str, buf);
+                MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -802,7 +802,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 MR_make_aligned_string_copy_saved_hp_quote(str,
-                        (MR_String) data_word);
+                        (MR_String) data_word, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -818,7 +818,8 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 str = MR_bitmap_to_quoted_string_saved_hp(
-                            (MR_ConstBitmapPtr) data_word);
+                            (MR_ConstBitmapPtr) data_word,
+                            NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -1014,7 +1015,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 sprintf(buf, "c_pointer(0x%lX)", (long) data_word);
-                MR_make_aligned_string_copy_saved_hp(str, buf);
+                MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -1031,7 +1032,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
 
                 data_word = *data_word_ptr;
                 sprintf(buf, "stable_c_pointer(0x%lX)", (long) data_word);
-                MR_make_aligned_string_copy_saved_hp(str, buf);
+                MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                 expand_info->EXPAND_FUNCTOR_FIELD = str;
             }
 #endif  /* EXPAND_FUNCTOR_FIELD */
@@ -1162,7 +1163,7 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
                         char    *str;
 
                         sprintf(buf, "tvar%ld", (long) data_pseudo_type_info);
-                        MR_make_aligned_string_copy_saved_hp(str, buf);
+                        MR_make_aligned_string_copy_saved_hp(str, buf, NULL);
                         expand_info->EXPAND_FUNCTOR_FIELD = str;
                     }
 #endif  /* EXPAND_FUNCTOR_FIELD */
diff --git a/runtime/mercury_prof.c b/runtime/mercury_prof.c
index d64d94b..8218fd7 100644
--- a/runtime/mercury_prof.c
+++ b/runtime/mercury_prof.c
@@ -401,8 +401,8 @@ print_memory_node(FILE *words_fptr, FILE *cells_fptr, MR_memprof_record *node)
         MR_convert_dword_to_double(words, words_double);
         MR_convert_dword_to_double(cells, cells_double);
 
-        fprintf(words_fptr, "%ld %.0f\n", (long) node->addr, words_double);
-        fprintf(cells_fptr, "%ld %.0f\n", (long) node->addr, cells_double);
+        fprintf(words_fptr, "%ld %.0f\n", (long) node->proc, words_double);
+        fprintf(cells_fptr, "%ld %.0f\n", (long) node->proc, cells_double);
 
         print_memory_node(words_fptr, cells_fptr, node->left);
         print_memory_node(words_fptr, cells_fptr, node->right);
@@ -466,6 +466,10 @@ MR_prof_finish(void)
 #ifdef MR_MPROF_PROFILE_MEMORY
     prof_output_mem_tables();
 #endif
+
+#ifdef MR_MPROF_PROFILE_MEMORY_ATTRIBUTION
+    MR_finish_prof_snapshots_file();
+#endif
 }
 
 void MR_close_prof_decl_file(void)
diff --git a/runtime/mercury_prof_mem.c b/runtime/mercury_prof_mem.c
index 7e69b5e..9f6a768 100644
--- a/runtime/mercury_prof_mem.c
+++ b/runtime/mercury_prof_mem.c
@@ -47,7 +47,7 @@
 ** Private Global Variables
 */
 static size_t   mem_left = 0;       /* Number of bytes left     */
-static void     *next    = NULL;    /* Pointer to next data block   */
+static char     *next    = NULL;    /* Pointer to next data block   */
                                     /* we can give away     */
 
 /*----------------------------------------------------------------------------*/
@@ -73,19 +73,15 @@ MR_prof_malloc(size_t size)
 
     /* Here we waste a bit of space but hopefully not to much */
     if (mem_left < size) {
-        /*
-        ** XXX For the conservative GC, it would be better to allocate
-        ** this memory with GC_malloc_atomic_uncollectable(),
-        ** so that the collector doesn't scan it.
-        */
-        next = MR_GC_malloc(MEMORY_BLOCK * size);
+        next = MR_GC_malloc_uncollectable_attrib(MEMORY_BLOCK * size,
+            MR_ALLOC_SITE_RUNTIME);
         mem_left = MEMORY_BLOCK * size;
     }
 
-    p = next;
+    p = (void *) next;
 
-    next = (void *) ((char *) next + size);
-    mem_left = mem_left - size;
+    next += size;
+    mem_left -= size;
 
     return p;
 }
diff --git a/runtime/mercury_stacks.c b/runtime/mercury_stacks.c
index ccfd9e5..fc19903 100644
--- a/runtime/mercury_stacks.c
+++ b/runtime/mercury_stacks.c
@@ -217,7 +217,8 @@ MR_Word *MR_new_detstack_segment(MR_Word *sp, int n)
     new_zone = MR_create_or_reuse_zone("detstack_segment", MR_detstack_size, 0,
         0, MR_default_handler);
 
-    list = MR_GC_malloc_uncollectable(sizeof(MR_MemoryZones));
+    list = MR_GC_malloc_uncollectable_attrib(sizeof(MR_MemoryZones),
+        MR_ALLOC_SITE_RUNTIME);
 
 #ifdef  MR_DEBUG_STACK_SEGMENTS
     MR_debug_log_message(
@@ -282,7 +283,8 @@ MR_nondetstack_segment_extend_slow_path(MR_Word *old_maxfr, int incr)
             MR_nondetstack_size, 0, 0, MR_default_handler);
     }
 
-    list = MR_GC_malloc_uncollectable(sizeof(MR_MemoryZones));
+    list = MR_GC_malloc_uncollectable_attrib(sizeof(MR_MemoryZones),
+        MR_ALLOC_SITE_RUNTIME);
 
 #ifdef  MR_DEBUG_STACK_SEGMENTS
     printf("create new nondet segment: old zone: %p, old maxfr %p\n",
@@ -339,7 +341,7 @@ MR_rewind_nondetstack_segments(MR_Word *maxfr)
         assert(list != NULL);
         MR_CONTEXT(MR_ctxt_nondetstack_zone) = list->MR_zones_head;
         MR_CONTEXT(MR_ctxt_prev_nondetstack_zones) = list->MR_zones_tail;
-        MR_GC_free(list);
+        MR_GC_free_attrib(list);
     }
 
     return reusable_zone;
@@ -375,7 +377,7 @@ MR_define_entry(MR_pop_detstack_segment);
     MR_CONTEXT(MR_ctxt_detstack_zone) = list->MR_zones_head;
     MR_CONTEXT(MR_ctxt_prev_detstack_zones) = list->MR_zones_tail;
     MR_CONTEXT(MR_ctxt_sp) = orig_sp;
-    MR_GC_free(list);
+    MR_GC_free_attrib(list);
 
 #ifdef  MR_DEBUG_STACK_SEGMENTS
     MR_debug_log_message(
@@ -630,7 +632,8 @@ MR_register_generator_ptr(MR_SubgoalPtr subgoal)
         return;
     }
 
-    node = MR_GC_NEW(struct MR_CutGeneratorListNode);
+    node = MR_GC_NEW_ATTRIB(struct MR_CutGeneratorListNode,
+        MR_ALLOC_SITE_RUNTIME);
     node->MR_cut_generator_ptr = subgoal;
     node->MR_cut_next_generator =
         MR_cut_stack[MR_cut_next - 1].MR_cut_generators;
diff --git a/runtime/mercury_stm.c b/runtime/mercury_stm.c
index 3ce945e..0be6cf7 100644
--- a/runtime/mercury_stm.c
+++ b/runtime/mercury_stm.c
@@ -24,7 +24,8 @@ MR_STM_record_transaction(MR_STM_TransLog *tlog, MR_STM_Var *var,
 {
     MR_STM_TransRecord  *new_record;
 
-    new_record = MR_GC_NEW(MR_STM_TransRecord);
+    new_record = MR_GC_NEW_ATTRIB(MR_STM_TransRecord,
+        MR_ALLOC_SITE_RUNTIME);
     new_record->MR_STM_tr_var = var;
     new_record->MR_STM_tr_old_value = old_value;
     new_record->MR_STM_tr_new_value = new_value;
@@ -38,7 +39,7 @@ MR_STM_attach_waiter(MR_STM_Var *var, MR_ThreadId tid,
 {
     MR_STM_Waiter   *new_waiter;
 
-    new_waiter = MR_GC_NEW(MR_STM_Waiter);
+    new_waiter = MR_GC_NEW_ATTRIB(MR_STM_Waiter, MR_ALLOC_SITE_RUNTIME);
     new_waiter->MR_STM_cond_var = cvar;
 
     if (var->MR_STM_var_waiters == NULL) {
@@ -361,7 +362,8 @@ MR_STM_block_thread(MR_STM_TransLog *tlog)
 #if defined(MR_THREAD_SAFE)
         MR_STM_ConditionVar     *thread_condvar;
 
-        thread_condvar = MR_GC_NEW(MR_STM_ConditionVar);
+        thread_condvar = MR_GC_NEW_ATTRIB(MR_STM_ConditionVar,
+            MR_ALLOC_SITE_RUNTIME);
         MR_STM_condvar_init(thread_condvar);
 
         MR_STM_wait(tlog, thread_condvar);
@@ -378,7 +380,7 @@ MR_STM_block_thread(MR_STM_TransLog *tlog)
 
         MR_UNLOCK(&MR_STM_lock, "MR_STM_block_thread");
 
-        MR_GC_free(thread_condvar);
+        MR_GC_free_nonattrib(thread_condvar);
 #else
     MR_fatal_error("Blocking thread in non-parallel grade");
 #endif
diff --git a/runtime/mercury_string.c b/runtime/mercury_string.c
index 1cadc42..25fdf79 100644
--- a/runtime/mercury_string.c
+++ b/runtime/mercury_string.c
@@ -23,7 +23,7 @@
 #define BUFFER_SIZE 4096
 
 MR_String
-MR_make_string(MR_Code *proclabel, const char *fmt, ...)
+MR_make_string(MR_AllocSiteInfoPtr alloc_id, const char *fmt, ...)
 {
     va_list     ap;
     MR_String   result;
@@ -83,7 +83,7 @@ MR_make_string(MR_Code *proclabel, const char *fmt, ...)
     p = fixed;
 #endif
     MR_restore_transient_hp();
-    MR_allocate_aligned_string_msg(result, strlen(p), proclabel);
+    MR_allocate_aligned_string_msg(result, strlen(p), alloc_id);
     MR_save_transient_hp();
     strcpy(result, p);
 
diff --git a/runtime/mercury_string.h b/runtime/mercury_string.h
index 29df629..1036156 100644
--- a/runtime/mercury_string.h
+++ b/runtime/mercury_string.h
@@ -89,17 +89,21 @@
 ** MR_{save/restore}_transient_hp().
 */
 
-#define MR_make_aligned_string_copy(ptr, string) 			\
-	do {								\
-		MR_Word make_aligned_string_tmp;			\
-		char	*make_aligned_string_ptr;			\
-									\
-	  	MR_offset_incr_hp_atomic(make_aligned_string_tmp, 0,	\
-			(strlen(string) + sizeof(MR_Word)) / sizeof(MR_Word)); \
-	    	make_aligned_string_ptr =				\
-			(char *) make_aligned_string_tmp;		\
-	    	strcpy(make_aligned_string_ptr, (string));		\
-	    	(ptr) = make_aligned_string_ptr;			\
+#define MR_make_aligned_string_copy(ptr, string)                              \
+	MR_make_aligned_string_copy_msg((ptr), (string), NULL)
+
+#define MR_make_aligned_string_copy_msg(ptr, string, alloc_id)                \
+	do {                                                                  \
+		MR_Word make_aligned_string_tmp;                              \
+		char	*make_aligned_string_ptr;                             \
+                                                                              \
+		MR_offset_incr_hp_atomic_msg(make_aligned_string_tmp, 0,      \
+			(strlen(string) + sizeof(MR_Word)) / sizeof(MR_Word), \
+			(alloc_id), "string.string/0");                       \
+		make_aligned_string_ptr =                                     \
+			(char *) make_aligned_string_tmp;                     \
+		strcpy(make_aligned_string_ptr, (string));                    \
+		(ptr) = make_aligned_string_ptr;                              \
 	} while(0)
 
 /*
@@ -109,17 +113,18 @@
 ** MR_offset_incr_saved_hp_atomic instead of MR_offset_incr_hp_atomic.
 */
 
-#define MR_make_aligned_string_copy_saved_hp(ptr, string) 		\
-	do {								\
-		MR_Word	make_aligned_string_tmp;			\
-		char	*make_aligned_string_ptr;			\
-									\
-	  	MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp,	0, \
-	    	    (strlen(string) + sizeof(MR_Word)) / sizeof(MR_Word)); \
-	    	make_aligned_string_ptr =				\
-		    (char *) make_aligned_string_tmp;			\
-	    	strcpy(make_aligned_string_ptr, (string));		\
-	    	(ptr) = make_aligned_string_ptr;			\
+#define MR_make_aligned_string_copy_saved_hp(ptr, string, alloc_id)           \
+	do {                                                                  \
+		MR_Word	make_aligned_string_tmp;                              \
+		char	*make_aligned_string_ptr;                             \
+                                                                              \
+		MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp,	0,    \
+		    (strlen(string) + sizeof(MR_Word)) / sizeof(MR_Word),     \
+		    (alloc_id), "string.string/0");                           \
+		make_aligned_string_ptr =                                     \
+		    (char *) make_aligned_string_tmp;                         \
+		strcpy(make_aligned_string_ptr, (string));                    \
+		(ptr) = make_aligned_string_ptr;                              \
 	} while(0)
 
 /*
@@ -129,17 +134,18 @@
 ** it puts double quote marks at the start and end of the string.
 */
 
-#define MR_make_aligned_string_copy_saved_hp_quote(ptr, string)		\
-	do {								\
-		MR_Word	make_aligned_string_tmp;			\
-		char	*make_aligned_string_ptr;			\
-									\
-	  	MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp,	0, \
-	    	    (strlen(string) + 2 + sizeof(MR_Word)) / sizeof(MR_Word)); \
-	    	make_aligned_string_ptr =				\
-		    (char *) make_aligned_string_tmp;			\
-                sprintf(make_aligned_string_ptr, "%c%s%c", '"', string, '"'); \
-	    	(ptr) = make_aligned_string_ptr;			\
+#define MR_make_aligned_string_copy_saved_hp_quote(ptr, string, alloc_id)     \
+	do {                                                                  \
+		MR_Word	make_aligned_string_tmp;                              \
+		char	*make_aligned_string_ptr;                             \
+                                                                              \
+		MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp,	0,    \
+		    (strlen(string) + 2 + sizeof(MR_Word)) / sizeof(MR_Word), \
+		    (alloc_id), "string.string/0");                           \
+		make_aligned_string_ptr =                                     \
+		    (char *) make_aligned_string_tmp;                         \
+		sprintf(make_aligned_string_ptr, "%c%s%c", '"', string, '"'); \
+		(ptr) = make_aligned_string_ptr;                              \
 	} while(0)
 
 /*
@@ -155,29 +161,30 @@
 ** MR_{save/restore}_transient_hp().
 */
 
-#define MR_allocate_aligned_string_msg(ptr, len, proclabel)		\
-	do {								\
-		MR_Word	make_aligned_string_tmp;			\
-		char	*make_aligned_string_ptr;			\
-									\
-	  	MR_offset_incr_hp_atomic_msg(make_aligned_string_tmp, 0,\
-	    	    ((len) + sizeof(MR_Word)) / sizeof(MR_Word),	\
-		    proclabel, "string:string/0");			\
-	    	make_aligned_string_ptr =				\
-		    (char *) make_aligned_string_tmp;			\
-	    	(ptr) = make_aligned_string_ptr;			\
+#define MR_allocate_aligned_string_msg(ptr, len, alloc_id)                    \
+	do {                                                                  \
+		MR_Word	make_aligned_string_tmp;                              \
+		char	*make_aligned_string_ptr;                             \
+                                                                              \
+		MR_offset_incr_hp_atomic_msg(make_aligned_string_tmp, 0,      \
+		    ((len) + sizeof(MR_Word)) / sizeof(MR_Word),              \
+		    (alloc_id), "string.string/0");                           \
+		make_aligned_string_ptr =                                     \
+		    (char *) make_aligned_string_tmp;                         \
+		(ptr) = make_aligned_string_ptr;                              \
 	} while(0)
 
-#define MR_allocate_aligned_string_saved_hp(ptr, len)			\
-	do {								\
-		MR_Word	make_aligned_string_tmp;			\
-		char	*make_aligned_string_ptr;			\
-									\
-	  	MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp, 0,\
-	    	    ((len) + sizeof(MR_Word)) / sizeof(MR_Word));	\
-	    	make_aligned_string_ptr =				\
-		    (char *) make_aligned_string_tmp;			\
-	    	(ptr) = make_aligned_string_ptr;			\
+#define MR_allocate_aligned_string_saved_hp(ptr, len, alloc_id)               \
+	do {                                                                  \
+		MR_Word	make_aligned_string_tmp;                              \
+		char	*make_aligned_string_ptr;                             \
+                                                                              \
+		MR_offset_incr_saved_hp_atomic(make_aligned_string_tmp, 0,    \
+		    ((len) + sizeof(MR_Word)) / sizeof(MR_Word),              \
+		    (alloc_id), "string.string/0");                           \
+		make_aligned_string_ptr =                                     \
+		    (char *) make_aligned_string_tmp;                         \
+		(ptr) = make_aligned_string_ptr;                              \
 	} while(0)
 
 /*
@@ -309,7 +316,7 @@ MR_Integer	MR_hash_string3(MR_ConstString);
 ** before/after calling this function.
 */
 
-MR_String MR_make_string(MR_Code *proclabel, const char *fmt, ...);
+MR_String MR_make_string(MR_AllocSiteInfoPtr alloc_id, const char *fmt, ...);
 
 /*
 ** True if c is an ASCII code point, i.e. U+0000..U+007f.
diff --git a/runtime/mercury_tabling.c b/runtime/mercury_tabling.c
index 30041d4..dea0617 100644
--- a/runtime/mercury_tabling.c
+++ b/runtime/mercury_tabling.c
@@ -747,7 +747,8 @@ MR_cmp_bitmaps(const void *p1, const void *p2)
                 slot = table->hash_table[bucket].table_field;           \
                 while (slot != NULL) {                                  \
                     MR_GC_ensure_room_for_next(value, type_name,        \
-                        MR_INIT_HASH_CONTENTS_ARRAY_SIZE);              \
+                        MR_INIT_HASH_CONTENTS_ARRAY_SIZE,               \
+                        MR_ALLOC_SITE_TABLING);                         \
                     values[value_next] = slot->key;                     \
                     value_next++;                                       \
                     slot = slot->next;                                  \
diff --git a/runtime/mercury_tabling.h b/runtime/mercury_tabling.h
index 228f4b8..f37a85b 100644
--- a/runtime/mercury_tabling.h
+++ b/runtime/mercury_tabling.h
@@ -558,33 +558,28 @@ extern	void		MR_print_answerblock(FILE *fp,
 #ifndef MR_NATIVE_GC
 
   #define MR_TABLE_NEW(type)						\
-	MR_GC_NEW(type)
+	MR_GC_NEW_ATTRIB(type, MR_ALLOC_SITE_TABLING)
 
   #define MR_TABLE_NEW_ARRAY(type, count)				\
-	MR_GC_NEW_ARRAY(type, (count))
+	MR_GC_NEW_ARRAY_ATTRIB(type, (count), MR_ALLOC_SITE_TABLING)
 
   #define MR_TABLE_RESIZE_ARRAY(ptr, type, count)			\
-	MR_GC_RESIZE_ARRAY((ptr), type, (count))
+	MR_GC_RESIZE_ARRAY_ATTRIB((ptr), type, (count))
 
   #define MR_table_allocate_words(size)					\
-	((MR_Word *) MR_GC_malloc(sizeof(MR_Word) * (size)))
-
-  #define MR_table_reallocate_words(pointer, size)			\
-	(MR_CHECK_EXPR_TYPE((pointer), MR_Word *),			\
-	(MR_Word *) MR_GC_realloc((pointer), sizeof(MR_Word) * (size)))
+	((MR_Word *) MR_GC_malloc_attrib(sizeof(MR_Word) * (size),      \
+            MR_ALLOC_SITE_TABLING))
 
   #define MR_table_allocate_struct(type)				\
-	((type *) MR_GC_malloc(sizeof(type)))
+	((type *) MR_GC_malloc_attrib(sizeof(type),                     \
+            MR_ALLOC_SITE_TABLING))
 
   #define MR_table_allocate_structs(num, type)				\
-	((type *) MR_GC_malloc(sizeof(type) * (num)))
-
-  #define MR_table_reallocate_structs(pointer, num, type)		\
-	(MR_CHECK_EXPR_TYPE((pointer), type *),				\
-	(type *) MR_GC_realloc((pointer), sizeof(type) * (num)))
+	((type *) MR_GC_malloc_attrib(sizeof(type) * (num),             \
+            MR_ALLOC_SITE_TABLING))
 
   #define MR_table_free(pointer)					\
-	MR_GC_free((pointer))
+	MR_GC_free_attrib((pointer))
 
 #else /* MR_NATIVE_GC */
 
@@ -604,33 +599,24 @@ extern	void		MR_print_answerblock(FILE *fp,
   #define MR_table_allocate_bytes(size)					\
 	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG),			\
 	(void *) NULL)
-  #define MR_table_reallocate_bytes(pointer, size)			\
-	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG),			\
-	(void *) NULL)
 #endif
   #define MR_table_allocate_words(size)					\
 	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG), 			\
 	(void *) NULL)
-  #define MR_table_reallocate_words(pointer, size)			\
-	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG), 			\
-	(void *) NULL)
   #define MR_table_allocate_struct(type)				\
 	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG), 			\
 	(void *) NULL)
   #define MR_table_allocate_structs(num, type)				\
 	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG), 			\
 	(void *) NULL)
-  #define MR_table_reallocate_structs(pointer, num, type)		\
-	(MR_fatal_error(MR_TABLE_NATIVE_GC_MSG), 			\
-	(void *) NULL)
   #define MR_table_free(pointer)					\
 	MR_fatal_error(MR_TABLE_NATIVE_GC_MSG)
 
 #endif /* MR_NATIVE_GC */
 
-#define MR_table_copy_bytes(dest, source, size)				\
-	MR_memcpy((dest), (source), (size))
-
+/*
+** XXX The extra memory attribution word is not yet copied.
+*/
 #define MR_table_copy_words(dest, source, size)				\
 	(MR_CHECK_EXPR_TYPE((dest), MR_Word *),				\
 	(MR_CHECK_EXPR_TYPE((source), MR_Word *),			\
diff --git a/runtime/mercury_tags.h b/runtime/mercury_tags.h
index 94907a9..b473037 100644
--- a/runtime/mercury_tags.h
+++ b/runtime/mercury_tags.h
@@ -159,11 +159,11 @@
   #define MR_typed_list_cons(ti_head, head, ti_tail, tail)		\
 	((MR_Word) MR_mkword(MR_TAG_CONS,				\
 		MR_create2((ti_head), (head), (ti_tail), (tail))))
-  #define MR_list_empty_msg(proclabel)	MR_list_empty()
-  #define MR_typed_list_cons_msg(ti_head, head, ti_tail, tail, proclabel) \
+  #define MR_list_empty_msg(alloc_id)	MR_list_empty()
+  #define MR_typed_list_cons_msg(ti_head, head, ti_tail, tail, alloc_id)\
 	((MR_Word) MR_mkword(MR_TAG_CONS,				\
 		MR_create2_msg((ti_head), (head), (ti_tail), (tail),	\
-			 proclabel, "list.list/1")))
+			 alloc_id, "list.list/1")))
 #else
   /*
   ** MR_TAGBITS == 0 && 
@@ -191,17 +191,17 @@
 			&MR_TYPE_CTOR_INFO_NAME(builtin, void, 0),	\
 			MR_RAW_TAG_CONS, (ti_head), (head), (ti_tail), (tail))))
 
-  #define MR_list_empty_msg(proclabel) 					\
+  #define MR_list_empty_msg(alloc_id) 					\
 	((MR_Word) MR_mkword(MR_mktag(0),				\
 		MR_create1_msg((MR_TypeInfo)				\
-			&MR_TYPE_CTOR_INFO_NAME(builtin, void, 0),\
-			MR_RAW_TAG_NIL, proclabel, "list:list/1")))
-  #define MR_typed_list_cons_msg(ti_head, head, ti_tail, tail, proclabel) \
+			&MR_TYPE_CTOR_INFO_NAME(builtin, void, 0),	\
+			MR_RAW_TAG_NIL, alloc_id, "list.list/1")))
+  #define MR_typed_list_cons_msg(ti_head, head, ti_tail, tail, alloc_id) \
 	((MR_Word) MR_mkword(MR_mktag(0),				\
 		MR_create3((MR_TypeInfo)				\
 			&MR_TYPE_CTOR_INFO_NAME(builtin, void, 0),	\
 			MR_RAW_TAG_CONS, (ti_head), (head), (ti_tail), (tail), \
-			proclabel, "list:list/1")))
+			alloc_id, "list.list/1")))
 #endif
 
 /*
@@ -219,68 +219,68 @@
 		(head),							 \
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, void, 0), \
 		(tail))
-  #define MR_list_cons_msg(head, tail, proclabel)			 \
+  #define MR_list_cons_msg(head, tail, alloc_id)			 \
 	MR_typed_list_cons_msg(						 \
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, void, 0), \
 		(head),							 \
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, void, 0), \
-		(tail), proclabel)
+		(tail), alloc_id)
 #endif
 
 #define MR_univ_list_cons(head, tail)					\
 	MR_typed_list_cons((MR_TypeInfo) MR_type_ctor_info_for_univ, (head), \
 		MR_type_info_for_list_of_univ, (tail))
 
-#define MR_univ_list_cons_msg(head, tail, proclabel)			\
+#define MR_univ_list_cons_msg(head, tail, alloc_id)			\
 	MR_typed_list_cons_msg(						\
 		(MR_TypeInfo) MR_type_ctor_info_for_univ, (head), 	\
-		MR_type_info_for_list_of_univ, (tail), proclabel)
+		MR_type_info_for_list_of_univ, (tail), alloc_id)
 
 #define MR_int_list_cons(head, tail)					\
 	MR_typed_list_cons(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, int, 0), \
 		(head), MR_type_info_for_list_of_int, (tail))
 
-#define MR_int_list_cons_msg(head, tail, proclabel)			\
+#define MR_int_list_cons_msg(head, tail, alloc_id)			\
 	MR_typed_list_cons_msg(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, int, 0),	\
-		(head), MR_type_info_for_list_of_int, (tail), proclabel)
+		(head), MR_type_info_for_list_of_int, (tail), alloc_id)
 
 #define MR_char_list_cons(head, tail)					\
 	MR_typed_list_cons(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, character, 0), \
 		(head), MR_type_info_for_list_of_char, (tail))
 
-#define MR_char_list_cons_msg(head, tail, proclabel)			\
+#define MR_char_list_cons_msg(head, tail, alloc_id)			\
 	MR_typed_list_cons_msg(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, character, 0), \
-		(head), MR_type_info_for_list_of_char, (tail), proclabel)
+		(head), MR_type_info_for_list_of_char, (tail), alloc_id)
 
 #define MR_string_list_cons(head, tail)					\
 	MR_typed_list_cons(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, string, 0), \
 		(head), MR_type_info_for_list_of_string, (tail))
 
-#define MR_string_list_cons_msg(head, tail, proclabel)			\
+#define MR_string_list_cons_msg(head, tail, alloc_id)			\
 	MR_typed_list_cons_msg(						\
 		(MR_TypeInfo) &MR_TYPE_CTOR_INFO_NAME(builtin, string, 0), \
-		(head), MR_type_info_for_list_of_string, (tail), proclabel)
+		(head), MR_type_info_for_list_of_string, (tail), alloc_id)
 
 #define MR_type_info_list_cons(head, tail)				\
 	MR_typed_list_cons(MR_type_info_for_type_info, (head),		\
 		MR_type_info_for_list_of_type_info, (tail))
 
-#define MR_type_info_list_cons_msg(head, tail, proclabel)		\
-	MR_typed_list_cons_msg(MR_type_info_for_type_info, (head),	\
-		MR_type_info_for_list_of_type_info, (tail), proclabel)
+#define MR_type_info_list_cons_msg(head, tail, alloc_id)                \
+	MR_typed_list_cons_msg(MR_type_info_for_type_info, (head),      \
+		MR_type_info_for_list_of_type_info, (tail), alloc_id)
 
 #define MR_pseudo_type_info_list_cons(head, tail)			\
 	MR_typed_list_cons(MR_type_info_for_pseudo_type_info, (head),	\
 		MR_type_info_for_list_of_pseudo_type_info, (tail))
 
-#define MR_pseudo_type_info_list_cons_msg(head, tail, proclabel)	\
-	MR_typed_list_cons_msg(MR_type_info_for_pseudo_type_info, (head), \
-		MR_type_info_for_list_of_pseudo_type_info, (tail), proclabel)
+#define MR_pseudo_type_info_list_cons_msg(head, tail, alloc_id)		    \
+	MR_typed_list_cons_msg(MR_type_info_for_pseudo_type_info, (head),   \
+		MR_type_info_for_list_of_pseudo_type_info, (tail), alloc_id)
 
 /*
 ** Convert an enumeration declaration into one which assigns the same
diff --git a/runtime/mercury_thread.c b/runtime/mercury_thread.c
index 41e1cc9..72ce53b 100644
--- a/runtime/mercury_thread.c
+++ b/runtime/mercury_thread.c
@@ -73,7 +73,7 @@ MR_create_thread(MR_ThreadGoal *goal)
     ** automatically freed when threads terminate (we don't call
     ** pthread_join() anywhere).
     */
-    thread = MR_GC_NEW(MercuryThread);
+    thread = MR_GC_NEW_ATTRIB(MercuryThread, MR_ALLOC_SITE_RUNTIME);
     pthread_attr_init(&attrs);
     pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);
     err = pthread_create(thread, &attrs, MR_create_thread_2, (void *) goal);
@@ -416,11 +416,13 @@ MR_create_thread_local_mutables(MR_Unsigned numslots)
 {
     MR_ThreadLocalMuts  *muts;
 
-    muts = MR_GC_NEW(MR_ThreadLocalMuts);
+    muts = MR_GC_NEW_ATTRIB(MR_ThreadLocalMuts,
+        MR_ALLOC_SITE_RUNTIME);
 #ifdef MR_THREAD_SAFE
     pthread_mutex_init(&muts->MR_tlm_lock, MR_MUTEX_ATTR);
 #endif
-    muts->MR_tlm_values = MR_GC_NEW_ARRAY(MR_Word, numslots);
+    muts->MR_tlm_values = MR_GC_NEW_ARRAY_ATTRIB(MR_Word, numslots,
+        MR_ALLOC_SITE_RUNTIME);
 
     return muts;
 }
diff --git a/runtime/mercury_trace_base.c b/runtime/mercury_trace_base.c
index b5f29c2..2d37461 100644
--- a/runtime/mercury_trace_base.c
+++ b/runtime/mercury_trace_base.c
@@ -243,7 +243,7 @@ MR_insert_module_info_into_module_table(const MR_ModuleLayout *module)
     int     slot;
 
     MR_GC_ensure_room_for_next(MR_module_info, const MR_ModuleLayout *,
-        INIT_MODULE_TABLE_SIZE);
+        INIT_MODULE_TABLE_SIZE, MR_ALLOC_SITE_NONE);
     MR_prepare_insert_into_sorted(MR_module_infos, MR_module_info_next, slot,
         strcmp(MR_module_infos[slot]->MR_ml_name, module->MR_ml_name));
 
diff --git a/runtime/mercury_trail.c b/runtime/mercury_trail.c
index 4a3a0a8..51a6d66 100644
--- a/runtime/mercury_trail.c
+++ b/runtime/mercury_trail.c
@@ -228,7 +228,8 @@ MR_new_trail_segment(void)
     new_zone = MR_create_or_reuse_zone("trail_segment", MR_trail_size, 0,
         0, MR_default_handler);
 
-    list = MR_GC_malloc_uncollectable(sizeof(MR_MemoryZones));
+    list = MR_GC_malloc_uncollectable_attrib(sizeof(MR_MemoryZones),
+        MR_ALLOC_SITE_RUNTIME);
 
 #if defined(MR_DEBUG_TRAIL_SEGMENTS)
     printf("create new trail segment: old zone: %p, old trail_ptr %p\n",
diff --git a/runtime/mercury_type_desc.c b/runtime/mercury_type_desc.c
index 56e1f3e..74fd9a5 100644
--- a/runtime/mercury_type_desc.c
+++ b/runtime/mercury_type_desc.c
@@ -167,7 +167,8 @@ MR_make_type(int arity, MR_TypeCtorDesc type_ctor_desc, MR_Word arg_types_list)
 
         MR_restore_transient_registers();
         MR_offset_incr_hp_msg(new_type_info_arena_word, 0,
-            MR_var_arity_type_info_size(arity), "MR_make_type", "type_info");
+            MR_var_arity_type_info_size(arity),
+            MR_ALLOC_SITE_TYPE_INFO, NULL);
         new_type_info_arena = (MR_Word *) new_type_info_arena_word;
         MR_save_transient_registers();
         MR_fill_in_var_arity_type_info(new_type_info_arena, type_ctor_info,
@@ -182,7 +183,8 @@ MR_make_type(int arity, MR_TypeCtorDesc type_ctor_desc, MR_Word arg_types_list)
 
         MR_restore_transient_registers();
         MR_offset_incr_hp_msg(new_type_info_arena_word, 0,
-            MR_fixed_arity_type_info_size(arity), "MR_make_type", "type_info");
+            MR_fixed_arity_type_info_size(arity),
+            MR_ALLOC_SITE_TYPE_INFO, NULL);
         new_type_info_arena = (MR_Word *) new_type_info_arena_word;
         MR_save_transient_registers();
         MR_fill_in_fixed_arity_type_info(new_type_info_arena, type_ctor_info,
diff --git a/runtime/mercury_type_info.c b/runtime/mercury_type_info.c
index 230870f..27217de 100644
--- a/runtime/mercury_type_info.c
+++ b/runtime/mercury_type_info.c
@@ -55,9 +55,11 @@ static MR_TypeInfo          MR_get_arg_type_info(
         MR_Word         *target_word_ptr;                           \
         MR_MemoryList   node;                                       \
                                                                     \
-        target_word_ptr = MR_GC_NEW_ARRAY(MR_Word, (size));         \
+        target_word_ptr = MR_GC_NEW_ARRAY_ATTRIB(MR_Word, (size),   \
+            MR_ALLOC_SITE_TYPE_INFO);                               \
         (target) = (MR_Word) target_word_ptr;                       \
-        node = MR_GC_malloc(sizeof(*node));                         \
+        node = MR_GC_malloc_attrib(sizeof(*node),                   \
+            MR_ALLOC_SITE_TYPE_INFO);                               \
         node->data = target_word_ptr;                               \
         node->next = *allocated;                                    \
         *allocated = node;                                          \
@@ -87,13 +89,13 @@ static MR_TypeInfo          MR_get_arg_type_info(
     do {                                                            \
         /* reserve one extra word for GC forwarding pointer */      \
         /* (see comments in compiler/mlds_to_c.m for details) */    \
-        MR_offset_incr_saved_hp((target), 0, 1);                    \
-        MR_offset_incr_saved_hp((target), 0, (size));               \
+        MR_offset_incr_saved_hp((target), 0, 1, NULL, NULL);        \
+        MR_offset_incr_saved_hp((target), 0, (size), NULL, NULL);   \
     } while (0)
 #else /* !MR_NATIVE_GC */
   #define ALLOCATE_WORDS(target, size)                              \
     do {                                                            \
-        MR_offset_incr_saved_hp((target), 0, (size));               \
+        MR_offset_incr_saved_hp((target), 0, (size), NULL, NULL);   \
     } while (0)
 #endif /* !MR_NATIVE_GC */
 
@@ -121,13 +123,13 @@ static MR_TypeInfo          MR_get_arg_type_info(
     do {                                                            \
         /* reserve one extra word for GC forwarding pointer */      \
         /* (see comments in compiler/mlds_to_c.m for details) */    \
-        MR_offset_incr_saved_hp((target), 0, 1);                    \
-        MR_offset_incr_saved_hp((target), 0, (size));               \
+        MR_offset_incr_saved_hp((target), 0, 1, NULL, NULL);        \
+        MR_offset_incr_saved_hp((target), 0, (size), NULL, NULL);   \
     } while (0)
 #else /* !MR_NATIVE_GC */
   #define ALLOCATE_WORDS(target, size)                              \
     do {                                                            \
-        MR_offset_incr_saved_hp((target), 0, (size));               \
+        MR_offset_incr_saved_hp((target), 0, (size), NULL, NULL);   \
     } while (0)
 #endif /* !MR_NATIVE_GC */
 
@@ -762,8 +764,12 @@ MR_deallocate(MR_MemoryList allocated)
 {
     while (allocated != NULL) {
         MR_MemoryList next = allocated->next;
-        MR_GC_free(allocated->data);
-        MR_GC_free(allocated);
+        /*
+        ** These were allocated with MR_GC_NEW_ARRAY_ATTRIB so we must free
+        ** using MR_GC_free_attrib.
+        **/
+        MR_GC_free_attrib(allocated->data);
+        MR_GC_free_attrib(allocated);
         allocated = next;
     }
 }
diff --git a/runtime/mercury_types.h b/runtime/mercury_types.h
index 318b80c..7c6621c 100644
--- a/runtime/mercury_types.h
+++ b/runtime/mercury_types.h
@@ -261,6 +261,9 @@ typedef struct MR_SynthAttr_Struct              MR_SynthAttr;
 typedef struct MR_UserEvent_Struct              MR_UserEvent;
 typedef struct MR_UserEventSpec_Struct          MR_UserEventSpec;
 
+typedef struct MR_AllocSiteInfo_Struct          MR_AllocSiteInfo;
+typedef MR_AllocSiteInfo                        *MR_AllocSiteInfoPtr;
+
 typedef union MR_TableNode_Union                MR_TableNode;
 typedef MR_TableNode                            *MR_TrieNode;
 typedef MR_TrieNode                             *MR_TrieNodePtr;
diff --git a/runtime/mercury_wrapper.c b/runtime/mercury_wrapper.c
index 3be930c..e27583f 100644
--- a/runtime/mercury_wrapper.c
+++ b/runtime/mercury_wrapper.c
@@ -881,7 +881,8 @@ MR_init_conservative_GC(void)
 
         limit = (1 << MR_LOW_TAG_BITS);
 
-    #ifdef MR_RECORD_TERM_SIZES
+    #if defined(MR_RECORD_TERM_SIZES) || \
+        defined(MR_MPROF_PROFILE_MEMORY_ATTRIBUTION)
         limit += sizeof(MR_Word);
     #endif
 
diff --git a/runtime/mercury_wsdeque.c b/runtime/mercury_wsdeque.c
index 8732d70..aba6f60 100644
--- a/runtime/mercury_wsdeque.c
+++ b/runtime/mercury_wsdeque.c
@@ -109,9 +109,11 @@ MR_wsdeque_take_top(MR_SparkDeque *dq, MR_Spark *ret_spark)
 static MR_SparkArray *
 MR_alloc_spark_array(MR_Integer size)
 {
-    MR_SparkArray *arr;
+    MR_SparkArray   *arr;
+    size_t          num_bytes;
 
-    arr = MR_GC_malloc(sizeof(MR_SparkArray) + (size - 1) * sizeof(MR_Spark));
+    num_bytes = sizeof(MR_SparkArray) + (size - 1) * sizeof(MR_Spark);
+    arr = MR_GC_malloc_attrib(num_bytes, MR_ALLOC_SITE_RUNTIME);
     arr->MR_sa_max = size - 1;
     return arr;
 }
diff --git a/trace/mercury_trace.c b/trace/mercury_trace.c
index c136b4c..d1ca013 100644
--- a/trace/mercury_trace.c
+++ b/trace/mercury_trace.c
@@ -1691,7 +1691,7 @@ MR_maybe_record_call_table(const MR_ProcLayout *level_layout,
 
         if (call_table != NULL) {
             MR_GC_ensure_room_for_next(MR_call_table_ptr, MR_TrieNode,
-                INIT_CALL_TABLE_ARRAY_SIZE);
+                INIT_CALL_TABLE_ARRAY_SIZE, NULL);
 
             MR_call_table_ptrs[MR_call_table_ptr_next] = call_table;
             MR_call_table_ptr_next++;
@@ -1758,7 +1758,7 @@ static void
 MR_abandon_call_table_array(void)
 {
     if (MR_call_table_ptrs != NULL) {
-        MR_GC_free(MR_call_table_ptrs);
+        MR_GC_free_attrib(MR_call_table_ptrs);
         MR_call_table_ptrs = NULL;
     }
 }
diff --git a/trace/mercury_trace_tables.c b/trace/mercury_trace_tables.c
index 3868ea0..7f1c404 100644
--- a/trace/mercury_trace_tables.c
+++ b/trace/mercury_trace_tables.c
@@ -324,7 +324,7 @@ MR_insert_module_info(const MR_ModuleLayout *module)
                     module);
         } else {
             MR_GC_ensure_room_for_next(MR_module_nick, MR_Module_Nick,
-                INIT_MODULE_TABLE_SIZE);
+                INIT_MODULE_TABLE_SIZE, NULL);
             MR_prepare_insert_into_sorted(MR_module_nicks, MR_module_nick_next,
                 slot, strcmp(MR_module_nicks[slot].MR_nick_name, nickname));
             MR_module_nicks[slot].MR_nick_name = nickname;
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list