[m-rev.] for review: version array reimplementation

Peter Wang novalazy at gmail.com
Mon Jul 27 12:31:23 AEST 2009


Branches: main

Make the version array implementation do what it says on the tin.  It claims
O(1) access for the latest version of the array, but that's true only if the
array hasn't had any updates, or if you're accessing the last element that was
updated.  The existing data structure is essentially this:

        :- type version_array(T)
            --->    update(
                        index   :: int,
                        value   :: T,
                        next    :: version_array(T)
                    )
            ;       base(array(T)).

Note that even getting the size of the array is O(k), where k is the number of
updates on the array.

This patch changes the data structure to an array of update lists (see
comments in the code).  Accessing the latest version of an element is O(1),
and accessing older versions is O(k) where k is the number of updates to that
particular element.  Updating an element requires two allocations instead of
one.

library/version_array.m:
        As above.

compiler/make.dependencies.m:
        Use size doubling when resizing version arrays.

tests/hard_coded/version_array_test.exp:
        Update test case.


diff --git a/compiler/make.dependencies.m b/compiler/make.dependencies.m
index 6191138..d9f97d9 100644
--- a/compiler/make.dependencies.m
+++ b/compiler/make.dependencies.m
@@ -210,12 +210,22 @@ module_name_to_index(ModuleName, Index, !Info) :-
         Map0 = module_index_map(Forward0, Reverse0, Size0),
         Index = module_index(Size0),
         Size = Size0 + 1,
-        Forward = version_hash_table.det_insert(Forward0, ModuleName, Index),
-        version_array.resize(Size, ModuleName, Reverse0, Reverse),
+        version_hash_table.det_insert(ModuleName, Index, Forward0, Forward),
+        TrueSize = version_array.size(Reverse0),
+        ( Size > TrueSize ->
+            NewSize = increase_array_size(TrueSize),
+            version_array.resize(NewSize, ModuleName, Reverse0, Reverse)
+        ;
+            version_array.set(Size0, ModuleName, Reverse0, Reverse)
+        ),
         Map = module_index_map(Forward, Reverse, Size),
         !Info ^ module_index_map := Map
     ).
 
+:- func increase_array_size(int) = int.
+
+increase_array_size(N) = (if N = 0 then 1 else N * 2).
+
 :- pred module_index_to_name(make_info::in, module_index::in, module_name::out)
     is det.
 
@@ -259,7 +269,13 @@ dependency_file_to_index(DepFile, Index, !Info) :-
         Index = dependency_file_index(Size0),
         Size = Size0 + 1,
         version_hash_table.det_insert(DepFile, Index, Forward0, Forward),
-        version_array.resize(Size, DepFile, Reverse0, Reverse),
+        TrueSize = version_array.size(Reverse0),
+        ( Size > TrueSize ->
+            NewSize = increase_array_size(TrueSize),
+            version_array.resize(NewSize, DepFile, Reverse0, Reverse)
+        ;
+            version_array.set(Size0, DepFile, Reverse0, Reverse)
+        ),
         Map = dependency_file_index_map(Forward, Reverse, Size),
         !Info ^ dep_file_index_map := Map
     ).
diff --git a/library/version_array.m b/library/version_array.m
index e2fc02f..3b516d4 100644
--- a/library/version_array.m
+++ b/library/version_array.m
@@ -1,14 +1,14 @@
 %-----------------------------------------------------------------------------%
 % vim: ts=4 sw=4 et tw=0 wm=0 ft=mercury
 %-----------------------------------------------------------------------------%
-% Copyright (C) 2004-2008 The University of Melbourne.
+% Copyright (C) 2004-2009 The University of Melbourne.
 % This file may only be copied under the terms of the GNU Library General
 % Public License - see the file COPYING.LIB in the Mercury distribution.
 % vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
 %-----------------------------------------------------------------------------%
 %
 % File: version_array.m.
-% Author: Ralph Becket <rafe at cs.mu.oz.au>.
+% Authors: Ralph Becket <rafe at cs.mu.oz.au>, Peter Wang.
 % Stability: low.
 %
 % (See the header comments in version_types.m for an explanation of version
@@ -17,7 +17,7 @@
 % This module implements version arrays.  A version array provides O(1)
 % access and update for the "latest" version of the array.  "Older"
 % versions of the array incur an O(k) penalty on accesses where k is
-% the number of updates that have been made since.
+% the number of updates that have been made since to the given index.
 %
 % The advantage of version arrays is that in the common, singly threaded,
 % case, they are almost as fast as unique arrays, but can be treated as
@@ -25,6 +25,12 @@
 %
 % Version arrays are zero based.
 %
+% This implementation is limited in the number of updates that can be performed
+% on a version array which is determined by the machine word size, e.g. 2^32 on
+% a 32-bit machine.  You will get an exception if this limit is reached.
+% Copying or resizing the array resets the version counter, which will allow
+% further updates.
+%
 % XXX This implementation is not yet guaranteed to work with the agc (accurate
 % garbage collection) grades.  Specifically, MR_deep_copy and MR_agc_deep_copy
 % currently do not recognise version arrays.
@@ -83,7 +89,7 @@
     %
 :- func size(version_array(T)) = int.
 
-    % max(Z) = size(A) - 1.
+    % max(A) = size(A) - 1.
     %
 :- func max(version_array(T)) = int.
 
@@ -125,10 +131,11 @@
 :- func copy(version_array(T)) = version_array(T).
 
     % unsafe_rewind(A) produces a version of A for which all accesses are
-    % O(1).  Invoking this predicate renders A and all later versions undefined
-    % that were derived by performing individual updates.  Only use this when
-    % you are absolutely certain there are no live references to A or later
-    % versions of A.  (A predicate version is also provided.)
+    % O(1), i.e. as if A was the latest version of the array.  This renders A
+    % and all later versions undefined that were derived by performing
+    % individual updates.  Only use this when you are absolutely certain there
+    % are no live references to A or later versions of A.  (A predicate version
+    % is also provided.)
     %
 :- func unsafe_rewind(version_array(T)) = version_array(T).
 :- pred unsafe_rewind(version_array(T)::in, version_array(T)::out) is det.
@@ -136,12 +143,6 @@
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
 
-% The first implementation of version arrays used nb_references.
-% This incurred three memory allocations for every update. This version
-% works at a lower level, but only performs one allocation per update.
-
-%-----------------------------------------------------------------------------%
-
 :- implementation.
 
 :- import_module int.
@@ -181,7 +182,8 @@ lookup(VA, I) = VA ^ elem(I).
 (VA0 ^ elem(I) := X) =
     ( if   set_if_in_range(VA0, I, X, VA)
       then VA
-      else func_error("version_array.'elem :=': index out of range")
+      else func_error(
+            "version_array.'elem :=': index out of range or too many updates")
     ).
 
 set(I, X, VA, VA ^ elem(I) := X).
@@ -254,7 +256,7 @@ unsafe_rewind(VA, unsafe_rewind(VA)).
 % Note: this code is not thread safe, hence the absence of `thread_safe'
 % attributes!
 
-:- pragma foreign_type("C", version_array(T), "struct ML_va *")
+:- pragma foreign_type("C", version_array(T), "ML_VerArray *")
     where
         equality   is eq_version_array,
         comparison is cmp_version_array.
@@ -326,12 +328,7 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    VA = MR_GC_NEW(struct ML_va);
-
-    VA->index            = -1;
-    VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, 1);
-    VA->rest.array->size = 0;
+    VA = ML_va_new(0, ML_va_alloc_base(0, 0, (MR_Word) NULL));
 ").
 
 :- pragma foreign_proc("C",
@@ -339,17 +336,13 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness, may_not_duplicate],
 "
-    MR_Integer  i;
+    ML_VerArrayBase *base;
+    MR_Integer      i;
 
-    VA = MR_GC_NEW(struct ML_va);
-    VA->index            = -1;
-    VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
-    VA->rest.array->size = N;
+    base = ML_va_alloc_base(0, N, X);
+    MR_memset(base->ML_vab_slots, 0, N * sizeof(ML_VerArraySlot));
 
-    for (i = 0; i < N; i++) {
-        VA->rest.array->elements[i] = X;
-    }
+    VA = ML_va_new(0, base);
 ").
 
 :- pragma foreign_proc("C",
@@ -357,26 +350,7 @@ cmp_version_array_2(I, Size, VAa, VAb, R) :-
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness, may_not_duplicate],
 "
-    MR_Integer  i;
-    MR_Integer  size_VA0;
-    MR_Integer  min;
-
-    size_VA0 = ML_va_size(VA0);
-    min      = (N <= size_VA0 ? N : size_VA0);
-    VA       = MR_GC_NEW(struct ML_va);
-
-    VA->index            = -1;
-    VA->value            = (MR_Word) NULL;
-    VA->rest.array       = (MR_ArrayPtr) MR_GC_NEW_ARRAY(MR_Word, N + 1);
-    VA->rest.array->size = N;
-
-    for (i = 0; i < min; i++) {
-        (void) ML_va_get(VA0, i, &VA->rest.array->elements[i]);
-    }
-
-    for (i = min; i < N; i++) {
-        VA->rest.array->elements[i] = X;
-    }
+    VA = ML_va_copy_resize(VA0, N, X);
 ").
 
 resize(N, X, VA, resize(VA, N, X)).
@@ -415,139 +389,295 @@ resize(N, X, VA, resize(VA, N, X)).
     [will_not_call_mercury, promise_pure, will_not_modify_trail,
         does_not_affect_liveness],
 "
-    VA = ML_va_rewind(VA0);
+    ML_va_rewind(VA0);
+    VA = VA0;
 ").
 
 :- pragma foreign_decl("C", "
+
+/*
+** An ML_VerArray structure identifies a particular version of a version array.
+** It also points to a 'base', shared by multiple versions of the array.
+**
+** ML_VerArrayBase contains a slot for each element of the version array, each
+** pointing to a linked list of updates at that index.  Each update, an
+** ML_VerArraySlot structure, has a version number.  Later versions always
+** appear earlier in the list.  Searching for an array element's value consists
+** of finding the first update in the list with a version number less than or
+** equal to the version we want.
+**
+** An ML_VerArraySlot list is terminated by a null pointer, representing the
+** earliest version of that element.  The array initialiser is stored in the
+** ML_VerArrayBase.  This is an optimisation.
+**
+** ML_VerArrayBase also records the latest version for which it was updated.
+** This is for detecting a situation where we are trying to update an older
+** version of an array.  Then we must make a copy of the base array with later
+** updates since the older version not present.
+*/
+
+typedef struct ML_VerArray      ML_VerArray;
+typedef struct ML_VerArrayBase  ML_VerArrayBase;
+typedef struct ML_VerArraySlot  ML_VerArraySlot;
+
+struct ML_VerArray {
+    MR_Unsigned         ML_va_version;
+    ML_VerArrayBase     *ML_va_base;
+};
+
+struct ML_VerArrayBase {
+    MR_Unsigned         ML_vab_latest_version;
+    MR_Integer          ML_vab_size;
+    MR_Word             ML_vab_init;
+    ML_VerArraySlot     *ML_vab_slots[MR_VARIABLE_SIZED];
+};
+
+struct ML_VerArraySlot {
+    MR_Unsigned         ML_vas_version;
+    MR_Word             ML_vas_value;
+    ML_VerArraySlot     *ML_vas_next;
+};
+
     /*
-    ** If index is -1 then value is undefined and rest is the latest
-    ** array value.
-    **
-    ** Otherwise value is the overwritten value at index and rest is
-    ** a pointer to the next version in the chain.
+    ** Allocate a new ML_VerArray structure.
     */
+extern ML_VerArray      *ML_va_new(MR_Unsigned ver, ML_VerArrayBase *base);
 
-typedef struct ML_va    *ML_va_ptr;
+    /*
+    ** Allocate a new ML_VerArrayBase structure.
+    ** The slots are not initialised.
+    */
+extern ML_VerArrayBase  *ML_va_alloc_base(MR_Unsigned ver, MR_Integer N,
+                            MR_Word init);
 
-struct ML_va {
-    MR_Integer          index;  /* -1 for latest, >= 0 for older */
-    MR_Word             value;  /* Valid if index >= 0           */
-    union {
-        MR_ArrayPtr     array;  /* Valid if index == -1          */
-        ML_va_ptr       next;   /* Valid if index >= 0           */
-    } rest;
-};
+    /*
+    ** Allocate a new ML_VerArraySlot structure.
+    */
+static ML_VerArraySlot  *ML_va_new_slot(MR_Unsigned ver, MR_Word value,
+                            ML_VerArraySlot *next);
+
+    /*
+    ** Return the first slot in the chain with a version less than or equal to
+    ** the given version.
+    */
+static ML_VerArraySlot  *ML_va_chase_version(ML_VerArraySlot *slot,
+                            MR_Unsigned version);
 
     /*
     ** Returns the number of items in a version array.
     */
-extern MR_Integer   ML_va_size(ML_va_ptr);
+extern MR_Integer   ML_va_size(const ML_VerArray *);
 
     /*
     ** If I is in range then ML_va_get(VA, I, &X) sets X to the Ith item
     ** in VA (counting from zero) and returns MR_TRUE.  Otherwise it
     ** returns MR_FALSE.
     */
-extern int          ML_va_get(ML_va_ptr, MR_Integer, MR_Word *);
+extern MR_bool      ML_va_get(const ML_VerArray *, MR_Integer, MR_Word *);
 
     /*
     ** If I is in range then ML_va_set(VA0, I, X, VA) sets VA to be VA0
     ** updated with the Ith item as X (counting from zero) and
     ** returns MR_TRUE.  Otherwise it returns MR_FALSE.
     */
-extern int          ML_va_set(ML_va_ptr, MR_Integer, MR_Word, ML_va_ptr *);
+extern MR_bool      ML_va_set(const ML_VerArray *, MR_Integer, MR_Word,
+                        ML_VerArray **);
 
     /*
-    ** `Rewinds' a version array, invalidating all extant successors
-    ** including the argument.
+    ** Return a copy of VA0, but with N number of elements.  Elements which
+    ** exist in the new array but not in the old array are initialised to X.
     */
-extern ML_va_ptr    ML_va_rewind(ML_va_ptr);
+extern ML_VerArray  *ML_va_copy_resize(const ML_VerArray *VA0, MR_Integer N,
+                        MR_Word X);
+
+    /*
+    ** Update the base array of VA such that VA becomes the latest version of
+    ** the version array again.  All extant successors become invalid.
+    */
+extern void          ML_va_rewind(ML_VerArray *VA);
 
 ").
 
 :- pragma foreign_code("C", "
 
-#define ML_va_latest_version(VA)   ((VA)->index == -1)
+ML_VerArray *
+ML_va_new(MR_Unsigned ver, ML_VerArrayBase *base)
+{
+    ML_VerArray *VA;
 
-MR_Integer
-ML_va_size(ML_va_ptr VA)
+    VA = MR_GC_NEW(ML_VerArray);
+    VA->ML_va_version = ver;
+    VA->ML_va_base = base;
+    return VA;
+}
+
+ML_VerArrayBase *
+ML_va_alloc_base(MR_Unsigned ver, MR_Integer N, MR_Word init)
+{
+    size_t          size;
+    ML_VerArrayBase *new_base;
+
+    size = sizeof(ML_VerArrayBase) + N * sizeof(ML_VerArraySlot);
+    new_base = MR_GC_malloc(size);
+    new_base->ML_vab_latest_version = ver;
+    new_base->ML_vab_size = N;
+    new_base->ML_vab_init = init;
+    return new_base;
+}
+
+static ML_VerArraySlot *
+ML_va_new_slot(MR_Unsigned ver, MR_Word value, ML_VerArraySlot *next)
+{
+    ML_VerArraySlot *slot;
+
+    slot = MR_GC_NEW(ML_VerArraySlot);
+    slot->ML_vas_version = ver;
+    slot->ML_vas_value = value;
+    slot->ML_vas_next = next;
+    return slot;
+}
+
+static ML_VerArraySlot *
+ML_va_chase_version(ML_VerArraySlot *slot, MR_Unsigned version)
 {
-    while (!ML_va_latest_version(VA)) {
-        VA = VA->rest.next;
+    while (slot != NULL && slot->ML_vas_version > version) {
+        slot = slot->ML_vas_next;
     }
 
-    return VA->rest.array->size;
+    return slot;
+}
+
+MR_Integer
+ML_va_size(const ML_VerArray *VA)
+{
+    return VA->ML_va_base->ML_vab_size;
 }
 
 int
-ML_va_get(ML_va_ptr VA, MR_Integer I, MR_Word *Xptr)
+ML_va_get(const ML_VerArray *VA, MR_Integer I, MR_Word *Xptr)
 {
-    while (!ML_va_latest_version(VA)) {
-        if (I == VA->index) {
-            *Xptr = VA->value;
-            return MR_TRUE;
-        }
+    ML_VerArrayBase *base;
+    ML_VerArraySlot *slot;
 
-        VA = VA->rest.next;
+    if (I < 0 || I >= ML_va_size(VA)) {
+        return MR_FALSE;
     }
 
-    if (0 <= I && I < VA->rest.array->size) {
-        *Xptr = VA->rest.array->elements[I];
-        return MR_TRUE;
+    base = VA->ML_va_base;
+    slot = ML_va_chase_version(base->ML_vab_slots[I], VA->ML_va_version);
+
+    if (slot != NULL) {
+        *Xptr = slot->ML_vas_value;
     } else {
-        return MR_FALSE;
+        *Xptr = base->ML_vab_init;
     }
+
+    return MR_TRUE;
 }
 
-int
-ML_va_set(ML_va_ptr VA0, MR_Integer I, MR_Word X, ML_va_ptr *VAptr)
+MR_bool
+ML_va_set(const ML_VerArray *VA0, MR_Integer I, MR_Word X, ML_VerArray **VAptr)
 {
-    ML_va_ptr VA1 = MR_GC_NEW(struct ML_va);
+    MR_Unsigned     old_ver;
+    MR_Unsigned     new_ver;
+    ML_VerArrayBase *base;
+    ML_VerArray     *copy;
 
-    if (ML_va_latest_version(VA0)) {
-        if (I < 0 || I >= VA0->rest.array->size) {
-            return MR_FALSE;
-        }
+    /* Check index in range. */
+    if (I < 0 || I >= ML_va_size(VA0)) {
+        return MR_FALSE;
+    }
+
+    old_ver = VA0->ML_va_version;
+    new_ver = old_ver + 1;
+
+    /* Check for version counter overflow. */
+    if (new_ver < old_ver) {
+        return MR_FALSE;
+    }
 
-        VA1->index      = -1;
-        VA1->value      = (MR_Word) NULL;
-        VA1->rest.array = VA0->rest.array;
+    /*
+    ** If VA0 is not the latest version of the array then we can't
+    ** destructively update the base array, so we make a copy of the base array
+    ** and update that instead.
+    */
+    base = VA0->ML_va_base;
+    if (old_ver != base->ML_vab_latest_version) {
+        copy = ML_va_copy_resize(VA0, base->ML_vab_size, base->ML_vab_init);
+        return ML_va_set(copy, I, X, VAptr);
+    }
 
-        VA0->index     = I;
-        VA0->value     = VA0->rest.array->elements[I];
-        VA0->rest.next = VA1;
+    base->ML_vab_latest_version = new_ver;
+    base->ML_vab_slots[I] = ML_va_new_slot(new_ver, X, base->ML_vab_slots[I]);
+
+    *VAptr = ML_va_new(new_ver, base);
+    return MR_TRUE;
+}
 
-        VA1->rest.array->elements[I] = X;
+ML_VerArray *
+ML_va_copy_resize(const ML_VerArray *VA0, MR_Integer N, MR_Word X)
+{
+    MR_Unsigned             ver;
+    const MR_Unsigned       reset_ver = 1;
+    const ML_VerArrayBase   *base;
+    ML_VerArrayBase         *new_base;
+    ML_VerArraySlot         *slot;
+    MR_Integer              size_VA0;
+    MR_Integer              min;
+    MR_Integer              i;
+
+    ver      = VA0->ML_va_version;
+    base     = VA0->ML_va_base;
+    size_VA0 = ML_va_size(VA0);
+    min      = (N <= size_VA0 ? N : size_VA0);
+
+    if (size_VA0 == 0) {
+        new_base = ML_va_alloc_base(ver, N, X);
     } else {
-        if (I < 0 || I >= ML_va_size(VA0)) {
-            return MR_FALSE;
+        new_base = ML_va_alloc_base(ver, N, base->ML_vab_init);
+    }
+
+    for (i = 0; i < min; i++) {
+        slot = ML_va_chase_version(base->ML_vab_slots[i], ver);
+        if (slot != NULL && slot->ML_vas_next != NULL) {
+            /*
+            ** Create a new singleton list instead of pointing to an existing
+            ** list to allow collection of old update lists.
+            */
+            slot = ML_va_new_slot(reset_ver, slot->ML_vas_value, NULL);
         }
+        new_base->ML_vab_slots[i] = slot;
+    }
 
-        VA1->index      = I;
-        VA1->value      = X;
-        VA1->rest.next  = VA0;
+    if (X == new_base->ML_vab_init) {
+        for (i = min; i < N; i++) {
+            new_base->ML_vab_slots[i] = NULL;
+        }
+    } else {
+        for (i = min; i < N; i++) {
+            new_base->ML_vab_slots[i] = ML_va_new_slot(reset_ver, X, NULL);
+        }
     }
 
-    *VAptr = VA1;
-    return MR_TRUE;
+    return ML_va_new(reset_ver, new_base);
 }
 
-ML_va_ptr
-ML_va_rewind(ML_va_ptr VA)
+void
+ML_va_rewind(ML_VerArray *VA)
 {
-    MR_Integer I;
-    MR_Word    X;
+    MR_Unsigned     ver;
+    ML_VerArrayBase *base;
+    MR_Integer      i;
 
-    if (ML_va_latest_version(VA)) {
-        return VA;
-    }
+    ver = VA->ML_va_version;
+    base = VA->ML_va_base;
 
-    I  = VA->index;
-    X  = VA->value;
-    VA = ML_va_rewind(VA->rest.next);
-    VA->rest.array->elements[I] = X;
+    for (i = 0; i < base->ML_vab_size; i++) {
+        base->ML_vab_slots[i] =
+            ML_va_chase_version(base->ML_vab_slots[i], ver);
+    }
 
-    return VA;
+    base->ML_vab_latest_version = ver;
 }
 
 ").
diff --git a/tests/hard_coded/version_array_test.exp b/tests/hard_coded/version_array_test.exp
index 21fb575..dc7f68a 100644
--- a/tests/hard_coded/version_array_test.exp
+++ b/tests/hard_coded/version_array_test.exp
@@ -13,7 +13,7 @@ ordering(A2, A1) = '>'
  (sum 73)
 7, 7, 7, 7 (size 4)
 Found exception as expected: software_error("version_array.elem: index out of range")
-Found exception as expected: software_error("version_array.\'elem :=\': index out of range")
-Found exception as expected: software_error("version_array.\'elem :=\': index out of range")
+Found exception as expected: software_error("version_array.\'elem :=\': index out of range or too many updates")
+Found exception as expected: software_error("version_array.\'elem :=\': index out of range or too many updates")
 Found exception as expected: software_error("version_array.elem: index out of range")
 Found exception as expected: software_error("version_array.elem: index out of range")
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list