[m-dev.] Re: sparse bitsets
Zoltan Somogyi
zs at csse.unimelb.edu.au
Thu Dec 14 16:52:59 AEDT 2006
On 12-Dec-2006, Zoltan Somogyi <zs at csse.unimelb.edu.au> wrote:
> That wouldn't achieve my objective, which is: if you union two sets that don't
> overlap, the complexity of the union operation shouldn't depend on the size
> of the sets. We get the problem we do because constructing the union may
> (and in the training cars example, does) involve copying the list of
> bitset_elems in one set, just to append the other set at the end. With an
> explicitly hierarchical data structure design, you can avoid that traversal.
> With a data structure in which the hierarchical structure is hidden behind
> an abstraction barrier and in which the hierarchies of two sets aren't
> guaranteed to be aligned anyway, I don't think the traversal can be avoided.
I have designed a data structure that I expect will provide the algorithmic
complexity we need. I attach the description for your comments. I am starting
work on the algorithms.
I am also seeking feedback on the name.
Zoltan.
% We describe a set using a tree. Leaf nodes are level 0 in the tree.
% Interior nodes have their level listed in the node; this level must be
% greater than zero.
%
% A leaf node contains information about the set membership of the items
% in the range leaf_offset (inclusive) to leaf_offset + bits_per_int
% (exclusive). This is done using a bitmap; an item is in the set if
% and only if the bit corresponding to the item in the bits field is 1.
%
% An interior node of level N contains information about the set membership
% of the items in the range init_offset (inclusive) to final_offset
% (exclusive), where
%
% final_offset = init_offset +
% bits_per_int * 2 ^ (N * bits_per_level)
%
% The components field of an interior node of level N should consist of
% a list of nodes of level N-1 (leaf nodes if N-1 = 0, interior nodes
% otherwise) whose init_offset/leaf_offset fields are init_offset +
% k * bits_per_int * 2 ^ (N-1 * bits_per_level), for values of k between
% 0 (inclusive) and 2 ^ bits_per_level (exclusive)
%
% Invariants:
%
% - In every list of nodes, all the nodes in the list have the same level.
%
% - In every list of nodes, the list elements are sorted on offset, and
% no two elements have the same offset.
%
% - A list of nodes of level N must have the ranges of all its nodes
% contained within the range of a single level N+1 node.
%
% - If a node's range contains no items, the node must be deleted.
%
% - The top level list should not be a singleton, unless it consists
% of a single leaf node.
%
% These invariants ensure that every set of items has a unique
% representation.
%
% Leaf node cells should only be constructed using make_leaf_node/2.
:- type big_bitset(T) % <= enum(T)
---> big_bitset(list(bitset_node)).
:- type bitset_node
---> leaf_node(
leaf_offset :: int,
% multiple of bits_per_int
leaf_bits :: int
% bits offset .. offset + bits_per_int - 1
)
; interior_node(
init_offset :: int,
% multiple of
% bits_per_int * 2 ^ (level * bits_per_level)
limit_offset :: int,
% limit_offset = init_offset +
% bits_per_int * 2 ^ (level * bits_per_level)
level :: int,
% Redundant; could be computed from init_offset
% and limit_offset.
components :: list(bitset_node)
).
:- func bits_per_level = int.
bits_per_level = 5.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions: mercury-developers-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the developers
mailing list