[m-dev.] Re: sparse bitsets

Zoltan Somogyi zs at csse.unimelb.edu.au
Thu Dec 14 16:52:59 AEDT 2006

On 12-Dec-2006, Zoltan Somogyi <zs at csse.unimelb.edu.au> wrote:
> That wouldn't achieve my objective, which is: if you union two sets that don't
> overlap, the complexity of the union operation shouldn't depend on the size
> of the sets. We get the problem we do because constructing the union may
> (and in the training cars example, does) involve copying the list of
> bitset_elems in one set, just to append the other set at the end. With an
> explicitly hierarchical data structure design, you can avoid that traversal.
> With a data structure in which the hierarchical structure is hidden behind
> an abstraction barrier and in which the hierarchies of two sets aren't
> guaranteed to be aligned anyway, I don't think the traversal can be avoided.

I have designed a data structure that I expect will provide the algorithmic
complexity we need. I attach the description for your comments. I am starting
work on the algorithms.

I am also seeking feedback on the name.


    % We describe a set using a tree. Leaf nodes are level 0 in the tree.
    % Interior nodes have their level listed in the node; this level must be
    % greater than zero.
    % A leaf node contains information about the set membership of the items
    % in the range leaf_offset (inclusive) to leaf_offset + bits_per_int
    % (exclusive). This is done using a bitmap; an item is in the set if
    % and only if the bit corresponding to the item in the bits field is 1.
    % An interior node of level N contains information about the set membership
    % of the items in the range init_offset (inclusive) to final_offset
    % (exclusive), where
    %   final_offset = init_offset +
    %       bits_per_int * 2 ^ (N * bits_per_level)
    % The components field of an interior node of level N should consist of 
    % a list of nodes of level N-1 (leaf nodes if N-1 = 0, interior nodes
    % otherwise) whose init_offset/leaf_offset fields are init_offset +
    % k * bits_per_int * 2 ^ (N-1 * bits_per_level), for values of k between
    % 0 (inclusive) and 2 ^ bits_per_level (exclusive)
    % Invariants:
    % - In every list of nodes, all the nodes in the list have the same level.
    % - In every list of nodes, the list elements are sorted on offset, and
    %   no two elements have the same offset.
    % - A list of nodes of level N must have the ranges of all its nodes
    %   contained within the range of a single level N+1 node.
    % - If a node's range contains no items, the node must be deleted.
    % - The top level list should not be a singleton, unless it consists
    %   of a single leaf node.
    % These invariants ensure that every set of items has a unique
    % representation.
    % Leaf node cells should only be constructed using make_leaf_node/2.

:- type big_bitset(T)    % <= enum(T)
    ---> big_bitset(list(bitset_node)).

:- type bitset_node
    --->    leaf_node(
                leaf_offset     :: int,
                                % multiple of bits_per_int

                leaf_bits       :: int
                                % bits offset .. offset + bits_per_int - 1
    ;       interior_node(
                init_offset     :: int,
                                % multiple of
                                % bits_per_int * 2 ^ (level * bits_per_level)
                limit_offset    :: int,
                                % limit_offset = init_offset +
                                %   bits_per_int * 2 ^ (level * bits_per_level)
                level           :: int,
                                % Redundant; could be computed from init_offset
                                % and limit_offset.
                components      :: list(bitset_node)

:- func bits_per_level = int.

bits_per_level = 5.
mercury-developers mailing list
Post messages to:       mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions:          mercury-developers-request at csse.unimelb.edu.au

More information about the developers mailing list