[m-rev.] for review: tree_bitset.m

Ralph Becket rafe at csse.unimelb.edu.au
Tue Dec 19 15:20:52 AEDT 2006


Zoltan Somogyi, Tuesday, 19 December 2006:
> For review by anyone. The new test isn't in this diff (it is in
> a different workspace), but it is boring anyway.

> +:- module tree_bitset.
> +:- interface.
> +
> +:- import_module enum.
> +:- import_module list.
> +:- import_module term.
> +
> +:- use_module set.
> +
> +%-----------------------------------------------------------------------------%
> +
> +:- type tree_bitset(T). % <= enum(T).
> +
> +    % Return an empty set.
> +    %
> +:- func init = tree_bitset(T).
> +:- pred init(tree_bitset(T)::out) is det.
> +
> +:- pred empty(tree_bitset(T)).
> +:- mode empty(in) is semidet.
> +:- mode empty(out) is det.
> +
> +    % `equal(SetA, SetB)' is true iff `SetA' and `SetB' contain the same
> +    % elements. Takes O(min(card(SetA), card(SetB))) time.
> +    %
> +:- pred equal(tree_bitset(T)::in, tree_bitset(T)::in) is semidet <= enum(T).
> +
> +    % `list_to_set(List)' returns a set containing only the members of `List'.
> +    % Takes O(length(List)) time and space.
> +    %
> +:- func list_to_set(list(T)) = tree_bitset(T) <= enum(T).
> +:- pred list_to_set(list(T)::in, tree_bitset(T)::out) is det <= enum(T).

I assume you're only providing `(in, ..., in, out) is det' pred versions
for backwards compatibility?  At some point we should consider
deprecating such things.

> +
> +    % The bit pattern will often look like a pointer, so allocate the pairs
> +    % using GC_malloc_atomic() to avoid unnecessary memory retention. Doing
> +    % this slows down the compiler by about 1%, but in a library module it is
> +    % better to be safe.
> +:- pragma foreign_proc("C",
> +    make_leaf_node(A::in, B::in) = (Node::out),
> +    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
> +"{
> +#define ML_BITSET_TAG MR_FIRST_UNRESERVED_RAW_TAG
> +
> +    MR_tag_offset_incr_hp_atomic_msg(Node, MR_mktag(ML_BITSET_TAG),
> +        MR_SIZE_SLOT_SIZE, MR_SIZE_SLOT_SIZE + 2,
> +        MR_PROC_LABEL, ""tree_bitset:leaf_node/0"");

I'd like to see MR_GC_NEW_ATOMIC added to runtime/mercury_memory.h for
this.  But that's another change.

> +    MR_define_size_slot(MR_mktag(ML_BITSET_TAG), Node, 1);
> +    MR_field(MR_mktag(ML_BITSET_TAG), Node, 0) = A;
> +    MR_field(MR_mktag(ML_BITSET_TAG), Node, 1) = B;
> +}").

> +%-----------------------------------------------------------------------------%
> +
> +to_sorted_list(Set) = foldr(func(Elem, Acc0) = [Elem | Acc0], Set, []).

to_sorted_list(Set) = foldr(list.cons, Set, []).

> +to_set(Set) = set.sorted_list_to_set(to_sorted_list(Set)).
> +
> +from_set(Set) = sorted_list_to_set(set.to_sorted_list(Set)).
> +
> +%-----------------------------------------------------------------------------%
> +
> +% XXX We should make this more efficient. At least, we could filter the bits in
> +% the leaf nodes, yielding a new list of leaf nodes, and we could put the
> +% interior nodes on top, just as we do in sorted_list_to_set.
> +
> +filter(P, S) = S ^ to_sorted_list ^ list.filter(P) ^ sorted_list_to_set.

The last time I tried using ^ as "postfix" function application there
were squawks from the gallery.  Personally I prefer to see ^ as
syntactic sugar and I'm happy with this style, but strong opinions were
voiced that ^ should be reserved specifically for field access.

> +make_singleton_set(A) = insert(init, A).

It's a good idea to module qualify that init or we might hit
ambiguity problems with intermodule optimization.

> +find_least_bit(Bits0) = BitNum :-
> +    Size = bits_per_int,
> +    BitNum0 = 0,
> +    BitNum = find_least_bit_2(Bits0, Size, BitNum0).
> +
> +:- func find_least_bit_2(int, int, int) = int.
> +
> +find_least_bit_2(Bits0, Size, BitNum0) = BitNum :-
> +    ( Size = 1 ->
> +        % We can't get here unless the bit is a 1 bit.
> +        BitNum = BitNum0
> +    ;
> +        HalfSize = unchecked_right_shift(Size, 1),
> +        Mask = mask(HalfSize),
> +
> +        LowBits = Bits0 /\ Mask,
> +        ( LowBits \= 0 ->
> +            BitNum = find_least_bit_2(LowBits, HalfSize, BitNum0)
> +        ;
> +            HighBits = Mask /\ unchecked_right_shift(Bits0, HalfSize),
> +            BitNum = find_least_bit_2(HighBits, HalfSize, BitNum0 + HalfSize)
> +        )
> +    ).

You can find the least set bit of a 2s complement number Bits0 with
(Bits0 /\ -Bits0).  You can convert that to a BitNum with a simpler
loop.  I appreciate that your algorithm is O(log n), but I suspect
the constant factors are larger.

Of course, the efficiency of this function is probably neither here nor
there.


> +
> +%-----------------------------------------------------------------------------%
> +
> +list_to_set(List) = Set :-
> +    list.sort(List, SortedList),
> +    Set = sorted_list_to_set(SortedList).

list_to_set(List) = sorted_list_to_set(list.sort(List)).

> +init(init).

Ditto re: module qualification.

The rest looks fine.

-- Ralph
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list