[m-dev.] for review: use bitsets in quantification

Fergus Henderson fjh at cs.mu.OZ.AU
Fri Nov 3 19:42:45 AEDT 2000


That change looks great, Simon.
I do have a few comments, though.

On 03-Nov-2000, Simon Taylor <stayl at cs.mu.OZ.AU> wrote:
> 
> +% File: enum.m.
> +% Author: stayl.
> +% Stability: low.
> +%
> +% This module provides the typeclass `enum', which describes
> +% types which can be converted to and from integers without loss
> +% of information.
> +%
> +%-----------------------------------------------------------------------------%
> +%-----------------------------------------------------------------------------%
> +:- module enum.
> +
> +:- interface.
> +
> +:- typeclass enum(T) where [
> +	func to_int(T) = int,
> +	func from_int(int) = T
> +].

The documentation and interface here suggests that only types which
are isomorphic to `int' are allowed, and that `from_int' can't fail.
But I think the intent is that this typeclass should be allowed
for types which are isomorphic to a subrange of `int'.
The `from_int' function should be semidet, and you should
document when it should fail.

> +:- instance enum(int).

It would also be useful to have instances for `char' and `bool'.

> @@ -410,10 +437,13 @@
>  
>  :- pragma c_header_code("
>  	#include <limits.h>
> +
> +	#define ML_BITS_PER_INT		(sizeof(MR_Integer) * CHAR_BIT)
>  ").

Technically that is making an assumption which standard C does not guarantee,
namely that integers have no p, since integer types

> +% File: sparse_bitset.m.
> +% Author: stayl
> +% Stability: medium.
> +%
> +% This module provides an ADT for storing sets of integers.
> +% If the integers are closely grouped, this representation will be
> +% much more compact than that provided by set.m.

The comment here doesn't explain why it is called "sparse" bitset.

> Union, intersection
> and difference are much faster. Converting to and from lists is slower.

I think it would be worth outlining roughly how sparse_bitsets are
represented, to give programmers a better idea of how it will perform.
It would also be nice to document asymptotic complexities for the
different operations.

> +:- module sparse_bitset.
...
> +	% `set__equal(SetA, SetB' is true iff `SetA' and `SetB'
> +	% contain the same elements.
> +:- pred equal(sparse_bitset(T), sparse_bitset(T)).
> +:- mode equal(in, in) is semidet.

s/set__//

> +	% `set_ordlist__list_to_set(List)' returns a set

s/set_ordlist__//

Likewise in other parts of this module.

> +	% containing only the members of `List'.
> +	% In the worst case this will take O(N^2) time
> +	% and space. If the elements of the list are closely
> +	% grouped, it will be closer to O(N).
> +:- func list_to_set(list(T)) = sparse_bitset(T) <= enum(T).
> +:- pragma type_spec(list_to_set/1, T = var(_)).
> +:- pragma type_spec(list_to_set/1, T = int).

Do these pragmas need to go in the interface? If so, would it be
better to put them all in a group at the end, rather than putting them
with the declaration of each operation?

> +        % `member(X, Set)' is true iff `X' is a member of `Set'.
> +:- pred member(T, sparse_bitset(T)) <= enum(T).
> +:- mode member(in, in) is semidet.

Probably you should use `contains(sparse_bitset(T), T)' rather than
`member'; for consistency with lists, `member' should be reserved for
procedures that nondeterministically produce all the possible members.

> +	% `insert_list(Set0, X)' returns the union of `Set0' and the set
> +	% containing only the members of `X'.
> +:- func delete_list(sparse_bitset(T), list(T)) = sparse_bitset(T) <= enum(T).

The documentation there is wrong (cut-and-paste error, obviously).

> +	% `fold(Func, Set, Start)' calls Func with each element
> +	% of `Set' and an accumulator (with the initial value of
> +	% `Start'), and returns the final value.
> +	% Takes O(NlogN) time.
> +:- func fold(func(T, U) = U, sparse_bitset(T), U) = U <= enum(T).

You should document the order in which elements are processed.

> +	% There's a lot of code duplication between this
> +	% and fold/3 below. The main difference is that
> +	% fold traverses the set in sorted order, whereas
> +	% to_sorted_list traverses the set in reverse order
> +	% to avoid having to reverse the resulting list.
> +to_sorted_list(sparse_bitset(A)) =
> +		sparse_bitset__to_sorted_list_2(A).

It may be worth defining a generic `foldr' that traverses the elements
in reverse order, and using that.

> +bits_for_index(Index, Offset, Bits) :-
> +	% Need to use `div' and `mod' rather than `//' and `rem'
> +	% to handle negative values correctly.
> +	Offset = int__floor_to_multiple_of_bits_per_int(Index),

It's a bit hard to see how the comment there relates to the code.

> +	% The bit pattern will often look like a pointer,
> +	% so allocate the pairs using GC_malloc_atomic()
> +	% to avoid unnecessary memory retention.
> +	% Doing this slows down the compiler by about 1%,

Do you know why?

> +	% but in a library module it's better to be safe.
> +	% On the other hand, the bit patterns probably cause
> +	% no more memory retention than unboxed floats.

I don't think the comment about unboxed floats is true.
We only use unboxed floats on 64 bit architectures,
and on such architectures, the chance of a random value
looking like a pointer is much less likely than on 32 bit
architectures.  So unboxed floats probably don't cause
much memory retention.  On 32-bit architectures we use
boxed floats, and in that case we allocate the floats using
GC_malloc_atomic().

> +:- func check(string, bitset_tester(T), bitset_tester(T)) =
> +		bitset_tester(T) <= enum(T).
> +
> +check(Op, Tester1, Tester) = Tester :-
> +
> +	Tester1 = BitSet1 - Set1,
> +	BitSetSet1 =
> +		sparse_bitset__sorted_list_to_set(set__to_sorted_list(Set1)),
> +	Tester = BitSet - Set,
> +	BitSetSet = sparse_bitset__sorted_list_to_set(set__to_sorted_list(Set)),
> +	( BitSetSet1 = BitSet1, BitSet = BitSetSet ->
> +		true
> +	;
> +		unsafe_perform_io(io__write_string("Error in ")),
> +		unsafe_perform_io(io__write_string(Op)),
> +		unsafe_perform_io(io__write_string(":\n")),
> +		unsafe_perform_io(io__write_string("Set1 : ")),
> +		unsafe_perform_io(io__write(Set1)),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__write_string("BitSet1 : ")),
> +		unsafe_perform_io(io__write(BitSetSet1)),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__write_string("Result Set: ")),
> +		unsafe_perform_io(io__write(Set)),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__write_string("Result BitSet: ")),
> +		unsafe_perform_io(io__write(BitSetSet)),
> +		unsafe_perform_io(io__nl),
> +		unsafe_perform_io(io__nl),
> +		error("bitset failed")
> +	).

Better make that `impure unsafe_perform_io', otherwise the compiler
will do duplicate call elimination to eliminate your newlines.

A much better approach, that avoids the need for unsafe_perform_io,
would be to just package the data that you want displayed into a
struct and throw that as an exception.

Could you please post a relative diff when you've addressed those issues?

Cheers,
	Fergus.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list