[m-dev.] [reuse] fixes of some naughty bugs

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Mar 13 02:06:12 AEDT 2001


On 12-Mar-2001, Peter Schachte <schachte at cs.mu.OZ.AU> wrote:
> On Mon, Mar 12, 2001 at 04:41:08PM +1100, Fergus Henderson wrote:
> > > I'm a bit concerned about doing this kind of thing on a type with a
> > > user-defined equivalence predicate.  Suppose you have a set type:
> > > 
> > > 	type set(T) ---> set(sorted :: bool, elements :: list(T))
> > > 		where equality is set_equal.
> > 
> > The fact that the compiler allows such declarations is an outstanding bug.
> > Allowing named fields in types with user-defined equality, and
> > treating those fields as functions, breaks referential transparency.
> 
> I can't see how the blame lies with the accessor functions.

The accessor functions violate referential transparency.  In
particular, given some set S, it is quite possible for the expression
`sorted(S)' to have different values at different times, depending on what
representation S happens to have at those times.  You can't rely on
`sorted(S) = sorted(S)' succeeding.  Likewise for `elements(S)'.

> Without them, you can still write
> 
>     Set = set(_,Elts), Set = Set2, Set = set(Flag,_) ...
> 
> and have the same problem.

No, that code doesn't have the same problem;
that code doesn't violate referential transparency.

For any given value of `Set' and `Set2', there might
be more than one value of `Elts' and `Flag' which is
a solution to that goal, but that's just the usual
case of a nondeterministic (in this case cc_nondet) goal.
Nondeterministic goals do not violate referential transparency.

The only problem with that code is that some people (especially
long-time Prolog programmers, who may assume that lower-case set/2 is a
constructor for an immutable data structure) may be surprised by
the behaviour of that code.  But it doesn't violate referential
transparency.

> > That may come as a surprise.  But the same could happen in pure Mercury code,
> > without any destructive update, if set/2 was a function rather than a
> > constructor.  For example:
> > 
> > 	:- type myset(T) ---> mkset(list(T)).
> > 
> > 	:- func set(bool, list(T)) = myset(T).
> > 	:- mode set(out, out) = in is nondet.
> > 	set(IsSorted, List) = mkset(SortedList) :-
> > 		sorted(SortedList),
> > 		( IsSorted = yes, List = SortedList
> > 		; IsSorted = no, list__perm(SortedList, List)
> > 		).
> > 
> > 	sorted(L) :- list__sort(L, L).
> 
> This isn't quite the same, because reordering does not change the behavior.

Well, reordering of the disjunction in set/2 will change the behaviour.
If the compiler happens to inline both of the calls to set/2,
and happens to reorder the disjunction only for the second call,
then you can get the same situation where the first solution
returned has Flag = yes but Elts not sorted.

> You're just pointing out that nondet functions don't behave like functions,
> right?  Though in this case it's a reverse mode, which is even less
> surprising.

What I'm pointing out here is that constructors for types with user-defined
equality behave like functions whose reverse mode is (cc_)nondet,
and that both of these are different to constructors for types
with the standard equality.

> Ie, who is shocked that,
> 
> 	X + _ = 4, _ + Y = 4, X + Y = Z
> 
> could wind up with Z != 4?

Exactly.  So why be shocked by the behaviour of the original example
with destructive update?

> > I think your example
> > is a good example of exactly the kind of thing I'd like to use
> > user-defined equality types for, so I would certainly like to permit
> > those kinds of value-preserving but representation-changing destructive
> > updates.
> 
> Agreed.  But maybe there needs to be some concept like impurity (only
> without the stigma and without the ubiquitous declarations of impurity) to
> indicate that it can happen.  This would inhibit reordering of accesses, and
> perhaps inhibit some optimizations.

I don't think reordering of accesses should be inhibited merely because
the representation might change.  The compiler and the programmer just
need to both be aware that if `foo' is a function or a constructor for a
type with a user-defined equality, then

	X = foo(A, B)

is NOT the same as

	X = foo(A, _), X = foo(_, B)

In the case where `foo' is `+', this is fairly obvious.
In the case where `foo' is `set', it is still true, albeit not so obvious.

> It would be a shame if the compiler
> could never count on terms remaining fixed just so once in a great while we
> could use some clever coding trick.

Possibly.  But I think in most cases the compiler will be satisfied
with the lesser guarantee that the value remains fixed, rather than
the stronger guarantee that the representation remains fixed.
Can you give examples of optimizations that would be inhibited?

> > I agree that there is some potential for confusion.  One thing that we
> > could do to reduce that is to prohibit types with user-defined equality
> > from having more than one functor or more than one argument, i.e. require
> > them all to be no-tag types.
> 
> Better still, require non-canonical types (ie, types with user defined
> equality) to be abstract.  Users have no business poking around inside,
> anyway.  Then the implementor is the only one who can mess up, and hopefully
> she will know better.

Both of these are style issues.  I'm inclined to give the programmer the
benefit of the doubt.  I could live with optional compiler warnings
for them, but I'd rather not prohibit them outright.

For example, concrete types with user-defined equality might be
useful in modelling interfaces to modules written in foreign languages.
E.g. if I have a C++ package that uses a class

	class Set {
	public:
		mutable bool sorted;
		List<T> elements;
		friend bool operator == (const Set &, const Set &);
		...
	};

then perhaps the best way to model that is by using a concrete
type with a user-defined equality and more than one field.

That's just one example off the top of my head. 
There may be more convincing examples that we haven't
thought of yet but which someone else will want to use
in the future.  I don't think we should prevent them from
doing so simply because we can't see now how it might be useful.

> > > Am I right that there is no way to write a foreign predicate that has both
> > > unique and non-unique modes without writing two copies of the code?
> > 
> > Well, you need two copies of the `pragma import' or `pragma foreign_proc'
> > declaration.  But the code itself can be defined once in a subroutine
> > (defined either externally or using `pragma foreign_code').
> > 
> > It would not be difficult to change the syntax to allow a single
> > `pragma import' or `pragma foreign_proc' declaration to define
> > multiple modes of a given procedure.
> 
> I think it would be better to make it equally easy to declare a foreign proc
> input that does or doesn't need to be unique, and that does or doesn't have
> uniqueness preserved.

Agreed.  However, that is not as easy; it would require
a non-trivial extension to the mode system.

> I think it would also be worth thinking about making users specify whether
> each foriegn pred or goal is pure, semipure or impure.

I think that is a good idea.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list