[mercury-users] Mode problem

Fergus Henderson fjh at cs.mu.OZ.AU
Fri Nov 29 18:55:01 AEDT 2002


On 28-Nov-2002, Lauri Alanko <la at iki.fi> wrote:
> I just started learning Mercury. For starters, I tried to implement
> simple type inference for simply-typed lambda calculus with ints and
> bools. In Prolog it would go like this:
> 
> lookup([(K,V)|_],K,V) :- !.
> lookup([_|Rest],K,V) :- lookup(Rest,K,V).
> 
> tc(t,bool,_).
> tc(f,bool,_).
> tc(C,int,_) :- integer(C).
> tc(V,T,Env) :- atom(V), lookup(Env,V,T).
> tc(if(Test,Then,Else),T,Env) :-
> 	tc(Test,bool,Env),
> 	tc(Then,T,Env),
> 	tc(Else,T,Env).
> tc(app(F,X),T,Env) :-
> 	tc(F,fun(S,T),Env),
> 	tc(X,S,Env).
> tc(lambda(V,X),fun(S,T),Env) :-
> 	atom(V),
> 	tc(X,T,[(V,S)|Env]).

That Prolog code doesn't work properly.
For example, it infers a type for the following expression,
even though this expression should not be well-typed:

	?- tc(lambda(x,if(x,1,t)),T,_).
	 
	T = fun(bool, int)

The bug seems to caused by a poor choice of "defaulty" representation:
"t" could be a boolean constant, but it could also be an identifier
which could have type int.

Fortunately it is very difficult to make this kind of mistake in Mercury,
due to Mercury's strong typing.  You do have to be a bit more verbose -- 
you have to explicitly define the types you wish to work with --
but this is useful documentation anyway and it seems more than worth it
for avoiding subtle bugs like this.

So let's chalk that one up as a win for Mercury...
(Scoreline so far: Mercury 1, Prolog 0 ;-)

Anyway, we can patch that bug in the Prolog code by changing both
occurrences of `atom(V)' with `ident(V)' and adding

	ident(V) :- atom(V), V \= t, V \= f.

But there are still more problems with the Prolog version.  One of them is
that it doesn't handle variable scoping in lambda expressions properly.
For example, the following should not be well-typed, but the Prolog code
infers a type for it:

	?- tc(lambda(x,if(x,lambda(y,t),lambda(x,app(lambda(z,x),if(t,1,x))))),
		T, E).

	T = fun(bool, fun(bool, bool))
	E = [ (x, int)|_G726]

The reason that the Prolog code was able to infer a type for this example
was that it treated the final occurrence of `x' as refering to some variable
with type `int' in the environment, rather than binding it to `x' parameter
of the enclosing lambda expression, which has type `bool'.

The problem comes from the definition of lookup/3:

	lookup([(K,V)|_],K,V) :- !.
	lookup([_|Rest],K,V) :- lookup(Rest,K,V).

The cut here is placed too late -- I think it ought to be

	lookup([(K,V0)|_],K,V) :- !, V = V0.
	lookup([_|Rest],K,V) :- lookup(Rest,K,V).

instead.

In Mercury, this kind of bug is prevented by Mercury's mode system.
If we convert that to the analagous syntax using if-then-else rather
than cut, which is one of the things you'd need to do to rewrite this
in Mercury, then the original (buggy) definition would correspond to

	lookup([First|Rest],K,V) :-
		( First = (K,V) -> true ; lookup(Rest, K, V) ).

and the fixed definition would correspond to

	lookup([First|Rest],K,V) :-
		( First = (K,V0) -> V = V0 ; lookup(Rest, K, V) ).

In Mercury, calling the buggy version of lookup/3 with the third argument
unbound would result in a mode error at compile time: the compiler would
report an error message such as "scope error: attempt to bind non-local
variable `V' in negated context".

(I think that makes it Mercury 2, Prolog 0 ;-)

The main drawback with writing this in Mercury is that Mercury doesn't
have any direct support for Herbrand constraints, so you have to
write the unification code yourself.  See below for an example.
(Mercury 2, Prolog 1).

However, relying on Prolog's built-in unification is not going to work
properly, because Prolog's built-in unification doesn't do the occurs
check.   Indeed, a query such as

	?- tc(lambda(x,app(x, x)),T,_).

has undefined behaviour according to the Prolog standard.
So that's another bug in the original Prolog code.
To make this work in Prolog, you need to very carefully replace
the implicit unifications with calls to unify_with_occurs_check.
In Mercury, the unifications need to be explicit anyway, so
it's harder to make this mistake.  Furthermore, even if you forget
to code up the occurs check in your hand-written unification code, the
resulting behaviour is at least going to be consistent between different
Mercury implementations (or different back-ends of the current Mercury
implementation), rather than varying between different implementations
like it does in Prolog.  So you won't have the problem of it working
fine on one implementation but then breaking when you try to port it
to another.  (Mercury 3, Prolog 1).

Furthermore, the original Prolog code also has yet another bug -- this
time a performance bug -- it leaves behind unnecessary choice points.
Leaving behind unnecessary choice points like this can result in space leaks,
because the choice points can prevent tail recursion optimization and
garbage collection.  In Mercury, if you declare the top-level procedure's
determinism, then Mercury's determinism system will prevent such bugs,
by reporting a determinism error.  (Mercury 4, Prolog 1).

To me, this example really sums up many of the differences between
Mercury and Prolog.  In Prolog, things seem simple at first, until you
scratch the surface a little.  But once you do, it turns out that there
are lots of complicated things that you need to worry about, such as
steadfastness, correct placement of cuts, avoiding defaulty data types,
first-argument indexing, occurs check safety, problems with the non-ground
representation, and so forth (Richard O'Keefe's book "The Craft of Prolog"
has a very good description of these sorts of things).  As a result,
it can be quite hard to understand the behaviour of even very simple
Prolog code, and it's very easy to get things wrong and write code which
has subtle bugs.

In Mercury, there is a bit more up-front complexity in the language's
type, mode, and determinism systems; but it is up-front, rather than
hidden, and a benefit of it is that the code that you write in Mercury
is usually simpler and easier to understand.  Furthermore, bugs tend
to manifest themselves early, as compile errors, rather than resulting
in subtle run-time problems or performance problems that don't show
up until much later.

I enclose below a Mercury implementation of simple type inference for
simply-typed lambda calculus with ints and bools.  It's a fair bit
longer than the original Prolog program, but a lot of this is useful
documentation in the form of type, mode, and determinism declarations.
The other main contributor to the length is the explicit unification code.
I haven't really tested it, so it's possible that I made a mistake. 
(Writing in Mercury doesn't eliminate the need for testing! ;-)
But this should at least give you the basic idea.

:- module tc.
:- interface.
:- import_module term, bool, list.
:- type ident == string.

	% type variables
:- type tvar == term__var.
:- type tvar_supply == term__var_supply(generic).

:- type typ ---> 
	int;
	bool;
	fun(typ,typ);
	tvar(tvar).

:- type value --->
	int(int);
	bool(bool).

:- type expr --->
	variable(ident);
	lambda(ident,expr);
	app(expr,expr);
	const(value);
	if(expr,expr,expr).

:- type env ---> env(
		tvars :: tvar_supply,
		ident_types :: list({ident, typ}),
		tvar_types :: list({tvar, typ})
	).

:- func empty_env = env.

:- pred tc(expr,typ,env,env).
:- mode tc(in, in, in, out) is semidet.
:- mode tc(in, out, in, out) is semidet.

:- implementation.
:- import_module require.

empty_env = env(init_var_supply, [], []).

tc(const(int(_)),int) --> [].
tc(const(bool(_)),bool) --> [].
tc(variable(X),T) -->
	has_type(X,T).
tc(app(F,E),T) -->
	tc(E,ET),
	tc(F,FT),
	unify_func_type(FT,ArgT,T),
	unify_type(ET,ArgT).
tc(lambda(X,E),fun(S,T)) -->
	push_scope(X,S),
	tc(E,T),
	pop_scope(X).
tc(if(C,X,Y),XT) -->
	tc(C,CT),
	unify_type(CT,bool),
	tc(X,XT),
	tc(Y,YT),
	unify_type(XT,YT).

:- pred unify_type(typ, typ, env, env).
:- mode unify_type(in, in, in, out) is semidet.
unify_type(int, int) --> [].
unify_type(bool, bool) --> [].
unify_type(fun(T1,T2), fun(S1,S2)) -->
	unify_type(T1,S1),
	unify_type(T2,S2).
unify_type(tvar(TV), T) -->
	unify_type_var(TV, T).
unify_type(int, tvar(TV)) -->
	unify_type_var(TV, int).
unify_type(bool, tvar(TV)) -->
	unify_type_var(TV, bool).
unify_type(fun(T1,T2), tvar(TV)) -->
	unify_type_var(TV, fun(T1, T2)).

:- pred unify_func_type(typ, typ, typ, env, env).
:- mode unify_func_type(in, out, out, in, out) is semidet.
unify_func_type(fun(T1,T2), T1, T2) --> [].
unify_func_type(tvar(TV), T1, T2, Env0, Env) :-
	% check if TV already occurs in the environment;
	% if so, unify with the pre-existing binding for TV
	( lookup(Env0 ^ tvar_types, TV, T0) ->
		unify_func_type(T0, T1, T2, Env0, Env)
	;
		% otherwise make T1 and T2 fresh type variables
		% and bind TV to fun(T1, T2),
		fresh_tvar(TV1, Env0, Env1),
		fresh_tvar(TV2, Env1, Env2),
		T1 = tvar(TV1),
		T2 = tvar(TV2),
		Env = (Env2 ^ tvar_types :=
			[{TV,fun(tvar(TV1),tvar(TV2))} | Env2 ^ tvar_types])
	).


:- pred unify_type_var(tvar, typ, env, env).
:- mode unify_type_var(in, in, in, out) is semidet.
unify_type_var(TV, T, Env0, Env) :-
	% check if TV already occurs in the environment;
	% if so, unify with the pre-existing binding for TV
	( lookup(Env0 ^ tvar_types, TV, T0) ->
		unify_type(T0, T, Env0, Env)
	;
		% otherwise just bind TV to T
		(not occurs(TV, T, Env)),
		Env = (Env0 ^ tvar_types :=
				[{TV,T} | Env0 ^ tvar_types])
	).

:- pred occurs(tvar, typ, env).
:- mode occurs(in, in, in) is semidet.
occurs(V, fun(T1, T2), Env) :- occurs(V, T1, Env) ; occurs(V, T2, Env).
occurs(V, tvar(V), _).
occurs(V, tvar(V1), Env) :-
	lookup(Env^tvar_types, V1, T),
	occurs(V, T, Env).

:- pred has_type(ident, typ, env, env).
:- mode has_type(in, out, in, out) is det.
has_type(V,T,Env0,Env) :-
	% check if V already occurs in the environment;
	% if so, return the pre-existing type for V
	( lookup(Env0 ^ ident_types, V, T0) ->
		T = T0,
		Env = Env0
	;
		% create a fresh type variable TV, and insert
		% the mapping {V,tvar(TV)} into the environment
		fresh_tvar(TV, Env0, Env1),
		Env = (Env1 ^ ident_types :=
				[{V,tvar(TV)} | Env1 ^ ident_types]),
		T = tvar(TV)
	).

:- pred push_scope(ident, typ, env, env).
:- mode push_scope(in, out, in, out) is det.
push_scope(V, tvar(TV), Env0, Env) :-
	% create a fresh type variable TV, and insert
	% the mapping {V,tvar(TV)} into the environment
	fresh_tvar(TV, Env0, Env1),
	Env = (Env1 ^ ident_types :=
			[{V,tvar(TV)} | Env1 ^ ident_types]).

:- pred pop_scope(ident, env, env).
:- mode pop_scope(in, in, out) is det.
pop_scope(V, Env0, Env) :-
	( IdentTypes = delete_first(V, Env0^ident_types) ->
		Env = (Env0^ident_types := IdentTypes)
	;
		error("pop_scope: variable not in scope")
	).

:- pred lookup(list({K,V}),K,V).
:- mode lookup(in,in,out) is semidet.
lookup([First|Rest],K,V) :-
	( First = {K,V0} -> V = V0 ; lookup(Rest,K,V) ).

:- func delete_first(K, list({K,V})) = list({K,V}) is semidet.
delete_first(X, [First | Rest]) =
	( First = {X,_} -> Rest ; [First | delete_first(X, Rest)] ).

:- pred fresh_tvar(tvar, env, env).
:- mode fresh_tvar(out, in, out) is det.
fresh_tvar(TV, Env0, Env) :-
	create_var(Env0 ^ tvars, TV, TVars),
	Env = (Env0 ^ tvars := TVars).

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list