[mercury-users] Pred defns
Richard A. O'Keefe
ok at atlas.otago.ac.nz
Wed Apr 8 14:26:45 AEST 1998
This whole thread on argument threading and updating has been most
interesting. Ralph Becket and Fergus Henderson have said that
RB ... this facility ... should look simple and *elegant*!
RH [it must] *be* simple and elegant.
Frankly, all of the concrete proposals I've seen so far have been
rather appalling messes of punctuation that don't satisfy these criteria.
I'm co-teaching a 4th year functional programming course this year.
Making sure I do all the student exercises and assignments, in both
SML/NJ 110 and Clean 1.2.4 is proving quite instructive. The latest
thing I've done is a little genetic algorithm kit, and of course
while it does no I/O it _does_ have a random number generator state
popping up all over the place. Instead of
breed :: Population -> Population
you have
breed :: Population -> RNG_State -> (Population, RNG_State)
How do functional languages handle this kind of thing?
1. Lexical scope.
make_random_list f x n s
= loop [] n s
where loop w 0 s = (w, s)
loop w m s = let (c,s) = f x s in loop [c:w] (m-1) s
The inner function has only three arguments, all of them
'changing'; it can refer to f and x without having to pass
them around (unlike 'flat' languages like Prolog and Erlang).
Nested predicate definitions would in no way detract from the
'logic'-based nature of Mercury. The source-to-source
transform is, if anything, simpler than the DCG transform.
There's another application of lexical scope there too.
let (c,s) = f x s in ...
the s on the right hand side refers to the old state,
the s on the left hand side is a new variable that refers to the
new state. To be honest, my actual code uses s, s`, s``, s```,
while my ML code uses s, s', s'', s''' (and I'm _always_ using
the wrong kind of prime in each language, _sigh_).
Again, there's nothing wrong with lexically scoped variables;
indeed, they are _standard_ in ordinary logical notation (to
the limited extent that anything is standard in mathematics).
We only really need name decoration schemes when we _want_ to
refer to multiple states of the 'same' thing at once, and of
course _that_ interferes with the uniqueness of reference that
implementation using destructive update depends on.
2. Combinators.
For example, suppose I have two ordinary functions
f :: b -> c // a, b, c are type variables
g :: a -> b // in Clean, use 'a 'b 'c in ML.
I can compose them using
(o) infixr 9 :: (b -> c) (a -> b) a -> c
(o) f g x = f (g x)
Similarly, if I have two 'functions' that mutate a random number
generator
f :: b RNG_State -> (c, RNG_State)
g :: a RNG_State -> (b, RNG_State)
I can compose them using
(compose) infixr 9 ::
(b RNG_State -> (c, RNG_State))
(a RNG_State -> (b, RNG_State)) ->
(a RNG_State -> (c, RNG_State))
(compose) f g x s0 = let (y,s1) = g x s0 in f y s1
Now I can chain together as many functions as I want:
f compose g compose h
isn't any harder to write than
f o g o h
This stuff leads on to 'parsing combinators' and 'monads' and
all that kind of stuff, without any _syntactic_ extensions at all.
3. What-you-see-is-not-what-the-compiler-has-to-do.
There's a certain amount of _syntactic_ clumsiness in passing
back pairs and triples, and there's this subliminal feeling
that all of these tuples will actually be _constructed_ and
that doing so will waste space and time. It's enough to make
anyone long for Prolog/Mercury syntax, where passing back
multiple results is so clean, so simple.
But functional language compilers do strange and wonderful
things. If a compiler _knows_ that function f always returns
an N-tuple, it can instead compile it so that the elements of
the tuple are left in N 'result registers'. Calls that don't
extract the elements can have a boxing step inserted, or the
compiler can generate two copies of the function. I suppose
we could call this technique 'untupling'.
The Mercury compiler knows just as much about the programs it
is compiling as the average functional language compiler. And
if it doesn't already do some kinds of untupling, it will
eventually have to. Just at the moment, I've got functional
syntax paged in, so I'll stick with that. Consider two types:
:: Date = MkDate {yr::Int, mo::Int, dy::Int}
:: Person = MkPerson {name::String, birth::Date}
In typical Prolog implementations,
a person(Name,date(Yr,Mo,Dy)) term would consist of two separate
linked records, requiring 7 words. Accessing someone's birth
year, X = person(_,date(Y,_,_)) requires traversing an extra
pointer. I'm sure Mercury can turn this into the equivalent of
C's "Y = X->name->yr" without breathing hard. With the type
information to hand, a functional language compiler can store
the date record directly inside the person record (which also
requires that the garbage collector support embedded objects,
but that's certainly doable), getting the equivalent of C's
"Y = X->name.yr", a single memory reference. I don't know if
Mercury does this yet, but it has the information it needs; it
_could_ use this method, and if the Mercury project continues
long enough I'm sure it _will_ do this. This is another
instance of 'untupling'.
Indeed, before assuming that packaging several state variables
into one record is going to be inefficient, it might be an idea
to try it out and see if it really is. My expectation is that
very occasionally it _will_ be a performance problem, but in
the presence of destructive-update-based-on-uniqueness-info,
it will be as efficient as the traditional implementation of
access to variables in outer blocks in Pascal or Algol, because
it will _be_ that implementation, and even without destructive
update, the time may well be going somewhere else.
By the way, there was a paper not so very many decades ago, which
showed how the very carefully designed imperative language 'Euclid'
could indeed be seen quite directly as a pure functional language.
(The crucial point was that a Euclid compiler is required to enforce
a ban on aliasing.) One of the Xerox blue-and-white reports, as I recall.
More information about the users
mailing list