# [m-dev.] Mercury book diff

Ralph Becket rafe at cs.mu.OZ.AU
Mon Jan 24 12:58:00 AEDT 2005

This didn't get through, so I'm resending it minus the .ps file
attachment.  Ask me if you'd like a copy!

----- Forwarded message from Ralph Becket <rafe at cs.mu.oz.au> -----

Subject: Mercury book diff
From: Ralph Becket <rafe at cs.mu.oz.au>
Date: Fri, 21 Jan 2005 18:40:31 +1100
To: Mercury Developers <mercury-developers at cs.mu.oz.au>
Reply-To: Ralph Becket <rafe at cs.mu.oz.au>
Organization: Mercury group, Melbourne University

I've done some more work on the book.  book.ps.gz is attached if anyone
wants hardcopy.

-- Ralph

This is the diff between Oct 18 and now, Oct 18 being the last time I
posted a diff.

--- ../diffs-2004-10-18/01-mercury-by-example.polytex	Mon Oct 18 16:48:19 2004
+++ 01-mercury-by-example.polytex	Fri Jan 21 18:23:36 2005
@@ -11,7 +11,8 @@
The approach taken here is to start by presenting the obvious'' solution
to a problem and then introduce features of Mercury that allow for more
elegant or efficient programs --- although we are not necessarily trying to
-write the most efficient programs possible at this point!
+write the most efficient programs possible at this point! \XXX{Is this last
+point even necessary?}

@@ -19,12 +20,12 @@

It is slightly unfortunate that the Hello, World!'' program introduces
-dictates that any tutorial text start with Hello, World!'' we'll just
+dictates that tutorial texts start with Hello, World!'' we'll just
have to jump straight in with the knowledge that things will get easier
thereafter.

We'll start by presenting the complete program which we'll assume we've
-typed into in a file called #hello.m#:
+typed into a file called #hello.m#:
%startcode
:- module hello.
:- interface.
@@ -90,14 +91,18 @@
prompt):
%startcode
$mmc --make hello -Making hello.int3 -Making hello.c -Making hello.o +Making Mercury/int3s/hello.int3 +Making Mercury/cs/hello.c +Making Mercury/os/hello.o Making hello$ ./hello
Hello, World!
%endcode
-Et voila!
+Et voila!  (By default, #mmc --make# will construct a local #Mercury#
+directory, if necessary, to hold intermediate files generated during
+compilation.)
+\XXX{Check for consistency throughout whether we're using a #Mercury#
+subdirectory or not when compiling.}

At this point the reader is probably wondering about the meaning of the #io#
type arguments #IOState_in# and #IOState_out#, and the strange #di# and #uo#
@@ -245,8 +250,8 @@
).
%endcode
The body of the #fib# definition uses an #if-then-else# goal to decide what
-to do (the #else# part is not optional and the whole thing appears in
-parentheses).  The condition #N =< 2# succeeds if #N# is less than or equal
+to do; the #else# part is not optional; and the whole thing appears in
+parentheses.  The condition #N =< 2# succeeds if #N# is less than or equal
like #=<# in later examples).  If #N =< 2# then the
\emph{unification} #X = 1# is executed.  Otherwise #fib# is called twice to
@@ -264,7 +269,7 @@
which calls #fib(17, X)#, unifying #X# with the result of computing the 17th
Fibonacci number, then writes out the answer.

-Now, just as #N - 1# computes #N# minus 1 and and #A + B# computes
+Now, just as #N - 1# computes #N# minus 1 and #A + B# computes
the sum of #A# and #B#, it is possible to define #fib# so
that #fib(N)# computes the #N#th Fibonacci number:
%startcode
@@ -348,7 +353,7 @@
then
io.format("fib(%d) = %d\n", [i(N), i(fib(N))], !IO)
else
-         io.format("I didn't expect that...\n", [], !IO)
+         io.format("That's not a number...\n", [], !IO)
).
%endcode
The #list# and #string# standard library modules are imported because we
@@ -462,12 +467,12 @@
io.format("I didn't expect that...\n", [], !IO)
).
%endcode
-Note the deconstruction test of #Result# with #eof# --- the #eof# data
+Observe the deconstruction test of #Result# with #eof# --- the #eof# data
constructor has no argument list and indeed it is a syntax error to write
#eof()# as one might in other languages.

Before we leave our #fib# example, let us introduce Mercury's \emph{switch}
-goals.  A switch goal is rather like C's #switch# statement and consists of a
+goals.  A switch goal is rather like C's #switch# statement and consists of
a set of alternatives testing a given variable against different possible
values it might have.  Here is #main# rewritten to use a switch goal:
%startcode
@@ -510,7 +515,7 @@
immediately follows the #else# part of another #if-then-else#.
\item A disjunction is a sequence of goals separated by semicolons.
\item A switch is a disjunction where each disjunct tests a particular
-variable against a different possibilty.  Where applicable, switches are
+variable against a different possibility.  Where applicable, switches are
generally preferable to #if-then-else#s.
\end{itemize}

@@ -657,13 +662,12 @@
error.)

Note that if we felt so inclined, we could make #rot13b# deterministic by
-including the translation of every possible ASCII character!
-\XXX{Are chars defined to be ASCII?}
-Of course, there are many ways of coding #rot13# and while our
-implementation may not be the most concise, it is quite efficient and very
-easy to understand.  Either way, our aim here was to look more closely at
-the concept of semideterminism and introduce the technique of making code
-more readable by using multiple clauses to define a predicate or function.
+including the translation of every possible character!  Of course, there are
+many ways of coding #rot13# and while our implementation may not be the most
+concise, it is quite efficient and very easy to understand.  Either way, our
+aim here was to look more closely at the concept of semideterminism and
+introduce the technique of making code more readable by using multiple
+clauses to define a predicate or function.

\subsection*{Points to remember}

@@ -759,12 +763,12 @@
each of #D#, #O#, #G#, #A#, #N#, #T# and #C# are different digits.

The digits for each letter are selected nondeterministically using the
-#pick# predicate.  The goal #pick(G, Ds0, Ds1)#, for instance, picks a digit
+#pick# predicate.  The goal #pick(Ds0, G, Ds1)#, for instance, picks a digit
for #G# from #Ds0# and leaves the remainder in #Ds1#.  Since #Ds0# contains
ten members, there are ten possible solutions for #G#.

-#Cs1# is the carry left over from the units column, #Cs2# is the carry left
-over from the tens column, and we introduce #Cs0# as a carry in'' of
+#C1# is the carry left over from the units column, #C2# is the carry left
+over from the tens column, and we introduce #C0# as a carry in'' of
zero just to keep the structure of the program regular.

We verify that the units column works out with the goal
--- ../diffs-2004-10-18/02-the-mercury-type-system.polytex	Mon Oct 18 16:48:20 2004
+++ 02-the-mercury-type-system.polytex	Fri Jan 21 18:23:36 2005
@@ -41,11 +41,11 @@

\section{The primitive types: #int#, #float#, #string#, and #char#}

-[\textbf{Note} that the underlying representation of the primitive types
+\Note The underlying representation of the primitive types
depends upon the compiler target (C, Java, .Net etc.)  For the C targets
#int# corresponds to #int#, #float# to #double#, #char# to #char#
and #string# to #char *#; number ranges, representations, arithmetic
-overflow handling and so forth are dictated by the target machine.]
+overflow handling and so forth are dictated by the target machine.

\subsection*{#int#}

@@ -58,6 +58,10 @@
Examples: decimal #-123#, #0#, #42#; hexadecimal #-0x7B#, #0x0#, #0x2A#;
octal #-0o173#, #0o0#, #0o52#; binary #-0b1111011#, #0b0#, #0b101010#.

+The sequence #0'x# denotes the character code for the character #x#.  For
+example, on an ASCII system #0'a#, #0'b#, and #0'c# denote 97, 98, and 99
+respectively.
+
The #int# standard library module must be imported to use any of the
primitive #int# operations.

@@ -105,8 +109,8 @@
octal number.  the ASCII character #A#, for example, can also be written
as #\x41\# or #\101\#.

-[\textbf{Note} that it is a Very Bad Idea to include the #NUL# character,
-#\x00\#, in strings.]
+\Note For arcane reasons, it is a Very Bad Idea to include the #NUL#
+character, #\x00\#, in strings.

A backslash at the end of a line is ignored in string constants.  Thus
\\#abc\#\\#def"#\\ is equivalent to just #"abcdef"#.  Otherwise, literal
@@ -147,10 +151,10 @@
#{"a", {"little", "contrived"}}# is a value of type
#{string, {string, string}}#.

-[\textbf{Note} that, unlike lists (described below), tuples are constructed
+\Note Unlike lists (described below), tuples are constructed
and deconstructed as complete entities.  There is no such thing as the head
or tail of a tuple.  \XXX{Should we include this note at all?  Or move the
-whole tuples section after lists?}]
+whole tuples section after lists?}

Tuples are occasionally useful for aggregating small numbers of different
types.  More often than not it is better style to use a discriminated union
@@ -311,7 +315,7 @@

Parentheses can change the meaning of an update expression:\\
-the #contact# field of the #employee# data constructor (\ie the type of this
+#contact# field of the #employee# data constructor (\ie the type of this
expression is #contact_details#, not #employee#.)

One final remark: it is also possible to explicitly define field access
@@ -412,13 +416,14 @@
Readability is often improved by giving simple names to complex types or
by using more meaningful names for a specific uses of general types:
%startcode
-:- type height   == float.  % In metres.
-:- type radius   == float.  % In metres.
-:- type density  == float.  % In cubic metres.
+:- type height  == float.  % In metres.
+:- type radius  == float.  % In metres.
+:- type volume  == float.  % In cubic metres.

:- func volume_of_cylinder(height, radius)  = volume.
%endcode
(The #%# sign introduces a comment, which extends to the end of the line.)
Here we define the types #height#, #radius# and #volume# to be equivalent to
(\ie interchangeable with) type #float#.  We could have just declared
@@ -517,8 +522,8 @@
This next example illustrates the syntax for predicate types (this predicate
is also defined in the #list# standard library module):
%startcode
-:- pred  filter(pred(T),              list(T),  list(T),  list(T)  ).
-:- mode  filter(pred(in) is semidet,  in,       out,      out      ) is det.
+:- pred  filter(pred(T),                  list(T),  list(T),  list(T)  ).
+:- mode  filter(in(pred(in) is semidet),  in,       out,      out      ) is det.

filter(_, [], [], []).
filter(P, [X | Xs], Ys, Zs) :-
@@ -527,7 +532,7 @@
else  Ys =       Ys0,   Zs = [X |  Zs0]
).
%endcode
-the goal #split(P, As, Bs, Cs)# unifies #Bs# with the list of members of
+the goal #filter(P, As, Bs, Cs)# unifies #Bs# with the list of members of
#As# that satisfy #P# and unifies #Cs# with the list of members of #As#
that don't.  The first clause says that filtering the empty list yeilds two
empty lists.  The second clause says that filtering #[X | Xs]# through the
--- ../diffs-2004-10-18/03-the-mercury-mode-system.polytex	Mon Oct 18 16:48:20 2004
+++ 03-the-mercury-mode-system.polytex	Fri Jan 21 18:23:36 2005
@@ -15,8 +15,6 @@

This chapter explains the mode system and how it is used.

-
-
\section{Predicates and procedures}

Every predicate must have a declaration specifying which arguments can be
@@ -42,13 +40,12 @@
corresponding \emph{determinism category} (#semidet#).

It is natural to want to call some predicates in more than one way --- if we
-want to use #phone# to perform reverse look-ups'', for instance.
-In such
+want to use #phone# to perform reverse look-ups'', for instance.  In such
cases more than one #mode# declaration is necessary (#pred#-#mode# shorthand
-can't be used for such predicates).
+can only be used for singly-moded predicates).

To allow reverse look-ups with #phone# all that is necessary is to (a) use
-separate #pred# and #mode# declarations and (b) add the extra #mode#
+separate #pred# and #mode# declarations and (b) add an extra #mode#
declaration:
%startcode
:- pred  phone(string,  int).
@@ -75,29 +72,32 @@
result is nondeterministic: it can fail because the goal
#phone(Person, 12345)# will fail, but the goal #phone(Person, 66532)#
has no less than three possible solutions --- #Person = "Ian"#,
-#Person = "Julien"#, and #Person = "Ralph"# --- each of which will be tried
-on backtracking.
+#Person = "Julien"#, and #Person = "Ralph"# --- each of which will be
+computed on backtracking.

-(What if we call #phone# with both arguments as inputs?  This is
-called an \emph{implied mode} and is explained in the next section.)
-
-Each #mode# declaration specifies a \emph{procedure} to be derived from the
-predicate definition.  The compiler generates code separately for each
-procedure of a predicate, reordering goals to ensure that every variable is
-instantiated (\eg by some earlier unification or call) before it is used in
-an input in a unification or call.  Mode information is also used to decide
-whether a unification with a data constructor is a construction or
+\Note Each #mode# declaration specifies a \emph{procedure} to be derived
+from the predicate definition.  The compiler generates code separately for
+each procedure of a predicate, reordering goals to ensure that every
+variable is instantiated (\eg by some earlier unification or call) before it
+is used in an input in a unification or call.  Mode information is also used
+to decide whether a unification with a data constructor is a construction or
deconstruction.

-The compiler then verifies that the determinism category for a procedure
-actually does describe the behaviour of the procedure.  The compiler will
+The compiler verifies that the determinism category for a procedure
+properly describes the behaviour of the procedure.  The compiler will
issue an error if a procedure can fail or have multiple solutions when its
declared determinism category says otherwise.  Moreover, the compiler will
report an error if a deterministic switch is incomplete, telling you
which cases have been missed (Prolog programmers dream of having error
detection like this\ldots)

+\subsection*{Implied modes}

+An \emph{implied mode} is one where an output argument is supplied as
+an input in a procedure call.  Consider the goal #phone("Ralph", 66540)#.
+The compiler gets around this situation by placing a new, temporary variable
+in the output position and then adding a unification goal, giving
+#phone("Ralph", Tmp), Tmp = 66540#.

\section{The determinism categories}

@@ -114,10 +114,11 @@
#failure#               & $0$ \\
\end{tabular}

-There are three other determinsm categories: #erroneous# which is used for
-predicates that only terminate by throwing an exception (exceptions are
-described in Chapter \XXX{}), and #cc_multi# and #cc_nondet#, described
-later in Section \XXX{} on committed-choice nondeterminism.
+There are three other determinsm categories that are only occasionally
+needed: #erroneous#, which is used for predicates that only terminate by
+throwing an exception (exceptions are described in Chapter \XXX{}), and
+#cc_multi# and #cc_nondet# which are used for committed-choice
+nondeterminism (see Chapter \XXX{}).

\subsection*{Some examples}

@@ -125,7 +126,8 @@
:- pred square(int::in, int::out) is det.
square(X, X * X).
%endcode
-#square# is #det#: it cannot fail and every input leads to a single solution.
+#square# is #det#: it cannot fail and every input has a single solution for
+the output.

%startcode
:- pred absolute_square_root(float::in, float::out) is semidet
@@ -133,8 +135,8 @@
X >= 0.0,
AbsSqrtX = math.sqrt(X).
%endcode
-#absolute_square_root# is #semidet#: a negative input leads to failure; a
-non-negative input has a single solution.
+#absolute_square_root# is #semidet#: it fails for negative inputs while
+non-negative inputs each have a single solution.

%startcode
:- pred small_prime(int::out) is multi.
@@ -159,66 +161,74 @@
built-in predicate #true# which has no arguments and always succeeds (and is
therefore #det#).

-\textbf{Note} that a goal with no output arguments is either #det# or
-#semidet#.  For instance,
+\Note The determinism category of a goal with no output arguments is either
+#det#, #semidet#, or #failure#.  Consider the following:
%startcode
-:- pred has_small_prime_factor(X::in) is semidet.
+:- pred has_small_prime_factor(int::in) is semidet.
has_small_prime_factor(X) :-
small_prime(P),
X mod P = 0.
%endcode
-Mercury ensures that a goal #has_small_prime_factor(15)#, say, will only
-succeed once, even though #small_prime# has two solutions that are factors
-of #15#.
-
-
+Because there are no outputs, Mercury ensures that the goal
+\\#has_small_prime_factor(15)#, say, will not succeed more than once, even
+though #small_prime(P)# has two solutions, #P = 3# and #P = 5#, satisfying
+#15 mod P = 0#.

\section{Determinism}

These rules specify how determinism categories for compound goals are
derived (with a little experience this quickly becomes second nature).
+The determinism category of a goal is derived from the instantiation state
+of its arguments at the time the goal is executed.
+
+\Note Remember that the compiler reorders the goals in a predicate
+separately for each #mode# declaration for the predicate.  A running program
+does not make decisions about which procedures should be executed when
+calling predicates; this is decided in advance by the Mercury compiler.

\subsection*{Unifications}

Whether a unification is a construction, deconstruction, assignment or
equality test depends upon which variables are instantiated and which are
-not.
+not at the time the unification is executed.

-A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{construction} if #Y1#, #Y2#
-and #Y3# are instantiated and #X# is not.  Constructions are always #det#.
+A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{construction} if #Y1#,
+#Y2#, and #Y3# are initially instantiated and #X# is not.  Constructions are
+always #det#.  Afterwards, #X# will be instantiated.

A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{deconstruction} if #X#
-is instantiated.
-Deconstructions are almost always #semidet#, alghough in some circumstances
-a deconstruction may be guaranteed to succeed to fail.
-
-A unification #X = Y# where #X# is instantiated and #Y# is not, or vice
-versa, is an \emph{assignment}.  Assignments are always #det#.
+is initially instantiated.  Afterwards, #Y1#, #Y2#, and #Y3# will be
+instantiated.  Deconstructions are almost always #semidet# (in certain
+circumstances a deconstruction may have determinism category #det# if it is
+guaranteed to succeed or #failure# if it is guaranteed to fail).
+
+A unification #X = Y# is an \emph{assignment} if precisely one of #X# or #Y#
+is initially instantiated.  Afterwards, both variables will be instantiated.
+Assignments are always #det#.

-A unification #X = Y# where both #X# and #Y# are instantiated is an
-\emph{equality test}.  Equality tests are always #semidet#.
+A unification #X = Y# is an \emph{equality test} if both #X# and #Y# are
+initially instantiated.  Equality tests are always #semidet#.

\subsection*{Procedure calls}

-Which procedure is called by a goal #p(X1, X2, X3)# depends upon which
-#mode# declaration for predicate #p# best matches the instantiation states
-of #X1#, #X2# and #X3#.  The determinism category of the goal is that of
-the called procedure, adjusted for any \emph{implied modes}.
-
-An implied mode arises when an already instantiated argument appears in an
-output argument position of a procedure call.  In this case a temporary
-variable is introduced to receive the output value, which is then unified
-with the original argument.
-
-The goal #small_prime_factor(9)# uses an implied mode because
-\\#small_prime_factor# has only one mode, #pred(out) is multi#.  The goal is
-converted into #small_prime_factor(Tmp), Tmp = 9# by the compiler, allowing
-us to use the rules for conjunction and deconstruction to work out the
-determinism category.
+For a predicate call #p(X1, X2, X3)#, which procedure of #p# is executed
+depends upon which #mode# declaration for predicate #p# best matches the
+instantiation states of #X1#, #X2#, and #X3# at the time the call is
+executed.  The determinism category of the goal is that of the called
+because some output arguments of the procedure are already instantiated at
+the time of the call).
+
+For example, the goal #phone("Zoltan", ZoltansNumber)# is compiled as a call
+to the #(in, out) is det# procedure of #phone#.  The goal
+#phone(Person, 66540)# is compiled as a call to the #(out, in) is nondet#
+procedure of #phone#.  The goal #phone("Ralph", 66532)# requires an implied
+mode and may be compiled either as #phone("Ralph", Tmp), Tmp = 66532# or
+#phone(Tmp, 66532), Tmp = "Ralph"#, both of which are #semidet#.

\subsection*{Conjunction}

-A sequence of goals separated by commas, #(G1, G2, G3)#, is called a
+A sequence of goals separated by commas, #G1, G2, G3, ...#, is called a
\emph{conjunction}.  The commas are pronounced and'' and each subgoal is
called a \emph{conjunct}.

@@ -229,42 +239,68 @@
A conjunction can have multiple solutions if it can succeed
and one or more conjuncts have multiple solutions.

+\Note These rules are a conservative (\ie safe) approximation.  For example,
+the compiler will conclude that conjunction #small_prime(X), X = 4# is
+#semidet#, even though we can see that this goal has to fail.
+
\subsection*{Disjunction}

-A sequence of goals separated by commas, #(G1 ; G2 ; G3)#, is called a
+A sequence of goals separated by commas, #(G1 ; G2 ; G3 ; ...)#, is called a
\emph{disjunction}.  The commas are pronounced or'' and each subgoal is
called a \emph{disjunct}.

A disjunction can succeed if any disjunct can succeed.

A disjunction can have multiple solutions if more than one disjunct can
-succeed.
+succeed or one or more disjuncts can have multiple solutions.

-\textbf{Note} the special case of switches.  A switch is a disjunction that
+\Note Switches are a special case.  A switch is a disjunction that
deconstructs a particular variable against a different data constructor in
each disjunct.  If, apart from the deconstructions, every disjunct is #det#,
then the switch is #det# if the set of deconstructions is exhaustive and
#semidet# if not.

-\textbf{Note} that disjunction binds less tightly than conjunction, hence
-\\#(G11, G12 ; G21, G22 ; G31, G32)#
+For example, even though both #p# and #q# (below) define switches on #X#,
+#p# is #det# because its switch is exhaustive, whereas #q# is #semidet#
+because its switch is not:
+%startcode
+:- type ott ---> one ; two ; three.
+
+:- pred p(ott::in, int::out) is det.
+p(X, Y) :-  ( X = one, Y = 1 ; X = two,    Y = 2 ; X = three, Y = 3 ).
+
+:- pred q(ott::in, int::out) is semidet.
+q(X, Y) :-  ( X = one, Y = 1 ; X = three,  Y = 3 ).
+%endcode
+
+\Note Disjunction binds less tightly than conjunction:
+\\#( G11, G12 , G13 ; G21 ; G31, G32 )#
\\is equivalent to
-\\#((G11, G12) ; (G21, G22) ; (G31, G32))#.
+\\#( (G11, G12, G13) ; G21 ; (G31, G32) )#.

-\textbf{Note} that a definition spanning multiple clauses is equivalent to a
+\Note A definition spanning multiple clauses is equivalent to a
definition using a single clause containing a disjunction.  That is
%startcode
-p_or_q_or_r({A, _, _},  X)  :- p(A, X).
-p_or_q_or_r({_, B, _},  Y)  :- q(B, Y).
-p_or_q_or_r({_, _, C},  Z)  :- r(C, Z).
-%endcode
-is the same as
-%startcode
-p_or_q_or_r({A, B, C}, W) :-
-    (  p(A, X),  W = X
-    ;  q(B, Y),  W = Y
-    ;  r(C, Z),  W = Z
-    ).
+p(one,    1).
+p(two,    2).
+p(three,  3).
+%endcode
+is semantically and operationally identical to
+%startcode
+p(X, Y) :-  ( X = one, Y = 1 ; X = two, Y = 2 ; X = three, Y = 3 ).
+%endcode
+
+\Note If any disjunct instantiates a variable that is used outside the
+disjunction, then every disjunct in the disjunction must also instantiate
+that variable.  That is, the Mercury compiler will report a mode error if a
+program contains a disjunction that instantiates #X# in some disjuncts, but
+not others, and #X# is also needed outside the disjunction.  For instance,
+the following is illegal because #Y#, which appears outside the disjunction,
+is instantiated in the first and second disjuncts, but not the third:
+
+%startcode
+:- pred p(number::in, int::out) is det.
+p(X, Y) :-  ( X = one, Y = 1 ; X = two, Y = 2 ; X = three ).
%endcode

\subsection*{Negation}
@@ -273,30 +309,322 @@
if #G# succeeds, and vice versa.  The negation succeeds if #G# fails and
fails if #G# succeeds.

-#G# is said to occur inside a \emph{negated context} and is
+\Note #G# is said to occur inside a \emph{negated context} and is
not allowed to instantiate variables that also occur outside the negation.

-\textbf{Note} that negation binds more tightly than conjunction, hence
-\\#(not G1, G2)# is equivalent to #((not G1), G2)#.
+\Note Negation binds more tightly than conjunction, hence
+#not G1, G2, ...# is equivalent to #(not G1), G2, ...#.  To negate a
+conjunction, put the conjunction in parentheses: #not (G1, G2, ...)#
+
+\Note #X \= Y# is syntactic sugar for #not (X = Y)#.

\subsection*{If-then-else goals}

-A goal #( if G1 then G2 else G3 )# means the same thing as
-\\#(G1, G2 ; not G1, G3)#.  We can work out the determinism
-category of this from the rules above.
-
-(\textbf{Note} that this describes the \emph{declarative semantics} for
-#if-then-else# goals.  The \emph{operational semantics} are more efficient:
-if #G1# produces any solution at all then the #if-then-else# goal is
-equivalent to #(G1, G2)#; if #G1# produces no solutions then the
-#if-then-else# goal is equivalent to just #G3#.)
+The declarative semantics for a goal #( if Gc then Gt else Ge )# are
+identical to those of #( Gc, Gt ; (not Gc), Ge )#.  The operational
+semantics are more efficient, though: if there are no solutions to #Gc#,
+the program immediately executes #Ge#.

-\textbf{Note} that #X \= Y# is just shorthand for #not X = Y#.
+If any of #Gc#, #Gt#, or #Ge# can fail then the #if#-#then#-#else# can fail.

+If any of #Gc#, #Gt#, or #Ge# can have multiple solutions then the
+#if#-#then#-#else# can have multiple solutions.

+\Note #Gc# is not allowed to instantiate variables that are used outside the
+#if#-#then#-#else#.  This is because, semantically, #Gc# appears in a
+negated context.  It is all right, however, for #Gc# to instantiate
+variables that are used by #Gt#.
+
+\Note Execution can backtrack into #Gc#.  For example,
+
+#( if small_prime(X), X > 2 then Y = X * X else Y = -1 )#
+
+has solutions #Y = 9#, #Y = 25#, and #Y = 49#.
+
+\Note #( Gc -> Gt ; Ge )# is an alternative, albeit old-fashioned, syntax for
+#( if Gc then Gt else Ge )#.

\section{Procedures and code reordering}

+The aim of this section is to give the reader some understanding of code
+reordering.  This knowledge is not required to write Mercury programs, but
+it can help the programmer understand mode-related error messages from the
+compiler.
+
+We will illustrate using the #append# predicate defined in the #list#
+standard library module.  The declarative semantics of
+#append(Xs, Ys, Zs)# is that the list #Zs# is the concatentation of lists
+#Xs# and #Ys#.  So #append([1], [2, 3], [1, 2, 3])# is true, but
+#append([2, 3], [1], [1, 2, 3])# is not.
+
+%startcode
+:- pred  append(list(T),  list(T),  list(T)).
+:- mode  append(in,       in,       out)  is det.
+:- mode  append(out,      out,      in)   is multi.
+
+append(Xs, Ys, Zs) :-
+    (
+        Xs = [], Zs = Ys
+    ;
+        Xs = [X | Xs0], append(Xs0, Ys, Zs0), Zs = [X | Zs0]
+    ).
+%endcode
+
+This code needs no reordering for the #(in, in, out) is det# mode, where
+#Xs# and #Ys# start off instantiated and #Zs# starts off uninstantiated.
+The first disjunct works like this:
+
+\begin{tabular}{lp{0.3\linewidth}l}
+1 & #Xs = []# & Deconstruct #Xs#\\
+2 & #Zs = Ys# & Assign #Zs#\\
+\end{tabular}
+
+and the second disjunct works like this:
+
+\begin{tabular}{lp{0.3\linewidth}p{0.6\linewidth}}
+1 & #Xs = [X | Xs0]# & Deconstruct #Xs#, instantiating #X# and #Xs0#\\
+2 & #append(Xs0, Ys, Zs0)# & Call the #(in, in, out)# procedure, instantiating
+#Zs0#\\
+3 & #Zs = [X | Zs0]# & Construct #Zs#\\
+\end{tabular}
+
+Because each disjunct deconstructs #Xs# in a different way, this disjunction
+is a switch.  Because the switch is exhaustive, and the other goals in each
+disjunct are #det#, the switch as a whole is #det#.
+
+The #(out, out, in) is multi# mode, where only #Zs# is initially
+instantiated, does require some reordering in order to ensure that every
+variable is instantiated before it is needed.  The first disjunct becomes
+
+\begin{tabular}{lp{0.3\linewidth}l}
+1 & #Xs = []# & Construct #Xs#\\
+2 & #Zs = Ys# & Assign #Ys#\\
+\end{tabular}
+
+and the second disjunct becomes
+
+\begin{tabular}{lp{0.3\linewidth}p{0.6\linewidth}}
+1 & #Zs = [X | Zs0]# & Deconstruct #Zs#, instantiating #X# and #Zs0#\\
+2 & #append(Xs0, Ys, Zs0)# & Call the #(out, out, in)# procedure, instantiating
+#Xs0# and #Ys#\\
+3 & #Xs = [X | Xs0]# & Construct #Xs#\\
+\end{tabular}
+
+Since this disjunction is not a switch and the first disjunct always leads
+to a solution, the disjunction as a whole is #multi# in this case.
+
+\Note The Mercury compiler reorders code as little as possible.  However,
+programmers should not write code that depends upon any particular order of
+evaluation --- code can also be reordered by various optimizations!  In
+particular, it is a bad idea to write something like
+#( if X \= 0, Z = Y / X then ... else ... )#, assuming that the test for #X#
+being non-zero will guarantee that this code cannot lead to a
+division-by-zero error at run-time.  It is certainly possible that the test
+and the division may be compiled the other way around.
+
+\section{Insts and subtypes}
+
+So far we have only talked about variables going from being uninstantiated
+to being instantiated.  It turns out to be useful to also keep track of the
+possible values a variable can have when it is instantiated.  Mercury uses
+#inst#s for this purpose.  An #inst# represents the possible
+instantiantiation states of a variable at a particular point in a program.
+
+The most basic #inst#s are #free#, meaning a variable is uninstantiated, and
+#ground#, meaning a variable is instantiated with some unknown value of the
+appropriate type.
+
+The built-in #mode#s #in# and #out# are defined using the following syntax:
+
+%startcode
+:- mode in   == (ground  >> ground).
+:- mode out  == (free    >> ground).
+%endcode
+
+That is, an #in# mode argument of a goal must be #ground# (\ie be
+instantiated with some value) before the goal is executed and will also be
+#ground# afterwards, while an #out# mode argument must be #free# (\ie not
+instantiated) before the goal is executed, but will be #ground# afterwards.
+
+\Note If a goal fails or backtracks then the #inst#s of its arguments stay
+the same as they were before the goal was tried.
+
+\subsection*{Specialised insts}
+
+It is occasionally useful to define new #inst#s matching only subsets of
+possible values that a variable might have.  Consider the following:
+
+%startcode
+:- inst non_empty_list == bound([ground | ground]).
+
+:- mode  head(in,                  out) is semidet.
+:- mode  head(in(non_empty_list),  out) is det.
+
+head(Xs, X) :- Xs = [X | _].
+%endcode
+
+The #inst# declaration defines #non_empty_list# to mean bound to the list
+data constructor #[|]# whose first argument has inst #ground# and whose
+second argument has inst #ground#''.
+
+The first mode for #head# tells us that if all we know about the first
+argument is that it is #ground# (\ie it could be bound to any value of type
+#list(T)#, including #[]#) then a call to #head# is semidet.
+
+The second mode for #head# says that if we know the first argument is a
+non-empty list (\ie whatever value it has, it's top-most data constructor
+must be #[|]# with two #ground# arguments) then a call to #head# is
+guaranteed to succeed.
+
+The second #mode# declaration uses the built-in parameterised form of the
+#in# argument mode, which is defined like this:
+
+%startcode
+:- mode in(I) == (I >> I).
+%endcode
+
+where #I# is an #inst# parameter.  #in(non_empty_list)# is therefore
+equivalent to writing #(non_empty_list >> non_empty_list)#.  There is also a
+built-in parameterised #out# argument mode, defined thus:
+
+%startcode
+:- mode out(I) == (free >> I).
+%endcode
+
+When compiling the procedure for the second mode of #head#, the Mercury
+compiler uses the information about the inst of the first argument, #Xs#, to
+infer that the goal #Xs = [X | _]# must (a) be a deconstruction and (b) must
+succeed because whatever value #Xs# has matches the pattern #[_ | _]#.
+
+\Note A value with a #bound(...)# #inst# can always be used in a context
+where a #ground# value is expected, but not the other way around.
+
+\subsection*{Recursively defined insts}
+
+It is possible to describe quite complicated instantiation states.  The
+following #inst#s, for instance, describe lists of even and odd lengths
+respectively:
+
+%startcode
+:- inst even_length_list  == bound([] ; [ground | odd_length_list]).
+:- inst odd_length_list   == bound([ground | even_length_list]).
+%endcode
+
+The first #inst# declaration defines #even_length_list# to mean bound
+\emph{either} to #[]# \emph{or} to #[|]# with two arguments, the first
+having #inst# #ground# and the second having #inst# #odd_length_list#
+(multiple possibilities in a #bound# expression are separated by
+semicolons).
+
+The second #inst# declaration defines #odd_length_list# to mean bound to
+#[|]# with two arguments, the first having #inst# #ground# and the second
+having inst #even_length_list#.''
+
+\subsection*{Partial instantiation}
+
+A partially instantiated value is one whose #inst# is #bound(...)# where
+the #...# part contains #free# sub-insts, either directly or indirectly.
+
+Partial instantiation is not currently supported for several reasons,
+including the difficulty of analysing such code, the difficulty of
+maintaining such code, and the difficulty of compiling such code
+efficiently.
+
+\section{Uniqueness}
+
+A #bound(...)# inst is said to be \emph{shared} --- that is, it corresponds
+to a value that may be referred to, directly or indirectly, by more than one
+variable at a given point in the program.
+
+Mercury has a special inst, #unique#, which is like #ground#, but it means
+that there is precisely one reference to the #unique# data at this point in
+the program.  The counterpart to #unique#, is #clobbered#.  A variable with
+#inst# #clobbered# may never be used again (\eg because the value it refers
+to is now out-of-date or has been overwritten with something else).
+
+The most common use of uniqueness is for managing IO.  All the IO operations
+defined in the #io# standard library module include two arguments of type
+#io#, with modes #di# and #uo# respectively.  #di# stands for destructive
+input'' and #uo# stands for unique output''.  These #mode#s are built-in
+and defined thus:
+
+%startcode
+:- mode di  == (unique  >> clobbered).
+:- mode uo  == (free    >> unique).
+%endcode
+
+To illustrate, consider these #pred# declarations taken from the #io#
+module:
+
+%startcode
+:- pred io.write_string(string::in, io::di, io::uo) is det.
+:- pred io.write_int(int::in, io::di, io::uo) is det.
+:- pred io.nl(io::di, io::uo) is det.
+%endcode
+
+and the following code snippet:
+
+%startcode
+    io.write_string("The meaning of life is ", IO0, IO1),
+    io.write_int(42, IO1, IO2),
+    io.nl(IO2, IO3)
+%endcode
+
+The #io# type arguments denote states of the world''.  These #io# states
+are updated when IO actions are performed.  One can never go back to an
+earlier state (you can't unplay a piece of music, say, or unprint a piece of
+paper), so each IO action clobbers the #io# state passed to it and produces
+and new #io# state as its result.  Similarly, because one cannot copy the
+state of the world, #io# states have to be unique.  These constraints ensure
+that the above code snippet executes in the expected order --- that is,
+first the string \texttt{The meaning of life is }'' will be printed
+(clobbering #IO0# and producing #IO1#), then the number \texttt{42}''
+(clobbering #IO1# and producing #IO2#), and finally a newline (clobbering
+#IO2# and producing #IO3#).
+
+Say we were to accidentally reuse #IO0# in the second goal:
+
+%startcode
+    io.write_string("The meaning of life is ", IO0, IO1),
+    io.write_int(42, IO0, IO2),
+    io.nl(IO2, IO3)
+%endcode
+
+The Mercury compiler will report the following error
+(line 27 in file #foo.m# is the call to #io.write_string#):
+
+%startcode
+foo.m:027: In  clause for main(di, uo)':
+foo.m:027:     in argument 2 of call to predicate io.write_string/3':
+foo.m:027:     unique-mode error: the called procedure would clobber
+foo.m:027:     its argument, but variable IO0' is still live.
+%endcode
+
+\Note Procedures that can clobber arguments must have determinism category
+#det# or #cc_multi#.  They must always succeed and produce a single result.
+The reason for this is that once an argument is clobbered, which could
+happen at any point during the execution of the procedure, there is no way
+of un-clobbering it on failure or backtracking.  Consequently it is also an
+error for code to backtrack into such procedures.  For the rare cases where
+one needs to do such things, the reader is referred to the section on
+backtrackable destructive update in the Mercury reference manual which
+discusses mostly uniqueness''.
+
+\section{Committed-choice nondeterminism}
+
+\XXX{Fill this out.}
+
+The #cc_nondet# and #cc_multi# modes.  There may be multiple solutions to a
+cc predicate, but you will only get one of them.

+The compiler will report an error if a program can backtrack into a
+committed-choice goal: all goals following a committed-choice goal must be
+\emph{guaranteed} to succeed.  Programming under this restriction is quite
+burdensome.

+If all solutions to a committed-choice predicate are equivalent, in the
+sense that, no matter which solution you get, the observable behaviour of
+your program will be the same, then you can use the built-in function
+#promise_only_solution# to escape from the committed-choice context.

----- End forwarded message -----
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
`