[m-dev.] Mercury book diff

Ralph Becket rafe at cs.mu.OZ.AU
Mon Jan 24 12:58:00 AEDT 2005
Previous message: [m-dev.] cc_multi bug
Next message: [m-dev.] mercury meeting this morning
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This didn't get through, so I'm resending it minus the .ps file
attachment.  Ask me if you'd like a copy!

----- Forwarded message from Ralph Becket <rafe at cs.mu.oz.au> -----

Subject: Mercury book diff
From: Ralph Becket <rafe at cs.mu.oz.au>
Date: Fri, 21 Jan 2005 18:40:31 +1100
To: Mercury Developers <mercury-developers at cs.mu.oz.au>
Reply-To: Ralph Becket <rafe at cs.mu.oz.au>
Organization: Mercury group, Melbourne University

I've done some more work on the book.  book.ps.gz is attached if anyone
wants hardcopy.

-- Ralph

This is the diff between Oct 18 and now, Oct 18 being the last time I
posted a diff.


--- ../diffs-2004-10-18/01-mercury-by-example.polytex	Mon Oct 18 16:48:19 2004
+++ 01-mercury-by-example.polytex	Fri Jan 21 18:23:36 2005
@@ -11,7 +11,8 @@
 The approach taken here is to start by presenting the ``obvious'' solution
 to a problem and then introduce features of Mercury that allow for more
 elegant or efficient programs --- although we are not necessarily trying to
-write the most efficient programs possible at this point!
+write the most efficient programs possible at this point! \XXX{Is this last
+point even necessary?}
 
 
 
@@ -19,12 +20,12 @@
 
 It is slightly unfortunate that the ``Hello, World!'' program introduces
 no less than three advanced Mercury concepts, but since tradition
-dictates that any tutorial text start with ``Hello, World!'' we'll just
+dictates that tutorial texts start with ``Hello, World!'' we'll just
 have to jump straight in with the knowledge that things will get easier
 thereafter.
 
 We'll start by presenting the complete program which we'll assume we've
-typed into in a file called #hello.m#:
+typed into a file called #hello.m#:
 %startcode
 :- module hello.
 :- interface.
@@ -90,14 +91,18 @@
 prompt):
 %startcode
 $ mmc --make hello
-Making hello.int3
-Making hello.c
-Making hello.o
+Making Mercury/int3s/hello.int3
+Making Mercury/cs/hello.c
+Making Mercury/os/hello.o
 Making hello
 $ ./hello
 Hello, World!
 %endcode
-Et voila!  
+Et voila!  (By default, #mmc --make# will construct a local #Mercury#
+directory, if necessary, to hold intermediate files generated during
+compilation.)
+\XXX{Check for consistency throughout whether we're using a #Mercury#
+subdirectory or not when compiling.}
 
 At this point the reader is probably wondering about the meaning of the #io#
 type arguments #IOState_in# and #IOState_out#, and the strange #di# and #uo#
@@ -245,8 +250,8 @@
     ).
 %endcode
 The body of the #fib# definition uses an #if-then-else# goal to decide what
-to do (the #else# part is not optional and the whole thing appears in
-parentheses).  The condition #N =< 2# succeeds if #N# is less than or equal
+to do; the #else# part is not optional; and the whole thing appears in
+parentheses.  The condition #N =< 2# succeeds if #N# is less than or equal
 to 2 and \emph{fails} otherwise (we'll learn more about #semidet# predicates
 like #=<# in later examples).  If #N =< 2# then the
 \emph{unification} #X = 1# is executed.  Otherwise #fib# is called twice to
@@ -264,7 +269,7 @@
 which calls #fib(17, X)#, unifying #X# with the result of computing the 17th
 Fibonacci number, then writes out the answer.
 
-Now, just as #N - 1# computes #N# minus 1 and and #A + B# computes
+Now, just as #N - 1# computes #N# minus 1 and #A + B# computes
 the sum of #A# and #B#, it is possible to define #fib# so
 that #fib(N)# computes the #N#th Fibonacci number:
 %startcode
@@ -348,7 +353,7 @@
        then
          io.format("fib(%d) = %d\n", [i(N), i(fib(N))], !IO)
        else
-         io.format("I didn't expect that...\n", [], !IO)
+         io.format("That's not a number...\n", [], !IO)
     ).
 %endcode
 The #list# and #string# standard library modules are imported because we
@@ -462,12 +467,12 @@
          io.format("I didn't expect that...\n", [], !IO)
     ).
 %endcode
-Note the deconstruction test of #Result# with #eof# --- the #eof# data
+Observe the deconstruction test of #Result# with #eof# --- the #eof# data
 constructor has no argument list and indeed it is a syntax error to write
 #eof()# as one might in other languages.
 
 Before we leave our #fib# example, let us introduce Mercury's \emph{switch}
-goals.  A switch goal is rather like C's #switch# statement and consists of a
+goals.  A switch goal is rather like C's #switch# statement and consists of
 a set of alternatives testing a given variable against different possible
 values it might have.  Here is #main# rewritten to use a switch goal:
 %startcode
@@ -510,7 +515,7 @@
 immediately follows the #else# part of another #if-then-else#.
 \item A disjunction is a sequence of goals separated by semicolons.
 \item A switch is a disjunction where each disjunct tests a particular
-variable against a different possibilty.  Where applicable, switches are
+variable against a different possibility.  Where applicable, switches are
 generally preferable to #if-then-else#s.
 \end{itemize}
 
@@ -657,13 +662,12 @@
 error.)
 
 Note that if we felt so inclined, we could make #rot13b# deterministic by
-including the translation of every possible ASCII character!
-\XXX{Are chars defined to be ASCII?}
-Of course, there are many ways of coding #rot13# and while our
-implementation may not be the most concise, it is quite efficient and very
-easy to understand.  Either way, our aim here was to look more closely at
-the concept of semideterminism and introduce the technique of making code
-more readable by using multiple clauses to define a predicate or function.
+including the translation of every possible character!  Of course, there are
+many ways of coding #rot13# and while our implementation may not be the most
+concise, it is quite efficient and very easy to understand.  Either way, our
+aim here was to look more closely at the concept of semideterminism and
+introduce the technique of making code more readable by using multiple
+clauses to define a predicate or function.
 
 \subsection*{Points to remember}
 
@@ -759,12 +763,12 @@
 each of #D#, #O#, #G#, #A#, #N#, #T# and #C# are different digits.
 
 The digits for each letter are selected nondeterministically using the
-#pick# predicate.  The goal #pick(G, Ds0, Ds1)#, for instance, picks a digit
+#pick# predicate.  The goal #pick(Ds0, G, Ds1)#, for instance, picks a digit
 for #G# from #Ds0# and leaves the remainder in #Ds1#.  Since #Ds0# contains
 ten members, there are ten possible solutions for #G#.
 
-#Cs1# is the carry left over from the units column, #Cs2# is the carry left
-over from the tens column, and we introduce #Cs0# as a ``carry in'' of
+#C1# is the carry left over from the units column, #C2# is the carry left
+over from the tens column, and we introduce #C0# as a ``carry in'' of
 zero just to keep the structure of the program regular.
 
 We verify that the units column works out with the goal
--- ../diffs-2004-10-18/02-the-mercury-type-system.polytex	Mon Oct 18 16:48:20 2004
+++ 02-the-mercury-type-system.polytex	Fri Jan 21 18:23:36 2005
@@ -41,11 +41,11 @@
 
 \section{The primitive types: #int#, #float#, #string#, and #char#}
 
-[\textbf{Note} that the underlying representation of the primitive types
+\Note The underlying representation of the primitive types
 depends upon the compiler target (C, Java, .Net etc.)  For the C targets
 #int# corresponds to #int#, #float# to #double#, #char# to #char#
 and #string# to #char *#; number ranges, representations, arithmetic
-overflow handling and so forth are dictated by the target machine.]
+overflow handling and so forth are dictated by the target machine.
 
 \subsection*{#int#}
 
@@ -58,6 +58,10 @@
 Examples: decimal #-123#, #0#, #42#; hexadecimal #-0x7B#, #0x0#, #0x2A#;
 octal #-0o173#, #0o0#, #0o52#; binary #-0b1111011#, #0b0#, #0b101010#.
 
+The sequence #0'x# denotes the character code for the character #x#.  For
+example, on an ASCII system #0'a#, #0'b#, and #0'c# denote 97, 98, and 99
+respectively.
+
 The #int# standard library module must be imported to use any of the
 primitive #int# operations.
 
@@ -105,8 +109,8 @@
 octal number.  the ASCII character #A#, for example, can also be written
 as #\x41\# or #\101\#.
 
-[\textbf{Note} that it is a Very Bad Idea to include the #NUL# character,
-#\x00\#, in strings.]
+\Note For arcane reasons, it is a Very Bad Idea to include the #NUL#
+character, #\x00\#, in strings.
 
 A backslash at the end of a line is ignored in string constants.  Thus
 \\#``abc\#\\#def"#\\ is equivalent to just #"abcdef"#.  Otherwise, literal
@@ -147,10 +151,10 @@
 #{"a", {"little", "contrived"}}# is a value of type
 #{string, {string, string}}#.
 
-[\textbf{Note} that, unlike lists (described below), tuples are constructed
+\Note Unlike lists (described below), tuples are constructed
 and deconstructed as complete entities.  There is no such thing as the head
 or tail of a tuple.  \XXX{Should we include this note at all?  Or move the
-whole tuples section after lists?}]
+whole tuples section after lists?}
 
 Tuples are occasionally useful for aggregating small numbers of different
 types.  More often than not it is better style to use a discriminated union
@@ -311,7 +315,7 @@
 
 Parentheses can change the meaning of an update expression:\\
 #(Employee^contact)^address := NewAddr# denotes an updated copy of the
-the #contact# field of the #employee# data constructor (\ie the type of this
+#contact# field of the #employee# data constructor (\ie the type of this
 expression is #contact_details#, not #employee#.)
 
 One final remark: it is also possible to explicitly define field access
@@ -412,13 +416,14 @@
 Readability is often improved by giving simple names to complex types or
 by using more meaningful names for a specific uses of general types:
 %startcode
-:- type height   == float.  % In metres.
-:- type radius   == float.  % In metres.
-:- type density  == float.  % In cubic metres.
+:- type height  == float.  % In metres.
+:- type radius  == float.  % In metres.
+:- type volume  == float.  % In cubic metres.
 
 :- func volume_of_cylinder(height, radius)  = volume.
 :- func volume_of_sphere(radius)            = volume.
 %endcode
+\XXX{Move this stuff about comments to chapter 1.}
 (The #%# sign introduces a comment, which extends to the end of the line.)
 Here we define the types #height#, #radius# and #volume# to be equivalent to
 (\ie interchangeable with) type #float#.  We could have just declared
@@ -517,8 +522,8 @@
 This next example illustrates the syntax for predicate types (this predicate
 is also defined in the #list# standard library module):
 %startcode
-:- pred  filter(pred(T),              list(T),  list(T),  list(T)  ).
-:- mode  filter(pred(in) is semidet,  in,       out,      out      ) is det.
+:- pred  filter(pred(T),                  list(T),  list(T),  list(T)  ).
+:- mode  filter(in(pred(in) is semidet),  in,       out,      out      ) is det.
 
 filter(_, [], [], []).
 filter(P, [X | Xs], Ys, Zs) :-
@@ -527,7 +532,7 @@
                    else  Ys =       Ys0,   Zs = [X |  Zs0]
     ).
 %endcode
-the goal #split(P, As, Bs, Cs)# unifies #Bs# with the list of members of
+the goal #filter(P, As, Bs, Cs)# unifies #Bs# with the list of members of
 #As# that satisfy #P# and unifies #Cs# with the list of members of #As#
 that don't.  The first clause says that filtering the empty list yeilds two
 empty lists.  The second clause says that filtering #[X | Xs]# through the
--- ../diffs-2004-10-18/03-the-mercury-mode-system.polytex	Mon Oct 18 16:48:20 2004
+++ 03-the-mercury-mode-system.polytex	Fri Jan 21 18:23:36 2005
@@ -15,8 +15,6 @@
 
 This chapter explains the mode system and how it is used.
 
-
-
 \section{Predicates and procedures}
 
 Every predicate must have a declaration specifying which arguments can be
@@ -42,13 +40,12 @@
 corresponding \emph{determinism category} (#semidet#).
 
 It is natural to want to call some predicates in more than one way --- if we
-want to use #phone# to perform ``reverse look-ups'', for instance.
-In such
+want to use #phone# to perform ``reverse look-ups'', for instance.  In such
 cases more than one #mode# declaration is necessary (#pred#-#mode# shorthand
-can't be used for such predicates).
+can only be used for singly-moded predicates).
 
 To allow reverse look-ups with #phone# all that is necessary is to (a) use
-separate #pred# and #mode# declarations and (b) add the extra #mode#
+separate #pred# and #mode# declarations and (b) add an extra #mode#
 declaration:
 %startcode
 :- pred  phone(string,  int).
@@ -75,29 +72,32 @@
 result is nondeterministic: it can fail because the goal
 #phone(Person, 12345)# will fail, but the goal #phone(Person, 66532)#
 has no less than three possible solutions --- #Person = "Ian"#,
-#Person = "Julien"#, and #Person = "Ralph"# --- each of which will be tried
-on backtracking.
+#Person = "Julien"#, and #Person = "Ralph"# --- each of which will be
+computed on backtracking.
 
-(What if we call #phone# with both arguments as inputs?  This is
-called an \emph{implied mode} and is explained in the next section.)
-
-Each #mode# declaration specifies a \emph{procedure} to be derived from the
-predicate definition.  The compiler generates code separately for each
-procedure of a predicate, reordering goals to ensure that every variable is
-instantiated (\eg by some earlier unification or call) before it is used in
-an input in a unification or call.  Mode information is also used to decide
-whether a unification with a data constructor is a construction or
+\Note Each #mode# declaration specifies a \emph{procedure} to be derived
+from the predicate definition.  The compiler generates code separately for
+each procedure of a predicate, reordering goals to ensure that every
+variable is instantiated (\eg by some earlier unification or call) before it
+is used in an input in a unification or call.  Mode information is also used
+to decide whether a unification with a data constructor is a construction or
 deconstruction.
 
-The compiler then verifies that the determinism category for a procedure
-actually does describe the behaviour of the procedure.  The compiler will
+The compiler verifies that the determinism category for a procedure
+properly describes the behaviour of the procedure.  The compiler will
 issue an error if a procedure can fail or have multiple solutions when its
 declared determinism category says otherwise.  Moreover, the compiler will
 report an error if a deterministic switch is incomplete, telling you
 which cases have been missed (Prolog programmers dream of having error
 detection like this\ldots)
 
+\subsection*{Implied modes}
 
+An \emph{implied mode} is one where an output argument is supplied as
+an input in a procedure call.  Consider the goal #phone("Ralph", 66540)#.
+The compiler gets around this situation by placing a new, temporary variable
+in the output position and then adding a unification goal, giving
+#phone("Ralph", Tmp), Tmp = 66540#.
 
 \section{The determinism categories}
 
@@ -114,10 +114,11 @@
 #failure#               & $0$ \\
 \end{tabular}
 
-There are three other determinsm categories: #erroneous# which is used for
-predicates that only terminate by throwing an exception (exceptions are
-described in Chapter \XXX{}), and #cc_multi# and #cc_nondet#, described
-later in Section \XXX{} on committed-choice nondeterminism.
+There are three other determinsm categories that are only occasionally
+needed: #erroneous#, which is used for predicates that only terminate by
+throwing an exception (exceptions are described in Chapter \XXX{}), and
+#cc_multi# and #cc_nondet# which are used for committed-choice
+nondeterminism (see Chapter \XXX{}).
 
 \subsection*{Some examples}
 
@@ -125,7 +126,8 @@
 :- pred square(int::in, int::out) is det.
 square(X, X * X).
 %endcode
-#square# is #det#: it cannot fail and every input leads to a single solution.
+#square# is #det#: it cannot fail and every input has a single solution for
+the output.
 
 %startcode
 :- pred absolute_square_root(float::in, float::out) is semidet
@@ -133,8 +135,8 @@
     X >= 0.0,
     AbsSqrtX = math.sqrt(X).
 %endcode
-#absolute_square_root# is #semidet#: a negative input leads to failure; a
-non-negative input has a single solution.
+#absolute_square_root# is #semidet#: it fails for negative inputs while
+non-negative inputs each have a single solution.
 
 %startcode
 :- pred small_prime(int::out) is multi.
@@ -159,66 +161,74 @@
 built-in predicate #true# which has no arguments and always succeeds (and is
 therefore #det#).
 
-\textbf{Note} that a goal with no output arguments is either #det# or
-#semidet#.  For instance,
+\Note The determinism category of a goal with no output arguments is either
+#det#, #semidet#, or #failure#.  Consider the following:
 %startcode
-:- pred has_small_prime_factor(X::in) is semidet.
+:- pred has_small_prime_factor(int::in) is semidet.
 has_small_prime_factor(X) :-
     small_prime(P),
     X mod P = 0.
 %endcode
-Mercury ensures that a goal #has_small_prime_factor(15)#, say, will only
-succeed once, even though #small_prime# has two solutions that are factors
-of #15#.
-
-
+Because there are no outputs, Mercury ensures that the goal
+\\#has_small_prime_factor(15)#, say, will not succeed more than once, even
+though #small_prime(P)# has two solutions, #P = 3# and #P = 5#, satisfying 
+#15 mod P = 0#.
 
 \section{Determinism}
 
 These rules specify how determinism categories for compound goals are
 derived (with a little experience this quickly becomes second nature).
+The determinism category of a goal is derived from the instantiation state
+of its arguments at the time the goal is executed.
+
+\Note Remember that the compiler reorders the goals in a predicate
+separately for each #mode# declaration for the predicate.  A running program
+does not make decisions about which procedures should be executed when
+calling predicates; this is decided in advance by the Mercury compiler.
 
 \subsection*{Unifications}
 
 Whether a unification is a construction, deconstruction, assignment or
 equality test depends upon which variables are instantiated and which are
-not.
+not at the time the unification is executed.
 
-A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{construction} if #Y1#, #Y2#
-and #Y3# are instantiated and #X# is not.  Constructions are always #det#.
+A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{construction} if #Y1#,
+#Y2#, and #Y3# are initially instantiated and #X# is not.  Constructions are
+always #det#.  Afterwards, #X# will be instantiated.
 
 A unification #X = data_ctor(Y1, Y2, Y3)# is a \emph{deconstruction} if #X#
-is instantiated.
-Deconstructions are almost always #semidet#, alghough in some circumstances
-a deconstruction may be guaranteed to succeed to fail.
-
-A unification #X = Y# where #X# is instantiated and #Y# is not, or vice
-versa, is an \emph{assignment}.  Assignments are always #det#.
+is initially instantiated.  Afterwards, #Y1#, #Y2#, and #Y3# will be
+instantiated.  Deconstructions are almost always #semidet# (in certain
+circumstances a deconstruction may have determinism category #det# if it is
+guaranteed to succeed or #failure# if it is guaranteed to fail).
+
+A unification #X = Y# is an \emph{assignment} if precisely one of #X# or #Y#
+is initially instantiated.  Afterwards, both variables will be instantiated.
+Assignments are always #det#.
 
-A unification #X = Y# where both #X# and #Y# are instantiated is an
-\emph{equality test}.  Equality tests are always #semidet#.
+A unification #X = Y# is an \emph{equality test} if both #X# and #Y# are
+initially instantiated.  Equality tests are always #semidet#.
 
 \subsection*{Procedure calls}
 
-Which procedure is called by a goal #p(X1, X2, X3)# depends upon which
-#mode# declaration for predicate #p# best matches the instantiation states
-of #X1#, #X2# and #X3#.  The determinism category of the goal is that of
-the called procedure, adjusted for any \emph{implied modes}.
-
-An implied mode arises when an already instantiated argument appears in an
-output argument position of a procedure call.  In this case a temporary
-variable is introduced to receive the output value, which is then unified
-with the original argument.
-
-The goal #small_prime_factor(9)# uses an implied mode because
-\\#small_prime_factor# has only one mode, #pred(out) is multi#.  The goal is
-converted into #small_prime_factor(Tmp), Tmp = 9# by the compiler, allowing
-us to use the rules for conjunction and deconstruction to work out the
-determinism category.
+For a predicate call #p(X1, X2, X3)#, which procedure of #p# is executed
+depends upon which #mode# declaration for predicate #p# best matches the
+instantiation states of #X1#, #X2#, and #X3# at the time the call is
+executed.  The determinism category of the goal is that of the called
+procedure, adjusted for any implied modes (\ie extra unifications added
+because some output arguments of the procedure are already instantiated at
+the time of the call).
+
+For example, the goal #phone("Zoltan", ZoltansNumber)# is compiled as a call
+to the #(in, out) is det# procedure of #phone#.  The goal
+#phone(Person, 66540)# is compiled as a call to the #(out, in) is nondet#
+procedure of #phone#.  The goal #phone("Ralph", 66532)# requires an implied
+mode and may be compiled either as #phone("Ralph", Tmp), Tmp = 66532# or
+#phone(Tmp, 66532), Tmp = "Ralph"#, both of which are #semidet#.
 
 \subsection*{Conjunction}
 
-A sequence of goals separated by commas, #(G1, G2, G3)#, is called a
+A sequence of goals separated by commas, #G1, G2, G3, ...#, is called a
 \emph{conjunction}.  The commas are pronounced ``and'' and each subgoal is
 called a \emph{conjunct}.
 
@@ -229,42 +239,68 @@
 A conjunction can have multiple solutions if it can succeed
 and one or more conjuncts have multiple solutions.
 
+\Note These rules are a conservative (\ie safe) approximation.  For example,
+the compiler will conclude that conjunction #small_prime(X), X = 4# is
+#semidet#, even though we can see that this goal has to fail.
+
 \subsection*{Disjunction}
 
-A sequence of goals separated by commas, #(G1 ; G2 ; G3)#, is called a
+A sequence of goals separated by commas, #(G1 ; G2 ; G3 ; ...)#, is called a
 \emph{disjunction}.  The commas are pronounced ``or'' and each subgoal is
 called a \emph{disjunct}.
 
 A disjunction can succeed if any disjunct can succeed.
 
 A disjunction can have multiple solutions if more than one disjunct can
-succeed.
+succeed or one or more disjuncts can have multiple solutions.
 
-\textbf{Note} the special case of switches.  A switch is a disjunction that
+\Note Switches are a special case.  A switch is a disjunction that
 deconstructs a particular variable against a different data constructor in
 each disjunct.  If, apart from the deconstructions, every disjunct is #det#,
 then the switch is #det# if the set of deconstructions is exhaustive and
 #semidet# if not.
 
-\textbf{Note} that disjunction binds less tightly than conjunction, hence
-\\#(G11, G12 ; G21, G22 ; G31, G32)#
+For example, even though both #p# and #q# (below) define switches on #X#,
+#p# is #det# because its switch is exhaustive, whereas #q# is #semidet#
+because its switch is not:
+%startcode
+:- type ott ---> one ; two ; three.
+
+:- pred p(ott::in, int::out) is det.
+p(X, Y) :-  ( X = one, Y = 1 ; X = two,    Y = 2 ; X = three, Y = 3 ).
+
+:- pred q(ott::in, int::out) is semidet.
+q(X, Y) :-  ( X = one, Y = 1 ; X = three,  Y = 3 ).
+%endcode
+
+\Note Disjunction binds less tightly than conjunction:
+\\#( G11, G12 , G13 ; G21 ; G31, G32 )#
 \\is equivalent to
-\\#((G11, G12) ; (G21, G22) ; (G31, G32))#.
+\\#( (G11, G12, G13) ; G21 ; (G31, G32) )#.
 
-\textbf{Note} that a definition spanning multiple clauses is equivalent to a
+\Note A definition spanning multiple clauses is equivalent to a
 definition using a single clause containing a disjunction.  That is
 %startcode
-p_or_q_or_r({A, _, _},  X)  :- p(A, X).
-p_or_q_or_r({_, B, _},  Y)  :- q(B, Y).
-p_or_q_or_r({_, _, C},  Z)  :- r(C, Z).
-%endcode
-is the same as
-%startcode
-p_or_q_or_r({A, B, C}, W) :-
-    (  p(A, X),  W = X
-    ;  q(B, Y),  W = Y
-    ;  r(C, Z),  W = Z
-    ).
+p(one,    1).
+p(two,    2).
+p(three,  3).
+%endcode
+is semantically and operationally identical to
+%startcode
+p(X, Y) :-  ( X = one, Y = 1 ; X = two, Y = 2 ; X = three, Y = 3 ).
+%endcode
+
+\Note If any disjunct instantiates a variable that is used outside the
+disjunction, then every disjunct in the disjunction must also instantiate
+that variable.  That is, the Mercury compiler will report a mode error if a
+program contains a disjunction that instantiates #X# in some disjuncts, but
+not others, and #X# is also needed outside the disjunction.  For instance,
+the following is illegal because #Y#, which appears outside the disjunction,
+is instantiated in the first and second disjuncts, but not the third:
+
+%startcode
+:- pred p(number::in, int::out) is det.
+p(X, Y) :-  ( X = one, Y = 1 ; X = two, Y = 2 ; X = three ).
 %endcode
 
 \subsection*{Negation}
@@ -273,30 +309,322 @@
 if #G# succeeds, and vice versa.  The negation succeeds if #G# fails and
 fails if #G# succeeds.
 
-#G# is said to occur inside a \emph{negated context} and is
+\Note #G# is said to occur inside a \emph{negated context} and is
 not allowed to instantiate variables that also occur outside the negation.
 
-\textbf{Note} that negation binds more tightly than conjunction, hence
-\\#(not G1, G2)# is equivalent to #((not G1), G2)#.
+\Note Negation binds more tightly than conjunction, hence
+#not G1, G2, ...# is equivalent to #(not G1), G2, ...#.  To negate a
+conjunction, put the conjunction in parentheses: #not (G1, G2, ...)#
+
+\Note #X \= Y# is syntactic sugar for #not (X = Y)#.
 
 \subsection*{If-then-else goals}
 
-A goal #( if G1 then G2 else G3 )# means the same thing as
-\\#(G1, G2 ; not G1, G3)#.  We can work out the determinism
-category of this from the rules above.
-
-(\textbf{Note} that this describes the \emph{declarative semantics} for
-#if-then-else# goals.  The \emph{operational semantics} are more efficient:
-if #G1# produces any solution at all then the #if-then-else# goal is
-equivalent to #(G1, G2)#; if #G1# produces no solutions then the
-#if-then-else# goal is equivalent to just #G3#.)
+The declarative semantics for a goal #( if Gc then Gt else Ge )# are
+identical to those of #( Gc, Gt ; (not Gc), Ge )#.  The operational
+semantics are more efficient, though: if there are no solutions to #Gc#,
+the program immediately executes #Ge#.
 
-\textbf{Note} that #X \= Y# is just shorthand for #not X = Y#.
+If any of #Gc#, #Gt#, or #Ge# can fail then the #if#-#then#-#else# can fail.
 
+If any of #Gc#, #Gt#, or #Ge# can have multiple solutions then the
+#if#-#then#-#else# can have multiple solutions.
 
+\Note #Gc# is not allowed to instantiate variables that are used outside the
+#if#-#then#-#else#.  This is because, semantically, #Gc# appears in a
+negated context.  It is all right, however, for #Gc# to instantiate
+variables that are used by #Gt#.
+
+\Note Execution can backtrack into #Gc#.  For example,
+
+#( if small_prime(X), X > 2 then Y = X * X else Y = -1 )#
+
+has solutions #Y = 9#, #Y = 25#, and #Y = 49#.
+
+\Note #( Gc -> Gt ; Ge )# is an alternative, albeit old-fashioned, syntax for
+#( if Gc then Gt else Ge )#.
 
 \section{Procedures and code reordering}
 
+The aim of this section is to give the reader some understanding of code
+reordering.  This knowledge is not required to write Mercury programs, but
+it can help the programmer understand mode-related error messages from the
+compiler.
+
+We will illustrate using the #append# predicate defined in the #list#
+standard library module.  The declarative semantics of
+#append(Xs, Ys, Zs)# is that the list #Zs# is the concatentation of lists
+#Xs# and #Ys#.  So #append([1], [2, 3], [1, 2, 3])# is true, but
+#append([2, 3], [1], [1, 2, 3])# is not.
+
+%startcode
+:- pred  append(list(T),  list(T),  list(T)).
+:- mode  append(in,       in,       out)  is det.
+:- mode  append(out,      out,      in)   is multi.
+
+append(Xs, Ys, Zs) :-
+    (
+        Xs = [], Zs = Ys
+    ;
+        Xs = [X | Xs0], append(Xs0, Ys, Zs0), Zs = [X | Zs0]
+    ).
+%endcode
+
+This code needs no reordering for the #(in, in, out) is det# mode, where
+#Xs# and #Ys# start off instantiated and #Zs# starts off uninstantiated.
+The first disjunct works like this:
+
+\begin{tabular}{lp{0.3\linewidth}l}
+1 & #Xs = []# & Deconstruct #Xs#\\
+2 & #Zs = Ys# & Assign #Zs#\\
+\end{tabular}
+
+and the second disjunct works like this:
+
+\begin{tabular}{lp{0.3\linewidth}p{0.6\linewidth}}
+1 & #Xs = [X | Xs0]# & Deconstruct #Xs#, instantiating #X# and #Xs0#\\
+2 & #append(Xs0, Ys, Zs0)# & Call the #(in, in, out)# procedure, instantiating
+#Zs0#\\
+3 & #Zs = [X | Zs0]# & Construct #Zs#\\
+\end{tabular}
+
+Because each disjunct deconstructs #Xs# in a different way, this disjunction
+is a switch.  Because the switch is exhaustive, and the other goals in each
+disjunct are #det#, the switch as a whole is #det#.
+
+The #(out, out, in) is multi# mode, where only #Zs# is initially
+instantiated, does require some reordering in order to ensure that every
+variable is instantiated before it is needed.  The first disjunct becomes
+
+\begin{tabular}{lp{0.3\linewidth}l}
+1 & #Xs = []# & Construct #Xs#\\
+2 & #Zs = Ys# & Assign #Ys#\\
+\end{tabular}
+
+and the second disjunct becomes
+
+\begin{tabular}{lp{0.3\linewidth}p{0.6\linewidth}}
+1 & #Zs = [X | Zs0]# & Deconstruct #Zs#, instantiating #X# and #Zs0#\\
+2 & #append(Xs0, Ys, Zs0)# & Call the #(out, out, in)# procedure, instantiating
+#Xs0# and #Ys#\\
+3 & #Xs = [X | Xs0]# & Construct #Xs#\\
+\end{tabular}
+
+Since this disjunction is not a switch and the first disjunct always leads
+to a solution, the disjunction as a whole is #multi# in this case.
+
+\Note The Mercury compiler reorders code as little as possible.  However,
+programmers should not write code that depends upon any particular order of
+evaluation --- code can also be reordered by various optimizations!  In
+particular, it is a bad idea to write something like
+#( if X \= 0, Z = Y / X then ... else ... )#, assuming that the test for #X#
+being non-zero will guarantee that this code cannot lead to a
+division-by-zero error at run-time.  It is certainly possible that the test
+and the division may be compiled the other way around.
+
+\section{Insts and subtypes}
+
+So far we have only talked about variables going from being uninstantiated
+to being instantiated.  It turns out to be useful to also keep track of the
+possible values a variable can have when it is instantiated.  Mercury uses
+#inst#s for this purpose.  An #inst# represents the possible
+instantiantiation states of a variable at a particular point in a program.
+
+The most basic #inst#s are #free#, meaning a variable is uninstantiated, and
+#ground#, meaning a variable is instantiated with some unknown value of the
+appropriate type.
+
+The built-in #mode#s #in# and #out# are defined using the following syntax:
+
+%startcode
+:- mode in   == (ground  >> ground).
+:- mode out  == (free    >> ground).
+%endcode
+
+That is, an #in# mode argument of a goal must be #ground# (\ie be
+instantiated with some value) before the goal is executed and will also be
+#ground# afterwards, while an #out# mode argument must be #free# (\ie not
+instantiated) before the goal is executed, but will be #ground# afterwards.
+
+\Note If a goal fails or backtracks then the #inst#s of its arguments stay
+the same as they were before the goal was tried.
+
+\subsection*{Specialised insts}
+
+It is occasionally useful to define new #inst#s matching only subsets of
+possible values that a variable might have.  Consider the following:
+
+%startcode
+:- inst non_empty_list == bound([ground | ground]).
+
+:- pred  head(list(T), T).
+:- mode  head(in,                  out) is semidet.
+:- mode  head(in(non_empty_list),  out) is det.
+
+head(Xs, X) :- Xs = [X | _].
+%endcode
+
+The #inst# declaration defines #non_empty_list# to mean ``bound to the list
+data constructor #[|]# whose first argument has inst #ground# and whose
+second argument has inst #ground#''.
+
+The first mode for #head# tells us that if all we know about the first
+argument is that it is #ground# (\ie it could be bound to any value of type
+#list(T)#, including #[]#) then a call to #head# is semidet.
+
+The second mode for #head# says that if we know the first argument is a
+non-empty list (\ie whatever value it has, it's top-most data constructor
+must be #[|]# with two #ground# arguments) then a call to #head# is
+guaranteed to succeed.
+
+The second #mode# declaration uses the built-in parameterised form of the
+#in# argument mode, which is defined like this:
+
+%startcode
+:- mode in(I) == (I >> I).
+%endcode
+
+where #I# is an #inst# parameter.  #in(non_empty_list)# is therefore
+equivalent to writing #(non_empty_list >> non_empty_list)#.  There is also a
+built-in parameterised #out# argument mode, defined thus:
+
+%startcode
+:- mode out(I) == (free >> I).
+%endcode
+
+When compiling the procedure for the second mode of #head#, the Mercury
+compiler uses the information about the inst of the first argument, #Xs#, to
+infer that the goal #Xs = [X | _]# must (a) be a deconstruction and (b) must
+succeed because whatever value #Xs# has matches the pattern #[_ | _]#.
+
+\Note A value with a #bound(...)# #inst# can always be used in a context
+where a #ground# value is expected, but not the other way around.
+
+\subsection*{Recursively defined insts}
+
+It is possible to describe quite complicated instantiation states.  The
+following #inst#s, for instance, describe lists of even and odd lengths
+respectively:
+
+%startcode
+:- inst even_length_list  == bound([] ; [ground | odd_length_list]).
+:- inst odd_length_list   == bound([ground | even_length_list]).
+%endcode
+
+The first #inst# declaration defines #even_length_list# to mean ``bound
+\emph{either} to #[]# \emph{or} to #[|]# with two arguments, the first
+having #inst# #ground# and the second having #inst# #odd_length_list#
+(multiple possibilities in a #bound# expression are separated by
+semicolons).
+
+The second #inst# declaration defines #odd_length_list# to mean ``bound to
+#[|]# with two arguments, the first having #inst# #ground# and the second
+having inst #even_length_list#.''
+
+\subsection*{Partial instantiation}
+
+A partially instantiated value is one whose #inst# is #bound(...)# where
+the #...# part contains #free# sub-insts, either directly or indirectly.
+
+Partial instantiation is not currently supported for several reasons,
+including the difficulty of analysing such code, the difficulty of
+maintaining such code, and the difficulty of compiling such code
+efficiently.
+
+\section{Uniqueness}
+
+A #bound(...)# inst is said to be \emph{shared} --- that is, it corresponds
+to a value that may be referred to, directly or indirectly, by more than one
+variable at a given point in the program.
+
+Mercury has a special inst, #unique#, which is like #ground#, but it means
+that there is precisely one reference to the #unique# data at this point in
+the program.  The counterpart to #unique#, is #clobbered#.  A variable with
+#inst# #clobbered# may never be used again (\eg because the value it refers
+to is now out-of-date or has been overwritten with something else).
+
+The most common use of uniqueness is for managing IO.  All the IO operations
+defined in the #io# standard library module include two arguments of type
+#io#, with modes #di# and #uo# respectively.  #di# stands for ``destructive
+input'' and #uo# stands for ``unique output''.  These #mode#s are built-in
+and defined thus:
+
+%startcode
+:- mode di  == (unique  >> clobbered).
+:- mode uo  == (free    >> unique).
+%endcode
+
+To illustrate, consider these #pred# declarations taken from the #io#
+module:
+
+%startcode
+:- pred io.write_string(string::in, io::di, io::uo) is det.
+:- pred io.write_int(int::in, io::di, io::uo) is det.
+:- pred io.nl(io::di, io::uo) is det.
+%endcode
+
+and the following code snippet:
+
+%startcode
+    io.write_string("The meaning of life is ", IO0, IO1),
+    io.write_int(42, IO1, IO2),
+    io.nl(IO2, IO3)
+%endcode
+
+The #io# type arguments denote ``states of the world''.  These #io# states
+are updated when IO actions are performed.  One can never go back to an
+earlier state (you can't unplay a piece of music, say, or unprint a piece of
+paper), so each IO action clobbers the #io# state passed to it and produces
+and new #io# state as its result.  Similarly, because one cannot copy the
+state of the world, #io# states have to be unique.  These constraints ensure
+that the above code snippet executes in the expected order --- that is,
+first the string ``\texttt{The meaning of life is }'' will be printed
+(clobbering #IO0# and producing #IO1#), then the number ``\texttt{42}''
+(clobbering #IO1# and producing #IO2#), and finally a newline (clobbering
+#IO2# and producing #IO3#).
+
+Say we were to accidentally reuse #IO0# in the second goal:
+
+%startcode
+    io.write_string("The meaning of life is ", IO0, IO1),
+    io.write_int(42, IO0, IO2),
+    io.nl(IO2, IO3)
+%endcode
+
+The Mercury compiler will report the following error
+(line 27 in file #foo.m# is the call to #io.write_string#):
+
+%startcode
+foo.m:027: In  clause for `main(di, uo)':
+foo.m:027:     in argument 2 of call to predicate `io.write_string/3':
+foo.m:027:     unique-mode error: the called procedure would clobber
+foo.m:027:     its argument, but variable `IO0' is still live.
+%endcode
+
+\Note Procedures that can clobber arguments must have determinism category
+#det# or #cc_multi#.  They must always succeed and produce a single result.
+The reason for this is that once an argument is clobbered, which could
+happen at any point during the execution of the procedure, there is no way
+of un-clobbering it on failure or backtracking.  Consequently it is also an
+error for code to backtrack into such procedures.  For the rare cases where
+one needs to do such things, the reader is referred to the section on
+backtrackable destructive update in the Mercury reference manual which
+discusses ``mostly uniqueness''.
+
+\section{Committed-choice nondeterminism}
+
+\XXX{Fill this out.}
+
+The #cc_nondet# and #cc_multi# modes.  There may be multiple solutions to a
+cc predicate, but you will only get one of them.
 
+The compiler will report an error if a program can backtrack into a
+committed-choice goal: all goals following a committed-choice goal must be
+\emph{guaranteed} to succeed.  Programming under this restriction is quite
+burdensome.
 
+If all solutions to a committed-choice predicate are equivalent, in the
+sense that, no matter which solution you get, the observable behaviour of
+your program will be the same, then you can use the built-in function
+#promise_only_solution# to escape from the committed-choice context.
 



----- End forwarded message -----
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
Previous message: [m-dev.] cc_multi bug
Next message: [m-dev.] mercury meeting this morning
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the developers mailing list