[mercury-users] RFC: resources

Peter Schachte schachte at cs.mu.OZ.AU
Tue Feb 22 19:03:04 AEDT 2005
Previous message: [mercury-users] Debug messages
Next message: [mercury-users] Beyond "Hello World"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Mercurians,

Please find enclosed a proposal for resources in Mercury.  This is a
bit like EDCGs, but hopefully better, and is meant to complement state
variables.  The proposal is rather long, but the first couple of pages
covers most of it; the rest is details, bells and whistles, and some
examples of code written using it.

I'd appreciate any comments.  Thanks.

-- 
Peter Schachte              Not our location is important, but the direction
schachte at cs.mu.OZ.AU        in which we move.
www.cs.mu.oz.au/~schachte/      -- Lev Tolstoy 
Phone: +61 3 8344 1338      
-------------- next part --------------
Motivation

Mercury's state variable (!) notation makes threading state through a program
considerably easier.  Their use can make a program much more succinct and
readable by removing as many as half of the apparent arguments in a goal or
clause head, and by removing the need for many "tie the knot" unifications.
The price paid for this is that in clauses written with state variables,
conjunction is not commutative (or, if you prefer, comma is not conjunction).
This seems a fair price to pay for more readable code.

However, state variables only partially solve some of these problems, and
create a new one.  These include:

     o	It is still necessary to change every call and every clause head of
        every predicate involved when a new value needs to be threaded.  It
        is often the case that predicates that thread one value through will
        need to thread others later.  It would be better if only the :- pred
        declarations needed to be changed to thread more or fewer values.

     o There is no abstraction of what is being threaded:  the type and modes
        must be repeated everywhere the value flows.  State variables are a
        useful ad hoc tool for localised dataflow, but for more pervasive
        threading a formal, checked declaration of the dataflow would be
        useful documentation.  Specifying types and modes of threaded values
        in only one place also prevents (rather than just detecting) type or
        mode errors, and makes maintenance easier.

     o	With state variable syntax, the apparent arity of a predicate call
        differs from that of the :- pred declaration.

     o	State variables cannot hide a dataflow in cases where the user does
        not need to know about it, such as a constraint store.  This leaves
        implementors of constraint solvers to use impurity to hide that
        flow, which seems rather heavy-weight for this purpose.

     o	State variables cannot help the programmer find where values are
        threaded but not actually needed.

This proposal seeks to address those problems.


Declaring Resources

We propose adding a declaration:

	:- resource Name : Type :: Inst.

eg,

	:- resource var_supply : int :: ground.

Since the specified Inst will usually be ground, that should be the default
if the ":: Inst" part is omitted.  Note that resource names are atoms; this
syntactically distinguishes them from state variables.

In some cases, there will be a particular initial value for a resource that
will most often be used when the resource's scope is entered.  In this case,
one may also specify an initial value by ending the resource declaration with
"= Expression."  For example, resource holding a set of students that passed
a subject might be specified as:

	:- resource passing_students : set(student) = empty.


Declaring Resource Usage

To use a resource, we name it in some pred or func declarations:

	:- pred Name(Types...) using Resourcespec.

where a Resourcespec is either a single resource name or a list of
Resourcespecs.  The compiler transforms this into ordinary Mercury code by
appending to the argument list, for each listed resource Name : Type :: Inst,
two arguments of type Type.  Then for each mode, an argument of mode
Inst>>Inst and one of mode free>>Inst is added.  If Inst is unique, then
unique>>dead and free>>unique are added instead.  Note that the programmer
need add nothing to the mode declarations, and nothing but the resource name
to the predicate declaration.  For example,

	:- pred compute_marks(list(studinfo), list(int))
		using passing_students.
	:- mode compute_marks(in, out) is det.

would be transformed to

	:- pred compute_marks(list(studinfo), list(int), 
			      set(student), set(student)).
	:- mode compute_marks(in, out, in, out) is det.

Sometimes one only wishes to pass a resource into a predicate; ie, it is
read-only for that predicate.  And occasionally a predicate treats a resource
as write-only.  For these cases, we may precede a resource name in a using
clause with !. or !: for input only and output only resources, respectively.


Using Resources

In the code, calls to predicates that use any resources must be preceded with
a ! symbol.  This serves to remind the reader that more arguments are passed
to this predicate than appear textually, and that reordering such calls with
respect to one another changes the meaning of the predicate.  It also tells
the compiler to add extra arguments to the goal for each resource used by the
called predicate.

By default, naming the resources used by a call is optional.  The reader can
look at a pred or func declaration to see which resources are used by a call,
just as they would look to see what arguments of what types a predicate or
function takes.  Since resources are passed in the order listed in the pred
declaration, and the names and types are standardised, resources should be
easier for a programmer to deal with than ordinary arguments.

Resources are used within a clause exactly as state variables:  !.Resource
denotes the current value of a resource, !:Resource denotes the next value,
and !Resource, in a call, denotes the two separate arguments !.Resource,
!:Resource.  However, the !.Resource and !:Resource terms must not be used in
contexts in which the named resource is not in scope.  Similarly, a predicate
declared to use a resource may not be called where that resource is not in
scope.  It is also an error to use the !.Resource term before (to the left
of) the first goal using the !:Resource term.  The compiler will report all
such errors.

The programmer may create a dynamic scope for resources by attaching a
'using' clause to a goal.  This is written:

	Goal using Usingclause.

Where a Usingclause is either a !Resource term or a list of Usingclauses.
All resources mentioned in the Usingclause are considered to be in scope in
Goal and all goals it invokes, directly or indirectly.  Furthermore, the
values of these resources before Goal using Usingclause, if any, are
undefined at the beginning of Goal, except that Resources whose declaration
specified an initial value are given that initial value at the beginning of
Goal.  Whatever values the named resources have at the end of Goal are
discarded following Goal using Usingclause, and are replaced with the values
in effect before Goal using Usingclause.  Naturally, any resources not in
scope before Goal using Usingclause are also not in scope after it.

As a syntactic convenience, we generalise Usingclause to also allow a
Resourcepart=Value term, where Resourcepart is either !.Resource or
!:Resource, Resource is a resource name, and Value is any expression (term).
Goal using !.Resource=Expr is equivalent to (!:Resource=Expr,Goal) using
!Resource, and Goal using !:Resource=Expr is equivalent to
(Goal,Expr=!.Resource) using !Resource, except that the Expr part is always
evaluated *outside* the context of the using clause allowing it to refer to
values from the outer scope.  As a handy shortcut, Goal using [!.R=Vi,!:R=Vf]
can be abbreviated Goal using !R=(Vi,Vf), and Goal using !R=(!.V,!:V) can be
abbreviated Goal using !R = !V (this applies whether !V is a resource or a
state variable).


Modules

If a :- resource declaration appears in an implementation section, it is not
visible outside that module.  In particular, exported predicates cannot use
such resources.  If it appears in the interface section, however, it becomes
publicly visible, and may be used by exported predicates.  In such cases, the
calling predicate must supply those resources.


Algebraic Properties

The resource system allows three useful algebraic properties of a predicate's
use of a resource to be promised by the programmer.  This is done by
generalising the definition of Resourcespec to also include terms of the form
Resourcespec promise Promisespec, where a Promisespec is either a single
algebraic property name, or a list of Promisespecs.

The important properties are commutativity, idempotence, and identity.  Note
that these properties refer only to the usage of the specified resource(s).

If a number of predicates declare that their use of a particular resource is
commutative, that means that they may be executed in any order without
changing the semantics of the program.  Consider two predicates op1/m and
op2/n declared to use resource s commutatively.  After expansion by adding 2
arguments each, we can say that

	    op1(X1, ... Xm, S0, S1), op2(Y1, ... Yn, S1, S2)

is equivalent to

	    op2(Y1, ... Yn, S0, S1), op1(X1, ... Xm, S1, S2)

The Mercury compiler may use this property to reorder calls to op1 and op2 if
necessary to satisfy mode constraints on other arguments.

Another important property is idempotence.  If op/n is declared to use
resource idempotently, then

	    op(X1, ... Xn, S0, S1), op(X1, ... Xn, S1, S2)

is equivalent to just

	    op(X1, ... Xn, S0, S1)

This may be used by the Mercury compiler to remove redundant goals.

Finally, a third property, identity, may be useful.  If a predicate op/n is an
identity for resource s, then following

	    op(X1, ... Xn, S0, S1)

it is known that S0 = S1.  (Note that this equality is with respect to a
user-supplied equality predicate, if one is specified for the type.)  This
implies that op/n is idempotent, and commutative with all other identity
predicates, but not with predicates declared commutative.  All input-only
uses of a resource are implicitly identities, and output-only uses are
implicitly idempotent, but not commutative.


Resource equality declarations

Resource equality declarations are a useful shortcut.  Programmers may
declare

	:- resource Name = Resourcespec.

This allows bundles of resources that are often used together to be given a
name.  The declared Name may also be parametric, allowing resource names or
properties to be abstracted.  For example, it would be convenient to
constraint programmers to have as standard definitions:

	:- resource tell(S) = S promise [idempotent,commutative].
	:- resource ask(S)  = S promise identity.

This would allow a declaration like:

	:- func +(cint, cint) = cint using tell(cintstore).


Hidden Resources

If all uses of all resources within a predicate body are promised commutative
and idempotent, or if all are identities, then the body is perfectly
declarative without considering the resources.  The compiler is free to
reorder goals and remove redundant ones without considering resources.  For
example, a body may use a constraint store resource, adding constraints in
several goals in the body.  If all these constraint operations are promised
commutative and idempotent, as they often will be, then the meaning of the
body is not sensitive to the order, or repetition, of goals in the body.  In
such cases it is not required to precede the goals using that resource with
!, since no confusion results from their omission.  The predicate itself,
however, still must declare that it uses that resource (in a commutative and
idempotent way).  The same holds when all uses are identities, though
not when identities and commutative-idempotent uses are mixed.

In some cases, this may be taken further.  If all predicates exported by a
module use a given resource only in a commutative and idempotent way (or all
uses are identities), then *all* use of that resource outside that module
will be commutative and idempotent, so there is no need for that module to
export the resource name or the fact that any of its exported predicates use
that resource.  The resource is an implementation detail.  This is truly a
hidden resource:  visible inside the module, but not outside.  Note that a
hidden resource must also have an initialisation specified in its resource
declaration to provide the value the resource is to be initialised to at the
start of program execution.


Higher Order Usage

Higher order calls will sometimes need to have extra argument pairs added for
resources.  I see three different approaches to handling this.  For the
moment, I propose the first, and simplest, of these.  In the longer term, the
others may be preferable.

Approach 1:  Require a special using clause on all higher-order calls just
listing which resources are to be passed, without needing to specify initial
or final values.

Approach 2:  Attach a list of resources to higher order types.  A predicate
that expects a higher order argument would then accept any higher order term
of the right type, provided the all the resources demanded by the higher
order term are available to the predicate.  This would allow the compiler to
generate the appropriate using clause itself.  However, it would require that
all higher order terms passed to a predicate use exactly the same resources.

Approach 3:  Approach 2 could be generalised to accept any higher order terms
as long as the used resources are a subset of those available, by having the
compiler automatically generate higher-order argument-dropping combinator
predicates/functions, and using them to wrap the higher order argument
actually passed.  For example, if I pass a closure that expects a resource
foo to a predicate that also uses resources bar and zip, the closure C gets
wrapped in a closure like drop_1_3_of_3(C).  When this is actually invoked,
it will be called as

	 drop_1_3_of_3(C, Bar0, Bar, Foo0, Foo, Zip0, Zip)

where drop_1_3 is defined as:

	drop_1_3_of_3(Closure, A, B, C, D, E, F) :-
		Closure(C, D).

This would impose some overhead to higher order calls, but perhaps this could
be mitigated with program specialisation.


The Compiler

The compiler can infer which predicates use which resources based on uses of
!.Resource and !:Resource, and should issue error messages for missing
resources in using clauses.  Furthermore, it can issue warnings where
unneeded resources are listed, or where they should have been listed as
!.input-only or !:output-only.  Where pred declarations are inferred, the
compiler should emit them suitably adorned with using clauses as needed.

The actual implementation could work by adding extra arguments to predicates
and calls, but an alternative approach would be to compile the code to pass
resources in memory, rather than as arguments.  This would probably be more
efficient, as it would avoid shuffling resources among registers (and
pseudo-registers actually passed in memory) as they are threaded from one
predicate to another.  It would probably also make implementing higher order
resources easier and more efficient.  But in either case, the semantics
should be the same:  viewed as a transformation adding extra arguments to
predicates.


Backward Compatibility

This proposal is largely backward compatible.  In particular, the state
variable syntax remains unchanged.  The library should be augmented with
resource declarations as appropriate; at least, the io module should define
an io resource, which would be used by all predicates that expect an
io__state argument pair.  This, too, is backward compatible, as existing
calls to io predicates will not have the ! prefix, so will not have arguments
added.  Since the argument types and arities will be correct without adding
resource arguments, the compiler should not complain.

Of course, introducing two new operators ('resource' and 'using') may cause
some compatibility problems.  Hopefully these will be few.


Examples

Here are a few examples written using resources:

Firstly, assume the interface section of the io module contains:

	:- resource io : io__state :: unique.
	...
	:- pred io__print(T:in) using io is det.
	...

Then hello.m might look like:

	:- module hello.
	:- interface.
	:- import_module io.
	:- pred main using io is det.

	:- implementation.

	main :-
		! print("hello "),
		! print("world!"),
		! nl.

This would be translated to:

	:- module hello.
	:- interface.
	:- import_module io.
	:- pred main(io__state::di, io__state::uo) is det.

	:- implementation.

	main(IO0, IO) :-
		print("hello ", IO0, IO1),
		print("world!", IO1, IO2),
		nl(IO2, IO).

More on I/O:

Resources neatly handle issues with having "current" input and output
streams, and separate versions of all I/O operations with and without an
explicit stream argument.  I'm assuming here that exceptions are used to
handle I/O errors, as it simplifies the code considerably.  If we declare

	:- resource io     : io__state :: unique.
	:- resource input  : input_stream = stdin.
	:- resource output : output_stream = stdout.

	:- pred open_input(string::in) using [io, !:input] is det.
	:- pred read_char(char::out) using [io, !.input] is det.

etc, then when we just need one input file, we can let the input resource
carry the "current" input stream:

	! open_input(Filename),
	! read_char(Firstchar),
	...

These would be translated to

	open_input(Filename, Input, IO0, IO),
	read_char(Firstchar, Input, IO0, IO),
	...

But when we want to use two input files, we can do

	! open_input(Filename1) using !:input=Stream1,
	! open_input(Filename2) using !:input=Stream2,
	! read_char(Firstchar1) using !.input=Stream1,
	! read_char(Firstchar2) using !.input=Stream2,
	...

Or if we use one stream much more than any other, we can also do

	! open_input(Filename1),
	! open_input(Filename2) using !:input=Stream2,
	! read_char(Firstchar1),
	! read_char(Firstchar2) using !.input=Stream2,
	...

Both of these would be translated to

	open_input(Filename1, Stream1, IO0, IO1),
	open_input(Filename2, Stream2, IO1, IO2),
	read_char(Firstchar1, Stream1, IO2, IO3),
	read_char(Firstchar2, Stream2, IO3, IO4),
	...

The difference between the two is that the second one propagates the input
resource beyond the given code.


A bigger example:

	:- resource predtable :: map(predspec, predinfo) = map__empty.
	:- resource vartable :: map(varid, predinfo) = map__empty.
	:- resource varsupply :: varid = initial_varsupply.


	:- pred process_file(string::in) using io.

	process_file(Name) :-
	    (	compute_output_name(Name, Outname),
		! open_input(Name),
		! open_output(Outname),
		% This is OK because predtable has a default initial value
		! compile_stream using !predtable,
		! close_output,
		! close_input
	    ) using [input,output].


	:- pred compile_stream using [predtable, input, output, io] is det.

	compile_stream :-
		! read_source_code,
		! generate_code.

	:- pred read_source_code using [predtable, input, io] is det.

	read_source_code :-
		! read_term(Term),
		(   at_end_of_file(Term) ->
			true
		;   is_directive(Term) ->
			! handle_directive(Term),
			! read_source_code
		;   % Use the default initial varsupply:
		    ! handle_clause(Term) using !varsupply,
		    ! read_source_code
		).

	:- pred handle_directive(term::in) using predtable.
	:- pred handle_clause(term::in) using [predtable, varsupply].

And so on.  This would translate to:

	:- pred process_file(string::in, io::di, io::uo).

	process_file(Name, IO0, IO) :-
		compute_output_name(Name, Outname),
		open_input(Name, Instream, IO0, IO1),
		open_output(Outname, IO1, IO2, Outstream),
		compile_stream(map__empty, _, Instream, Outstream, IO2, IO3),
		close_output(Outstream, IO3, IO4),
		close_input(Instream, IO4, IO).


	:- pred compile_stream(map(predspec,predinfo)::in,
			       map(predspec,predinfo)::out,
			       input_stream::in, output_stream::in,
			       io::di, io::uo) is det.

	compile_stream(Predtable0, Predtable, Instream, Outstream, IO0, IO) :-
		read_source_code(Predtable0, Predtable1, Instream, IO0, IO1),
		generate_code(Predtable1, Predtable, Instream, Outstream,
			      IO1, IO).

	:- pred read_source_code(map(predspec,predinfo)::in,
			       map(predspec,predinfo)::out,
			       input_stream::in, io::di, io::uo) is det.

	read_source_code(Predtable0, Predtable, Instream, IO0, IO) :-
		read_term(Term, Instream, IO0, IO1),
		(   at_end_of_file(Term) ->
			Predtable = Predtable1,
			IO = IO1
		;   is_directive(Term) ->
			handle_directive(Term, Predtable1, Predtable2),
			read_source_code(Predtable1, Predtable, Instream,
					 IO1, IO)
		;   % Use the default initial varsupply:
		    handle_clause(Term, Predtable1, Predtable2),
		    read_source_code(Predtable1, Predtable, Instream,
					 IO1, IO)
		).
Previous message: [mercury-users] Debug messages
Next message: [mercury-users] Beyond "Hello World"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the users mailing list