EDCGs

Fergus Henderson fjh at cs.mu.oz.au
Tue May 13 00:00:55 AEST 1997
Previous message: [m-dev.] For review: code generator improvements
Next message: diff: test cases for .agc bugs.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,

Here is the `descr.txt' file from Peter Szeredi and Martin Kosa's
extended definite clause grammars extension for Mercury.
They're seeking feedback.  Any comments?

--
Fergus Henderson <fjh at cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3         |     -- the last words of T. S. Garp.


		       An EDCG extension for Mercury

			Peter Szeredi, Marton Kosa
			   IQSOFT Ltd, Budapest
				April 1997

An introductory example
=======================

Let's start with a brief excerpt from an simple calculator program (the
full text is given later):

    :- pred expr(list(char)::in)
	+hidden(passed(base), produced(accumulator)) is semidet.
    expr(Cs) :- 
	    dcg <. Cs, intgr(I), accumulator <. I, expr_tail, dcg >. [].

    :- pred expr_tail
        +hidden(changed(dcg),changed(accumulator),passed(base)) is semidet.
    expr_tail :-
	    (
		op(OP) ->  intgr(X), perform_op(OP, X), expr_tail
	    ;
		true
	    ).

    :- pred op(op::out)+hidden(changed(dcg)) is semidet.
    op(OP) :-
	    dcg<+[OPC], is_op(OPC, OP).

    :- pred perform_op(op::in, int::in)+hidden(changed(accumulator)) is det.
    perform_op(plus,  I) :-  accumulator   <+ [I].
    perform_op(minus, I) :-  subtractor    <+ [I].

    :- pred intgr(int::out)+hidden(changed(dcg),passed(base)) is semidet.

Predicate `expr' does parsing and evaluation of simple arithmetic
expressions. It has one visible input argument, the list of characters to
be parsed; and it has two hidden arguments: an input argument holding the
number base of integers, and an output argument, the value of the
expression. The first hidden is said to be `passed', because it is
inherited from goals textually preceding the call of this predicate (and
will be further passed on to goals to the right); the second one is said to
be `produced' by the predicate, because it is not present in the left
context but it is passed to the goals to the right.

Predicate `expr_tail' has no visible arguments and five hidden ones. Each
argument classified as `changed' corresponds to an accumulating operation
and is represented by two hidden arguments: the old value of the
accumulator inherited from the left context, and the new value passed onto
the right context. There are two such changed hiddens: `dcg', which holds
the list of characters yet unparsed and `accumulator' which holds the
expression value accumulated so far. `expr_tail' parses (from the `dcg'
hidden) a sequence of numbers each preceded by an additive operator,
converts the numbers using the `base' passed, and adds/subtracts their
value to/from the `accumulator'.

Two further predicates called from within `expr_tail', `is_op' and
`perform_op', are also shown in full. For predicate `intgr' only the pred
declaration is given.

In the bodies of the example predicates three special primitive operations
for hiddens are used. The construct `dcg <. Cs' in `expr' "produces" the
hidden `dcg', i.e. sets it up with the initial value `Cs'. The construct
`dcg >. []' is an inverse operation: it unifies the value of the `dcg'
hidden with `[]' and "consumes" the hidden, i.e. removes it from the right
context. Predicates may also have hiddens of kind `consumed'. After a
hidden has been consumed, it cannot be used, unless it is re-introduced
with a subsequent "production" (i.e. a `<.' primitive, or a predicate
having it as a `produced' hidden).

The construct `dcg<+[OPC]' performs an accumulating operation, "changing"
the value associated with the hidden `dcg'. This construct is equivalent to
the sequence `dcg >. Old, Old = [OPC|New], dcg <. New', i.e. consuming the
hidden; performing the accumulating operation resulting in a new value for
the hidden; and finally producing the hidden with the new value. The
accumulating construct behaves in a way analogous to a call of a predicate
which has the given hidden as a `changed' argument.

The same hidden may have multiple accumulating operations associated with
it, as exemplified by the `perform_op' predicate. The `accumulator <+ [I]'
and the `subtractor <+ [I]' constructs affect the same hidden,
`accumulator', but with different operations: the first one is equivalent
to `accumulator >. Old, New is Old+I, accumulator <. New', while the second
one to `accumulator >. Old, New is Old-I, accumulator <. New'.

The fourth primitive operation, not shown in the example, is `hidden >: V',
which unifies the value of `hidden' with `V' like the '>.' construct, but
does not consume the hidden. It behaves like a predicate having the given
hidden as a `passed' argument.

The EDCG preprocessor, in order to be able to expand these constructs,
requires several declarations to be supplied by the user:

:- htype(dcg, list(char)).
:- hmode(dcg, in, out).

:- htype(accumulator, int).
:- hmode(accumulator, in, out).

:- htype(base, int).
:- hmode(base, in).

:- acc(dcg, lambda([Tok::out, Toks0::in, Toks::out] is semidet, 
	Toks0 = [Tok|Toks])).

:- acc(accumulator, lambda([X::in, Y::in, Z::out] is det, Z is Y+X)).

:- acc(subtractor, lambda([X::in, Y::in, Z::out] is det, Z is Y-X), 
	accumulator).

The `htype' declaration defines the hidden and associates a Mercury type
with it. The `hmode' declaration supplies the Mercury modes for the left
and right context (the latter one is optional). The `acc' declaration
defines the accumulator, supplies its accumulating operation and optionally
associates a hidden with the accumulator (if no hidden is given it is
assumed to have the same name as the accumulator).


Description of the EDCG extension
=================================

We now describe the extension in detail.

Predicates can have hidden arguments. This has to be specified in the
(newly introduced) so called hidden part of their pred declaration. Each
hidden must have its Mercury type and mode defined; this is to be done
through the new htype and hmode declarations.  Some hiddens may have
accumulation operations associated with them; this is done in the new acc
declarations.

The htype and hmode declarations are of the following form:

:- htype(Hidden, Type).
:- hmode(Hidden, LeftMode[, RightMode[, Det]]).

where

    Type is a Mercury type, the type associated with Hidden.
    LeftMode is a Mercury mode, the mode to be used when Hidden is 
	inherited from the calls textually before
    RightMode is a Mercury mode, the mode to be used when Hidden is
	produced for use in the calls textually after
    Det is a Mercury determinism.

A htype declaration is required for each hidden argument occurring in the
program.  Zero or more hmode declarations can be supplied for hidden
arguments.  If there are multiple hmode declarations, then mode
declarations in the predicates using the given hidden argument will be
expanded to multiple declarations, one for each alternative hmode.

The Det part in the hmode declaration is used for multi-hmode hiddens
when the accumulation operation has different determinisms for the
modes. In such cases a predicate involving an accumulation of the hidden
has to specify the highest (most determinate) determinism in the lattice of
determinisms. In the process of expansion these determinisms will be glb-ed
with the determinisms specified for the hiddens.

The last one or two arguments of a hmode declaration can be omitted.

An alternative way to specify the modes of hidden arguments
is the extended form of mode declaration:

:- mode PredName(VisibleModes)+hidden(HiddenModes).

This is to be used if we want to fully control the expansion of modes of
predicates with hidden arguments, as opposed to the hmode mechanism, which
produces all possible mode combinations. These two possibilities for
specifying modes exclude each other and can not be used together for the
same predicate.

The acc declaration is of the following form:

:- acc(AccName[/Arity], AccPred[, Hidden]).

where

    AccName is an accumulator name (accumulator names form a name
	space different from hidden names; in most cases, however, an
	accumulator corresponds to a hidden with the same name).
    Arity is a non-negative integer, defaults to zero if omitted.
    AccPred is the name of a predicate of arity Arity+3, 
	or a lambda expression with such an arity.
    Hidden is an atom,  the name of the hidden argument. If omitted it
	defaults to AccName.

An acc declaration associates an accumulator functor (AccName/Arity)
with the hidden argument to be involved in the accumulation (Hidden) and
the predicate performing the accumulation (AccPred). There may be multiple
accumulators associated with the same hidden.

An accumulator name can have 0 or more arguments, as specified by Arity. In
the expansion of the accumulation operation, the arguments of the
accumulator will passed on to the accumulation predicate followed by three
additional arguments (the term to be accumulated, the left and the right
values of the hidden).


We extend predicate declarations with an optional hidden part:

:- pred PredName(VisibleArgs)[+hidden(HiddenArgs)].

where `+' and `hidden' are required syntactic elements and HiddenArgs is a
comma-separated enumeration of HiddenArg's. A HiddenArg can be of one of
four forms: 

	changed(Hidden)
	passed(Hidden)
	produced(Hidden)
	consumed(Hidden)

where Hidden is a name of a hidden argument previously declared in a htype
declaration. The same hidden argument can be used in different forms in
different predicates.

For each HiddenArg the predicate declaration is extended with two (changed)
or one (passed, produced or consumed) additional argument, in the order the
hidden arguments are listed. The additional argument(s) have the
Mercury type, as specified in the corresponding htype declaration for
Hidden. The two new arguments for a changed(Hidden) will have LeftMode and
RightMode as their modes, as specified in the hmode declaration for
Hidden. The new argument generated for passed(Hidden) and consumed(Hidden) 
has LeftMode as its mode and the argument for produced(Hidden) has the
declared RightMode as its mode.

Calls to predicates with hidden arguments are extended with extra arguments
likewise. The additional arguments in the head and the body contain
variables.  To describe how these variables are linked to each other we
introduce the notion of "state". A state is a map from hidden names to
(logical) variable names.  The head, the body and each subgoal of the body
gets assigned a pre-state and a post-state.

When a clause is processed, first its head is extended with the additional
arguments, each being a fresh new logical variable. The pre-state
associated with the head contains exactly those variables which have a
LeftMode, i.e. the pre-state maps a "changed" Hidden to the first variable
associated with it and a "passed" or "consumed" Hidden to the (only)
variable associated with it (and "produced" hiddens are excluded from its
domain).  The post-state maps a "passed" Hidden to the same variable as the
pre-state, maps a "changed" Hidden to its second variable, and a "produced"
Hidden to the (only) variable associated with it (and "consumed" hiddens
are excluded from its domain).

The pre-state of a clause body is equal to the pre-state of the clause
head. Similarly, the pre-state of the first conjunct in a conjunction is
the same as the pre-state of the latter. The pre-state of the second
conjunct is the same as the post-state of the first conjunct.  The
pre-state of all disjuncts in a disjunction is the same as that of the
disjunction.

When extending a predicate call in the body, for each argument in the
hidden part of its declaration, the following is done, depending on the
form of the hidden argument: 

    changed(Hidden): The pre-state should contain the Hidden in its domain
	(otherwise error). The first added argument will be the variable to
	which Hidden maps in the pre-state. The second added argument will be
	a fresh new variable. The post-state will map Hidden to this new
	variable.

    passed(Hidden): The pre-state should contain the Hidden in its domain
	(otherwise error). The  added argument will be the variable to
	which Hidden maps in the pre-state. The post-state will map Hidden to
	the same variable.

    consumed(Hidden): The pre-state should contain the Hidden in its domain
	(otherwise error). The  added argument will be the variable to
	which Hidden maps in the pre-state. Hidden will be deleted from the
	domain of the post-state.

    produced(Hidden): The pre-state should *not* contain the Hidden in its
	domain (otherwise error). The added argument will be a fresh new
	variable. The post-state will map Hidden to this new variable.

The post-state of a predicate call will be unchanged for all other elements
of its domain (i.e. those not mentioned in the hidden part of the called
predicate declaration).

The post-state of a conjunction is the post-state of its second conjunct.

The post-state of a disjunction is formed as follows. The post-states of
both disjuncts must have the same domain (otherwise error). It may make sense
to allow an exception from this rule when the disjunct has the determinism
fail or error, but this is not handled presently.  The post-state of the
disjunction will have this common domain as its domain, and will map those
hiddens for which the post-states of disjuncts agree to the shared value;
for all other elements it will map to a fresh new variable. The disjuncts
will be extended with equalites to unify the disagreeing variables of the
disjunct and the disjunction.

When the whole of a clause body is processed its post-state is compared
with the post-state of the head. It is an error if the domains of these
differ. It is also an error if a hidden of form `passed' has different
variables associated with it in the two states (i.e. passed arguments can
only occur in calls in passed positions).

The clause body is extended with equalites, in a way similar to
disjunctions, to unify the variables of the body post-state and head
post-state, where these differ. 

There are four primitive operations to be used in clause bodies, with
special syntax:

  accumulation:	`Accumulator <+ ListToBeAccumulated'	changed(Hidden)
	(and with :- acc(Accumulator[/Arity], AccPred[, Hidden]).)
  producing:	`Hidden <. InitValue'			produced(Hidden)
  consuming:	`Hidden >. FinalValue'			consumed(Hidden)
  accessing:	`Hidden >: IntermediateValue'		passed(Hidden)

These built-ins can be viewed as single argument predicate calls, with a
hidden argument of form listed in the last column. Their effect is as
follows: 

  accumulation:	 `Accumulator <+ []' is a no-op, `Accumulator <+ [A|As]' is
	equivalent to `Accumulator <+ [A], Accumulator <+ As'.  Accumulator
	has to have an acc declaration (which associates Accumulator and
	Hidden whether they are equal or distinct).  `Accumulator <+ [A]'
	is expanded to a call to the accumulation predicate (referenced in
	the acc declaration) with Arity+three arguments (..., A, Old, New),
	where Old is the variable associated with Hidden in the pre-state
	and New is a fresh new variable.  New will be associated with
	Hidden in the post-state.  Accumulator can be an atom (i.e., term
	of arity zero), or a term with positive arity; in the latter case,
	the arguments of the accumulator are passed to accumulating
	operation. The distinction between Accumulator and Hidden makes
	possible to have different accumulation operations for the same
	hidden.
	
  producing: `Hidden <. InitValue', checks that Hidden is not in the
	pre-state, introduces a new variable New to which Hidden  will be
	mapped in the post-state and replaces the goal with `New =
	InitValue' 

  consuming: `Hidden >. FinalValue' is replaced by `Old = FinalValue',
	where Old is the variable associated with Hidden in the pre-state.
	Hidden is deleted from the post-state.

  accessing: `Hidden >: IntermediateValue' is replaced by `Old =
	IntermediateValue', where Old is the variable associated with
	Hidden in the pre-state. Post-state remains the same as the
	pre-state. 


A simple example
================

:- pred rev(list(int), list(int)).
:- mode rev(in, out) is det.

rev(L, R) :-
	list_acc <. [],
	revapp(L),
	list_acc >. R.

:- htype(list_acc, list(int)).
:- hmode(list_acc, in, out, det).

:- acc(list_acc, 
	lambda([In::in, Acc::in, Out::out] is det, Out = [Acc|In])).

/* alternative formulation:
:- acc(list_acc, acc_pred).

:- pred acc_pred(int, list(int), list(int)).
:- mode acc_pred(in, in, out) is det.

acc_pred(Acc, In, [Acc|In]).
*/

:- pred revapp(list(int)) +hidden(changed(list_acc)).
:- mode revapp(in).

revapp([]).
revapp([X|L]):-
	list_acc <+ [X],
	revapp(L).


The calculator example
======================

This example implements a very simple calculator which accepts expressions
formed from numbers and operators  + and -. Numbers are interpreted in a
number base supplied separately.

The program consists of two modules, module `calc' defines the user
interface, while the module `expr' defines the expression parser/evaluator.


:- module calc.

:- interface.

:- import_module io.

:- pred main+hidden(changed(universe)) is det.

:- implementation.

:- import_module expr, string, int.

:- htype(universe, io__state).
:- hmode(universe, di, uo).

:- acc(wr, io__write_string, universe).
:- acc(arguments, io__command_line_arguments, universe).

main:-
	arguments <+ [Args],
	(
	    Args = [BS|Es], string__to_int(BS, B), B>1, B=<10
	->
	    wr <+ ["Base `", BS, "' used.\n"], 
	    base <. B, eval_exprs(Es)
	;
	    wr <+ ["Usage: calc <base 2..10> <expr1,...>\n"]
	).


:- pred eval_exprs(list(string)::in)
	   +hidden(consumed(base), changed(universe)) is det.
eval_exprs([]) :- 
	base >. _.
eval_exprs([S|Ss]) :-
	string__to_char_list(S, Cs),
	(
	    expr(Cs)
	-> 
	    accumulator >. V, string__int_to_string(V, VS),
	    wr <+ ["The value of `",S,"' is ", VS," decimal.\n"]
	;
	    wr <+ ["Expression `",S,"' not correct.\n"]
	),
	eval_exprs(Ss).
	
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

:- module expr.

:- interface.

:- import_module int, list, char.

:- htype(dcg, list(char)).
:- hmode(dcg, in, out).

:- htype(accumulator, int).
:- hmode(accumulator, in, out).

:- htype(base, int).
:- hmode(base, in).

:- pred expr(list(char)::in)
    +hidden(passed(base), produced(accumulator)) is semidet.

:- implementation.

:- htype(decimal, int).
:- hmode(decimal, in, out).

:- acc(dcg, lambda([Tok::out, Toks0::in, Toks::out] is semidet, 
	Toks0 = [Tok|Toks])).

:- acc(accumulator, lambda([X::in, Y::in, Z::out] is det, Z is Y+X)).

:- acc(subtractor, lambda([X::in, Y::in, Z::out] is det, Z is Y-X), 
	accumulator).

:- acc(decimal/1, lambda([B::in, D::in, I0::in, I::out] is det, I is I0*B+D)).

% The grammar of expressions:
%  expr --> intgr, expr_tail.
%  
%  expr_tail --> op, intgr, expr_tail
%  	;     [].
%  
%  op -> [+] ; [-].
%  
%  intgr --> digit, intgr_tail
%  
%  intgr_tail -> digit, intgr_tail
%  	;      [].

:- type op ---> plus ; minus.

expr(Cs) :- 
	dcg <. Cs, intgr(I), accumulator <. I, expr_tail, dcg >. [].

:- pred expr_tail
        +hidden(changed(dcg),changed(accumulator),passed(base)) is semidet.
expr_tail :-
	(
	    op(OP) ->  intgr(X), perform_op(OP, X), expr_tail
        ;
	    true
	).

:- pred op(op::out)+hidden(changed(dcg)) is semidet.
op(OP) :-
	dcg<+[OPC], is_op(OPC, OP).

:- pred is_op(char::in, op::out) is semidet.
is_op(+, plus). is_op(-, minus). 

:- pred intgr(int::out)+hidden(changed(dcg),passed(base)) is semidet.
intgr(Val) :-
	decimal <. 0, digit, intgr_tail, decimal >. Val.

:- pred intgr_tail+hidden(changed(dcg), changed(decimal),passed(base)) is det.
intgr_tail :-
	(
	    digit -> intgr_tail
	;
	    true
	).

:- pred digit+hidden(changed(dcg),changed(decimal),passed(base)) is semidet.
digit :-
	dcg<+[D], is_digit(D, V), base >: B, V < B, decimal(B) <+ [V].

:- pred is_digit(char::in, int::out) is semidet.
is_digit(D, V) :-
	char__is_digit(D), char__to_int(D, DV),
	char__to_int('0', ZV),	V is DV-ZV.

:- pred perform_op(op::in, int::in)+hidden(changed(accumulator)) is det.
perform_op(plus,  I) :-  accumulator   <+ [I].
perform_op(minus, I) :-  subtractor    <+ [I].




An example with multi-moded EDCG predicates
===========================================

This is a superficial example to illustrate the use of multi-moded
predicates with hidden arguments. 

:- module mu_process.

:- interface.

:- import_module string.

:- pred process(string::in, string::out) is nondet.

:- implementation.

:- import_module char, list.

:- type pint ---> z ; s(pint).

:- htype(charlist, list(char)).

:- acc(charlist, builder).

:- pred builder(char, list(char), list(char)).
:- mode builder(in, in, out) is semidet.
:- mode builder(in, out, in) is det.

builder(CH, [CH|OCL], OCL).

process(S0, S1) :-
	string__to_char_list(S0, Mu),
	mu1(N, Mu),
	mu1(s(N), Nu),
	string__from_char_list(Nu, S1).

:- pred mu1(pint, list(char)).
:- mode mu1(in, out) is det.
:- mode mu1(out, in) is nondet.

mu1(N, U):-
	charlist <. U,
	mu(N),
	charlist >. [].

:- pred mu(pint) +hidden(changed(charlist)).
:- mode mu(in) +hidden(out, in) is det.
:- mode mu(out) +hidden(in, out) is nondet.
mu(z):-
	charlist <+ [.].
mu(s(N)):-
	charlist <+ [m],
	u(s(N)),
	mu(N).

:- pred u(pint) +hidden(changed(charlist)).
:- mode u(out) +hidden(in, out) is nondet.
:- mode u(in) +hidden(out, in) is det.
u(z):-
	charlist <+ [;].
u(s(N)):-
	charlist <+ [u],
	u(N).
Previous message: [m-dev.] For review: code generator improvements
Next message: diff: test cases for .agc bugs.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the developers mailing list