[m-dev.] for review: Add "moose" to extras.

Tyson Dowd trd at cs.mu.OZ.AU
Sat May 20 13:51:37 AEST 2000

On 19-May-2000, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> That looks OK.  But it would be much much better if there was
> some general documentation that at very least explained what
> Moose is.  Also if you're going to add Moose to the extras,
> you should mention it in the extras/README file.

I have, I just forgot to include that change in the diff.

RCS file: /home/mercury1/repository/mercury/extras/README,v
retrieving revision 1.7
diff -u -r1.7 README
--- README      2000/01/31 03:12:20     1.7
+++ README      2000/05/19 00:15:40
@@ -33,6 +33,13 @@
                of its use, including a module `lazy_list' that defines
                a lazy list data type.
+moose          A parser generator for Mercury.  Moose works much like
+               yacc or bison, it takes a grammar and generates a table
+               driven LR parser for it.  You can add code to the
+               grammar to handle synthesized or inherited attributes.
+               Currently you need to write your own lexer to interface
+               to moose.
 odbc           A Mercury interface to ODBC (Open Database Connectivity),
                for interfacing to standard relational database packages.
> I had a look at the source code and examples and managed to figure out
> a bit about what this program does.  So here's a start at some
> user documentation for it.  It would be good if you could include
> this when you commit it.  It would be even better if the XXXs
> in it got fixed ;-)
> BTW, what category of grammars does Moose handle?
> Is it LR(0), LR(1), LR(k), ...?

Err... best ask Tom.  I'm pretty sure it just does LALR.

> ----------------------------------------------------------------------
> Moose is a parser generator for Mercury.
> It does the same sort of thing for Mercury that Yacc and Bison do for C.
> Moose input files should be given a `.moo' suffix.
> Moose input files contain Mercury code plus some additional
> kinds of declarations and clauses that specify a grammar.
> The `moose' program takes a Moose input file and converts it into
> ordinary Mercury code.
> Each Moose input file should contain:
> - One Moose parser declaration, of the form
> 	:- parse(<StartSymbol>, <EndToken>, <TokenType>, <Prefix>).
>   Here <StartSymbol> is the name of the starting symbol for the grammar,
>   <EndToken> is the token that signifies end-of-file,
>   <TokenType> is the name of the Mercury type for tokens in this grammar,
>   and <Prefix> is unused.  (XXX we should change the syntax to
>   delete the <Prefix> argument here, since it is not used.)

Actually, we should change the code to use the prefix.  That way you can
generate multiple parsers (say, one that parses expressions, and one
that parses whole programs) from the same grammar.

But I think just an XXX that currently this is unused it fine.

> - One or more Moose rule declarations, of the form
> 	:- rule <Name>(<ArgumentTypes>).
>   A `:- rule' declaration declares a non-terminal symbol in the grammar.
>   Here <Name> is the name of the non-terminal symbol, and
>   <ArgumentTypes> gives the types of the arguments (i.e. attributes)
>   of that non-terminal.
> - One or more Moose clauses.
>   The Moose clauses specify the productions for this grammar.
>   Each must be of the form
> 	<NonTerminal> ---> <ProductionBody>.
>   Here <NonTerminal> is of the form <Name> or <Name>(<Arguments>),
>   where <Name> is the name of the non-terminal symbol, and
>   <Arguments> specify the arguments (i.e. attributes) for
>   that non-terminal.
>   <ProductionBody> must of one of the following forms:
>         [<TerminalList>]
> 	<NonTerminal>
> 	<ProductionBody> , <ProductionBody>
> 	<ProductionBody> ; <ProductionBody>
> 	{ <Action> }
>   [<TerminalList>] denotes a list of terminal symbols.
>   Each of the terminal symbols must be an element of the token type
>   specified in the `:- parse' declaration.  The list can be empty.
>   <NonTerminal> denotes a non-terminal.  Each non-terminal must be
>   declared with a `:- rule' declaration.
>   <Production> , <Production> denotes sequence.
>   <Production> ; <Production> denotes alternatives.
>   { <Action> } denotes a grammar action.  Here <Action> is an arbitrary
>   Mercury goal.  Grammar actions can impose semantic conditions,
>   or can be used to compute attributes.

Everything above here is fine.

> - One or more Moose action declarations, of the form

Zero or more.

> 	:- action(<Name>/<Arity>, <PredicateName>).
>   I'm not sure what these declarations mean.
>   (XXX we should document what those declarations do.)

Each :- action declaration will add a method called PredicateName
to the type class parser_state/1.  The method will have the same types
as the rule given by Name/Arity, plus a threaded in/out pair of
parser_state arguments.  

For example
	:- rule foo(int).
	:- action(foo/1, process_foo).
will generate
	:- typeclass parser_state(T) where [
		... get_token and any other action methods ...
		pred process_foo(int, T, T),
		mode process_foo(in, in, out) is det

Whenever the parser reduces using a rule, it will invoke the associated
action method for that rule.  Since the parser_state is threaded through
all action methods, it can be used to implement inherited attributes.
Actions can also modify the token stream in the parser_state.
> (XXX we should also document the parser_state type class
> and the get_token method.)

The parser_state type class is the set of all operations that must be
implemented in order to use the parser.  parser_state will contain at
least one method, called get_token.  

	:- typeclass parser_state(T) where [
		pred get_token(token, T, T),
		mode get_token(out, in, out) is semidet

get_token returns the next token in the token stream.  The parser state
typically contains a list of tokens, from which the next token is
retrieved (although other implementations are possible).  get_token may
fail if there are no more tokens available.

The other methods in parser_state will be dictated by the Moose action

As well as storing inherited attributes, the parser_state can be used to
track any other information that may be useful (for example, the current
file and line number for generating error messages).

I intend to add a big example of using Moose, a parser for C.  It uses
actions to handle type_defs, which are added to a map(identifier,
declaration), and this map is looked up by get_token whenever it sees an
identifier (to make sure the identifier isn't a type_def, which is
treaded as a different token).

mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au

More information about the developers mailing list