[mercury-users] lexing and parsing in Mercury

Fergus Henderson fjh at cs.mu.OZ.AU
Mon Apr 23 08:24:32 AEST 2001


On 22-Apr-2001, Terrence Brannon <princepawn at earthlink.net> wrote:
> However, none of the available Mercury docs has any examples of how to
> write a parser.

Have a look at samples/calculator.m.
That uses definite clause grammar (DCG)
rules to parse a simple expression language.

DCG parsers in Mercury are pretty similar to those in Prolog.
Richard O'Keefe's book "The Craft of Prolog" has a good description, IIRC,
and a quick search with Google will probably net you some other references.

> How could you write a parser to do the following:
> 
> Take sentences of the form VERB OBJECT for a limited number of
> combinations of verb and object and in all cases but the VERB
> add_verb_object simply say "VERB accomplished".
> 
> Ie, 
> 
> read book => "read accomplished"
> open door => "open accomplished"
> open jar  => "open accomplished"
> open sesame => "open does not accept the sesame object"
> play piano => "play does not accept the piano object... in fact play
> 	      does not exist'
> add_verb_object('play','piano').
> play piano => "play accomplished"

Let's see... we're probably going to need to tokenize the input into
words.  You may be able to reuse the Mercury lexer for that.

We're going to need to keep a database of the current state at each
command, and the `add_verb_object' verb will update the database.
So it's going to look something like this:

`main' will call a recursive predicate `run',
passing it the initial value of the database.
For now we'll treat the database as an ADT,
so we'll just invent a function `initial_db'
that returns the initial state of the database:

	main -->
		run(initial_db).

`run' will then call the Mercury lexer (`lexer__get_token_list'),
will call a subroutine `parse' to parse the words into some kind of
semantic representation of the command (e.g. a parse tree), and will
then process the command, giving us a new database state; finally `run'
will call itself recursively with the newdatabase state, to process the
remaining commands.  Oh, and we'd better account for the fact that `parse'
might fail:

	run(DB0) -->
		lexer__get_token_list(Tokens),
		(if { parse(Tokens, Command) } then
			process(Command, DB0, DB),
			run(DB)
		;
			io__write_string("syntax error\n")
		).

Now, let's think about what the parse tree is going to be like.
We really only need to distinguish two kinds of commands:
`add_verb_object', and everything else:

	:- type parse_tree
		--->	add_verb_object(verb, object)
		;	other_sentence(verb, object).
	
	:- type verb == string.
	:- type object == string.

Unfortunately the lexer module returns a rather odd list type,
which happens to be good for efficiency but which is not great
for our current pedagical purpose.  So we need a function
to convert from that list type to the standard list type

	token_list_to_list(token_nil) = [].
	token_list_to_list(token_cons(H, _, T)) = [H | token_list_to_list(T)].

and then we'll change the call to `parse' above to call it:

	(if { parse(token_list_to_list(Tokens), Command) } then

Now, the actual parser itself is trivial; in fact it's so trivial that there's
no need to use a DCG for this, we can just use pattern matching on the list of
tokens:

	:- mode parse(in, out) is semidet.

	parse([name("add_verb_object"), open_ct, name(Verb), comma,
			name(Object), close, end],
		add_verb_object(Verb, Object)).
	parse([name(Verb), name(Object), end],
		other_sentence(Verb, Object)).

Once we've parsed the command, we need to process it.
Processing the `add_verb_object' command is easy enough: we
just add the verb and object to the database,

	process(add_verb_object(Verb, Object), DB0, DB) -->
		{ DB = add_to_db(Verb, Object, DB0) }.

For anything else, the database will remain unchanged, but
we need to look up the verb and object in the database
to see what to do:

	process(other_sentence(Verb, Object), DB, DB) -->
		(if { search_db(DB, Verb, Object) } then
			io__print(Verb), io__print(" accomplished"), io__nl
		else
			io__write_list([Verb, "does not accept the",
				Object, "object"])
			(if { search_db(DB, Verb, _AnythingElse) } then
				io__nl
			else
				io__write_list(["... in fact", Verb,
					"does not exist"]), io__nl
			)
		).

The only thing that remains is to implement our database ADT.
For a simple implementation, we could just use a list.

	:- type database == list(pair(verb, object)).

	initial_db = ["read" - "book", "open" - "door", "open" - "jar"].

	add_to_db(Verb, Object, DB) = [Verb - Object | DB].

	search_db(DB0, Verb, Object) :-
		list__member(Verb - Object, DB0).

For a more efficient solution we might want to use the standard library
`multimap' ADT, e.g.

	:- type database == multimap(verb, object).

	initial_db = DB :-
		multimap__from_assoc_list(["read" - ["book"],
			"open" ["door", "jar"]], DB).

	add_to_db(DB0, Verb, Object) = DB :-
		multimap__insert(DB0, Verb, Object, DB).

	search_db(DB0, Verb, Object) :-
		multimap__nondet_search(DB0, Verb, Object).

Anyway, putting it all together (adding module declarations and fixing
some typos and other mistakes above).  Since this is a toy program I
didn't bother with type declarations, so compile with `mmc --infer-all'.

	:- module example.
	:- interface.
	:- import_module io.

	:- pred main(io__state::di, io__state::uo) is det.

	:- implementation.
	:- import_module lexer, std_util, list, multi_map.

	main -->
		run(initial_db).

	:- mode run(in, di, uo) is det.
	run(DB0) -->
		lexer__get_token_list(Tokens),
		(if { parse(token_list_to_list(Tokens), Command) } then
			process(Command, DB0, DB),
			run(DB)
		else
			io__write_string("syntax error\n")
		).

	:- type parse_tree
		--->	add_verb_object(verb, object)
		;	other_sentence(verb, object).
	
	:- type verb == string.
	:- type object == string.

	token_list_to_list(token_nil) = [].
	token_list_to_list(token_cons(H, _, T)) = [H | token_list_to_list(T)].

	:- mode parse(in, out) is semidet.

	parse([name("add_verb_object"), open_ct, name(Verb), comma,
		name(Object), close, end], add_verb_object(Verb, Object)).
	parse([name(Verb), name(Object), end], other_sentence(Verb, Object)).

	process(add_verb_object(Verb, Object), DB0, DB) -->
		{ DB = add_to_db(DB0, Verb, Object) }.
	process(other_sentence(Verb, Object), DB, DB) -->
		(if { search_db(DB, Verb, Object) } then
			io__print(Verb), io__print(" accomplished"), io__nl
		else
			io__write_strings([Verb, " does not accept the ",
				Object, " object"]),
			(if { search_db(DB, Verb, _AnythingElse) } then
				io__nl
			else
				io__write_strings(["... in fact ", Verb,
					" does not exist"]), io__nl
			)
		).

	:- type database == multi_map(verb, object).

	initial_db = DB :-
		multi_map__from_assoc_list(["read" - ["book"],
			"open" - ["door", "jar"]], DB).

	add_to_db(DB0, Verb, Object) = DB :-
		multi_map__set(DB0, Verb, Object, DB).

	search_db(DB0, Verb, Object) :-
		multi_map__nondet_search(DB0, Verb, Object).

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list