[m-rev.] lex and moose changed

Holger Krug hkrug at rationalizer.com
Thu Aug 2 05:45:35 AEST 2001


> I agree that the parser should work from the token level up.  My wrapper
> solution gives you just what you want: the parser would never see a raw
> string!

Yes but you have to write a lexer, a wrapper and a parser, 3 things
instead of 2. 

I think its better to have the wrapper inside the lexer and you say,
it's better to have it outside. There are, it seems, good reasons for
both. It would really be fine to hear a third voice on this issue.

My thoughts are like follows: the result of the regexp-match are
values of an implicit sub-type of `string'. The sub-type is determined
by the regexp. Obviously neither the Mercury compiler nor any other
compiler can handle such sub-types. Therefore they have to be handled
manually by the programmer. To simplify this I would pose the
definition of the sub-type (the regular expression) and the processing
of the sub-type into one and the same place, e.g. the lexeme
definition. Therefore I put the function processing the matched string
inside the lexeme. In doing so I assure, that most of the important
information in the interface of the lexer can be managed by the
Mercury type system. There is no need to think about external
constraints not enforced by the type system.

The second reason is: it needs less coding.

> > Yes, I thought about this issue. The solution depends on how often
> > moose will be used without lex. I for myself will not moose without
> > lex.  (Maybe lex without moose and thats the reason why I put
> > `lexer_result' inside lex.) But maybe that's not correct for others ?
> 
> I'd be reluctant to tie the two together unnecessarily, partly on
> aesthetic grounds, partly on grounds of maintenance, and partly out
> of fear that one day it'll come back and bit us.  And not tying them
> together is a point in favour of my lex__read//1 wrapper approach!

I can agree with you on this point because I also feel the
deficiencies of my solution. So we have to decide on the type for
input of tokens to moose. Currently moose allows any token type
whatsoever and inside the `:- parse' definition the eof-token has to
be named. There is no way to forward lexer errors to moose. I changed
the input type to a discriminated union combining real token, error
messages and eof. Are there any objections against this decision ?

-- 
Holger Krug
hkrug at rationalizer.com
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list