[m-rev.] lex and moose changed

Holger Krug hkrug at rationalizer.com
Thu Aug 2 02:44:04 AEST 2001


On Wed, Aug 01, 2001 at 07:31:51AM -0700, Ralph Becket wrote:
> > Changes concerning lex:
> This change is mainly concerned with changing the lexer result type
> from
> 
> :- type lexer_result(Token)
> 	--->	ok(Token)           % For noval tokens.
> 	;	ok(Token, string)
> 	;	eof
> 	;	error(int).
> 
> to just
> 
> :- type lexer_result(Token)
> 	--->	ok(Token)           % For noval tokens.
> 	;	eof
> 	;	error(int).
> 
> and requiring that each lexeme be associated with a function
> constructing a token of the required type from the matched
> string.

Exactly.

> I really don't think this is a good change to lex.  It complicates
> the interface and the implementation without gaining anything much
> in terms of utility.

It complicates the interface for the implementor of the lexer, but
simplifies it for the token consumer. Because lexer implementation
usually is very simple, simpler e.g. than parser implementation, the
implementor of the lexer could really do a little bit extra work, I
think.
 
> The effect of this change can be obtained by using a thin
> wrapper around lex__read//1 on a per application basis, e.g.

Yes, but the problem is that inside the lexer you exactly know the
form of each string and can use conversion functions which presuppose
special string forms, e.g. `string__det_to_int'. Outside the lexer
using a wrapper as you propose you should use `string__to_int'
instead, implementing additional error checks and not relying on the
correctness of the lexer. Therefore it seems simpler to do the
conversion inside the lexer.
 
> my_read(T) -->
> 	lex__read(Result),
> 	{	Result = error(_), ...
> 	;	Result = eof, ...
> 	;	Result = ok(Token),         T = convert(Token, "")
> 	;	Result = ok(Token, String), T = convert(Token, String)
> 	}.
> 
> This doesn't add any real cost in terms of computation or 
> complexity to the application.  

You are concerning efficiency. But the code becomes more complexe
because of the additional error checks necessary inside the `convert'
predicate.

> My own opinion is that converting
> strings into other representations is properly done by the parser.

That's exactly the question. My option is, that it should be done
inside the lexer, because the parser should work on a structural level
and mostly independent of textual representations of single
tokens. The lexer works on the input and token levels and, therefore,
in every case has to handle textual representations. The parser works
on the token and structural levels, it's not necessary to work on a
textual level inside the parser. So why not to free the parser from
this burden ? I think it gives a clearer design.

Why do you think it should be done inside the parser ?

> I'd like to hear what other people think, though.

I too !

> > Changes concerning moose:
> > 
> >
> %-----------------------------------------------------------------------
> -----%
> > %
> > % 07/24/01 hkrug at rationalizer.com:
> > %    * added option --unique-state/-u
> 
> Another option would be to pass in appropriate modes for the lexer
> state,
> e.g. {in, in, out} vs {ui, di, uo}.  But that's a minor change that
> isn't
> urgent.

I do not understand !

> discriminated
OK.
> discriminated
OK.

> > Attention: moose now depends on lex, because the following type forms
> the
> > interface of moose with its lexer:
> > 
> > :- type lex__lexer_result(Token)
> >     --->    ok(Token)                   % Token matched.
> >     ;       eof                         % End of input.
> >     ;       error(int).                 % No matches for string at
> this offset.
> 
> Again, I don't think lex and moose need to by tied together.

Actually the only tie is the common type `lie__lexer_result'. 
 
> You could just define a type moose__lexer_result/1 as above and require
> the application's lex__read//1 wrapper to do the conversion.  This
> wouldn't
> cost anything since that's what the wrapper is doing anyway.

Yes, I thought about this issue. The solution depends on how often
moose will be used without lex. I for myself will not moose without
lex.  (Maybe lex without moose and thats the reason why I put
`lexer_result' inside lex.) But maybe that's not correct for others ?
 
> I'll go over the moose changes in detail and get back with a review
> ASAP.

Thanks !

-- 
Holger Krug
hkrug at rationalizer.com
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list