[m-rev.] lex and moose changed

Ralph Becket rbeck at microsoft.com
Thu Aug 2 03:22:04 AEST 2001


> From: Holger Krug [mailto:hkrug at rationalizer.com] 
> Sent: 01 August 2001 17:44
> 
> On Wed, Aug 01, 2001 at 07:31:51AM -0700, Ralph Becket wrote:
> 
> > I really don't think this is a good change to lex.  It complicates
> > the interface and the implementation without gaining anything much
> > in terms of utility.
> 
> It complicates the interface for the implementor of the lexer, but
> simplifies it for the token consumer. Because lexer implementation
> usually is very simple, simpler e.g. than parser implementation, the
> implementor of the lexer could really do a little bit extra work, I
> think.
>  
> > The effect of this change can be obtained by using a thin
> > wrapper around lex__read//1 on a per application basis, e.g.
> 
> Yes, but the problem is that inside the lexer you exactly know the
> form of each string and can use conversion functions which presuppose
> special string forms, e.g. `string__det_to_int'. Outside the lexer
> using a wrapper as you propose you should use `string__to_int'
> instead, implementing additional error checks and not relying on the
> correctness of the lexer. Therefore it seems simpler to do the
> conversion inside the lexer.

I don't think this point is valid.

The person who wrote the regular expressions is the person who should
write the wrapper function.  The programmer will therefore know that
a particular string will satisfy string__to_int and use 
string__det_to_int instead (after, perhaps, checking for overflow etc.)

The question is where the conversion should happen.  Your change
places it inside the lexer and makes both the interface to the
lexer and its implementation more complicated.  My suggestion avoids
both problems.

> > my_read(T) -->
> > 	lex__read(Result),
> > 	{	Result = error(_), ...
> > 	;	Result = eof, ...
> > 	;	Result = ok(Token),         T = convert(Token, "")
> > 	;	Result = ok(Token, String), T = convert(Token, String)
> > 	}.
> > 
> > This doesn't add any real cost in terms of computation or 
> > complexity to the application.  
> 
> You are concerning efficiency. But the code becomes more complexe
> because of the additional error checks necessary inside the `convert'
> predicate.

I don't see this at all.  You have to put the checks somewhere.

> > My own opinion is that converting
> > strings into other representations is properly done by the parser.
> 
> That's exactly the question. My option is, that it should be done
> inside the lexer, because the parser should work on a structural level
> and mostly independent of textual representations of single
> tokens. The lexer works on the input and token levels and, therefore,
> in every case has to handle textual representations. The parser works
> on the token and structural levels, it's not necessary to work on a
> textual level inside the parser. So why not to free the parser from
> this burden ? I think it gives a clearer design.

I agree that the parser should work from the token level up.  My wrapper
solution gives you just what you want: the parser would never see a raw
string!

> > Another option would be to pass in appropriate modes for the lexer
> > state,
> > e.g. {in, in, out} vs {ui, di, uo}.  But that's a minor change that
> > isn't
> > urgent.
> 
> I do not understand !

Your change allows the user to tell moose whether it should treat the
lexer state as unique or ground.  However, it is conceivable that
more complicated insts might be useful.  At the moment lex uses
unsafe_promise_unique/2 and the fact that array_uo == ground to
get around problems with the current mode analyzer.  In future we
hope working with more complicated insts will be easier and require
less slight of hand :)

It's a small point for now, though.

> > You could just define a type moose__lexer_result/1 as above 
> and require
> > the application's lex__read//1 wrapper to do the conversion.  This
> > wouldn't
> > cost anything since that's what the wrapper is doing anyway.
> 
> Yes, I thought about this issue. The solution depends on how often
> moose will be used without lex. I for myself will not moose without
> lex.  (Maybe lex without moose and thats the reason why I put
> `lexer_result' inside lex.) But maybe that's not correct for others ?

I'd be reluctant to tie the two together unnecessarily, partly on
aesthetic grounds, partly on grounds of maintenance, and partly out
of fear that one day it'll come back and bit us.  And not tying them
together is a point in favour of my lex__read//1 wrapper approach!

- Ralph
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list