[m-rev.] lex and moose changed
Holger Krug
hkrug at rationalizer.com
Thu Aug 2 17:07:01 AEST 2001
On Thu, Aug 02, 2001 at 10:58:17AM +1000, Peter Schachte wrote:
> Ralph and Holger:
>
> Maybe the best way to resolve this is for each of you to post a simpler
> lexer using your preferred scheme. Hopefully on seeing both, the preferable
> approach will become clear.
>
> As a suggested application, here's a token type I'd like to have produced by
> the lexer:
>
> :- type token --->
> plus ; minus ; times ; divide ; int(int) ; ident(string).
>
> Hopefully this is simple enough to be easy to code, and complex enough to
> illustrate the differences between the approaches.
Great proposal. Thanks ! I'll try to answer for both sides and allow
Ralph to kill me (but only within 3 hours after having read this email
and only personally, so he had to come to Berlin within 3 hours) if
there's some wrongdoing concerning his approach.
Ralph's solution (code prefixed with line numbers):
---------------------------------------------------
% the token type does not contain the values, only the token names
01 :- type token --->
02 plus ; minus ; times ; divide ; int ; ident.
03 :- func lexemes = list(annotated_lexeme(token)).
04 lexemes = [ % lexeme(annotated token, regular expression)
05 lexeme(noval(plus) , atom('+') ),
06 lexeme(noval(minus) , atom('-') ),
07 lexeme(noval(times) , atom('*') ),
08 lexeme(noval(divide) , atom('/')),
09 lexeme(value(int) , signed_int ),
10 lexeme(value(ident) , identifier )
11 ]
% Using lexemes we now create a lexer, to which a character source may be
% added and from where tokens may be received. Because here is no difference
% I won't show the code. The difference only lies in the result which is
% of type `lexler_result(token)' and is obtained by `lex__read':
:- type lex__lexer_result(Token)
---> ok(Token) % Noval token matched.
; ok(Token, string) % Value token matched.
; eof % End of input.
; error(int). % No matches for string at this offset.
:- pred lex__read(lexer_result(Tok),
lexer_state(Tok, Src), lexer_state(Tok, Src)).
:- mode lex__read(out, di, uo) is det.
% `lexler_result(token)' gives the token and, in the case of a value token,
% the string matched by the regular expression associated with the token
% value in the lexeme. To avoid to have the parser to tackle with strings
% when they really represent integers or something else, Ralph's approach
% now is to wrap `lex__read' with `my_read' which does the conversion
12 :- pred my_read(parser_input(PTok),
13 lexer_state(Tok, Src), lexer_state(Tok, Src)).
14 :- mode my_read(out, di, uo) is det.
% Here the output type `parser_input(PTok)' is added by me. `PTok' would be
% the kind of tokens as the parser expects them. `PTok' has to be defined
% additionally because it's not the same as `token'. E.g.:
% to be defined by the user:
15 :- type parser_token --->
16 plus ; minus ; times ; divide ; int(int) ; ident(string).
% to be defined by the parser framework, e.g. moose:
:- type parser_input(PTok) ---> ok(PTok)
; eof
; error(int).
17 :- func convert(token, string) = parser_input(parser_token).
18 my_read(T) -->
19 lex__read(Result),
20 { Result = error(_), ...
21 ; Result = eof, ...
22 ; Result = ok(Token), T = convert(Token, "")
23 ; Result = ok(Token, String), T = convert(Token, String)
24 }.
% `convert' does the conversion from lexer output to parser input, e.g.
% from strings to ints. The parser now may be fed with `my_read' as
% token source.
My solution:
------------
% The token type is like you propose:
01 :- type token --->
02 plus ; minus ; times ; divide ; int(int) ; ident(string).
% The lexemes are more difficult as in Ralph's approach. This may be changed
% later on when we will have `lex' not only as library, but as a real
% code generator, as Ralph suggests in his to-do list.
03 lexemes =[ % lexeme( noval(token), regular expression )
% lexeme( t(func(string) = token), regular expression )
04 lexeme(noval(plus) , atom('+') ),
05 lexeme(noval(minus) , atom('-') ),
06 lexeme(noval(times) , atom('*') ),
07 lexeme(noval(divide) , atom('/')),
08 lexeme(t(func(Match) =
09 int(convert_signed_digits_to_string(Match))), signed_int ),
10 lexeme(t(func(Match) = ident(Match))) , signed_int )
11 ]
12 :- func convert_signed_digits_to_string(string) = token.
% As you see, the conversion is part of the lexeme definition, based on
% which the lexer is created. `convert_signed_digits_to_string' is
% `string__det_to_int' with overflow check.
% A problem here, I admit, is that in the current implementation there
% is no way to signal an error at this place in the case of e.g. an
% integer overflow. This has to be added to the implementation.
% (An exception may be thrown inside the converter. If the converter
% throws an exception, `lex__read' catches the exception and returns
% the appropriated error message. This all is transparent to the user,
% I only forgot to implement it.)
% The library function `lex__read' now does the same, as `my_read'
% in Ralph's case. Ralph has to declare two Token types, one for the
% lexer (only tokens without values) and another one, containing the
% necessary values after conversion, for the parser.
% In my case the token types for lexer and parser may be the same.
% I.e. we are ready now to feed the parser with `lex__read' as its input
% source. The type for parser input is the same of the type for lexer
% output, viz. `lexer_result(token)'.
Result
------
24 lines of code to be written for Ralph's approach (+ implementation of
`convert' and the dots `...' in `my_read')
12 lines of code to be written for my approach (+ implementation of
`convert_signed_digits_to_string')
--
Holger Krug
hkrug at rationalizer.com
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list