[mercury-users] Bug in lexer.m
Paul Bone
pbone at csse.unimelb.edu.au
Thu Jul 17 12:57:35 AEST 2008
On Tue, Jul 15, 2008 at 09:40:30PM +1000, Nicholas Nethercote wrote:
> On Tue, 15 Jul 2008, Nicholas Nethercote wrote:
>
> >:- pred string_ungetchar(string::in, posn::in, posn::out) is det.
> >
> >string_ungetchar(String, Posn0, Posn) :-
> > Posn0 = posn(LineNum0, LineOffset0, Offset0),
> > Offset = Offset0 - 1,
> > string.unsafe_index(String, Offset, Char),
> > ( Char = '\n' ->
> > LineNum = LineNum0 - 1,
> > Posn = posn(LineNum, Offset, Offset)
> > ;
> > Posn = posn(LineNum0, LineOffset0, Offset)
> > ).
> >
> >
> >In a 'posn', the first arg is the current line number, the 3rd arg is the
> >current offset into the string being parsed, and the 2nd arg is the offset
> >of the start of the current line.
> >
> >When the above code goes back over a newline, the 2nd argument is
> >incorrectly set -- it gets set to 'Offset', which is the offset of the
> >newline character that ended the line. It should be set to the offset of
> >the beginning of that line (which will be a number of characters earlier
> >than 'Offset'). But by this point the offset for the start of that line
> >has been lost and cannot be easily recreated.
> >
> >I suspect this has been wrong for a very long time, but nobody has noticed
> >because the lexer doesn't ever use the 2nd field anywhere (it could be
> >used to report column numbers for lexing errors, but it isn't). So
> >perhaps the 3-argument 'posn' type could be replaced with a 2-argument
> >type, and some memory would be saved.
>
> And some more savings could be made if that 2-argument type was split into
> two parts which were passed around separately. It results in extra
> arguments to lots of lexing predicates, but avoids the
> deconstruct/construct pair which currently occurs on every character.
>
> I just made a change like this to the Zinc lexer, which is structured
> similarly, and got a nice speedup (I think it was about 20--30%, although I
> didn't measure it all that carefully).
>
Isn't there a complier optimization or two that are supposed to help
here? By packing and unpacking tuples?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/users/attachments/20080717/c162658c/attachment.sig>
More information about the users
mailing list