[m-rev.] add character ranges to extras/lex
Sebastian Godelet
sebastian.godelet+github at gmail.com
Mon Feb 24 07:03:13 AEDT 2014
Greetings,
I had some more time to do more substantial improvements (that I think) to
the extra/lex library.
It was easier to include all previous patches and the new code into a pull
request (based on master, but actually any recent branch should be fine)
I'm sorry that I didn't write the patch in one chunk, I'm still learning
how to program in Mercury, I guess I caused to much traffic in the mailing
list :(
https://github.com/Mercury-Language/mercury/pull/15
branches: 14.01,master
estimated review time: 30m - 1h
The lexer in extras/lex now supports:
Unicode characters
character ranges, a specialization of the also supported
character sets Speed improvement by directly using the sparse_bitset for
NFA edge storage, which is of lesser complexity than "or"ing thousands of
characters (as required for some Unicode scripts)
extras/lex/lex.automata.m:
extras/lex/lex.convert_NFA_to_DFA.m:
extras/lex/lex.lexeme.m:
extras/lex/lex.regexp.m:
- changed the basic transition char to charset
- transition --> atom(Char) is now made a singleton charset
- compiled regexp now uses a record for state_no and char, instead of a
packed int,
otherwise the codepoint gets truncated
extras/lex/samples/lex_demo.m:
- added use cases for the liblex changes
- including Unicode support
extras/lex/lex.m:
- dot and nl recognize Windows newlines aswell
- introduced range/2 for Character ranges
- added a charset type alias == sparse_bitset(char)
- added an regexp(charset) instance
- and a type tag for charset in type regexp
- exporting charset helper functions, e.g. charset and charset_from_lists
Cheers,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20140223/365046d8/attachment.html>
More information about the reviews
mailing list