<div dir="ltr"><div>Greetings,<br><br>I had some more time to do more substantial improvements (that I think) to the extra/lex library.<br><br>It was easier to include all previous patches and the new code into a pull request (based on master, but actually any recent branch should be fine)<br>
I'm sorry that I didn't write the patch in one chunk, I'm still learning how to program in Mercury, I guess I caused to much traffic in the mailing list :(<br><br><a href="https://github.com/Mercury-Language/mercury/pull/15">https://github.com/Mercury-Language/mercury/pull/15</a></div>
<div><br>branches: 14.01,master<br>estimated review time: 30m - 1h<br><br>The lexer in extras/lex now supports:<br><br>Unicode characters<br>character ranges, a specialization of the also supported<br>character sets Speed improvement by directly using the sparse_bitset for NFA edge storage, which is of lesser complexity than "or"ing thousands of characters (as required for some Unicode scripts)<br>
<br>extras/lex/lex.automata.m:<br>extras/lex/lex.convert_NFA_to_DFA.m:<br>extras/lex/lex.lexeme.m:<br>extras/lex/lex.regexp.m:<br>- changed the basic transition char to charset<br>- transition --> atom(Char) is now made a singleton charset<br>
- compiled regexp now uses a record for state_no and char, instead of a packed int,<br>otherwise the codepoint gets truncated<br><br>extras/lex/samples/lex_demo.m:<br>- added use cases for the liblex changes<br>- including Unicode support<br>
<br>extras/lex/lex.m:<br>- dot and nl recognize Windows newlines aswell<br>- introduced range/2 for Character ranges<br>- added a charset type alias == sparse_bitset(char)<br>- added an regexp(charset) instance<br>- and a type tag for charset in type regexp<br>
- exporting charset helper functions, e.g. charset and charset_from_lists<br><br>Cheers,<br><br>Sebastian</div></div>