<div dir="ltr"><div>Greetings, I had some more time to do more substantial improvements (that I think) to the extra/lex library. It was easier to include all previous patches and the new code into a pull request (based on master, but actually any recent branch should be fine) I'm sorry that I didn't write the patch in one chunk, I'm still learning how to program in Mercury, I guess I caused to much traffic in the mailing list :( <a href="https://github.com/Mercury-Language/mercury/pull/15">https://github.com/Mercury-Language/mercury/pull/15</a></div> <div> branches: 14.01,master estimated review time: 30m - 1h The lexer in extras/lex now supports: Unicode characters character ranges, a specialization of the also supported character sets Speed improvement by directly using the sparse_bitset for NFA edge storage, which is of lesser complexity than "or"ing thousands of characters (as required for some Unicode scripts) extras/lex/lex.automata.m: extras/lex/lex.convert_NFA_to_DFA.m: extras/lex/lex.lexeme.m: extras/lex/lex.regexp.m: - changed the basic transition char to charset - transition --> atom(Char) is now made a singleton charset - compiled regexp now uses a record for state_no and char, instead of a packed int, otherwise the codepoint gets truncated extras/lex/samples/lex_demo.m: - added use cases for the liblex changes - including Unicode support extras/lex/lex.m: - dot and nl recognize Windows newlines aswell - introduced range/2 for Character ranges - added a charset type alias == sparse_bitset(char) - added an regexp(charset) instance - and a type tag for charset in type regexp - exporting charset helper functions, e.g. charset and charset_from_lists Cheers, Sebastian</div></div>