<div dir="ltr"><div>For review by anyone.<br><br>To facilitate easier lexeme definition,</div><div>add a new range/2 function which works as a simple character class like in Perl regular expressions.</div><div>For example: range('a', 'f') = any("abcdef").</div>
<div><br></div><div>extras/lex/lex.m:</div><div> adds func range(char, char) = regexp.</div><div><br></div><div>extras/lex/samples/lex_demo.m</div><div> adds a word recognizer just before the "junk" lexeme.<br>
</div><div><br></div><div>I hope you find this useful. </div><div>If my changes get approved in some form (this is my first contribution)</div><div>I'd happily enhance the basic lexer to become more expressive and powerful.</div>
<div>I was thinking of range/3: range(From, To, Exclude = type set(char).</div><div><br></div><div>Sebastian Godelet.</div><div><br>diff --git a/extras/lex/lex.m b/extras/lex/lex.m<br> index c6c7930..7601f5a 100644<br>--- a/extras/lex/lex.m<br>
+++ b/extras/lex/lex.m<br>@@ -107,6 +107,7 @@<br> :- func anybut(string) = regexp. % anybut("abc") is complement of any("abc")<br> :- func ?(T) = regexp <= regexp(T). % ?(R) = R or null<br>
:- func +(T) = regexp <= regexp(T). % +(R) = R ++ *(R)<br>+:- func range(char, char) = regexp. % range('a', 'z') = any("ab...xyz")<br> <br> % Some useful single-char regexps.<br>
%<br> @@ -746,6 +747,30 @@ str_foldr(Fn, S, X, I) =<br> <br> +(R) = (R ++ *(R)).<br> <br>+range(S, E) = R :-<br>+ char.to_int(S, Si),<br>+ char.to_int(E, Ei),<br>+ ( if Si < Ei then<br>+ R = build_range(Si + 1, Ei, re(S))<br>
+ else if Si = Ei then<br>+ R = re(S)<br>+ else<br>+ R = null<br>+ ).<br>+<br>+:- func build_range(int, int, regexp) = regexp.<br>+<br>+build_range(S, E, R0) = R :-<br>+ ( if S < E then<br>
+ char.det_from_int(S, C),<br>+ R1 = (R0 or re(C)),<br>+ R = build_range(S + 1, E, R1)<br>+ else if S = E then<br>+ R = R0<br>+ else<br>+ throw(exception.software_error("invalid range!"))<br>
+ ).<br>+<br> %-----------------------------------------------------------------------------%<br> % Some useful single-char regexps.<br> <br>@@ -768,13 +793,13 @@ alpha = (lower or upper).<br> alphanum = (alpha or digit).<br>
identstart = (alpha or ('_')).<br> ident = (alphanum or ('_')).<br>-nl = re('\n').<br> tab = re('\t').<br> spc = re(' ').<br> <br> %-----------------------------------------------------------------------------%<br>
% Some useful compound regexps.<br> <br>+nl = (?('\r') ++ '\n'). % matches both Posix and Windows newline.<br> nat = +(digit).<br> signed_int = ?("+" or "-") ++ nat.<br>
real = signed_int ++ (<br>diff --git a/extras/lex/samples/lex_demo.m b/extras/lex/samples/lex_demo.m<br>index 6d30ac2..68aef0d 100644<br>--- a/extras/lex/samples/lex_demo.m<br>+++ b/extras/lex/samples/lex_demo.m</div>
</div>