[m-rev.] add character ranges to extras/lex
Sebastian Godelet
sebastian.godelet+github at gmail.com
Sun Feb 16 06:30:32 AEDT 2014
For review by anyone.
To facilitate easier lexeme definition,
add a new range/2 function which works as a simple character class like in
Perl regular expressions.
For example: range('a', 'f') = any("abcdef").
extras/lex/lex.m:
adds func range(char, char) = regexp.
extras/lex/samples/lex_demo.m
adds a word recognizer just before the "junk" lexeme.
I hope you find this useful.
If my changes get approved in some form (this is my first contribution)
I'd happily enhance the basic lexer to become more expressive and powerful.
I was thinking of range/3: range(From, To, Exclude = type set(char).
Sebastian Godelet.
diff --git a/extras/lex/lex.m b/extras/lex/lex.m
index c6c7930..7601f5a 100644
--- a/extras/lex/lex.m
+++ b/extras/lex/lex.m
@@ -107,6 +107,7 @@
:- func anybut(string) = regexp. % anybut("abc") is complement of
any("abc")
:- func ?(T) = regexp <= regexp(T). % ?(R) = R or null
:- func +(T) = regexp <= regexp(T). % +(R) = R ++ *(R)
+:- func range(char, char) = regexp. % range('a', 'z') = any("ab...xyz")
% Some useful single-char regexps.
%
@@ -746,6 +747,30 @@ str_foldr(Fn, S, X, I) =
+(R) = (R ++ *(R)).
+range(S, E) = R :-
+ char.to_int(S, Si),
+ char.to_int(E, Ei),
+ ( if Si < Ei then
+ R = build_range(Si + 1, Ei, re(S))
+ else if Si = Ei then
+ R = re(S)
+ else
+ R = null
+ ).
+
+:- func build_range(int, int, regexp) = regexp.
+
+build_range(S, E, R0) = R :-
+ ( if S < E then
+ char.det_from_int(S, C),
+ R1 = (R0 or re(C)),
+ R = build_range(S + 1, E, R1)
+ else if S = E then
+ R = R0
+ else
+ throw(exception.software_error("invalid range!"))
+ ).
+
%-----------------------------------------------------------------------------%
% Some useful single-char regexps.
@@ -768,13 +793,13 @@ alpha = (lower or upper).
alphanum = (alpha or digit).
identstart = (alpha or ('_')).
ident = (alphanum or ('_')).
-nl = re('\n').
tab = re('\t').
spc = re(' ').
%-----------------------------------------------------------------------------%
% Some useful compound regexps.
+nl = (?('\r') ++ '\n'). % matches both Posix and Windows newline.
nat = +(digit).
signed_int = ?("+" or "-") ++ nat.
real = signed_int ++ (
diff --git a/extras/lex/samples/lex_demo.m b/extras/lex/samples/lex_demo.m
index 6d30ac2..68aef0d 100644
--- a/extras/lex/samples/lex_demo.m
+++ b/extras/lex/samples/lex_demo.m
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20140215/943c62fa/attachment.html>
More information about the reviews
mailing list