[m-rev.] for review: Add times operator for liblex
Sebastian Godelet
sebastian.godelet+github at gmail.com
Mon May 5 00:39:23 AEST 2014
For review by anyone.
Branch: master.
---
Added the times operator for regular expressions,
such that one can express /[a-z]{10}/
in this way: letter * 10.
extras/lex/lex.m:
Removed unused and unsafe str_foldr function,
added (T * int) = regexp function.
extras/lex/samples/lex_demo.m:
Removed whitespace in comments,
added an input prompt,
added a lexeme for '//' using the new '*' operator.
---
extras/lex/lex.m | 15 ++++++++-------
extras/lex/samples/lex_demo.m | 11 +++++++----
2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/extras/lex/lex.m b/extras/lex/lex.m
index 7570e32..2c47d91 100644
--- a/extras/lex/lex.m
+++ b/extras/lex/lex.m
@@ -117,6 +117,7 @@
:- func ?(T) = regexp <= regexp(T). % ?(R) = R or null
:- func +(T) = regexp <= regexp(T). % +(R) = R ++ *(R)
:- func range(char, char) = regexp. % range('a', 'z') =
any("ab...xyz") +:- func (T * int) = regexp <= regexp(T). % R * N = R
++ R ++ .. ++ R
% Some useful single-char regexps.
%
@@ -837,19 +838,19 @@ anybut(S) = R :-
ExcludedChars = sparse_bitset.list_to_set(string.to_char_list(S)),
R = re(sparse_bitset.difference(valid_unicode_chars,
ExcludedChars)).
-:- func str_foldr(func(char, T) = T, string, T, int) = T.
-
-str_foldr(Fn, S, X, I) =
- ( if I < 0 then X
- else str_foldr(Fn, S, Fn(string.det_index(S, I), X), I
- 1)
- ).
-
?(R) = (R or null).
+(R) = (R ++ *(R)).
range(Start, End) = re(charset(char.to_int(Start), char.to_int(End))).
+R * N = Result :-
+ ( N < 0 -> unexpected($file, $pred, "N must be a non-negative
number")
+ ; N = 0 -> Result = null
+ ; N = 1 -> Result = re(R)
+ ; Result = conc(re(R), (R * (N - 1)))
+ ).
+
%-----------------------------------------------------------------------------%
% Some useful single-char regexps.
diff --git a/extras/lex/samples/lex_demo.m
b/extras/lex/samples/lex_demo.m index 80a50d0..5cc9a26 100644
--- a/extras/lex/samples/lex_demo.m
+++ b/extras/lex/samples/lex_demo.m
@@ -6,7 +6,7 @@
%
% Copyright (C) 2001-2002 The University of Melbourne
% Copyright (C) 2001 The Rationalizer Intelligent Software AG
-% The changes made by Rationalizer are contributed under the terms
+% The changes made by Rationalizer are contributed under the terms
% of the GNU General Public License - see the file COPYING in the
% Mercury Distribution.
%
@@ -46,8 +46,8 @@ I recognise the following words:
""and"", ""then"", ""the"", ""it"", ""them"", ""to"", ""on"".
I also recognise Unicode characters:
""我"", ""会"", ""说"", ""中文""
-I also recognise Mercury-style comments, integers and floating point
-numbers, and a variety of punctuation symbols.
+I also recognise Mercury-style and C++-style comments, integers
+and floating point numbers, and a variety of punctuation symbols.
Try me...
@@ -64,6 +64,7 @@ Try me...
is det.
tokenise_stdin(!LS) :-
+ lex.manipulate_source(io.print("> "), !LS),
lex.read(Result, !LS),
lex.manipulate_source(io.print(Result), !LS),
lex.manipulate_source(io.nl, !LS),
@@ -107,6 +108,8 @@ lexemes = [
( "rat" -> (func(Match) = noun(Match)) ),
( "mat" -> (func(Match) = noun(Match)) ),
+ ( '/' * 2 ++ junk -> (func(Match) = comment(Match)) ),
+
% Here we use `or', rather than multiple lexemes.
%
( "sat" or
@@ -117,7 +120,7 @@ lexemes = [
"then" -> (func(Match) = conj(Match)) ),
% `\/' is a synonym for `or'. Tell us which you prefer...
- %
+ %
( "the" \/
"it" \/
"them" \/
--
1.9.0
More information about the reviews
mailing list