[m-rev.] New regex module in extras/lex

Ralph Becket rafe at cs.mu.OZ.AU
Thu Nov 28 14:10:40 AEDT 2002
Previous message: [m-rev.] New regex module in extras/lex
Next message: [m-rev.] New regex module in extras/lex
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Fergus Henderson, Friday, 22 November 2002:
> 
> The new module should be documented in the README file
> (and removed from the "FEATURES TO ADD" list there).

Done.

> The copyright notice and the "lex version 1.0 (very alpha)" comment
> in the README file should be updated.

Done.

> There should be a comment or two at the start of the regex.m file giving
> a brief overview.

Done.

> 
> Also, the copyright notices in extras/lex/* should be changed from
> 
> %   THIS FILE IS HEREBY CONTRIBUTED TO THE MERCURY PROJECT TO
> %   BE RELEASED UNDER WHATEVER LICENCE IS DEEMED APPROPRIATE
> %   BY THE ADMINISTRATORS OF THE MERCURY PROJECT.
> 
> to an actual license.  I recommend licensing these files under the LGPL.

Done.

> > extras/lex/lex.lexeme.m:
> > 	Removed the parameter on inst compiled_lexeme.
> 
> Why?

It wasn't necessary.

> These bug fixes (and the license fix) should be committed to the release
> branch.  As for the other changes, well, since this package is in the
> extras distribution, I guess it is a low risk change.  So I would not
> object to them being included in the release branch if they are ready
> in time.  But they are certainly not release-critical, so I will not
> be holding up the release branch for them.

I'd like this to go in the release.  It's been fairly heavily tested the
last few days.

> > extras/lex/test_regex.m:
> > 	A little test harness for regex.m
> 
> The Mmakefile should have a "check" target which runs the test
> and compares it with the expected output.

Done.
> 
> It would be nicer to put that in a "tests" subdir, IMHO.

Done.
> 
> I have not reviewed the new regex.m file and the changes to lex.lex.m
> and lex.lexeme.m in detail.

Here's the updated changelog and interdiff:

Estimated hours taken: 50
Branches: main

Added a new module, regex, as a companion to lex.  The new module provides
functionality for converting conventional Unix-style regular expressions
into regexps for use with lex and a number of search and search-and-replace
functions for strings.

The new functionality has been tested fairly thoroughly (and led to several
bugs in lex being identified and fixed.)

NEWS:
	Reported new additions.

extras/lex/README:
	Now just points the reader to README.lex and README.regex.

extras/lex/README.lex:
extras/lex/README.regex:
	Added.  Brief introductions to the two libraries.

extras/lex/lex.automata.m:
extras/lex/lex.buf.m:
extras/lex/lex.convert_NFA_to_DFA.m:
extras/lex/lex.regexp.m:
	Trivial formatting changes.

extras/lex/lex.lexeme.m:
	Removed the parameter on inst compiled_lexeme.

extras/lex/lex.m:
	Various formatting changes.

	Added pred offset_from_start/3 which can be used to identify
	the `current' point in the input stream with respect to lexing.

	Added pred read_char/3 which can be used to read the `next'
	char from the input stream without doing any lexing.

	Added a field init_winner_func to the lexer_instance type.  This
	is used to resolve a bug whereby regular expressions that match
	the empty string were not being spotted at the start of the input
	stream.

	Solved some bugs whereby an exception was incorrectly thrown in
	some circumstance when the end of the input stream was reached.

extras/lex/regex.m:
	Added.  This file defines the functions for converting Unix-style
	regular expression strings into regexps for use with lex and into
	regexes for use with the string search(-and-replace) predicates
	defined in this module.

extras/lex/Mmakefile:
	Improved the installation instructions and included a check target.

extras/lex/tests:
extras/lex/tests/Mmakefile:
extras/lex/tests/test_regex:
extras/lex/tests/test_regex.in:
extras/lex/tests/test_regex.exp:
	Added a test suite.

extras/lex/samples/demo.m:
	Moved to lex_demo.m
extras/lex/samples/lex_demo.m:
	Was demo.m; slightly changed to include a match for unexpected
	characters.

extras/lex/samples/regex_demo.m:
	Added.

extras/lex/samples/Mmakefile:
	Updated.

Index: NEWS
===================================================================
RCS file: /home/mercury1/repository/mercury/NEWS,v
retrieving revision 1.279
diff -u -r1.279 NEWS
--- NEWS	4 Nov 2002 02:14:24 -0000	1.279
+++ NEWS	28 Nov 2002 03:08:42 -0000
@@ -35,6 +35,8 @@
 Numerous minor improvements to the Mercury standard library.
 
 A new testing tool in the extras distribution.
+A new regex module for string matching and search-and-replace in the
+extras distribution.
 
 DETAILED LISTING
 ================
@@ -302,6 +304,10 @@
   distinct random numbers that can be generated.
 
 Changes to the extras distribution:
+
+* The lex subdirectory now contains a new module, regex, which provides
+  for more traditional string-based ways of defining regular expressions
+  and provides string matching and search-and-replace functionality.
 
 * There's a new testing tool called "quickcheck", which is similar to
   Haskell's "QuickCheck".  See quickcheck/tutes/index.html.

diff -u lex.m lex.m
--- lex.m	22 Nov 2002 04:59:27 -0000
+++ lex.m	28 Nov 2002 02:47:24 -0000
@@ -181,6 +181,19 @@
 :- func start(lexer(Tok, Src), Src) = lexer_state(Tok, Src).
 :- mode start(in(lexer), di) = uo is det.
 
+    % Read the next token from the input stream.
+    %
+    % CAVEAT: if the token returned happened to match the empty
+    % string then you must use read_char/3 (below) to consume
+    % the next char in the input stream before calling read/3
+    % again, since matching the empty string does not consume
+    % any chars from the input stream and will otherwise mean
+    % you simply get the same match ad infinitum.
+    %
+    % An alternative solution is to always include a "catch all"
+    % lexeme that matches any unexpected char at the end of the
+    % list of lexemes.
+    %
 :- pred read(io__read_result(Tok),
             lexer_state(Tok, Src), lexer_state(Tok, Src)).
 :- mode read(out, di, uo) is det.
@@ -312,11 +325,12 @@
 init_lexer_instance(Lexer, Instance, Buf) :-
     buf__init(Lexer ^ lex_buf_read_pred, BufState, Buf),
     Start          = BufState ^ start_offset,
-    InitLexemes    = Lexer ^ lex_compiled_lexemes,
     InitWinnerFunc = initial_winner_func(InitLexemes),
+    InitLexemes    = Lexer ^ lex_compiled_lexemes,
+    InitWinner     = InitWinnerFunc(Start),
     IgnorePred     = Lexer ^ lex_ignore_pred,
     Instance       = lexer_instance(InitLexemes, InitWinnerFunc, InitLexemes,
-                            InitWinnerFunc(Start), BufState, IgnorePred).
+                           InitWinner, BufState, IgnorePred).
 
 %-----------------------------------------------------------------------------%
 
@@ -351,13 +365,14 @@
 %-----------------------------------------------------------------------------%
 
 read(Result, State0, State) :-
+
     lexer_state_args(State0, Instance0, Buf0, Src0),
-    BufState       = Instance0 ^ buf_state,
-    Start          = BufState ^ start_offset,
-    InitWinnerFunc = Instance0 ^ init_winner_func,
-    Instance1      = ( Instance0 ^ current_winner := InitWinnerFunc(Start) ),
+    BufState0  = Instance0 ^ buf_state,
+    Start      = BufState0 ^ start_offset,
+    InitWinner = ( Instance0 ^ init_winner_func )(Start),
+    Instance1  = ( Instance0 ^ current_winner := InitWinner ),
     read_2(Result, Instance1, Instance, Buf0, Buf, Src0, Src),
-    State          = args_lexer_state(Instance, Buf, Src).
+    State      = args_lexer_state(Instance, Buf, Src).
 
 
 
@@ -373,19 +388,15 @@
     %
 read_2(Result, !Instance, !Buf, !Src) :-
 
-    some [!BufState] (
-
-        !:BufState = !.Instance ^ buf_state,
-
-        buf__read(BufReadResult, !BufState, !Buf, !Src),
-        (
-            BufReadResult = ok(Char),
-            process_char(Result, Char, !Instance, !.BufState, !Buf, !Src)
-        ;
-            BufReadResult = eof,
-            process_eof(Result, !Instance, !.BufState, !.Buf)
-        )
+    BufState0 = !.Instance ^ buf_state,
 
+    buf__read(BufReadResult, BufState0, BufState, !Buf, !Src),
+    (
+        BufReadResult = ok(Char),
+        process_char(Result, Char, !Instance, BufState, !Buf, !Src)
+    ;
+        BufReadResult = eof,
+        process_eof(Result, !Instance, BufState, !.Buf)
     ).
 
 %-----------------------------------------------------------------------------%
@@ -425,23 +436,44 @@
             in(lexer_instance), out(lexer_instance),
             in(buf_state), array_di, array_uo, di, uo) is det.
 
-process_any_winner(Result, yes(TokenCreator - Offset), !Instance,
-        BufState0, !Buf, !Src) :-
-
-    BufState   = rewind_cursor(Offset, BufState0),
-    IgnorePred = !.Instance ^ ignore_pred,
-
-    InitWinnerFunc = !.Instance ^ init_winner_func,
-    !:Instance = ((( !.Instance ^ live_lexemes   := !.Instance ^ init_lexemes )
-                                ^ current_winner := InitWinnerFunc(Offset)    )
-                                ^ buf_state      := commit(BufState)          ),
+process_any_winner(Result, yes(TokenCreator - Offset), Instance0, Instance,
+        BufState0, Buf0, Buf, Src0, Src) :-
 
-    ( if
-        get_token_from_buffer(BufState, !.Buf, TokenCreator, IgnorePred, Token)
-      then
-        Result = ok(Token)
+    BufState1  = rewind_cursor(Offset, BufState0),
+    String     = string_to_cursor(BufState1, Buf0),
+    Token      = TokenCreator(String),
+    IgnorePred = Instance0 ^ ignore_pred,
+    InitWinner = ( Instance0 ^ init_winner_func )(Offset),
+    Instance1  = ((( Instance0
+                       ^ live_lexemes       := Instance0 ^ init_lexemes )
+                       ^ current_winner     := InitWinner               )
+                       ^ buf_state          := commit(BufState1)        ),
+
+    ( if IgnorePred(Token) then
+
+            % We have to be careful to avoid an infinite loop here.
+            % If the longest match was the empty string, then the
+            % next char in the input stream cannot start a match,
+            % so it must be reported as an error.
+            %
+        ( if String = "" then
+            buf__read(BufResult, BufState1, BufState, Buf0, Buf, Src0, Src),
+            (
+                BufResult = ok(_),
+                Result    = error("input not matched by any regexp", Offset)
+            ;
+                BufResult = eof,
+                Result    = eof
+            ),
+            Instance = ( Instance1 ^ buf_state := commit(BufState) )
+          else
+            read_2(Result, Instance1, Instance, Buf0, Buf, Src0, Src)
+        )
       else
-        read_2(Result, !Instance, !Buf, !Src)
+        Result   = ok(Token),
+        Instance = Instance1,
+        Buf      = Buf0,
+        Src      = Src0
     ).
 
 process_any_winner(Result, no, !Instance,
@@ -451,9 +483,9 @@
     BufState   = rewind_cursor(Start + 1, BufState0),
     Result     = error("input not matched by any regexp", Start),
 
-    InitWinnerFunc = !.Instance ^ init_winner_func,
+    InitWinner = ( !.Instance ^ init_winner_func )(Start),
     !:Instance = ((( !.Instance ^ live_lexemes   := !.Instance ^ init_lexemes )
-                                ^ current_winner := InitWinnerFunc(Start)     )
+                                ^ current_winner := InitWinner                )
                                 ^ buf_state      := commit(BufState)          ).
 
 %-----------------------------------------------------------------------------%
@@ -464,40 +496,24 @@
 :- mode process_eof(out, in(lexer_instance), out(lexer_instance),
             in(buf_state), array_ui) is det.
 
-process_eof(Result, Instance0, Instance, BufState0, Buf) :-
-
-    ( if
-        Instance0 ^ current_winner = yes(TokenCreator - Offset0),
-
-        IgnorePred = Instance0 ^ ignore_pred,
-        BufState1  = rewind_cursor(Offset0, BufState0),
+process_eof(Result, !Instance, !.BufState, !.Buf) :-
 
-        get_token_from_buffer(BufState1, Buf, TokenCreator, IgnorePred, Token)
-      then
-        Offset     = Offset0,
-        BufState   = BufState1,
-        Result     = ok(Token)
-      else
-        Offset     = BufState0 ^ start_offset,
-        BufState   = BufState0,
-        Result     = eof
+    CurrentWinner = !.Instance ^ current_winner,
+    (
+        CurrentWinner = no,
+        Offset        = !.BufState ^ cursor_offset,
+        Result        = eof
+    ;
+        CurrentWinner = yes(TokenCreator - Offset),
+        String        = string_to_cursor(!.BufState, !.Buf),
+        Token         = TokenCreator(String),
+        IgnorePred    = !.Instance ^ ignore_pred,
+        Result        = ( if IgnorePred(Token) then eof else ok(Token) )
     ),
-
-    InitWinnerFunc = Instance0 ^ init_winner_func,
-    Instance   = ((( Instance0 ^ live_lexemes   := Instance0 ^ init_lexemes )
-                               ^ current_winner := InitWinnerFunc(Offset)   )
-                               ^ buf_state      := commit(BufState)         ).
-
-%-----------------------------------------------------------------------------%
-
-:- pred get_token_from_buffer(buf_state(Src), buf,
-            token_creator(Tok), ignore_pred(Tok), Tok).
-:- mode get_token_from_buffer(in(buf_state), array_ui,
-            in(token_creator), in(ignore_pred), out) is semidet.
-
-get_token_from_buffer(BufState, Buf, TokenCreator, IgnorePred, Token) :-
-    Token = TokenCreator(string_to_cursor(BufState, Buf)),
-    not IgnorePred(Token).
+    InitWinner = ( !.Instance ^ init_winner_func )(Offset),
+    !:Instance = ((( !.Instance ^ live_lexemes   := !.Instance ^ init_lexemes )
+                                ^ current_winner := InitWinner                )
+                                ^ buf_state      := commit(!.BufState)        ).
 
 %-----------------------------------------------------------------------------%
 
@@ -596,17 +612,13 @@
 
 read_char(Result, !State) :-
 
-    some [!Instance, !Buf, !Src, !BufState] (
+    lexer_state_args(!.State, Instance0, Buf0, Src0),
 
-        lexer_state_args(!.State, !:Instance, !:Buf, !:Src),
+    BufState0 = Instance0 ^ buf_state,
+    buf__read(Result, BufState0, BufState, Buf0, Buf, Src0, Src),
+    Instance  = ( Instance0 ^ buf_state := commit(BufState) ),
 
-        !:BufState = !.Instance ^ buf_state,
-        buf__read(Result, !BufState, !Buf, !Src),
-        !:Instance = !.Instance ^ buf_state := commit(!.BufState),
-
-        !:State = args_lexer_state(!.Instance, !.Buf, !.Src)
-
-    ).
+    !:State = args_lexer_state(Instance, Buf, Src).
 
 %-----------------------------------------------------------------------------%
 
diff -u regex.m regex.m
--- regex.m	22 Nov 2002 05:36:20 -0000
+++ regex.m	28 Nov 2002 02:52:50 -0000
@@ -1,9 +1,13 @@
 %-----------------------------------------------------------------------------%
 % regex.m
 % Ralph Becket <rafe at cs.mu.oz.au>
-% Tue Nov 19 13:01:52 EST 2002
+% Copyright (C) 2002 The University of Melbourne
 % vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
 %
+% This module provides basic string matching and search and replace
+% functionality using regular expressions defined as strings of the
+% form recognised by tools such as sed and grep.
+%
 % TODO
 % - Add <regex>{n[,m]} regexps.
 % - Add character classes (e.g. [:space:]) to sets.
@@ -363,23 +367,26 @@
 :- func rpar(string, list(re)) = list(re).
 
 rpar(S, REs) =
-    (      if REs = [alt, lpar               | REs0]
-      then    [nil                           | REs0]
+    (      if REs = [chars(RE)              | REs0]
+      then    rpar(S, [re(RE)               | REs0])
+
+      else if REs = [alt, lpar              | REs0]
+      then    [nil                          | REs0]
 
-      else if REs = [RE_A, alt, lpar         | REs0]
-      then    [alt(nil, RE_A)                | REs0]
+      else if REs = [RE_A, alt, lpar        | REs0]
+      then    [alt(nil, RE_A)               | REs0]
 
-      else if REs = [RE_A, alt, RE_B         | REs0]
-      then    rpar(S, [alt(RE_B, RE_A)       | REs0])
+      else if REs = [RE_A, alt, RE_B        | REs0]
+      then    rpar(S, [alt(RE_B, RE_A)      | REs0])
 
-      else if REs = [lpar                    | REs0]
-      then    [nil                           | REs0]
+      else if REs = [lpar                   | REs0]
+      then    [nil                          | REs0]
 
-      else if REs = [RE, lpar                | REs0]
-      then    [RE                            | REs0]
+      else if REs = [RE, lpar               | REs0]
+      then    [RE                           | REs0]
 
-      else if REs = [RE_A, RE_B              | REs0]
-      then    rpar(S, [concat(RE_B, RE_A)    | REs0])
+      else if REs = [RE_A, RE_B             | REs0]
+      then    rpar(S, [concat(RE_B, RE_A)   | REs0])
 
       else    regex_error("`)' without opening `('", S)
     ).
@@ -436,22 +443,21 @@
     % We have to keep trying successive suffixes of String until
     % we find a complete match.
     %
-right_match(Regex, String, Substring, Start, Count) :-
-    right_match_2(Regex, String, 0, length(String), Substring, Start, Count).
+right_match(Regex, String, Substring, Start, length(Substring)) :-
+    right_match_2(Regex, String, 0, length(String), Substring, Start).
 
 
-:- pred right_match_2(regex,     string, int, int, string, int, int).
-:- mode right_match_2(in(regex), in,     in,  in,  out,    out, out) is semidet.
+:- pred right_match_2(regex,     string, int, int, string, int).
+:- mode right_match_2(in(regex), in,     in,  in,  out,    out) is semidet.
 
-right_match_2(Regex, String, I, Length, Substring, Start, Count) :-
+right_match_2(Regex, String, I, Length, Substring, Start) :-
     I =< Length,
-    Substring0 = substring(String, I, Length),
+    Substring0 = substring(String, I, max_int),
     ( if exact_match(Regex, Substring0) then
         Substring = Substring0,
-        Start     = I,
-        Count     = Length
+        Start     = I
       else
-        right_match_2(Regex, String, I + 1, Length - 1, Substring, Start, Count)
+        right_match_2(Regex, String, I + 1, Length, Substring, Start)
     ).
 
 %-----------------------------------------------------------------------------%
@@ -465,7 +471,7 @@
 :- mode first_match_2(out,    out, di         ) is semidet.
 
 first_match_2(Substring, Start, !.State) :-
-    offset_from_start(Start0, !State),
+    lex__offset_from_start(Start0, !State),
     lex__read(Result,         !State),
     (
         Result = error(_, _),
@@ -485,39 +491,47 @@
 
 matches(Regex, String) = Matches :-
     State   = start(Regex, unsafe_promise_unique(String)),
-    Matches = matches_2(State).
+    Matches = matches_2(length(String), State).
 
 
-:- func matches_2(lexer_state) = list({string, int, int}).
-:- mode matches_2(di)          = out is det.
+:- func matches_2(int, lexer_state) = list({string, int, int}).
+:- mode matches_2(in,  di)          = out is det.
 
-matches_2(State0) = Matches :-
-    offset_from_start(Start0, State0, State1),
+matches_2(Length, State0) = Matches :-
+    lex__offset_from_start(Start0, State0, State1),
     lex__read(Result, State1, State2),
     (
         Result  = eof,
         Matches = []
     ;
         Result  = error(_, _),
-        Matches = matches_2(State2)
+        Matches = matches_2(Length, State2)
     ;
         Result  = ok(Substring),
+        lex__offset_from_start(End, State2, State3),
         Start   = Start0,
-        Count   = length(Substring),
+        Count   = End - Start,
 
             % If we matched the empty string then we have to advance
-            % at least one char.  Finish if we get eof.
+            % at least one char (and finish if we get eof.)
+            %
+            % If we've reached the end of the input then also finish
+            % (this avoids the situation where, say, ".*" produces
+            % two matches for "foo" - "foo" and the notional null string
+            % at the end.)
             %
         Matches =
             [ {Substring, Start, Count} |
-              ( if Count = 0 then
-                  ( if lex__read_char(ok(_), State2, State3) then
-                      matches_2(State3)
+              ( if End = Length then
+                    []
+                else if Count = 0 then
+                  ( if lex__read_char(ok(_), State3, State4) then
+                      matches_2(Length, State4)
                     else
                       []
                   )
                 else
-                  matches_2(State2)
+                  matches_2(Length, State3)
               )
             ]
         ).
reverted:
--- test_regex.m	22 Nov 2002 05:44:14 -0000
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,94 +0,0 @@
-%-----------------------------------------------------------------------------%
-% test_regex.m
-% Ralph Becket <rafe at cs.mu.oz.au>
-% Thu Nov 21 15:33:48 EST 2002
-% vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
-%
-%-----------------------------------------------------------------------------%
-
-:- module test_regex.
-
-:- interface.
-
-:- import_module io.
-
-
-
-:- pred main(io::di, io::uo) is det.
-
-%-----------------------------------------------------------------------------%
-%-----------------------------------------------------------------------------%
-
-:- implementation.
-
-:- import_module int, string, list.
-:- import_module lex, regex.
-
-%-----------------------------------------------------------------------------%
-
-main(!IO) :-
-    S = "(xy?ab+[0-9]*)|[aeiouw-z]",
-    io__format("parsing against regex(\"%s\")\n", [s(S)], !IO),
-    loop(regex(S), !IO).
-
-:- pred loop(regex, io, io).
-:- mode loop(in(regex), di, uo) is det.
-
-loop(R, !IO) :-
-    io__format("\n> ", [], !IO),
-    io__read_line_as_string(Res, !IO),
-    (
-        Res = eof
-    ;
-        Res = error(_),
-        io__format("*** error: ", [], !IO),
-        io__print(Res, !IO),
-        io__format(" ***\n", [], !IO)
-    ;
-        Res = ok(S0),
-        S   = chomp(S0),
-        ( if M = matches(R, S), M \= [] then
-            io__format("all matches             : ", [], !IO),
-            io__print(matches(R, S), !IO),
-            io__nl(!IO),
-
-            io__format("replace_first with `<>' : \"%s\"\n",
-                [s(replace_first(R, "<>", S))], !IO),
-
-            io__format("replace_all with `<>'   : \"%s\"\n",
-                [s(replace_all(R, "<>", S))], !IO),
-
-            ChgFn = (func(Str) = append_list(["<", Str, ">"])),
-
-            io__format("change_first to `<&>'   : \"%s\"\n",
-                [s(change_first(R, ChgFn, S))], !IO),
-
-            io__format("change_all to `<&>'     : \"%s\"\n",
-                [s(change_all(R, ChgFn, S))], !IO)
-
-          else true
-        ),
-        ( if left_match(R, S, LSub, LS, LC) then
-            io__format("left match              : {\"%s\", %d, %d}\n",
-                    [s(LSub), i(LS), i(LC)], !IO)
-          else true
-        ),
-        ( if right_match(R, S, RSub, RS, RC) then
-            io__format("right match             : {\"%s\", %d, %d}\n",
-                    [s(RSub), i(RS), i(RC)], !IO)
-          else true
-        ),
-        ( if first_match(R, S, FSub, FS, FC) then
-            io__format("first match             : {\"%s\", %d, %d}\n",
-                    [s(FSub), i(FS), i(FC)], !IO)
-          else true
-        ),
-        loop(R, !IO)
-    ).
-
-:- func chomp(string) = string.
-
-chomp(S) = ( if string__remove_suffix(S, "\n", T) then T else S ).
-
-%-----------------------------------------------------------------------------%
-%-----------------------------------------------------------------------------%
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/test_regex.m	28 Nov 2002 02:38:37 -0000
@@ -0,0 +1,107 @@
+%-----------------------------------------------------------------------------%
+% test_regex.m
+% Ralph Becket <rafe at cs.mu.oz.au>
+% Thu Nov 21 15:33:48 EST 2002
+% vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
+%
+%-----------------------------------------------------------------------------%
+
+:- module test_regex.
+
+:- interface.
+
+:- import_module io.
+
+
+
+:- pred main(io::di, io::uo) is det.
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+
+:- implementation.
+
+:- import_module int, string, list, exception.
+:- import_module lex, regex.
+
+:- type op
+    --->    set_regex(string)
+    ;       try_match(string).
+
+%-----------------------------------------------------------------------------%
+
+main(!IO) :-
+    loop(regex("<don't go here>"), !IO).
+
+:- pred loop(regex, io, io).
+:- mode loop(in(regex), di, uo) is det.
+
+loop(R, !IO) :-
+    io__read(Res, !IO),
+    (
+        Res = eof
+    ;
+        Res = error(_, _),
+        throw(Res)
+    ;
+        Res = ok(Op),
+        (
+            Op = set_regex(S),
+            io__format("\n\n* Matching against \"%s\"\n", [s(S)], !IO),
+            loop(regex(S), !IO)
+        ;
+            Op = try_match(S),
+            io__format("\n> \"%s\"\n", [s(S)], !IO),
+            M = matches(R, S),
+
+            io__format("all matches             : ", [], !IO),
+            io__print(matches(R, S), !IO),
+            io__nl(!IO),
+
+            ( if M \= [] then
+
+                io__format("replace_first with `<>' : \"%s\"\n",
+                    [s(replace_first(R, "<>", S))], !IO),
+
+                io__format("replace_all with `<>'   : \"%s\"\n",
+                    [s(replace_all(R, "<>", S))], !IO),
+
+                ChgFn = (func(Str) = append_list(["<", Str, ">"])),
+
+                io__format("change_first to `<&>'   : \"%s\"\n",
+                    [s(change_first(R, ChgFn, S))], !IO),
+
+                io__format("change_all to `<&>'     : \"%s\"\n",
+                    [s(change_all(R, ChgFn, S))], !IO)
+
+              else true
+            ),
+            ( if exact_match(R, S) then
+                io__format("exact match\n", [], !IO)
+              else true
+            ),
+            ( if left_match(R, S, LSub, LS, LC) then
+                io__format("left match              : {\"%s\", %d, %d}\n",
+                        [s(LSub), i(LS), i(LC)], !IO)
+              else true
+            ),
+            ( if right_match(R, S, RSub, RS, RC) then
+                io__format("right match             : {\"%s\", %d, %d}\n",
+                        [s(RSub), i(RS), i(RC)], !IO)
+              else true
+            ),
+            ( if first_match(R, S, FSub, FS, FC) then
+                io__format("first match             : {\"%s\", %d, %d}\n",
+                        [s(FSub), i(FS), i(FC)], !IO)
+              else true
+            ),
+            loop(R, !IO)
+        )
+    ).
+
+:- func chomp(string) = string.
+
+chomp(S) = ( if string__remove_suffix(S, "\n", T) then T else S ).
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/test_regex.in	28 Nov 2002 02:14:06 -0000
@@ -0,0 +1,120 @@
+set_regex("a").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("ab").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("ab|ad").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("a*").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("aa*").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("a+").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("aa+").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("a?").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("aa?").
+try_match("aardvark").
+try_match("xaardvark").
+try_match("aardvarkx").
+try_match("foo").
+
+set_regex("(ab|ad)+").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("[abcd]").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("[ab-d]").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("[]]").
+try_match("]foo[").
+try_match("[foo]").
+try_match("foo").
+
+set_regex("[[-]]").
+try_match("]foo[").
+try_match("[foo]").
+try_match("foo").
+
+set_regex("\\[").
+try_match("]foo[").
+try_match("[foo]").
+try_match("foo").
+
+set_regex("[^abcd]").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("[^ab-d]").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex("[^]]").
+try_match("]foo[").
+try_match("[foo]").
+try_match("foo").
+
+set_regex("[^[-]]").
+try_match("]foo[").
+try_match("[foo]").
+try_match("foo").
+
+set_regex(".*").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
+
+set_regex(".").
+try_match("abracadabra").
+try_match("xabracadabra").
+try_match("abracadabrax").
+try_match("foo").
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/test_regex.exp	26 Nov 2002 08:45:41 -0000
@@ -0,0 +1,736 @@
+
+
+* Matching against "a"
+
+> "abracadabra"
+all matches             : [{"a", 0, 1}, {"a", 3, 1}, {"a", 5, 1}, {"a", 7, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabra"
+replace_all with `<>'   : "<>br<>c<>d<>br<>"
+change_first to `<&>'   : "<a>bracadabra"
+change_all to `<&>'     : "<a>br<a>c<a>d<a>br<a>"
+left match              : {"a", 0, 1}
+right match             : {"a", 10, 1}
+first match             : {"a", 0, 1}
+
+> "xabracadabra"
+all matches             : [{"a", 1, 1}, {"a", 4, 1}, {"a", 6, 1}, {"a", 8, 1}, {"a", 11, 1}]
+replace_first with `<>' : "x<>bracadabra"
+replace_all with `<>'   : "x<>br<>c<>d<>br<>"
+change_first to `<&>'   : "x<a>bracadabra"
+change_all to `<&>'     : "x<a>br<a>c<a>d<a>br<a>"
+right match             : {"a", 11, 1}
+first match             : {"a", 1, 1}
+
+> "abracadabrax"
+all matches             : [{"a", 0, 1}, {"a", 3, 1}, {"a", 5, 1}, {"a", 7, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabrax"
+replace_all with `<>'   : "<>br<>c<>d<>br<>x"
+change_first to `<&>'   : "<a>bracadabrax"
+change_all to `<&>'     : "<a>br<a>c<a>d<a>br<a>x"
+left match              : {"a", 0, 1}
+first match             : {"a", 0, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "ab"
+
+> "abracadabra"
+all matches             : [{"ab", 0, 2}, {"ab", 7, 2}]
+replace_first with `<>' : "<>racadabra"
+replace_all with `<>'   : "<>racad<>ra"
+change_first to `<&>'   : "<ab>racadabra"
+change_all to `<&>'     : "<ab>racad<ab>ra"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "xabracadabra"
+all matches             : [{"ab", 1, 2}, {"ab", 8, 2}]
+replace_first with `<>' : "x<>racadabra"
+replace_all with `<>'   : "x<>racad<>ra"
+change_first to `<&>'   : "x<ab>racadabra"
+change_all to `<&>'     : "x<ab>racad<ab>ra"
+first match             : {"ab", 1, 2}
+
+> "abracadabrax"
+all matches             : [{"ab", 0, 2}, {"ab", 7, 2}]
+replace_first with `<>' : "<>racadabrax"
+replace_all with `<>'   : "<>racad<>rax"
+change_first to `<&>'   : "<ab>racadabrax"
+change_all to `<&>'     : "<ab>racad<ab>rax"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "ab|ad"
+
+> "abracadabra"
+all matches             : [{"ab", 0, 2}, {"ad", 5, 2}, {"ab", 7, 2}]
+replace_first with `<>' : "<>racadabra"
+replace_all with `<>'   : "<>rac<><>ra"
+change_first to `<&>'   : "<ab>racadabra"
+change_all to `<&>'     : "<ab>rac<ad><ab>ra"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "xabracadabra"
+all matches             : [{"ab", 1, 2}, {"ad", 6, 2}, {"ab", 8, 2}]
+replace_first with `<>' : "x<>racadabra"
+replace_all with `<>'   : "x<>rac<><>ra"
+change_first to `<&>'   : "x<ab>racadabra"
+change_all to `<&>'     : "x<ab>rac<ad><ab>ra"
+first match             : {"ab", 1, 2}
+
+> "abracadabrax"
+all matches             : [{"ab", 0, 2}, {"ad", 5, 2}, {"ab", 7, 2}]
+replace_first with `<>' : "<>racadabrax"
+replace_all with `<>'   : "<>rac<><>rax"
+change_first to `<&>'   : "<ab>racadabrax"
+change_all to `<&>'     : "<ab>rac<ad><ab>rax"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "a*"
+
+> "aardvark"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"a", 5, 1}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}]
+replace_first with `<>' : "<>rdvark"
+replace_all with `<>'   : "<><>r<>d<>v<><>r<>k<>"
+change_first to `<&>'   : "<aa>rdvark"
+change_all to `<&>'     : "<aa><>r<>d<>v<a><>r<>k<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 8, 0}
+first match             : {"aa", 0, 2}
+
+> "xaardvark"
+all matches             : [{"", 0, 0}, {"aa", 1, 2}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"a", 6, 1}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>xaardvark"
+replace_all with `<>'   : "<>x<><>r<>d<>v<><>r<>k<>"
+change_first to `<&>'   : "<>xaardvark"
+change_all to `<&>'     : "<>x<aa><>r<>d<>v<a><>r<>k<>"
+left match              : {"", 0, 0}
+right match             : {"", 9, 0}
+first match             : {"", 0, 0}
+
+> "aardvarkx"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"a", 5, 1}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>rdvarkx"
+replace_all with `<>'   : "<><>r<>d<>v<><>r<>k<>x<>"
+change_first to `<&>'   : "<aa>rdvarkx"
+change_all to `<&>'     : "<aa><>r<>d<>v<a><>r<>k<>x<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 9, 0}
+first match             : {"aa", 0, 2}
+
+> "foo"
+all matches             : [{"", 0, 0}, {"", 1, 0}, {"", 2, 0}, {"", 3, 0}]
+replace_first with `<>' : "<>foo"
+replace_all with `<>'   : "<>f<>o<>o<>"
+change_first to `<&>'   : "<>foo"
+change_all to `<&>'     : "<>f<>o<>o<>"
+left match              : {"", 0, 0}
+right match             : {"", 3, 0}
+first match             : {"", 0, 0}
+
+
+* Matching against "aa*"
+
+> "aardvark"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}]
+replace_first with `<>' : "<>rdvark"
+replace_all with `<>'   : "<><>r<>d<>v<>a<>r<>k<>"
+change_first to `<&>'   : "<aa>rdvark"
+change_all to `<&>'     : "<aa><>r<>d<>v<>a<>r<>k<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 8, 0}
+first match             : {"aa", 0, 2}
+
+> "xaardvark"
+all matches             : [{"", 0, 0}, {"aa", 1, 2}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>xaardvark"
+replace_all with `<>'   : "<>x<><>r<>d<>v<>a<>r<>k<>"
+change_first to `<&>'   : "<>xaardvark"
+change_all to `<&>'     : "<>x<aa><>r<>d<>v<>a<>r<>k<>"
+left match              : {"", 0, 0}
+right match             : {"", 9, 0}
+first match             : {"", 0, 0}
+
+> "aardvarkx"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>rdvarkx"
+replace_all with `<>'   : "<><>r<>d<>v<>a<>r<>k<>x<>"
+change_first to `<&>'   : "<aa>rdvarkx"
+change_all to `<&>'     : "<aa><>r<>d<>v<>a<>r<>k<>x<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 9, 0}
+first match             : {"aa", 0, 2}
+
+> "foo"
+all matches             : [{"", 0, 0}, {"", 1, 0}, {"", 2, 0}, {"", 3, 0}]
+replace_first with `<>' : "<>foo"
+replace_all with `<>'   : "<>f<>o<>o<>"
+change_first to `<&>'   : "<>foo"
+change_all to `<&>'     : "<>f<>o<>o<>"
+left match              : {"", 0, 0}
+right match             : {"", 3, 0}
+first match             : {"", 0, 0}
+
+
+* Matching against "a+"
+
+> "aardvark"
+all matches             : [{"aa", 0, 2}, {"a", 5, 1}]
+replace_first with `<>' : "<>rdvark"
+replace_all with `<>'   : "<>rdv<>rk"
+change_first to `<&>'   : "<aa>rdvark"
+change_all to `<&>'     : "<aa>rdv<a>rk"
+left match              : {"aa", 0, 2}
+first match             : {"aa", 0, 2}
+
+> "xaardvark"
+all matches             : [{"aa", 1, 2}, {"a", 6, 1}]
+replace_first with `<>' : "x<>rdvark"
+replace_all with `<>'   : "x<>rdv<>rk"
+change_first to `<&>'   : "x<aa>rdvark"
+change_all to `<&>'     : "x<aa>rdv<a>rk"
+first match             : {"aa", 1, 2}
+
+> "aardvarkx"
+all matches             : [{"aa", 0, 2}, {"a", 5, 1}]
+replace_first with `<>' : "<>rdvarkx"
+replace_all with `<>'   : "<>rdv<>rkx"
+change_first to `<&>'   : "<aa>rdvarkx"
+change_all to `<&>'     : "<aa>rdv<a>rkx"
+left match              : {"aa", 0, 2}
+first match             : {"aa", 0, 2}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "aa+"
+
+> "aardvark"
+all matches             : [{"aa", 0, 2}]
+replace_first with `<>' : "<>rdvark"
+replace_all with `<>'   : "<>rdvark"
+change_first to `<&>'   : "<aa>rdvark"
+change_all to `<&>'     : "<aa>rdvark"
+left match              : {"aa", 0, 2}
+first match             : {"aa", 0, 2}
+
+> "xaardvark"
+all matches             : [{"aa", 1, 2}]
+replace_first with `<>' : "x<>rdvark"
+replace_all with `<>'   : "x<>rdvark"
+change_first to `<&>'   : "x<aa>rdvark"
+change_all to `<&>'     : "x<aa>rdvark"
+first match             : {"aa", 1, 2}
+
+> "aardvarkx"
+all matches             : [{"aa", 0, 2}]
+replace_first with `<>' : "<>rdvarkx"
+replace_all with `<>'   : "<>rdvarkx"
+change_first to `<&>'   : "<aa>rdvarkx"
+change_all to `<&>'     : "<aa>rdvarkx"
+left match              : {"aa", 0, 2}
+first match             : {"aa", 0, 2}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "a?"
+
+> "aardvark"
+all matches             : [{"a", 0, 1}, {"a", 1, 1}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"a", 5, 1}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}]
+replace_first with `<>' : "<>ardvark"
+replace_all with `<>'   : "<><><>r<>d<>v<><>r<>k<>"
+change_first to `<&>'   : "<a>ardvark"
+change_all to `<&>'     : "<a><a><>r<>d<>v<a><>r<>k<>"
+left match              : {"a", 0, 1}
+right match             : {"", 8, 0}
+first match             : {"a", 0, 1}
+
+> "xaardvark"
+all matches             : [{"", 0, 0}, {"a", 1, 1}, {"a", 2, 1}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"a", 6, 1}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>xaardvark"
+replace_all with `<>'   : "<>x<><><>r<>d<>v<><>r<>k<>"
+change_first to `<&>'   : "<>xaardvark"
+change_all to `<&>'     : "<>x<a><a><>r<>d<>v<a><>r<>k<>"
+left match              : {"", 0, 0}
+right match             : {"", 9, 0}
+first match             : {"", 0, 0}
+
+> "aardvarkx"
+all matches             : [{"a", 0, 1}, {"a", 1, 1}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"a", 5, 1}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>ardvarkx"
+replace_all with `<>'   : "<><><>r<>d<>v<><>r<>k<>x<>"
+change_first to `<&>'   : "<a>ardvarkx"
+change_all to `<&>'     : "<a><a><>r<>d<>v<a><>r<>k<>x<>"
+left match              : {"a", 0, 1}
+right match             : {"", 9, 0}
+first match             : {"a", 0, 1}
+
+> "foo"
+all matches             : [{"", 0, 0}, {"", 1, 0}, {"", 2, 0}, {"", 3, 0}]
+replace_first with `<>' : "<>foo"
+replace_all with `<>'   : "<>f<>o<>o<>"
+change_first to `<&>'   : "<>foo"
+change_all to `<&>'     : "<>f<>o<>o<>"
+left match              : {"", 0, 0}
+right match             : {"", 3, 0}
+first match             : {"", 0, 0}
+
+
+* Matching against "aa?"
+
+> "aardvark"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}]
+replace_first with `<>' : "<>rdvark"
+replace_all with `<>'   : "<><>r<>d<>v<>a<>r<>k<>"
+change_first to `<&>'   : "<aa>rdvark"
+change_all to `<&>'     : "<aa><>r<>d<>v<>a<>r<>k<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 8, 0}
+first match             : {"aa", 0, 2}
+
+> "xaardvark"
+all matches             : [{"", 0, 0}, {"aa", 1, 2}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>xaardvark"
+replace_all with `<>'   : "<>x<><>r<>d<>v<>a<>r<>k<>"
+change_first to `<&>'   : "<>xaardvark"
+change_all to `<&>'     : "<>x<aa><>r<>d<>v<>a<>r<>k<>"
+left match              : {"", 0, 0}
+right match             : {"", 9, 0}
+first match             : {"", 0, 0}
+
+> "aardvarkx"
+all matches             : [{"aa", 0, 2}, {"", 2, 0}, {"", 3, 0}, {"", 4, 0}, {"", 5, 0}, {"", 6, 0}, {"", 7, 0}, {"", 8, 0}, {"", 9, 0}]
+replace_first with `<>' : "<>rdvarkx"
+replace_all with `<>'   : "<><>r<>d<>v<>a<>r<>k<>x<>"
+change_first to `<&>'   : "<aa>rdvarkx"
+change_all to `<&>'     : "<aa><>r<>d<>v<>a<>r<>k<>x<>"
+left match              : {"aa", 0, 2}
+right match             : {"", 9, 0}
+first match             : {"aa", 0, 2}
+
+> "foo"
+all matches             : [{"", 0, 0}, {"", 1, 0}, {"", 2, 0}, {"", 3, 0}]
+replace_first with `<>' : "<>foo"
+replace_all with `<>'   : "<>f<>o<>o<>"
+change_first to `<&>'   : "<>foo"
+change_all to `<&>'     : "<>f<>o<>o<>"
+left match              : {"", 0, 0}
+right match             : {"", 3, 0}
+first match             : {"", 0, 0}
+
+
+* Matching against "(ab|ad)+"
+
+> "abracadabra"
+all matches             : [{"ab", 0, 2}, {"adab", 5, 4}]
+replace_first with `<>' : "<>racadabra"
+replace_all with `<>'   : "<>rac<>ra"
+change_first to `<&>'   : "<ab>racadabra"
+change_all to `<&>'     : "<ab>rac<adab>ra"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "xabracadabra"
+all matches             : [{"ab", 1, 2}, {"adab", 6, 4}]
+replace_first with `<>' : "x<>racadabra"
+replace_all with `<>'   : "x<>rac<>ra"
+change_first to `<&>'   : "x<ab>racadabra"
+change_all to `<&>'     : "x<ab>rac<adab>ra"
+first match             : {"ab", 1, 2}
+
+> "abracadabrax"
+all matches             : [{"ab", 0, 2}, {"adab", 5, 4}]
+replace_first with `<>' : "<>racadabrax"
+replace_all with `<>'   : "<>rac<>rax"
+change_first to `<&>'   : "<ab>racadabrax"
+change_all to `<&>'     : "<ab>rac<adab>rax"
+left match              : {"ab", 0, 2}
+first match             : {"ab", 0, 2}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "[abcd]"
+
+> "abracadabra"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabra"
+replace_all with `<>'   : "<><>r<><><><><><>r<>"
+change_first to `<&>'   : "<a>bracadabra"
+change_all to `<&>'     : "<a><b>r<a><c><a><d><a><b>r<a>"
+left match              : {"a", 0, 1}
+right match             : {"a", 10, 1}
+first match             : {"a", 0, 1}
+
+> "xabracadabra"
+all matches             : [{"a", 1, 1}, {"b", 2, 1}, {"a", 4, 1}, {"c", 5, 1}, {"a", 6, 1}, {"d", 7, 1}, {"a", 8, 1}, {"b", 9, 1}, {"a", 11, 1}]
+replace_first with `<>' : "x<>bracadabra"
+replace_all with `<>'   : "x<><>r<><><><><><>r<>"
+change_first to `<&>'   : "x<a>bracadabra"
+change_all to `<&>'     : "x<a><b>r<a><c><a><d><a><b>r<a>"
+right match             : {"a", 11, 1}
+first match             : {"a", 1, 1}
+
+> "abracadabrax"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabrax"
+replace_all with `<>'   : "<><>r<><><><><><>r<>x"
+change_first to `<&>'   : "<a>bracadabrax"
+change_all to `<&>'     : "<a><b>r<a><c><a><d><a><b>r<a>x"
+left match              : {"a", 0, 1}
+first match             : {"a", 0, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "[ab-d]"
+
+> "abracadabra"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabra"
+replace_all with `<>'   : "<><>r<><><><><><>r<>"
+change_first to `<&>'   : "<a>bracadabra"
+change_all to `<&>'     : "<a><b>r<a><c><a><d><a><b>r<a>"
+left match              : {"a", 0, 1}
+right match             : {"a", 10, 1}
+first match             : {"a", 0, 1}
+
+> "xabracadabra"
+all matches             : [{"a", 1, 1}, {"b", 2, 1}, {"a", 4, 1}, {"c", 5, 1}, {"a", 6, 1}, {"d", 7, 1}, {"a", 8, 1}, {"b", 9, 1}, {"a", 11, 1}]
+replace_first with `<>' : "x<>bracadabra"
+replace_all with `<>'   : "x<><>r<><><><><><>r<>"
+change_first to `<&>'   : "x<a>bracadabra"
+change_all to `<&>'     : "x<a><b>r<a><c><a><d><a><b>r<a>"
+right match             : {"a", 11, 1}
+first match             : {"a", 1, 1}
+
+> "abracadabrax"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabrax"
+replace_all with `<>'   : "<><>r<><><><><><>r<>x"
+change_first to `<&>'   : "<a>bracadabrax"
+change_all to `<&>'     : "<a><b>r<a><c><a><d><a><b>r<a>x"
+left match              : {"a", 0, 1}
+first match             : {"a", 0, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "[]]"
+
+> "]foo["
+all matches             : [{"]", 0, 1}]
+replace_first with `<>' : "<>foo["
+replace_all with `<>'   : "<>foo["
+change_first to `<&>'   : "<]>foo["
+change_all to `<&>'     : "<]>foo["
+left match              : {"]", 0, 1}
+first match             : {"]", 0, 1}
+
+> "[foo]"
+all matches             : [{"]", 4, 1}]
+replace_first with `<>' : "[foo<>"
+replace_all with `<>'   : "[foo<>"
+change_first to `<&>'   : "[foo<]>"
+change_all to `<&>'     : "[foo<]>"
+right match             : {"]", 4, 1}
+first match             : {"]", 4, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "[[-]]"
+
+> "]foo["
+all matches             : [{"]", 0, 1}, {"[", 4, 1}]
+replace_first with `<>' : "<>foo["
+replace_all with `<>'   : "<>foo<>"
+change_first to `<&>'   : "<]>foo["
+change_all to `<&>'     : "<]>foo<[>"
+left match              : {"]", 0, 1}
+right match             : {"[", 4, 1}
+first match             : {"]", 0, 1}
+
+> "[foo]"
+all matches             : [{"[", 0, 1}, {"]", 4, 1}]
+replace_first with `<>' : "<>foo]"
+replace_all with `<>'   : "<>foo<>"
+change_first to `<&>'   : "<[>foo]"
+change_all to `<&>'     : "<[>foo<]>"
+left match              : {"[", 0, 1}
+right match             : {"]", 4, 1}
+first match             : {"[", 0, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "\["
+
+> "]foo["
+all matches             : [{"[", 4, 1}]
+replace_first with `<>' : "]foo<>"
+replace_all with `<>'   : "]foo<>"
+change_first to `<&>'   : "]foo<[>"
+change_all to `<&>'     : "]foo<[>"
+right match             : {"[", 4, 1}
+first match             : {"[", 4, 1}
+
+> "[foo]"
+all matches             : [{"[", 0, 1}]
+replace_first with `<>' : "<>foo]"
+replace_all with `<>'   : "<>foo]"
+change_first to `<&>'   : "<[>foo]"
+change_all to `<&>'     : "<[>foo]"
+left match              : {"[", 0, 1}
+first match             : {"[", 0, 1}
+
+> "foo"
+all matches             : []
+
+
+* Matching against "[^abcd]"
+
+> "abracadabra"
+all matches             : [{"r", 2, 1}, {"r", 9, 1}]
+replace_first with `<>' : "ab<>acadabra"
+replace_all with `<>'   : "ab<>acadab<>a"
+change_first to `<&>'   : "ab<r>acadabra"
+change_all to `<&>'     : "ab<r>acadab<r>a"
+first match             : {"r", 2, 1}
+
+> "xabracadabra"
+all matches             : [{"x", 0, 1}, {"r", 3, 1}, {"r", 10, 1}]
+replace_first with `<>' : "<>abracadabra"
+replace_all with `<>'   : "<>ab<>acadab<>a"
+change_first to `<&>'   : "<x>abracadabra"
+change_all to `<&>'     : "<x>ab<r>acadab<r>a"
+left match              : {"x", 0, 1}
+first match             : {"x", 0, 1}
+
+> "abracadabrax"
+all matches             : [{"r", 2, 1}, {"r", 9, 1}, {"x", 11, 1}]
+replace_first with `<>' : "ab<>acadabrax"
+replace_all with `<>'   : "ab<>acadab<>a<>"
+change_first to `<&>'   : "ab<r>acadabrax"
+change_all to `<&>'     : "ab<r>acadab<r>a<x>"
+right match             : {"x", 11, 1}
+first match             : {"r", 2, 1}
+
+> "foo"
+all matches             : [{"f", 0, 1}, {"o", 1, 1}, {"o", 2, 1}]
+replace_first with `<>' : "<>oo"
+replace_all with `<>'   : "<><><>"
+change_first to `<&>'   : "<f>oo"
+change_all to `<&>'     : "<f><o><o>"
+left match              : {"f", 0, 1}
+right match             : {"o", 2, 1}
+first match             : {"f", 0, 1}
+
+
+* Matching against "[^ab-d]"
+
+> "abracadabra"
+all matches             : [{"r", 2, 1}, {"r", 9, 1}]
+replace_first with `<>' : "ab<>acadabra"
+replace_all with `<>'   : "ab<>acadab<>a"
+change_first to `<&>'   : "ab<r>acadabra"
+change_all to `<&>'     : "ab<r>acadab<r>a"
+first match             : {"r", 2, 1}
+
+> "xabracadabra"
+all matches             : [{"x", 0, 1}, {"r", 3, 1}, {"r", 10, 1}]
+replace_first with `<>' : "<>abracadabra"
+replace_all with `<>'   : "<>ab<>acadab<>a"
+change_first to `<&>'   : "<x>abracadabra"
+change_all to `<&>'     : "<x>ab<r>acadab<r>a"
+left match              : {"x", 0, 1}
+first match             : {"x", 0, 1}
+
+> "abracadabrax"
+all matches             : [{"r", 2, 1}, {"r", 9, 1}, {"x", 11, 1}]
+replace_first with `<>' : "ab<>acadabrax"
+replace_all with `<>'   : "ab<>acadab<>a<>"
+change_first to `<&>'   : "ab<r>acadabrax"
+change_all to `<&>'     : "ab<r>acadab<r>a<x>"
+right match             : {"x", 11, 1}
+first match             : {"r", 2, 1}
+
+> "foo"
+all matches             : [{"f", 0, 1}, {"o", 1, 1}, {"o", 2, 1}]
+replace_first with `<>' : "<>oo"
+replace_all with `<>'   : "<><><>"
+change_first to `<&>'   : "<f>oo"
+change_all to `<&>'     : "<f><o><o>"
+left match              : {"f", 0, 1}
+right match             : {"o", 2, 1}
+first match             : {"f", 0, 1}
+
+
+* Matching against "[^]]"
+
+> "]foo["
+all matches             : [{"f", 1, 1}, {"o", 2, 1}, {"o", 3, 1}, {"[", 4, 1}]
+replace_first with `<>' : "]<>oo["
+replace_all with `<>'   : "]<><><><>"
+change_first to `<&>'   : "]<f>oo["
+change_all to `<&>'     : "]<f><o><o><[>"
+right match             : {"[", 4, 1}
+first match             : {"f", 1, 1}
+
+> "[foo]"
+all matches             : [{"[", 0, 1}, {"f", 1, 1}, {"o", 2, 1}, {"o", 3, 1}]
+replace_first with `<>' : "<>foo]"
+replace_all with `<>'   : "<><><><>]"
+change_first to `<&>'   : "<[>foo]"
+change_all to `<&>'     : "<[><f><o><o>]"
+left match              : {"[", 0, 1}
+first match             : {"[", 0, 1}
+
+> "foo"
+all matches             : [{"f", 0, 1}, {"o", 1, 1}, {"o", 2, 1}]
+replace_first with `<>' : "<>oo"
+replace_all with `<>'   : "<><><>"
+change_first to `<&>'   : "<f>oo"
+change_all to `<&>'     : "<f><o><o>"
+left match              : {"f", 0, 1}
+right match             : {"o", 2, 1}
+first match             : {"f", 0, 1}
+
+
+* Matching against "[^[-]]"
+
+> "]foo["
+all matches             : [{"f", 1, 1}, {"o", 2, 1}, {"o", 3, 1}]
+replace_first with `<>' : "]<>oo["
+replace_all with `<>'   : "]<><><>["
+change_first to `<&>'   : "]<f>oo["
+change_all to `<&>'     : "]<f><o><o>["
+first match             : {"f", 1, 1}
+
+> "[foo]"
+all matches             : [{"f", 1, 1}, {"o", 2, 1}, {"o", 3, 1}]
+replace_first with `<>' : "[<>oo]"
+replace_all with `<>'   : "[<><><>]"
+change_first to `<&>'   : "[<f>oo]"
+change_all to `<&>'     : "[<f><o><o>]"
+first match             : {"f", 1, 1}
+
+> "foo"
+all matches             : [{"f", 0, 1}, {"o", 1, 1}, {"o", 2, 1}]
+replace_first with `<>' : "<>oo"
+replace_all with `<>'   : "<><><>"
+change_first to `<&>'   : "<f>oo"
+change_all to `<&>'     : "<f><o><o>"
+left match              : {"f", 0, 1}
+right match             : {"o", 2, 1}
+first match             : {"f", 0, 1}
+
+
+* Matching against ".*"
+
+> "abracadabra"
+all matches             : [{"abracadabra", 0, 11}]
+replace_first with `<>' : "<>"
+replace_all with `<>'   : "<>"
+change_first to `<&>'   : "<abracadabra>"
+change_all to `<&>'     : "<abracadabra>"
+exact match
+left match              : {"abracadabra", 0, 11}
+right match             : {"abracadabra", 0, 11}
+first match             : {"abracadabra", 0, 11}
+
+> "xabracadabra"
+all matches             : [{"xabracadabra", 0, 12}]
+replace_first with `<>' : "<>"
+replace_all with `<>'   : "<>"
+change_first to `<&>'   : "<xabracadabra>"
+change_all to `<&>'     : "<xabracadabra>"
+exact match
+left match              : {"xabracadabra", 0, 12}
+right match             : {"xabracadabra", 0, 12}
+first match             : {"xabracadabra", 0, 12}
+
+> "abracadabrax"
+all matches             : [{"abracadabrax", 0, 12}]
+replace_first with `<>' : "<>"
+replace_all with `<>'   : "<>"
+change_first to `<&>'   : "<abracadabrax>"
+change_all to `<&>'     : "<abracadabrax>"
+exact match
+left match              : {"abracadabrax", 0, 12}
+right match             : {"abracadabrax", 0, 12}
+first match             : {"abracadabrax", 0, 12}
+
+> "foo"
+all matches             : [{"foo", 0, 3}]
+replace_first with `<>' : "<>"
+replace_all with `<>'   : "<>"
+change_first to `<&>'   : "<foo>"
+change_all to `<&>'     : "<foo>"
+exact match
+left match              : {"foo", 0, 3}
+right match             : {"foo", 0, 3}
+first match             : {"foo", 0, 3}
+
+
+* Matching against "."
+
+> "abracadabra"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"r", 2, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"r", 9, 1}, {"a", 10, 1}]
+replace_first with `<>' : "<>bracadabra"
+replace_all with `<>'   : "<><><><><><><><><><><>"
+change_first to `<&>'   : "<a>bracadabra"
+change_all to `<&>'     : "<a><b><r><a><c><a><d><a><b><r><a>"
+left match              : {"a", 0, 1}
+right match             : {"a", 10, 1}
+first match             : {"a", 0, 1}
+
+> "xabracadabra"
+all matches             : [{"x", 0, 1}, {"a", 1, 1}, {"b", 2, 1}, {"r", 3, 1}, {"a", 4, 1}, {"c", 5, 1}, {"a", 6, 1}, {"d", 7, 1}, {"a", 8, 1}, {"b", 9, 1}, {"r", 10, 1}, {"a", 11, 1}]
+replace_first with `<>' : "<>abracadabra"
+replace_all with `<>'   : "<><><><><><><><><><><><>"
+change_first to `<&>'   : "<x>abracadabra"
+change_all to `<&>'     : "<x><a><b><r><a><c><a><d><a><b><r><a>"
+left match              : {"x", 0, 1}
+right match             : {"a", 11, 1}
+first match             : {"x", 0, 1}
+
+> "abracadabrax"
+all matches             : [{"a", 0, 1}, {"b", 1, 1}, {"r", 2, 1}, {"a", 3, 1}, {"c", 4, 1}, {"a", 5, 1}, {"d", 6, 1}, {"a", 7, 1}, {"b", 8, 1}, {"r", 9, 1}, {"a", 10, 1}, {"x", 11, 1}]
+replace_first with `<>' : "<>bracadabrax"
+replace_all with `<>'   : "<><><><><><><><><><><><>"
+change_first to `<&>'   : "<a>bracadabrax"
+change_all to `<&>'     : "<a><b><r><a><c><a><d><a><b><r><a><x>"
+left match              : {"a", 0, 1}
+right match             : {"x", 11, 1}
+first match             : {"a", 0, 1}
+
+> "foo"
+all matches             : [{"f", 0, 1}, {"o", 1, 1}, {"o", 2, 1}]
+replace_first with `<>' : "<>oo"
+replace_all with `<>'   : "<><><>"
+change_first to `<&>'   : "<f>oo"
+change_all to `<&>'     : "<f><o><o>"
+left match              : {"f", 0, 1}
+right match             : {"o", 2, 1}
+first match             : {"f", 0, 1}
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/Mmakefile	28 Nov 2002 03:01:29 -0000
@@ -0,0 +1,28 @@
+# Copyright (C) 2002 The University of Melbourne
+# Ralph Becket <rafe at cs.mu.oz.au>
+#
+# To build do:
+#
+# $ mmake depend
+# $ mmake
+#
+# Ensure you have built and installed the lex and regex libraries.
+# Change the following line as appropriate if you installed them
+# elsewhere:
+#
+#EXTRA_LIB_DIRS := $(INSTALL_PREFIX)/extras/lib/mercury
+EXTRA_LIB_DIRS := ../lib/mercury
+
+EXTRA_LIBRARIES = lex regex
+
+MAIN_TARGET = all
+
+.PHONY: all depend
+
+all: test_regex
+
+depend: test_regex.depend
+
+check: depend test_regex
+	./test_regex < test_regex.in > test_regex.res
+	diff -u test_regex.exp test_regex.res && echo "Passed" || echo "Failed"
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ samples/regex_demo.m	28 Nov 2002 02:39:48 -0000
@@ -0,0 +1,56 @@
+%-----------------------------------------------------------------------------%
+% regex_demo.m
+% Ralph Becket <rafe at cs.mu.oz.au>
+% Sun Nov 24 11:44:45 EST 2002
+% vim: ft=mercury ts=4 sw=4 et wm=0 tw=0
+%
+%-----------------------------------------------------------------------------%
+
+:- module regex_demo.
+
+:- interface.
+
+:- import_module io.
+
+
+
+:- pred main(io::di, io::uo) is det.
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
+
+:- implementation.
+
+:- import_module string, list, exception.
+:- import_module lex, regex.
+
+%-----------------------------------------------------------------------------%
+
+main(!IO) :-
+    S = "([Ff][Oo][Oo])+",
+    M = change_all(regex(S), func(_) = "bar"),
+    io__format("Replacing multiple \"foo\"s with a single \"bar\"...",
+        [], !IO),
+    loop(M, !IO).
+
+%-----------------------------------------------------------------------------%
+
+:- pred loop(func(string) = string, io, io).
+:- mode loop(func(in) = out is det, di, uo) is det.
+
+loop(M, !IO) :-
+    io__format("\n> ", [], !IO),
+    io__read_line_as_string(Res, !IO),
+    (
+        Res = eof
+    ;
+        Res = error(_),
+        throw(Res)
+    ;
+        Res = ok(S),
+        io__format("  %s", [s(M(S))], !IO),
+        loop(M, !IO)
+    ).
+
+%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ samples/lex_demo.m	28 Nov 2002 02:39:48 -0000
@@ -0,0 +1,134 @@
+%----------------------------------------------------------------------------- %
+% lex_demo.m
+% Sun Aug 20 18:11:42 BST 2000
+%
+% vim: ts=4 sw=4 et tw=0 wm=0 ft=mercury
+%
+% Copyright (C) 2001-2002 The University of Melbourne
+% Copyright (C) 2001 The Rationalizer Intelligent Software AG
+%   The changes made by Rationalizer are contributed under the terms 
+%   of the GNU General Public License - see the file COPYING in the
+%   Mercury Distribution.
+%
+%----------------------------------------------------------------------------- %
+
+:- module lex_demo.
+
+:- interface.
+
+:- import_module io.
+
+:- pred main(io::di, io::uo) is det.
+
+%----------------------------------------------------------------------------- %
+%----------------------------------------------------------------------------- %
+
+:- implementation.
+
+:- import_module string, int, float, exception, list.
+:- import_module lex.
+
+%----------------------------------------------------------------------------- %
+
+main(IO0, IO) :-
+
+    io__print("\
+
+I recognise the following words:
+""cat"", ""dog"", ""rat"", ""mat"", ""sat"", ""caught"", ""chased"",
+""and"", ""then"", ""the"", ""it"", ""them"", ""to"", ""on"".
+I also recognise Mercury-style comments, integers and floating point
+numbers, and a variety of punctuation symbols.
+
+Try me...
+
+", IO0, IO1),
+
+    Lexer  = lex__init(lexemes, lex__read_from_stdin, ignore(space)),
+    State0 = lex__start(Lexer, IO1),
+    tokenise_stdin(State0, State),
+    IO     = lex__stop(State).
+
+%----------------------------------------------------------------------------- %
+
+:- pred tokenise_stdin(lexer_state(token, io__state),
+                lexer_state(token, io__state)).
+:- mode tokenise_stdin(di, uo) is det.
+
+tokenise_stdin -->
+    lex__read(Result),
+    lex__manipulate_source(io__print(Result)),
+    lex__manipulate_source(io__nl),
+    ( if { Result \= eof } then
+        tokenise_stdin
+      else
+        []
+    ).
+
+%----------------------------------------------------------------------------- %
+
+:- type token
+    --->    noun(string)
+    ;       comment(string)
+    ;       integer(int)
+    ;       real(float)
+    ;       verb(string)
+    ;       conj(string)
+    ;       prep(string)
+    ;       punc
+    ;       space
+    ;       unrecognised(string)
+    .
+
+:- func lexemes = list(lexeme(token)).
+
+lexemes = [
+
+    ( "%" ++ junk       -> (func(Match) = comment(Match)) ),
+    ( signed_int        -> (func(Match) = integer(string__det_to_int(Match))) ),
+    ( real              -> (func(Match) = real(det_string_to_float(Match))) ),
+
+        % Multiple regexps can match the same token constructor.
+        %
+    ( "cat"             -> (func(Match) = noun(Match)) ),
+    ( "dog"             -> (func(Match) = noun(Match)) ),
+    ( "rat"             -> (func(Match) = noun(Match)) ),
+    ( "mat"             -> (func(Match) = noun(Match)) ),
+
+        % Here we use `or', rather than multiple lexemes.
+        %
+    ( "sat" or
+      "caught" or
+      "chased"          -> (func(Match) = verb(Match)) ),
+
+    ( "and" or
+      "then"            -> (func(Match) = conj(Match)) ),
+
+        % `\/' is a synonym for `or'.  Tell us which you prefer...
+        % 
+    ( "the" \/
+      "it" \/
+      "them" \/
+      "to" \/
+      "on"              -> (func(Match) = prep(Match)) ),
+
+        % return/1 can be used when you don't care what string was matched.
+        %
+    ( any("~!@#$%^&*()_+`-={}|[]\\:"";'<>?,./")
+                        -> return(punc) ),
+    ( whitespace        -> return(space) ),
+    ( dot               -> func(Match) = unrecognised(Match) )
+].
+
+
+
+:- func det_string_to_float(string) = float.
+
+det_string_to_float(String) =
+    ( if   string__to_float(String, Float)
+      then Float
+      else throw("error in float conversion")
+    ).
+
+%----------------------------------------------------------------------------- %
+%----------------------------------------------------------------------------- %
only in patch2:
--- samples/demo.m	4 Oct 2001 07:46:04 -0000	1.2
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,136 +0,0 @@
-%----------------------------------------------------------------------------- %
-% demo.m
-% Sun Aug 20 18:11:42 BST 2000
-%
-% vim: ts=4 sw=4 et tw=0 wm=0 ff=unix ft=mercury
-%
-% Copyright (C) 2001 Ralph Becket <rbeck at microsoft.com>
-%   THIS FILE IS HEREBY CONTRIBUTED TO THE MERCURY PROJECT TO
-%   BE RELEASED UNDER WHATEVER LICENCE IS DEEMED APPROPRIATE
-%   BY THE ADMINISTRATORS OF THE MERCURY PROJECT.
-% Thu Jul 26 07:45:47 UTC 2001
-% Copyright (C) 2001 The Rationalizer Intelligent Software AG
-%   The changes made by Rationalizer are contributed under the terms 
-%   of the GNU General Public License - see the file COPYING in the
-%   Mercury Distribution.
-%
-%----------------------------------------------------------------------------- %
-
-:- module demo.
-
-:- interface.
-
-:- import_module io.
-
-:- pred main(io__state::di, io__state::uo) is det.
-
-%----------------------------------------------------------------------------- %
-%----------------------------------------------------------------------------- %
-
-:- implementation.
-
-:- import_module string, int, float, exception, list.
-:- import_module lex.
-
-%----------------------------------------------------------------------------- %
-
-main(IO0, IO) :-
-
-    io__print("\
-
-I recognise the following words:
-""cat"", ""dog"", ""rat"", ""mat"", ""sat"", ""caught"", ""chased"",
-""and"", ""then"", ""the"", ""it"", ""them"", ""to"", ""on"".
-I also recognise Mercury-style comments, integers and floating point
-numbers, and a variety of punctuation symbols.
-
-Try me...
-
-", IO0, IO1),
-
-    Lexer  = lex__init(lexemes, lex__read_from_stdin, ignore(space)),
-    State0 = lex__start(Lexer, IO1),
-    tokenise_stdin(State0, State),
-    IO     = lex__stop(State).
-
-%----------------------------------------------------------------------------- %
-
-:- pred tokenise_stdin(lexer_state(token, io__state),
-                lexer_state(token, io__state)).
-:- mode tokenise_stdin(di, uo) is det.
-
-tokenise_stdin -->
-    lex__read(Result),
-    lex__manipulate_source(io__print(Result)),
-    lex__manipulate_source(io__nl),
-    ( if { Result \= eof } then
-        tokenise_stdin
-      else
-        []
-    ).
-
-%----------------------------------------------------------------------------- %
-
-:- type token
-    --->    noun(string)
-    ;       comment(string)
-    ;       integer(int)
-    ;       real(float)
-    ;       verb(string)
-    ;       conj(string)
-    ;       prep(string)
-    ;       punc
-    ;       space
-    .
-
-:- func lexemes = list(lexeme(token)).
-
-lexemes = [
-
-    ( "%" ++ junk      -> (func(Match) = comment(Match)) ),
-    ( signed_int       -> (func(Match) = integer(string__det_to_int(Match))) ),
-    ( real             -> (func(Match) = real(det_string_to_float(Match))) ),
-
-        % Multiple regexps can match the same token constructor.
-        %
-    ( "cat"            -> (func(Match) = noun(Match)) ),
-    ( "dog"            -> (func(Match) = noun(Match)) ),
-    ( "rat"            -> (func(Match) = noun(Match)) ),
-    ( "mat"            -> (func(Match) = noun(Match)) ),
-
-        % Here we use `or', rather than multiple lexemes.
-        %
-    ( "sat" or
-      "caught" or
-      "chased"         -> (func(Match) = verb(Match)) ),
-
-    ( "and" or
-      "then"           -> (func(Match) = conj(Match)) ),
-
-        % `\/' is a synonym for `or'.  Tell us which you prefer...
-        % 
-    ( "the" \/
-      "it" \/
-      "them" \/
-      "to" \/
-      "on"             -> (func(Match) = prep(Match)) ),
-
-        % return/1 can be used when you don't care what string was matched.
-        %
-    ( any("~!@#$%^&*()_+`-={}|[]\\:"";'<>?,./")
-                       -> return(punc) ),
-    ( whitespace       -> return(space) )
-].
-
-
-
-:- func det_string_to_float(string) = float.
-
-det_string_to_float(String) =
-    ( if   string__to_float(String, Float)
-      then Float
-      else throw("error in float conversion")
-    ).
-
-%----------------------------------------------------------------------------- %
-%----------------------------------------------------------------------------- %
only in patch2:
--- samples/Mmakefile	21 Feb 2001 16:29:38 -0000	1.1
+++ samples/Mmakefile	28 Nov 2002 02:45:45 -0000
@@ -1,19 +1,26 @@
-# Copyright (C) 2001 Ralph Becket <rbeck at microsoft.com>
+# Copyright (C) 2001-2002 The University of Melbourne
+# Ralph Becket <rafe at cs.mu.oz.au>
 #
-#   THIS FILE IS HEREBY CONTRIBUTED TO THE MERCURY PROJECT TO
-#   BE RELEASED UNDER WHATEVER LICENCE IS DEEMED APPROPRIATE
-#   BY THE ADMINISTRATORS OF THE MERCURY PROJECT.
+# To build do:
+#
+# $ mmake depend
+# $ mmake
+#
+# The targets are called lex_demo and regex_demo.
+#
+# Ensure you have built and installed the lex and regex libraries.
+# Change the following line as appropriate if you installed them
+# elsewhere:
+#
+#EXTRA_LIB_DIRS := $(INSTALL_PREFIX)/extras/lib/mercury
+EXTRA_LIB_DIRS := ../lib/mercury
+
+EXTRA_LIBRARIES = lex regex
+
+.PHONY: all
 
-# Specify the location of the `mypackage' and `myotherlib' directories
-LEX_DIR = ..
+MAIN_TARGET = all
 
-# The following stuff tells Mmake to use the two libraries
-VPATH = $(LEX_DIR):$(MMAKE_VPATH)
-MCFLAGS = -I$(LEX_DIR) $(EXTRA_MCFLAGS)
-MLFLAGS = -R$(LEX_DIR) $(EXTRA_MLFLAGS) \
-          -L$(LEX_DIR)
-MLLIBS = -llex $(EXTRA_MLLIBS)
-C2INITARGS = $(LEX_DIR)/lex.init
+all: lex_demo regex_demo
 
-MAIN_TARGET = demo
-depend: $(MAIN_TARGET).depend
+depend: lex_demo.depend regex_demo.depend
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ README.regex	26 Nov 2002 01:08:56 -0000
@@ -0,0 +1,70 @@
+THE REGEX MODULE
+
+The regex/1 function converts standard string-type regular expression
+definitions into values of type regex.
+
+EXAMPLE OF REGEXES
+
+regex(".")		matches any char except `\n'
+regex("abc")		matches `abc'
+regex("abc*")		matches zero or more contiguous occurrences of `abc'
+regex("abc+")		matches one or more contiguous occurrences of `abc'
+regex("abc?")		matches zero or one occurrence of `abc'
+regex("abc|xyz")	matches `abc' or `xyz'
+regex("(abc|xyz)?")	matches zero or one occurrence of `abc' or `xyz'
+regex("abc|xyz?")	matches `abc' or zero or one occurrence of `xyz'
+regex("[pqr]")		matches `p', `q' or `r'
+regex("[p-z]")		matches `p', 'q', ..., or `z'
+regex("[abcp-r]")	matches `a', `b', `c', 'p', 'q', ..., or `z'
+regex("[]]")		matches `]'
+regex("[^...]")		matches any char not in the set ... or `\n'
+regex("\\?")		matches `?' (ditto for any other literal char)
+
+There is a corresponding function regexp/1 (note the different spelling)
+which converts standard string-type regular expression definitions into
+values of type regexp, suitable for use with the lex module.
+
+EXAMPLES OF USE
+
+The following predicates and functions all take values of type regex as
+their first argument:
+
+left_match(regex("a+"), "faat cat", Substring, Start, Count)
+	fails.
+
+left_match(regex("a+"), "a faat cat", Substring, Start, Count)
+	succeeds with Substring = "a", Start = 0, Count = 1.
+
+first_match(regex("a+"), "faat cat", Substring, Start, Count)
+	succeeds with Substring = "aa", Start = 1, Count = 2.
+
+right_match(regex("a+"), "faat cat", Substring, Start, Count)
+	fails.
+
+right_match(regex("a+"), "kowabunga", Substring, Start, Count)
+	succeeds with Substring = "a", Start = 8, Count = 1.
+
+exact_match(regex("a+"), "kowabunga", Substring, Start, Count)
+	fails.
+
+exact_match(regex("a+"), "aaaa")
+	succeeds.
+
+matches(regex("a+"), "faat cat") = [{"aa", 1, 2}, {"a", 6, 1}]
+
+replace_first(regex("a+"), "f", "faat cat") = "fft cat"
+
+replace_first(regex("a+"), "f", "xyz") = "xyz"
+
+replace_all(regex("a+"), "f", "faat cat") = "fft cft"
+
+replace_all(regex("a+"), "f", "xyz") = "xyz"
+
+change_first(regex("a+"), string__to_upper, "faat cat") = "fAAt cat"
+
+change_first(regex("a+"), string__to_upper, "xyz") = "xyz"
+
+change_all(regex("a+"), string__to_upper, "faat cat") = "fAAt cAt"
+
+change_all(regex("a+"), string__to_upper, "xyz") = "xyz"
+
only in patch2:
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ README.lex	26 Nov 2002 01:54:37 -0000
@@ -0,0 +1,74 @@
+THE LEX MODULE
+
+The lex module provides tools for writing lexical analyzers.  A
+lexical analyzer parses a stream of chars (e.g. from a string or the
+standard input stream) against a list of regular expressions,
+returning the first, longest match along with an indication of which
+regular expression was matched.
+
+QUICK START GUIDE
+
+A lexer is compiled from a list of lexemes and a predicate that will
+read the next char from the input stream.
+
+A lexeme is a pair consisting of a regular expression and a function
+that will convert a string matched by the regular expression into a
+token, which may be returned as a result by the lexical analyzer
+(hereafter referred to as a `lexer'.)
+
+The lex module provides a language for composing regular expressions
+including literal strings, alternation, Kleene closure, grouping and
+various other useful combinators, as well as a rich set of pre-defined
+regular expressions such as identifier, signed_int, real and so forth.
+(Also, consider the regexp/1 function defined in the regex module,
+which supports the construction of regular expressions from strings
+similar to those recognised by tools such as grep and sed.)
+
+A lexer may be created as in the following example (this lexer works
+over the standard input stream):
+
+:- type token
+	--->	id(string)
+	;	int(int)
+	;	float(float)
+	;	lpar
+	;	rpar
+	;	comment.
+
+Lexer = lex__init([
+	(	identifier	->	func(Id)    = id(Id)		),
+	(	signed_int	->	func(Int)   = int(Int)		),
+	(	real		->	func(Float) = float(Float)	),
+	(	"("		->	return(lpar)			),
+	(	")"		->	return(rpar)			),
+	(	"%" ++ junk	->	return(comment)			)
+	], read_from_stdin).
+
+The combinator return/2 is defined s.t. return(X) = ( func(_) = X ),
+that is, it simply discards the matched string and returns X.
+
+(There is also lex__init/3 which takes an extra argument, namely a
+predicate which is used to silently ignore certain tokens such as
+whitespace, say.)
+
+A lexer is activated by calling lex__start/2, which returns a (unique)
+lexer state:
+
+	!:LexerState = lex__start(Lexer, !.IO)
+
+The lex__read/3 predicate searches for the next, longest match in the
+input stream and returns the corresponding token (or an error message
+if there is no immediate match in the input stream):
+
+	lex__read(Result, !LexerState),
+	(	Result = eof,				...
+	;	Result = ok(Token),			...
+	;	Result = error(Message, Offset),	...
+	)
+
+When lexical analysis is complete, the input source may be obtained 
+by calling lex__stop/1:
+
+	!:IO = lex__stop(!.LexerState)
+
+
only in patch2:
--- README	4 Oct 2001 07:46:03 -0000	1.2
+++ README	28 Nov 2002 02:58:48 -0000
@@ -1,116 +1,10 @@
-lex 1.0 (very alpha)
-Fri Aug 25 17:54:28  2000
-Copyright (C) 2001 Ralph Becket <rbeck at microsoft.com>
-    THIS FILE IS HEREBY CONTRIBUTED TO THE MERCURY PROJECT TO
-    BE RELEASED UNDER WHATEVER LICENCE IS DEEMED APPROPRIATE
-    BY THE ADMINISTRATORS OF THE MERCURY PROJECT.
-Sun Aug  5 16:15:27 UTC 2001
-Copyright (C) 2001 The Rationalizer Intelligent Software AG
-    The changes made by Rationalizer are contributed under the terms 
-    of the GNU Free Documentation License - see the file COPYING.DOC 
-    in this directory.
+Two library modules, lex and regex, provide facilities for the
+construction of lexical analyzers and for string matching and search
+and replace functionality respectively.
 
-This package defines a lexer for Mercury.  There is plenty of scope for
-optimization, however it is reasonably efficient and does provide the
-holy grail of piecemeal lexing of stdin (and strings, and lists, and
-...)
+README.lex provides a brief introduction to the lex module.
 
-The interface is simple.
-
-1. Import module lex.
-
-    :- import_module lex.
-
-2. Set up a token type.
-
-    :- type token
-        --->    comment
-        ;       id(string)
-        ;       num(int)
-        ;       space.
-
-3. Set up a list of annotated_lexemes.
-
-    Lexemes = [
-	( "%" ++ *(dot)        ->  return(comment) ),
-	( identifier           ->  (func(Id) = id(Id)) ),
-	( signed_int           ->  (func(N)  = num(string__det_to_int(N))) ),
-	( whitespace           ->  return(space) )
-    ]
-
-
-A lexeme is a (RegExp - TokFn) pair where RegExp is a regular expression
-and TokFn is a token_creator function mapping the string matched by
-RegExp to a token value.
-
-4. Set up a lexer with an appropriate read predicate (see the buf module).
-
-    Lexer = lex__init(Lexemes, lex__read_from_stdin)
-
-or:
-
-    Lexer = lex__init(Lexemes, lex__read_from_stdin, ignore(space))
-     
-5. Obtain a live lexer state.
-
-    State0 = lex__start(Lexer, IO0)
-
-6. Use it to lex the input stream.
-
-    lex__read(Result, State0, State1),
-    (	Result = ok(Token), ...
-    ;	Result = error(Message, OffsetInInputStream), ...
-    ;	Result = eof, ...
-    )
-
-    NOTE: The result type of lex__read is io__read_result(token).
-    io__read_result is documented in the library file io.m as:
-    :- type io__read_result(T)      --->    ok(T)
-                                    ;       eof
-                                    ;       error(string, int).
-                                            % error message, line number
-    In contrast to this the `int' lex returns in the case of an error
-    does not correspond to the line number but to the character offset.
-    Hence be careful when processing lex errors.
-
-7. If you need to manipulate the source object, you can.
-
-    lex__manipulate_source(io__print("Not finished yet?"), State1, State2)
-
-8. When you're done, retrieve the source object.
-
-    IO = lex__stop(State)
-
-And that's basically it.
-
-In future I plan to add several optimizations to the implementation and
-the option to write out a compilable source file for the lexer.
-
-
-OPPORTUNITIES FOR MODULARIZATION
-
-1. Remove regexp functionality from lex.m and lex.regexp.m and put it
-   into a distinct regexp library.
-
-
-OPPORTUNITIES FOR OPTIMIZATION
-
-1. Move from chars to bytes.
-2. Implement a byte_array rather than using a wasteful array(char) for
-   the input buffer.
-3. Implement the first-byte optimization whereby the set of `live
-   lexemes' is decided by the first byte read in on a lexing pass.
-4. Implement state machine minimization (may or may not be worthwhile.)
-
-
-FEATURES TO ADD:
-
-1. Symbol table management (additional parameters for the user-defined
-   predicates containing the symbol table before and after processing
-   a lexeme)
-2. func (string) = regexp, where the function parameter contains a
-   regexp definition in a form like used in languages in Perl, awk etc.
-3. line# as part of the offset
-4. extend the lexer interface somehow to get more detailed information
-   about the token resp. error position
+README.regex provides a brief introduction to the regex module.
 
+The Mmakefile includes instructions for building and installing these
+libraries.
only in patch2:
--- Mmakefile	6 Mar 2002 10:10:30 -0000	1.3
+++ Mmakefile	26 Nov 2002 09:01:46 -0000
@@ -7,36 +7,50 @@
 # To build, do the following:
 #
 # $ mmake depend
-# $ mmake
+# $ mmake all
+# $ mmake install
 #
-# And to install...
+# If you have problems, try the following instead:
 #
-# $ mmake install
+# $ mmake depend
+# $ mmake all
+# $ mmake liblex.install
+# $ mmake libregex.install
 
 # Omit this line if you want to install in the standard location.
 # Edit this line if you want to install the library elsewhere.
 # A directory $(INSTALL_PREFIX)/lib/mercury will be created, if
 # necessary, and everything put there.
 #
-INSTALL_PREFIX := $(INSTALL_PREFIX)/extras
-#INSTALL_PREFIX = $(HOME)/mercury
+#INSTALL_PREFIX := $(INSTALL_PREFIX)/extras
+INSTALL_PREFIX = .
 
 # Omit this line if you want to install the default grades.
 # Edit this line if you want to install with different grades.
 #
-#LIBGRADES = asm_fast.gc hlc.gc
+LIBGRADES = asm_fast.gc hlc.gc asm_fast.gc.tr.debug
 
-# Any application using this library will also need the following
+# Any application using these libraries will also need the following
 # in its Mmakefile:
 #
-#EXTRA_LIBRARIES = lex
+#EXTRA_LIBRARIES = lex regex
+#
+# and the following must be uncommented if the library was not installed
+# in the standard location (the RHS must match the value of INSTALL_PREFIX
+# used to install the libr ary.)
+#
+#EXTRA_LIB_DIRS = $(INSTALL_PREFIX)/extras
+#EXTRA_LIB_DIRS = .
+
+MAIN_TARGET = all
+
+.PHONEY: all depend install check
 
--include ../Mmake.params
+all: liblex libregex
 
-MAIN_TARGET = liblex
-depend: lex.depend
-install: $(MAIN_TARGET).install
+depend: lex.depend regex.depend
 
-check: liblex
-	true
+install: liblex.install libregex.install
 
+check: install
+	(cd tests; mmake check)
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
Previous message: [m-rev.] New regex module in extras/lex
Next message: [m-rev.] New regex module in extras/lex
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list