[m-dev.] regular expressions

Ralph Becket rbeck at microsoft.com
Mon Dec 20 23:58:20 AEDT 1999


I've been thinking about this and it occurs to me that one wants to use
regular expressions for several tasks, including
* simply testing whether a sequence matches a regex,
* finding the prefix(es)/subsequence(s) of a sequence matching a regex,
* processing such matches, and
* performing substitutions (arguably a special case of the above).

Languages that make good use of regular expressions provide the machinery
for using the same basic regex for all of the above purposes, which suggests
to me that what we want is either a suite of regex interpreters (cf. Fergus'
post) or higher order regex compilers.

Suggestion: extend the regex type to include a yacc-like function annotation
to regexes...

:- type regex(seq(T1), T2)
	--->	empty
	;	atom(T1)
	;	sequence(regex(seq(T1), T2), regex(seq(T1), T2)))
	;	alternation(list(regex(seq(T1), T2)))
	;	apply(regex(seq(T1), T2), func(T2, seq(T1)) = T2).

So one could write

digit_regex = alternation([atom(0), atom(1), ..., atom(9)]).

nat_regex = kleene_plus(digit_regex, 0, nat_next_digit).

Where

kleene_plus(RegEx, Base, Fn) =
	sequence(
		apply(RegEx, Fn(Base)),		* -- see below
		kleene_star(RegEx, Fn)
	).

kleene_star(RegEx, Fn) =
	alternation([
		empty
	|	sequence(
			apply(RegEx, Fn),
			kleene_star(RegEx, Fn)
		)
	]).

* -- I've cheated here (which I'm allowed to do because I'm just trying
to knock in idea together) to get the starting value into the
computation.  Hmm, maybe I need to add an extra regex constructor of
the form
	apply(regex(seq(T1), T2), func(seq(T1)) = T2)

The above works on an item-at-a-time basis, but one could equally
write

nat_regex = apply(kleene_plus(digit_regex), char_list_to_int)

where

char_list_to_int(Chars) =
	string__to_int(list__condense(Chars)).

One would write non-processing parsers that would simply ignore the second
argument to apply/2 and processing parsers that would keep track of 
the accumulator state and execute the second argument to apply as 
necessary.

A few details need to be worked out, but I don't think this is entirely
mad.

Ralph
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list