[m-rev.] add character ranges to extras/lex

Paul Bone paul at bone.id.au
Fri Feb 21 12:38:09 AEDT 2014


On Sat, Feb 15, 2014 at 08:30:32PM +0100, Sebastian Godelet wrote:
> For review by anyone.
> 
> To facilitate easier lexeme definition,
> add a new range/2 function which works as a simple character class like in
> Perl regular expressions.
> For example: range('a', 'f') = any("abcdef").
> 
> extras/lex/lex.m:
>    adds func range(char, char) = regexp.
> 
> extras/lex/samples/lex_demo.m
>    adds a word recognizer just before the "junk" lexeme.

I can't find the changes to lex_demo.m in your attached patch.

> 
> I hope you find this useful.
> If my changes get approved in some form (this is my first contribution)
> I'd happily enhance the basic lexer to become more expressive and powerful.
> I was thinking of range/3: range(From, To, Exclude = type set(char).

Thanks Sebastian,

These changes are good in principal, although I cannot yet review
lex_demo.m.

If you want more flexibility it looks like the existing code could be made
more flexible as well.  One idea is to create a new version of the any/1
function which takes a list.

    :- func any_list(list(char)) = regexp.

Then your range example with the exclude list can be written easily:

    any_list(not_in(Exclude), char_range(From .. To))

Of course now you need a predicate not_in/1 and a function char_range.  But
those should be simple.

Many of the functions and predicates in the list module can then be used to
describe sets of characters.


> +range(S, E) = R :-
> +    char.to_int(S, Si),
> +    char.to_int(E, Ei),
> +    ( if Si < Ei then
> +        R = build_range(Si + 1, Ei, re(S))
> +      else if Si = Ei then
> +        R = re(S)
> +      else
> +        R = null
> +    ).
> +
> +:- func build_range(int, int, regexp) = regexp.
> +
> +build_range(S, E, R0) = R :-
> +    ( if S < E then
> +        char.det_from_int(S, C),
> +        R1 = (R0 or re(C)),
> +        R = build_range(S + 1, E, R1)
> +      else if S = E then
> +        R = R0
> +      else
> +        throw(exception.software_error("invalid range!"))
> +    ).
> +

Try to use more meaningful variable names, rather than S, E and C call these
Start End and Char.  I was able to work this out by looking at your code
however you can avoid many misunderstandings with well written code.

We also have some useful exception throwing functions in the module require.
error($file, $pred, "invalid range") will throw a software error exception
that describes the location of the error.

>  %-----------------------------------------------------------------------------%
>  % Some useful single-char regexps.
> 
> @@ -768,13 +793,13 @@ alpha      = (lower or upper).
>  alphanum   = (alpha or digit).
>  identstart = (alpha or ('_')).
>  ident      = (alphanum or ('_')).
> -nl         = re('\n').
>  tab        = re('\t').
>  spc        = re(' ').
> 
>  %-----------------------------------------------------------------------------%
>  % Some useful compound regexps.
> 
> +nl         = (?('\r') ++ '\n').  % matches both Posix and Windows newline.
>  nat        = +(digit).
>  signed_int = ?("+" or "-") ++ nat.
>  real       = signed_int ++ (

Good idea.

> diff --git a/extras/lex/samples/lex_demo.m b/extras/lex/samples/lex_demo.m
> index 6d30ac2..68aef0d 100644
> --- a/extras/lex/samples/lex_demo.m
> +++ b/extras/lex/samples/lex_demo.m

The changes to this file seem to be missing.


-- 
Paul Bone



More information about the reviews mailing list