[m-rev.] for review: make parsing tail recursive

Paul Bone pbone at csse.unimelb.edu.au
Wed May 7 12:04:13 AEST 2008


On Wed, May 07, 2008 at 10:41:05AM +1000, Peter Wang wrote:
> Estimated hours taken: 0.5
> Branches: main
> 
> Avoid running out of stack space when parsing big terms.
> 
> library/Mercury.options:
> 	Enable last-call-modulo-cons optimisation to make
> 	`lexer.get_token_list_2' tail recursive.
> 
> library/parser.m:
> 	Make `check_for_bad_token' tail recursive.
> 
> Index: library/Mercury.options
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/Mercury.options,v
> retrieving revision 1.29
> diff -u -r1.29 Mercury.options
> --- library/Mercury.options	24 Jan 2008 03:50:46 -0000	1.29
> +++ library/Mercury.options	7 May 2008 00:37:32 -0000
> @@ -16,6 +16,7 @@
>  MCFLAGS-set += $(TYPE_SPEC_FLAGS)
>  MCFLAGS-set_ordlist += $(TYPE_SPEC_FLAGS)
>  
> +MCFLAGS-lexer += --optimize-constructor-last-call
>  MCFLAGS-sparse_bitset += --use-atomic-cells --optimize-constant-propagation
>  MCFLAGS-tree_bitset += --use-atomic-cells --optimize-constant-propagation
>  
> Index: library/parser.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/parser.m,v
> retrieving revision 1.58
> diff -u -r1.58 parser.m
> --- library/parser.m	3 Apr 2008 05:26:47 -0000	1.58
> +++ library/parser.m	7 May 2008 00:37:32 -0000
> @@ -251,7 +251,13 @@
>  
>  :- pred check_for_bad_token(token_list::in, string::out, int::out) is semidet.
>  
> -check_for_bad_token(token_cons(Token, LineNum, Tokens), Message, LineNum) :-
> +check_for_bad_token(Token, Message, LineNum) :-
> +    check_for_bad_token_2(Token, Message),
> +    Token = token_cons(_, LineNum, _).
> +
> +:- pred check_for_bad_token_2(token_list::in, string::out) is semidet.
> +
> +check_for_bad_token_2(token_cons(Token, _LineNum, Tokens), Message) :-
>      ( Token = io_error(IO_Error) ->
>          io.error_message(IO_Error, IO_ErrorMessage),
>          string.append("I/O error: ", IO_ErrorMessage, Message)
> @@ -264,7 +270,7 @@
>      ; Token = error(ErrorMessage) ->
>          string.append("Syntax error: ", ErrorMessage, Message)
>      ;
> -        check_for_bad_token(Tokens, Message, LineNum)
> +        check_for_bad_token_2(Tokens, Message)
>      ).
>  

This does not return the correct LineNum for the bad token, rather it
returns the list head's LineNum.

The original code seems to have the same problem (if it is a problem).

:- pred check_for_bad_token(token_list::in, string::out, int::out) is semidet.
                                                                      ^^^^^^^
                                                                      This may
                                                                      leave the
                                                                      error
                                                                      undetected.

check_for_bad_token(token_cons(Token, LineNum, Tokens), Message, LineNum) :-
                                      ^^^^^^^ This and this      ^^^^^^^ are
                                      unified in all cases, They should
                                      probably be only unified in the last case
                                      at the end of the predicate.
    ( Token = io_error(IO_Error) ->
        io.error_message(IO_Error, IO_ErrorMessage),
        string.append("I/O error: ", IO_ErrorMessage, Message)
    ; Token = junk(Char) -> 
        char.to_int(Char, Code), 
        string.int_to_base_string(Code, 10, Decimal),
        string.int_to_base_string(Code, 16, Hex),
        string.append_list(["Syntax error: Illegal character 0x", Hex,
            " (", Decimal, ") in input"], Message)
    ; Token = error(ErrorMessage) ->
        string.append("Syntax error: ", ErrorMessage, Message)
    ;
        check_for_bad_token(Tokens, Message, LineNum)
                                             ^^^^^^^ It's unified here too,
        when LineNum is unified here it probably shouldn't be unified against
        the line number in the first argument.
    ).  

Perhaps using a switch over all the function symbols that Token may be can help as well.


--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list