[m-rev.] for review: make parsing tail recursive
Paul Bone
pbone at csse.unimelb.edu.au
Wed May 7 12:04:13 AEST 2008
On Wed, May 07, 2008 at 10:41:05AM +1000, Peter Wang wrote:
> Estimated hours taken: 0.5
> Branches: main
>
> Avoid running out of stack space when parsing big terms.
>
> library/Mercury.options:
> Enable last-call-modulo-cons optimisation to make
> `lexer.get_token_list_2' tail recursive.
>
> library/parser.m:
> Make `check_for_bad_token' tail recursive.
>
> Index: library/Mercury.options
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/Mercury.options,v
> retrieving revision 1.29
> diff -u -r1.29 Mercury.options
> --- library/Mercury.options 24 Jan 2008 03:50:46 -0000 1.29
> +++ library/Mercury.options 7 May 2008 00:37:32 -0000
> @@ -16,6 +16,7 @@
> MCFLAGS-set += $(TYPE_SPEC_FLAGS)
> MCFLAGS-set_ordlist += $(TYPE_SPEC_FLAGS)
>
> +MCFLAGS-lexer += --optimize-constructor-last-call
> MCFLAGS-sparse_bitset += --use-atomic-cells --optimize-constant-propagation
> MCFLAGS-tree_bitset += --use-atomic-cells --optimize-constant-propagation
>
> Index: library/parser.m
> ===================================================================
> RCS file: /home/mercury/mercury1/repository/mercury/library/parser.m,v
> retrieving revision 1.58
> diff -u -r1.58 parser.m
> --- library/parser.m 3 Apr 2008 05:26:47 -0000 1.58
> +++ library/parser.m 7 May 2008 00:37:32 -0000
> @@ -251,7 +251,13 @@
>
> :- pred check_for_bad_token(token_list::in, string::out, int::out) is semidet.
>
> -check_for_bad_token(token_cons(Token, LineNum, Tokens), Message, LineNum) :-
> +check_for_bad_token(Token, Message, LineNum) :-
> + check_for_bad_token_2(Token, Message),
> + Token = token_cons(_, LineNum, _).
> +
> +:- pred check_for_bad_token_2(token_list::in, string::out) is semidet.
> +
> +check_for_bad_token_2(token_cons(Token, _LineNum, Tokens), Message) :-
> ( Token = io_error(IO_Error) ->
> io.error_message(IO_Error, IO_ErrorMessage),
> string.append("I/O error: ", IO_ErrorMessage, Message)
> @@ -264,7 +270,7 @@
> ; Token = error(ErrorMessage) ->
> string.append("Syntax error: ", ErrorMessage, Message)
> ;
> - check_for_bad_token(Tokens, Message, LineNum)
> + check_for_bad_token_2(Tokens, Message)
> ).
>
This does not return the correct LineNum for the bad token, rather it
returns the list head's LineNum.
The original code seems to have the same problem (if it is a problem).
:- pred check_for_bad_token(token_list::in, string::out, int::out) is semidet.
^^^^^^^
This may
leave the
error
undetected.
check_for_bad_token(token_cons(Token, LineNum, Tokens), Message, LineNum) :-
^^^^^^^ This and this ^^^^^^^ are
unified in all cases, They should
probably be only unified in the last case
at the end of the predicate.
( Token = io_error(IO_Error) ->
io.error_message(IO_Error, IO_ErrorMessage),
string.append("I/O error: ", IO_ErrorMessage, Message)
; Token = junk(Char) ->
char.to_int(Char, Code),
string.int_to_base_string(Code, 10, Decimal),
string.int_to_base_string(Code, 16, Hex),
string.append_list(["Syntax error: Illegal character 0x", Hex,
" (", Decimal, ") in input"], Message)
; Token = error(ErrorMessage) ->
string.append("Syntax error: ", ErrorMessage, Message)
;
check_for_bad_token(Tokens, Message, LineNum)
^^^^^^^ It's unified here too,
when LineNum is unified here it probably shouldn't be unified against
the line number in the first argument.
).
Perhaps using a switch over all the function symbols that Token may be can help as well.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to: mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions: mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------
More information about the reviews
mailing list