[m-rev.] For review: updates to parsing_utils.m

Paul Bone pbone at csse.unimelb.edu.au
Wed Jun 17 12:15:02 AEST 2009


On Tue, Jun 16, 2009 at 05:15:59PM +1000, Ralph Becket wrote:
> 
> Estimated hours taken: 2
> Branches: main
> 
> library/parsing_utils.m:
> 	Add support for parsers that update their own state (e.g., a symbol
> 	table).  Add support for user-defined whitespace specification (e.g.,
> 	to include comments).  Add support for converting offsets into line
> 	numbers (e.g., for error reporting).
> 		
> tests/general/test_parsing_utils.m:
> tests/general/test_parsing_utils.exp:
> 	Update the test case.
> 

Please mention that this fixes bug 96 and close the bug.

> Index: library/parsing_utils.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/parsing_utils.m,v
> retrieving revision 1.1
> diff -u -r1.1 parsing_utils.m
> --- library/parsing_utils.m	28 Jan 2009 07:19:41 -0000	1.1
> +++ library/parsing_utils.m	16 Jun 2009 04:30:17 -0000
> @@ -70,6 +89,21 @@
>  :- pred current_offset(src::in, int::out,
>          ps::in, ps::out) is det.
>  
> +    % Compute a structure from the parser source which can be used to
> +    % convert offsets into line numbers and positions in the file (this
> +    % is useful for error reporting).
> +    %
> +:- type line_numbers.
> +
> +:- func src_to_line_numbers(src) = line_numbers.
> +
> +    % Convert an offset into a line number and position within the line
> +    % (the first line is number 1; the first character in a line is
> +    % position 1).
> +    %
> +:- pred offset_to_line_number_and_position(line_numbers::in, int::in,
> +        int::out, int::out) is det.
> +
>      % input_substring(Src, StartOffset, EndOffsetPlusOne, Substring)
>      % Copy the substring from the input occupying the offsets
>      % [StartOffset, EndOffsetPlusOne).

Great!  I wanted one of those!

> @@ -91,7 +125,7 @@
>  :- pred punct(string::in, src::in, unit::out,
>          ps::in, ps::out) is semidet.
>  
> -    % keyword(Src, IdChars, Keyword, _, !PS) matches Keyword exactly (i.e., it
> +    % keyword(IdChars, Keyword, Src, _, !PS) matches Keyword exactly (i.e., it
>      % must not be followed by any character in IdChars) and any subsequent
>      % whitespace.
>      %

Please check that similar problems are fixed regarding the order of arguments
in comments (bug 96).

> +
> +:- pred offset_to_line_number_and_position_2(line_numbers::in, int::in,
> +        int::in, int::in, int::out, int::out) is det.
> +
> +    % Perform a binary search looking for the offset of the line number
> +    % of the line containing Offset.
> +    %
> +offset_to_line_number_and_position_2(LineNos, Lo, Hi, Offset, LineNo, Pos) :-
> +    ( if Lo < Hi then
> +        Mid = (Lo + Hi) / 2,
> +        MidOffset = LineNos ^ elem(Mid),
> +        ( if MidOffset < Offset then
> +            offset_to_line_number_and_position_2(LineNos, Mid + 1, Hi, Offset,
> +                LineNo, Pos)
> +          else
> +            offset_to_line_number_and_position_2(LineNos, Lo, Mid, Offset,
> +                LineNo, Pos)
> +        )
> +      else
> +        LoOffset = LineNos ^ elem(Lo),
> +        LineNo = 1 + Lo,
> +        Pos = 1 + Offset - LoOffset
> +    ).
> +
> +%-----------------------------------------------------------------------------%
> +

It might be useful to put this binary search code into array.m.  It's
feasible that other developers might want to perform a similar binary
search.

> +%-----------------------------------------------------------------------------%
> +
>  whitespace(Src, unit, !PS) :-
>      ( if
>          next_char(Src, C, !PS),
>          char.is_whitespace(C)
>        then
> -        whitespace(Src, _, !PS)
> +        skip_whitespace(Src, !PS)
>        else
> -        true
> +        semidet_true
>      ).

This does not allow a programmer to customize how whitespace begins.  If
a programmer defines their whitespace function to include
char.is_whitespace plus the question mark symbol (a poor example), a
whitespace section beginning with a question mark symbol will not be
recognized by whitespace/4 But the behavior is different when one of the
builtin parsers such as punct/5 calls skip_whitespace internally.  I
suggest either fixing this so that whitespace/4 behaves predictably or
documenting it's behavior in the interface section.

Thanks, the rest is good.  Thanks for making these changes.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20090617/57e78d7f/attachment.sig>


More information about the reviews mailing list