[m-rev.] For review: updates to parsing_utils.m
Paul Bone
pbone at csse.unimelb.edu.au
Wed Jun 17 12:15:02 AEST 2009
On Tue, Jun 16, 2009 at 05:15:59PM +1000, Ralph Becket wrote:
>
> Estimated hours taken: 2
> Branches: main
>
> library/parsing_utils.m:
> Add support for parsers that update their own state (e.g., a symbol
> table). Add support for user-defined whitespace specification (e.g.,
> to include comments). Add support for converting offsets into line
> numbers (e.g., for error reporting).
>
> tests/general/test_parsing_utils.m:
> tests/general/test_parsing_utils.exp:
> Update the test case.
>
Please mention that this fixes bug 96 and close the bug.
> Index: library/parsing_utils.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/parsing_utils.m,v
> retrieving revision 1.1
> diff -u -r1.1 parsing_utils.m
> --- library/parsing_utils.m 28 Jan 2009 07:19:41 -0000 1.1
> +++ library/parsing_utils.m 16 Jun 2009 04:30:17 -0000
> @@ -70,6 +89,21 @@
> :- pred current_offset(src::in, int::out,
> ps::in, ps::out) is det.
>
> + % Compute a structure from the parser source which can be used to
> + % convert offsets into line numbers and positions in the file (this
> + % is useful for error reporting).
> + %
> +:- type line_numbers.
> +
> +:- func src_to_line_numbers(src) = line_numbers.
> +
> + % Convert an offset into a line number and position within the line
> + % (the first line is number 1; the first character in a line is
> + % position 1).
> + %
> +:- pred offset_to_line_number_and_position(line_numbers::in, int::in,
> + int::out, int::out) is det.
> +
> % input_substring(Src, StartOffset, EndOffsetPlusOne, Substring)
> % Copy the substring from the input occupying the offsets
> % [StartOffset, EndOffsetPlusOne).
Great! I wanted one of those!
> @@ -91,7 +125,7 @@
> :- pred punct(string::in, src::in, unit::out,
> ps::in, ps::out) is semidet.
>
> - % keyword(Src, IdChars, Keyword, _, !PS) matches Keyword exactly (i.e., it
> + % keyword(IdChars, Keyword, Src, _, !PS) matches Keyword exactly (i.e., it
> % must not be followed by any character in IdChars) and any subsequent
> % whitespace.
> %
Please check that similar problems are fixed regarding the order of arguments
in comments (bug 96).
> +
> +:- pred offset_to_line_number_and_position_2(line_numbers::in, int::in,
> + int::in, int::in, int::out, int::out) is det.
> +
> + % Perform a binary search looking for the offset of the line number
> + % of the line containing Offset.
> + %
> +offset_to_line_number_and_position_2(LineNos, Lo, Hi, Offset, LineNo, Pos) :-
> + ( if Lo < Hi then
> + Mid = (Lo + Hi) / 2,
> + MidOffset = LineNos ^ elem(Mid),
> + ( if MidOffset < Offset then
> + offset_to_line_number_and_position_2(LineNos, Mid + 1, Hi, Offset,
> + LineNo, Pos)
> + else
> + offset_to_line_number_and_position_2(LineNos, Lo, Mid, Offset,
> + LineNo, Pos)
> + )
> + else
> + LoOffset = LineNos ^ elem(Lo),
> + LineNo = 1 + Lo,
> + Pos = 1 + Offset - LoOffset
> + ).
> +
> +%-----------------------------------------------------------------------------%
> +
It might be useful to put this binary search code into array.m. It's
feasible that other developers might want to perform a similar binary
search.
> +%-----------------------------------------------------------------------------%
> +
> whitespace(Src, unit, !PS) :-
> ( if
> next_char(Src, C, !PS),
> char.is_whitespace(C)
> then
> - whitespace(Src, _, !PS)
> + skip_whitespace(Src, !PS)
> else
> - true
> + semidet_true
> ).
This does not allow a programmer to customize how whitespace begins. If
a programmer defines their whitespace function to include
char.is_whitespace plus the question mark symbol (a poor example), a
whitespace section beginning with a question mark symbol will not be
recognized by whitespace/4 But the behavior is different when one of the
builtin parsers such as punct/5 calls skip_whitespace internally. I
suggest either fixing this so that whitespace/4 behaves predictably or
documenting it's behavior in the interface section.
Thanks, the rest is good. Thanks for making these changes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: Digital signature
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20090617/57e78d7f/attachment.sig>
More information about the reviews
mailing list