for review: add code to read terms from strings
Fergus Henderson
fjh at cs.mu.OZ.AU
Wed May 13 05:51:43 AEST 1998
Hi,
Can someone -- how about Tom, or perhaps Mark Brown (since we
had that long discussion about this!) -- please review this one?
Estimated hours taken: 24
Add code to parse terms from strings rather than from streams.
The original intention for this change was twofold:
to increase expressiveness and to improve efficiency.
However, for the moment I have given up on the goal of
improving efficiency. See the comments in lexer.m.
library/io.m:
Add io__read_file_as_string/{4,5}, for efficiently
reading a whole file into a single string.
Add io__read_from_string/5, for reading terms of
any type from a string.
(Should that instead be named string__read instead?)
library/string.m:
Implement string__substring/4 more efficiently,
using `pragma c_code' rather than by calling
string__right(string__left(...)).
Export string__unsafe_index/3, and add new exported
predicate string__unsafe_substring/4 -- these
are versions of string__index and string__substring
that don't check for range errors. They are
needed to get reasonable efficiency when dealing
with very large strings.
library/string.nu.nl:
Add Prolog implementations of string__substring
and string__unsafe_substring.
library/lexer.m:
Add lexer__string_get_token_list/{5,6}, for parsing
tokens from a string. This required adding
`string_' versions of most of the lexical analysis routines.
Lots of code duplication, I'm afraid. Also the
string versions are still not as efficiency as they
could ideally be.
library/parser.m:
Add new predicates parser__read_term_from_string/{4,6}
and parser__parse_token_list.
compiler/bytecode_gen.m:
compiler/error_util.m:
compiler/fact_table.m:
compiler/term_errors.m:
Minor changes to use term__context as an ADT rather than
as a concrete data type.
In addition, I made the following change,
which I do NOT plan to commit (at least not yet).
compiler/prog_io.m:
Change to use io__read_file_as_string and
parser__read_term_from_string.
(Not committed because the existing method is
in currently more efficient.)
cvs server: Diffing .
Index: io.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/io.m,v
retrieving revision 1.153
diff -u -u -r1.153 io.m
--- io.m 1998/03/12 19:45:38 1.153
+++ io.m 1998/05/12 19:37:10
@@ -120,6 +120,12 @@
% Reads all the characters from the current input stream until
% eof or error.
+:- pred io__read_file_as_string(io__res, string, io__state, io__state).
+:- mode io__read_file_as_string(out, out, di, uo) is det.
+% Reads all the characters from the current input stream until
+% eof or error. Returns the result as a string rather than
+% as a list of char.
+
:- pred io__putback_char(char, io__state, io__state).
:- mode io__putback_char(in, di, uo) is det.
% Un-reads a character from the current input stream.
@@ -147,6 +153,13 @@
% Reads all the characters from the given input stream until
% eof or error.
+:- pred io__read_file_as_string(io__input_stream, io__res, string,
+ io__state, io__state).
+:- mode io__read_file_as_string(in, out, out, di, uo) is det.
+% Reads all the characters from the given input stream until
+% eof or error. Returns the result as a string rather than
+% as a list of char.
+
:- pred io__putback_char(io__input_stream, char, io__state, io__state).
:- mode io__putback_char(in, in, di, uo) is det.
% Un-reads a character from specified stream.
@@ -172,6 +185,29 @@
% or if encounters an I/O error, then it returns
% `error(Message, LineNumber)'.
+% The type `posn' represents a position within a string.
+:- type posn
+ ---> posn(int, int, int).
+ % line number, offset for start of line, current offset
+ % (the first two are used only for the purposes of
+ % computing term_contexts, for use e.g. in error messages).
+
+:- pred io__read_from_string(string, string, int, io__read_result(T), posn, posn).
+:- mode io__read_from_string(in, in, in, out, in, out) is det.
+% mode io__read_from_string(FileName, String, MaxPos, Result, Posn0, Posn):
+% Same as io__read/4 except that it reads from
+% a string rather than from a stream.
+% FileName is the name of the source (for use in error messages).
+% String is the string to be parsed.
+% Posn0 is the position to start parsing from.
+% Posn is the position one past where the term read in ends.
+% MaxPos is the offset in the string which should be
+% considered the end-of-stream -- this is the upper bound
+% for Posn. (In the usual case, MaxPos is just the length
+% of the String.)
+% WARNING: if MaxPos > length of String then the behaviour
+% is UNDEFINED.
+
:- pred io__ignore_whitespace(io__result, io__state, io__state).
:- mode io__ignore_whitespace(out, di, uo) is det.
% Discards all the whitespace from the current stream.
@@ -957,7 +993,7 @@
:- implementation.
:- import_module map, dir, term, term_io, varset, require, benchmarking, array.
-:- import_module int, std_util.
+:- import_module int, std_util, parser.
:- type io__state ---> io__state(c_pointer).
% Values of type `io__state' are never really used:
@@ -1190,6 +1226,218 @@
io__read_file_2(Stream, [Char|Chars0], Result)
).
+%-----------------------------------------------------------------------------%
+
+io__read_file_as_string(Result, String) -->
+ io__input_stream(Stream),
+ io__read_file_as_string(Stream, Result, String).
+
+io__read_file_as_string(Stream, Result, String) -->
+ %
+ % check if the stream is a regular file;
+ % if so, allocate a buffer according to the
+ % size of the file. Otherwise, just use
+ % a default buffer size of 4k minus a bit
+ % (to give malloc some room).
+ %
+ io__stream_file_size(Stream, FileSize),
+ { FileSize >= 0 ->
+ BufferSize0 is FileSize + 1
+ ;
+ BufferSize0 = 4000
+ },
+ { io__alloc_buffer(BufferSize0, Buffer0) },
+
+ %
+ % Read the file into the buffer (resizing it as we go if necessary),
+ % convert the buffer into a string, and see if anything went wrong.
+ %
+ io__clear_err(Stream),
+ { Pos0 = 0 },
+ io__read_file_as_string_2(Stream, Buffer0, Pos0, BufferSize0,
+ Buffer, Pos, BufferSize),
+ { require(Pos < BufferSize, "io__read_file_as_string: overflow") },
+ { io__buffer_to_string(Buffer, Pos, String) },
+ io__check_err(Stream, Result).
+
+:- pred io__read_file_as_string_2(io__input_stream, buffer, int, int,
+ buffer, int, int, io__state, io__state).
+:- mode io__read_file_as_string_2(in, di, in, in,
+ uo, out, out, di, uo) is det.
+
+io__read_file_as_string_2(Stream, Buffer0, Pos0, Size0, Buffer, Pos, Size) -->
+ io__read_into_buffer(Stream, Buffer0, Pos0, Size0,
+ Buffer1, Pos1),
+ ( { Pos1 = Pos0 } ->
+ % end of file (or error)
+ { Size = Size0 },
+ { Pos = Pos1 },
+ { Buffer = Buffer1 }
+ ; { Pos1 = Size0 } ->
+ % full buffer
+ { Size1 is Size0 * 2 },
+ { io__resize_buffer(Buffer1, Size0, Size1, Buffer2) },
+ io__read_file_as_string_2(Stream, Buffer2, Pos1, Size1,
+ Buffer, Pos, Size)
+ ;
+ io__read_file_as_string_2(Stream, Buffer1, Pos1, Size0,
+ Buffer, Pos, Size)
+ ).
+
+%-----------------------------------------------------------------------------%
+
+:- pred io__clear_err(stream, io__state, io__state).
+:- mode io__clear_err(in, di, uo) is det.
+% same as ANSI C's clearerr().
+
+:- pragma c_code(io__clear_err(Stream::in, _IO0::di, _IO::uo),
+ will_not_call_mercury,
+"{
+ MercuryFile *f = (MercuryFile *) Stream;
+ clearerr(f->file);
+}").
+
+:- pred io__check_err(stream, io__res, io__state, io__state).
+:- mode io__check_err(in, out, di, uo) is det.
+
+io__check_err(Stream, Res) -->
+ io__ferror(Stream, Int, Msg),
+ { Int = 0 ->
+ Res = ok
+ ;
+ Res = error(Msg)
+ }.
+
+:- pred io__ferror(stream, int, string, io__state, io__state).
+:- mode io__ferror(in, out, out, di, uo) is det.
+% similar to ANSI C's ferror().
+
+:- pragma c_code(ferror(Stream::in, RetVal::out, RetStr::out,
+ _IO0::di, _IO::uo),
+ will_not_call_mercury,
+"{
+ MercuryFile *f = (MercuryFile *) Stream;
+ RetVal = ferror(f->file);
+ ML_maybe_make_err_msg(RetVal != 0, ""read failed: "", RetStr);
+}").
+
+%-----------------------------------------------------------------------------%
+
+:- pred io__stream_file_size(stream, int, io__state, io__state).
+:- mode io__stream_file_size(in, out, di, uo) is det.
+% io__stream_file_size(Stream, Size):
+% if Stream is a regular file, then Size is its size,
+% otherwise Size is -1.
+
+:- pragma c_header_code("
+ #include <unistd.h>
+ #include <sys/stat.h>
+
+ /*
+ ** in case some non-POSIX implementation doesn't have S_ISREG(),
+ ** define it to always fail
+ */
+ #ifndef S_ISREG
+ #define S_ISREG(x) FALSE
+ #endif
+").
+
+:- pragma c_code(io__stream_file_size(Stream::in, Size::out,
+ _IO0::di, _IO::uo),
+ will_not_call_mercury,
+"{
+ MercuryFile *f = (MercuryFile *) Stream;
+ struct stat s;
+ if (fstat(fileno(f->file), &s) == 0 && S_ISREG(s.st_mode)) {
+ Size = s.st_size;
+ } else {
+ Size = -1;
+ }
+}").
+
+%-----------------------------------------------------------------------------%
+
+% A `buffer' is just an array of Chars.
+% Buffer sizes are measured in Chars.
+
+:- type buffer ---> buffer(c_pointer).
+
+:- pred io__alloc_buffer(int::in, buffer::uo) is det.
+:- pragma c_code(io__alloc_buffer(Size::in, Buffer::uo),
+ will_not_call_mercury,
+"{
+ incr_hp_atomic(Buffer,
+ (Size * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+}").
+
+:- pred io__resize_buffer(buffer::di, int::in, int::in, buffer::uo) is det.
+:- pragma c_code(io__resize_buffer(Buffer0::di, OldSize::in, NewSize::in,
+ Buffer::uo),
+ will_not_call_mercury,
+"{
+ Char *buffer0 = (Char *) Buffer0;
+ Char *buffer;
+
+#ifdef CONSERVATIVE_GC
+ buffer = GC_REALLOC(buffer0, NewSize * sizeof(Char));
+#else
+ if (buffer0 + OldSize == (Char *) MR_hp) {
+ Word next;
+ incr_hp_atomic(next,
+ (NewSize * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+ assert(buffer0 + oldSize = (Char *) next);
+ buffer = buffer0;
+ } else {
+ /* just have to alloc and copy */
+ incr_hp_atomic(Buffer,
+ (NewSize * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+ buffer = (Char *) buffer;
+ if (OldSize > NewSize) {
+ memcpy(buffer, buffer0, NewSize);
+ } else {
+ memcpy(buffer, buffer0, OldSize);
+ }
+ }
+#endif
+
+ Buffer = (Word) buffer;
+}").
+
+:- pred io__buffer_to_string(buffer::di, int::in, string::uo) is det.
+:- pragma c_code(io__buffer_to_string(Buffer::di, Len::in, Str::uo),
+ will_not_call_mercury,
+"{
+ Str = (String) Buffer;
+ Str[Len] = '\\0';
+}").
+
+:- pred io__buffer_to_string(buffer::di, string::uo) is det.
+:- pragma c_code(io__buffer_to_string(Buffer::di, Str::uo),
+ will_not_call_mercury,
+"{
+ Str = (String) Buffer;
+}").
+
+:- pred io__read_into_buffer(stream::in, buffer::di, int::in, int::in,
+ buffer::uo, int::out, io__state::di, io__state::uo) is det.
+
+:- pragma c_code(io__read_into_buffer(Stream::in,
+ Buffer0::di, Pos0::in, Size::in,
+ Buffer::uo, Pos::out, _IO0::di, _IO::uo),
+ will_not_call_mercury,
+"{
+ MercuryFile *f = (MercuryFile *) Stream;
+ char *buffer = (Char *) Buffer0;
+ int items_read;
+
+ items_read = fread(buffer + Pos0, sizeof(Char), Size - Pos0, f->file);
+
+ Buffer = (Word) buffer;
+ Pos = Pos0 + items_read;
+}").
+
+%-----------------------------------------------------------------------------%
+
io__read_binary_file(Result) -->
io__binary_input_stream(Stream),
io__read_binary_file(Stream, Result).
@@ -1228,24 +1476,38 @@
io__read(Result) -->
term_io__read_term(ReadResult),
+ io__get_line_number(LineNumber),
+ { io__process_read_term(ReadResult, LineNumber, Result) }.
+
+io__read_from_string(FileName, String, Len, Result, Posn0, Posn) :-
+ parser__read_term_from_string(FileName, String, Len, Posn0, Posn, ReadResult),
+ Posn = posn(LineNumber, _, _),
+ io__process_read_term(ReadResult, LineNumber, Result).
+
+:- pred io__process_read_term(read_term, int, io__read_result(T)).
+:- mode io__process_read_term(in, in, out) is det.
+
+io__process_read_term(ReadResult, LineNumber, Result) :-
(
- { ReadResult = term(_VarSet, Term) },
- ( { term_to_type(Term, Type) } ->
- { Result = ok(Type) }
+ ReadResult = term(_VarSet, Term),
+ ( term_to_type(Term, Type) ->
+ Result = ok(Type)
;
- io__get_line_number(LineNumber),
- ( { \+ term__is_ground(Term) } ->
- { Result = error("io__read: the term read was not a ground term", LineNumber) }
+ ( \+ term__is_ground(Term) ->
+ Result = error("io__read: the term read was not a ground term",
+ LineNumber)
;
- { Result = error("io__read: the term read did not have the right type", LineNumber) }
+ Result = error(
+ "io__read: the term read did not have the right type",
+ LineNumber)
)
)
;
- { ReadResult = eof },
- { Result = eof }
+ ReadResult = eof,
+ Result = eof
;
- { ReadResult = error(String, Int) },
- { Result = error(String, Int) }
+ ReadResult = error(String, Int),
+ Result = error(String, Int)
).
io__read_anything(Stream, Result) -->
Index: lexer.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/lexer.m,v
retrieving revision 1.29
diff -u -u -r1.29 lexer.m
--- lexer.m 1998/03/03 17:25:59 1.29
+++ lexer.m 1998/05/12 19:37:26
@@ -55,6 +55,25 @@
% Keep reading until either we encounter either an `end' token
% (i.e. a full stop followed by whitespace) or the end-of-file.
+% The type `offset' represents a (zero-based) offset into a string.
+:- type offset == int.
+
+:- pred lexer__string_get_token_list(string, offset, token_list, posn, posn).
+:- mode lexer__string_get_token_list(in, in, out, in, out) is det.
+% lexer__string_get_token_list(String, MaxOffset, Tokens, InitialPos, FinalPos):
+% Scan a list of tokens from a string,
+% starting at the current offset specified by InitialPos.
+% Keep scanning until either we encounter either an `end' token
+% (i.e. a full stop followed by whitespace) or until
+% we reach MaxOffset. (MaxOffset must be =< the length of the string.)
+% Return the tokens scanned in Tokens, and return the position
+% one character past the end of the last token in FinalPos.
+
+:- pred lexer__string_get_token_list(string, token_list, posn, posn).
+:- mode lexer__string_get_token_list(in, out, in, out) is det.
+% lexer__string_get_token_list(String, Tokens, InitialPos, FinalPos):
+% calls string_get_token_list/5 above with MaxPos = length of String.
+
:- pred lexer__token_to_string(token, string).
:- mode lexer__token_to_string(in, out) is det.
% Convert a token to a human-readable string describing the token.
@@ -72,8 +91,40 @@
%-----------------------------------------------------------------------------%
:- implementation.
-:- import_module list, require, string, int.
+:- import_module list, term, require, string, int.
+%
+% Note that there are two implementations of most predicates here:
+% one which deals with strings, the other that deals with io__states.
+% We can't write the io__state version in terms of the string
+% version because we don't know how much string to slurp up
+% until after we've lexically analysed it. Some interactive
+% applications require the old Prolog behaviour of stopping
+% after an end token (i.e. `.' plus whitespace) rather than
+% reading in whole lines.
+% Conversely, we can't write the string version using the io__state
+% version, since that would require either cheating with the io__state
+% or ruining the string interface.
+%
+% An alternative would be to write both versions in terms
+% of a generic "char_stream" typeclass, with instances
+% for io__states and for strings.
+% However, for this to be acceptably efficient it would
+% require the compiler to specialize the code, which
+% currently (13 May 98) it is not capable of doing.
+%
+% In fact, the string version is still not as efficient as I would
+% like. The compiler ought to (but currently doesn't) unfold all
+% the instances of the `posn' type. We could do this type unfolding
+% by hand, but it would be very tedious and it would make the code
+% less readable. If/when there is compiler support for this, we
+% should also think about moving the `String' and `Len' arguments
+% into the posn (or making a new `lexer_state' struct which contains
+% both the posn and the String and Len arguments).
+%
+
+%-----------------------------------------------------------------------------%
+
lexer__token_to_string(name(Name), String) :-
string__append_list(["token '", Name, "'"], String).
lexer__token_to_string(variable(Var), String) :-
@@ -128,12 +179,92 @@
lexer__get_token_list(Tokens1)
).
+lexer__string_get_token_list(String, Tokens) -->
+ { string__length(String, Len) },
+ lexer__string_get_token_list(String, Len, Tokens).
+
+lexer__string_get_token_list(String, Len, Tokens) -->
+ lexer__string_get_token(String, Len, Token, Context),
+ ( { Token = eof } ->
+ { Tokens = token_nil }
+ ; { Token = end ; Token = error(_) ; Token = io_error(_) } ->
+ { Tokens = token_cons(Token, Context, token_nil) }
+ ;
+ { Tokens = token_cons(Token, Context, Tokens1) },
+ lexer__string_get_token_list(String, Len, Tokens1)
+ ).
+
+%-----------------------------------------------------------------------------%
+
+% some low-level routines
+
:- pred lexer__get_context(token_context, io__state, io__state).
:- mode lexer__get_context(out, di, uo) is det.
lexer__get_context(Context) -->
io__get_line_number(Context).
+:- type string_token_context == token_context.
+
+:- pred lexer__string_get_context(posn, string_token_context, posn, posn).
+:- mode lexer__string_get_context(in, out, in, out) is det.
+
+lexer__string_get_context(StartPosn, Context, EndPosn, EndPosn) :-
+ StartPosn = posn(StartLineNum, _, _),
+ Context = StartLineNum.
+ % In future, we might want to modify this code to read something
+ % like this:
+ % posn_to_line_and_column(StartPosn, StartLineNum, StartColumn),
+ % posn_to_line_and_column(EndPosn, EndLineNum, EndColumn),
+ % Context = detailed(StartLine, StartColumn, EndLine, EndColumn).
+
+:- pred lexer__string_read_char(string, int, char, posn, posn).
+:- mode lexer__string_read_char(in, in, out, in, out) is semidet.
+
+:- pragma inline(lexer__string_read_char/5).
+
+lexer__string_read_char(String, Len, Char, Posn0, Posn) :-
+ Posn0 = posn(LineNum0, LineOffset0, Offset0),
+ Offset0 < Len,
+ string__unsafe_index(String, Offset0, Char),
+ Offset is Offset0 + 1,
+ ( Char = '\n' ->
+ LineNum is LineNum0 + 1,
+ Posn = posn(LineNum, Offset, Offset)
+ ;
+ Posn = posn(LineNum0, LineOffset0, Offset)
+ ).
+
+:- pred lexer__string_ungetchar(string, posn, posn).
+:- mode lexer__string_ungetchar(in, in, out) is det.
+
+lexer__string_ungetchar(String, Posn0, Posn) :-
+ Posn0 = posn(LineNum0, LineOffset0, Offset0),
+ Offset is Offset0 - 1,
+ string__unsafe_index(String, Offset, Char),
+ ( Char = '\n' ->
+ LineNum is LineNum0 - 1,
+ Posn = posn(LineNum, Offset, Offset)
+ ;
+ Posn = posn(LineNum0, LineOffset0, Offset)
+ ).
+
+:- pred lexer__grab_string(string, posn, string, posn, posn).
+:- mode lexer__grab_string(in, in, out, in, out) is det.
+
+lexer__grab_string(String, Posn0, SubString, Posn, Posn) :-
+ Posn0 = posn(_, _, Offset0),
+ Posn = posn(_, _, Offset),
+ Count is Offset - Offset0,
+ string__unsafe_substring(String, Offset0, Count, SubString).
+
+:- pred lexer__string_set_line_number(int, posn, posn).
+:- mode lexer__string_set_line_number(in, in, out) is det.
+
+lexer__string_set_line_number(LineNumber, Posn0, Posn) :-
+ Posn0 = posn(_, _, Offset),
+ Posn = posn(LineNumber, Offset, Offset).
+
%-----------------------------------------------------------------------------%
:- pred lexer__get_token(token, token_context, io__state, io__state).
@@ -190,6 +321,60 @@
)
).
+:- pred lexer__string_get_token(string, int, token, token_context, posn, posn).
+:- mode lexer__string_get_token(in, in, out, out, in, out) is det.
+
+lexer__string_get_token(String, Len, Token, Context) -->
+ =(Posn0),
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ' ' ; Char = '\t' ; Char = '\n' } ->
+ lexer__string_get_token_2(String, Len, Token, Context)
+ ; { char__is_upper(Char) ; Char = '_' } ->
+ lexer__string_get_variable(String, Len, Posn0,
+ Token, Context)
+ ; { char__is_lower(Char) } ->
+ lexer__string_get_name(String, Len, Posn0,
+ Token, Context)
+ ; { Char = '0' } ->
+ lexer__string_get_zero(String, Len, Posn0,
+ Token, Context)
+ ; { char__is_digit(Char) } ->
+ lexer__string_get_number(String, Len, Posn0,
+ Token, Context)
+ ; { lexer__special_token(Char, SpecialToken) } ->
+ lexer__string_get_context(Posn0, Context),
+ { SpecialToken = open ->
+ Token = open_ct
+ ;
+ Token = SpecialToken
+ }
+ ; { Char = ('.') } ->
+ lexer__string_get_dot(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('%') } ->
+ lexer__string_skip_to_eol(String, Len, Token, Context)
+ ; { Char = '"' ; Char = '''' } ->
+ lexer__string_get_quoted_name(String, Len, Char, [],
+ Posn0, Token, Context)
+ ; { Char = ('/') } ->
+ lexer__string_get_slash(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('#') } ->
+ =(Posn1),
+ lexer__string_get_source_line_number(String, Len,
+ Posn1, Token, Context)
+ ; { lexer__graphic_token_char(Char) } ->
+ lexer__string_get_graphic(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = junk(Char) }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = eof }
+ ).
+
%-----------------------------------------------------------------------------%
:- pred lexer__get_token_2(token, token_context, io__state, io__state).
@@ -246,6 +431,59 @@
)
).
+:- pred lexer__string_get_token_2(string, int, token, token_context,
+ posn, posn).
+:- mode lexer__string_get_token_2(in, in, out, out, in, out) is det.
+
+lexer__string_get_token_2(String, Len, Token, Context) -->
+ =(Posn0),
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ' ' ; Char = '\t' ; Char = '\n' } ->
+ lexer__string_get_token_2(String, Len, Token, Context)
+ ; { char__is_upper(Char) ; Char = '_' } ->
+ lexer__string_get_variable(String, Len, Posn0,
+ Token, Context)
+ ; { char__is_lower(Char) } ->
+ lexer__string_get_name(String, Len, Posn0,
+ Token, Context)
+ ; { Char = '0' } ->
+ lexer__string_get_zero(String, Len, Posn0,
+ Token, Context)
+ ; { char__is_digit(Char) } ->
+ lexer__string_get_number(String, Len, Posn0,
+ Token, Context)
+ ; { lexer__special_token(Char, SpecialToken) } ->
+ lexer__string_get_context(Posn0, Context),
+ { Token = SpecialToken }
+ ; { Char = ('.') } ->
+ lexer__string_get_dot(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('%') } ->
+ lexer__string_skip_to_eol(String, Len, Token, Context)
+ ; { Char = '"' ; Char = '''' } ->
+ lexer__string_get_quoted_name(String, Len, Char, [],
+ Posn0, Token, Context)
+ ; { Char = ('/') } ->
+ lexer__string_get_slash(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('#') } ->
+ =(Posn1),
+ lexer__string_get_source_line_number(String, Len,
+ Posn1, Token, Context)
+ ; { lexer__graphic_token_char(Char) } ->
+ lexer__string_get_graphic(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = junk(Char) }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = eof }
+ ).
+
+%-----------------------------------------------------------------------------%
+
%-----------------------------------------------------------------------------%
:- pred lexer__special_token(char, token).
@@ -303,6 +541,29 @@
)
).
+:- pred lexer__string_get_dot(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_dot(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_dot(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { lexer__whitespace_after_dot(Char) } ->
+ lexer__string_ungetchar(String),
+ lexer__string_get_context(Posn0, Context),
+ { Token = end }
+ ; { lexer__graphic_token_char(Char) } ->
+ lexer__string_get_graphic(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__string_get_context(Posn0, Context),
+ { Token = name(".") }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = end }
+ ).
+
:- pred lexer__whitespace_after_dot(char).
:- mode lexer__whitespace_after_dot(in) is semidet.
@@ -334,6 +595,23 @@
)
).
+:- pred lexer__string_skip_to_eol(string, int, token, token_context,
+ posn, posn).
+:- mode lexer__string_skip_to_eol(in, in, out, out, in, out) is det.
+
+lexer__string_skip_to_eol(String, Len, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = '\n' } ->
+ lexer__string_get_token_2(String, Len, Token, Context)
+ ;
+ lexer__string_skip_to_eol(String, Len, Token, Context)
+ )
+ ;
+ =(Posn),
+ lexer__string_get_context(Posn, Context),
+ { Token = eof }
+ ).
+
:- pred lexer__get_slash(token, token_context, io__state, io__state).
:- mode lexer__get_slash(out, out, di, uo) is det.
@@ -352,12 +630,34 @@
lexer__get_context(Context),
lexer__get_graphic([Char, '/'], Token)
;
- lexer__get_context(Context),
io__putback_char(Char),
+ lexer__get_context(Context),
{ Token = name("/") }
)
).
+:- pred lexer__string_get_slash(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_slash(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_slash(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ('*') } ->
+ lexer__string_get_comment(String, Len, Posn0,
+ Token, Context)
+ ; { lexer__graphic_token_char(Char) } ->
+ lexer__string_get_graphic(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__string_get_context(Posn0, Context),
+ { Token = name("/") }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = name("/") }
+ ).
+
:- pred lexer__get_comment(token, token_context, io__state, io__state).
:- mode lexer__get_comment(out, out, di, uo) is det.
@@ -377,6 +677,24 @@
)
).
+:- pred lexer__string_get_comment(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_comment(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_comment(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ('*') } ->
+ lexer__string_get_comment_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_get_comment(String, Len, Posn0,
+ Token, Context)
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated '/*' comment") }
+ ).
+
:- pred lexer__get_comment_2(token, token_context, io__state, io__state).
:- mode lexer__get_comment_2(out, out, di, uo) is det.
@@ -390,6 +708,7 @@
{ Token = error("unterminated '/*' comment") }
; { Result = ok(Char) },
( { Char = ('/') } ->
+ % end of /* ... */ comment, so get next token
lexer__get_token_2(Token, Context)
; { Char = ('*') } ->
lexer__get_comment_2(Token, Context)
@@ -398,6 +717,27 @@
)
).
+:- pred lexer__string_get_comment_2(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_comment_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_comment_2(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ('/') } ->
+ % end of /* ... */ comment, so get next token
+ lexer__string_get_token_2(String, Len, Token, Context)
+ ; { Char = ('*') } ->
+ lexer__string_get_comment_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_get_comment(String, Len, Posn0,
+ Token, Context)
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated '/*' comment") }
+ ).
+
%-----------------------------------------------------------------------------%
% quoted names and quoted strings
@@ -422,6 +762,29 @@
)
).
+:- pred lexer__string_get_quoted_name(string, int, char, list(char), posn,
+ token, string_token_context, posn, posn).
+:- mode lexer__string_get_quoted_name(in, in, in, in, in, out, out, in, out)
+ is det.
+
+lexer__string_get_quoted_name(String, Len, QuoteChar, Chars, Posn0,
+ Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = QuoteChar } ->
+ lexer__string_get_quoted_name_quote(String, Len,
+ QuoteChar, Chars, Posn0, Token, Context)
+ ; { Char = ('\\') } ->
+ lexer__string_get_quoted_name_escape(String, Len,
+ QuoteChar, Chars, Posn0, Token, Context)
+ ;
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ [Char | Chars], Posn0, Token, Context)
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated quote") }
+ ).
+
:- pred lexer__get_quoted_name_quote(char, list(char), token,
io__state, io__state).
:- mode lexer__get_quoted_name_quote(in, in, out, di, uo) is det.
@@ -441,6 +804,27 @@
)
).
+:- pred lexer__string_get_quoted_name_quote(string, int, char, list(char),
+ posn, token, string_token_context, posn, posn).
+:- mode lexer__string_get_quoted_name_quote(in, in, in, in, in, out, out,
+ in, out) is det.
+
+lexer__string_get_quoted_name_quote(String, Len, QuoteChar, Chars, Posn0,
+ Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = QuoteChar } ->
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ [Char | Chars], Posn0, Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__string_get_context(Posn0, Context),
+ { lexer__finish_quoted_name(QuoteChar, Chars, Token) }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { lexer__finish_quoted_name(QuoteChar, Chars, Token) }
+ ).
+
:- pred lexer__finish_quoted_name(char, list(char), token).
:- mode lexer__finish_quoted_name(in, in, out) is det.
@@ -480,6 +864,38 @@
)
).
+:- pred lexer__string_get_quoted_name_escape(string, int, char, list(char),
+ posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_quoted_name_escape(in, in, in, in, in, out, out,
+ in, out) is det.
+
+lexer__string_get_quoted_name_escape(String, Len, QuoteChar, Chars, Posn0,
+ Token, Context) -->
+ =(Posn1),
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = '\n' } ->
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ Chars, Posn0, Token, Context)
+ ; { lexer__escape_char(Char, EscapedChar) } ->
+ { Chars1 = [EscapedChar | Chars] },
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ Chars1, Posn0, Token, Context)
+ ; { Char = 'x' } ->
+ lexer__string_get_hex_escape(String, Len, QuoteChar,
+ Chars, [], Posn0, Token, Context)
+ ; { char__is_octal_digit(Char) } ->
+ lexer__string_get_octal_escape(String, Len, QuoteChar,
+ Chars, [Char], Posn0, Token, Context)
+ ;
+ lexer__string_get_context(Posn1, Context),
+ { Token = error("invalid escape character") }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated quoted name") }
+ ).
+
:- pred lexer__escape_char(char, char).
:- mode lexer__escape_char(in, out) is semidet.
@@ -517,6 +933,30 @@
)
).
+:- pred lexer__string_get_hex_escape(string, int, char, list(char), list(char),
+ posn, token, string_token_context, posn, posn).
+:- mode lexer__string_get_hex_escape(in, in, in, in, in, in, out, out, in, out)
+ is det.
+
+lexer__string_get_hex_escape(String, Len, QuoteChar, Chars, HexChars, Posn0,
+ Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_hex_digit(Char) } ->
+ lexer__string_get_hex_escape(String, Len, QuoteChar,
+ Chars, [Char | HexChars], Posn0,
+ Token, Context)
+ ; { Char = ('\\') } ->
+ lexer__string_finish_hex_escape(String, Len, QuoteChar,
+ Chars, HexChars, Posn0, Token, Context)
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated hex escape") }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("unterminated quote") }
+ ).
+
:- pred lexer__finish_hex_escape(char, list(char), list(char),
token, io__state, io__state).
:- mode lexer__finish_hex_escape(in, in, in, out, di, uo) is det.
@@ -536,6 +976,31 @@
)
).
+:- pred lexer__string_finish_hex_escape(string, int, char, list(char),
+ list(char), posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_finish_hex_escape(in, in, in, in, in, in, out, out, in, out)
+ is det.
+
+lexer__string_finish_hex_escape(String, Len, QuoteChar, Chars, HexChars, Posn0,
+ Token, Context) -->
+ ( { HexChars = [] } ->
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("empty hex escape") }
+ ;
+ { lexer__rev_char_list_to_string(HexChars, HexString) },
+ (
+ { string__base_string_to_int(16, HexString, Int) },
+ { char__to_int(Char, Int) }
+ ->
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ [Char|Chars], Posn0, Token, Context)
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = error("invalid hex escape") }
+ )
+ ).
+
:- pred lexer__get_octal_escape(char, list(char), list(char),
token, io__state, io__state).
:- mode lexer__get_octal_escape(in, in, in, out, di, uo) is det.
@@ -566,6 +1031,40 @@
)
).
+:- pred lexer__string_get_octal_escape(string, int, char, list(char),
+ list(char), posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_octal_escape(in, in, in, in, in, in, out, out,
+ in, out) is det.
+
+lexer__string_get_octal_escape(String, Len, QuoteChar, Chars, OctalChars,
+ Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_octal_digit(Char) } ->
+ lexer__string_get_octal_escape(String, Len,
+ QuoteChar, Chars, [Char | OctalChars], Posn0,
+ Token, Context)
+ ; { Char = ('\\') } ->
+ lexer__string_finish_octal_escape(String, Len,
+ QuoteChar, Chars, OctalChars, Posn0,
+ Token, Context)
+ ;
+ /******
+ % We don't report this as an error since
+ % we need bug-for-bug compatibility with
+ % NU-Prolog
+ { Token = error("unterminated octal escape") }
+ ******/
+ lexer__string_ungetchar(String),
+ lexer__string_finish_octal_escape(String, Len,
+ QuoteChar, Chars, OctalChars, Posn0,
+ Token, Context)
+ )
+ ;
+ { Token = error("unterminated quote") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__finish_octal_escape(char, list(char), list(char),
token, io__state, io__state).
:- mode lexer__finish_octal_escape(in, in, in, out, di, uo) is det.
@@ -585,6 +1084,31 @@
)
).
+:- pred lexer__string_finish_octal_escape(string, int, char, list(char),
+ list(char), posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_finish_octal_escape(in, in, in, in, in, in, out, out,
+ in, out) is det.
+
+lexer__string_finish_octal_escape(String, Len, QuoteChar, Chars, OctalChars,
+ Posn0, Token, Context) -->
+ ( { OctalChars = [] } ->
+ { Token = error("empty octal escape") },
+ lexer__string_get_context(Posn0, Context)
+ ;
+ { lexer__rev_char_list_to_string(OctalChars, OctalString) },
+ (
+ { string__base_string_to_int(8, OctalString, Int) },
+ { char__to_int(Char, Int) }
+ ->
+ lexer__string_get_quoted_name(String, Len, QuoteChar,
+ [Char|Chars], Posn0, Token, Context)
+ ;
+ { Token = error("invalid octal escape") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ).
+
%-----------------------------------------------------------------------------%
% names and variables
@@ -609,6 +1133,27 @@
)
).
+:- pred lexer__string_get_name(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_name(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_name(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_alnum_or_underscore(Char) } ->
+ lexer__string_get_name(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, Name),
+ { Token = name(Name) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, Name),
+ { Token = name(Name) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
%
% A line number directive token is `#' followed by an integer
% (specifying the line number) followed by a newline.
@@ -664,6 +1209,59 @@
)
).
+ %
+ % A line number directive token is `#' followed by an integer
+ % (specifying the line number) followed by a newline.
+ % Such a token sets the source line number for the next
+ % line, but it is otherwise ignored. This means that line number
+ % directives may appear anywhere that a token may appear, including
+ % in the middle of terms.
+ % (The source file name can be set with a `:- pragma source_file'
+ % declaration.)
+ %
+
+:- pred lexer__string_get_source_line_number(string, int, posn,
+ token, token_context, posn, posn).
+:- mode lexer__string_get_source_line_number(in, in, in, out, out, in, out)
+ is det.
+
+lexer__string_get_source_line_number(String, Len, Posn1, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_source_line_number(String, Len,
+ Posn1, Token, Context)
+ ; { Char = '\n' } ->
+ lexer__grab_string(String, Posn1, LineNumString),
+ (
+ { string__base_string_to_int(10, LineNumString,
+ LineNum) },
+ { LineNum > 0 }
+ ->
+ lexer__string_set_line_number(LineNum),
+ lexer__string_get_token(String, Len, Token, Context)
+ ;
+ lexer__string_get_context(Posn1, Context),
+ { string__append_list([
+ "invalid line number `", LineNumString,
+ "' in `#' line number directive"],
+ Message) },
+ { Token = error(Message) }
+ )
+ ;
+ lexer__string_get_context(Posn1, Context),
+ { string__from_char_list([Char], DirectiveString) },
+ { string__append_list([
+ "invalid character `", DirectiveString,
+ "' in `#' line number directive"],
+ Message) },
+ { Token = error(Message) }
+ )
+ ;
+ lexer__string_get_context(Posn1, Context),
+ { Token = error(
+ "unexpected end-of-file in `#' line number directive") }
+ ).
+
:- pred lexer__get_graphic(list(char), token, io__state, io__state).
:- mode lexer__get_graphic(in, out, di, uo) is det.
@@ -684,6 +1282,27 @@
)
).
+:- pred lexer__string_get_graphic(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_graphic(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_graphic(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { lexer__graphic_token_char(Char) } ->
+ lexer__string_get_graphic(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, Name),
+ { Token = name(Name) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, Name),
+ lexer__string_get_context(Posn0, Context),
+ { Token = name(Name) }
+ ).
+
:- pred lexer__get_variable(list(char), token, io__state, io__state).
:- mode lexer__get_variable(in, out, di, uo) is det.
@@ -704,6 +1323,27 @@
)
).
+:- pred lexer__string_get_variable(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_variable(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_variable(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_alnum_or_underscore(Char) } ->
+ lexer__string_get_variable(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, VariableName),
+ { Token = variable(VariableName) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, VariableName),
+ { Token = variable(VariableName) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
%-----------------------------------------------------------------------------%
% integer and float literals
@@ -738,6 +1378,43 @@
)
).
+:- pred lexer__string_get_zero(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_zero(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_zero(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_number(String, Len, Posn0,
+ Token, Context)
+ ; { Char = '''' } ->
+ lexer__string_get_char_code(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'b' } ->
+ lexer__string_get_binary(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'o' } ->
+ lexer__string_get_octal(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'x' } ->
+ lexer__string_get_hex(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('.') } ->
+ lexer__string_get_int_dot(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'e' ; Char = 'E' } ->
+ lexer__string_get_float_exponent(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__string_get_context(Posn0, Context),
+ { Token = integer(0) }
+ )
+ ;
+ lexer__string_get_context(Posn0, Context),
+ { Token = integer(0) }
+ ).
+
:- pred lexer__get_char_code(token, io__state, io__state).
:- mode lexer__get_char_code(out, di, uo) is det.
@@ -752,6 +1429,20 @@
{ Token = integer(CharCode) }
).
+:- pred lexer__string_get_char_code(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_char_code(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_char_code(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ { char__to_int(Char, CharCode) },
+ { Token = integer(CharCode) },
+ lexer__string_get_context(Posn0, Context)
+ ;
+ { Token = error("unterminated char code constant") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_binary(token, io__state, io__state).
:- mode lexer__get_binary(out, di, uo) is det.
@@ -770,6 +1461,25 @@
)
).
+:- pred lexer__string_get_binary(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_binary(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_binary(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_binary_digit(Char) } ->
+ lexer__string_get_binary_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ { Token = error("unterminated binary constant") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ { Token = error("unterminated binary constant") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_binary_2(list(char), token, io__state, io__state).
:- mode lexer__get_binary_2(in, out, di, uo) is det.
@@ -788,6 +1498,27 @@
)
).
+:- pred lexer__string_get_binary_2(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_binary_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_binary_2(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_binary_digit(Char) } ->
+ lexer__string_get_binary_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 2, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 2, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_octal(token, io__state, io__state).
:- mode lexer__get_octal(out, di, uo) is det.
@@ -806,6 +1537,25 @@
)
).
+:- pred lexer__string_get_octal(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_octal(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_octal(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_octal_digit(Char) } ->
+ lexer__string_get_octal_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ { Token = error("unterminated octal constant") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ { Token = error("unterminated octal constant") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_octal_2(list(char), token, io__state, io__state).
:- mode lexer__get_octal_2(in, out, di, uo) is det.
@@ -824,6 +1574,27 @@
)
).
+:- pred lexer__string_get_octal_2(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_octal_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_octal_2(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_octal_digit(Char) } ->
+ lexer__string_get_octal_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 8, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 8, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_hex(token, io__state, io__state).
:- mode lexer__get_hex(out, di, uo) is det.
@@ -842,6 +1613,26 @@
)
).
+:- pred lexer__string_get_hex(string, int, posn, token, string_token_context,
+ posn, posn).
+:- mode lexer__string_get_hex(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_hex(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_hex_digit(Char) } ->
+ lexer__string_get_hex_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ { Token = error("unterminated hex constant") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ { Token = error("unterminated hex constant") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
+
:- pred lexer__get_hex_2(list(char), token, io__state, io__state).
:- mode lexer__get_hex_2(in, out, di, uo) is det.
@@ -860,6 +1651,28 @@
)
).
+:- pred lexer__string_get_hex_2(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_hex_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_hex_2(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_hex_digit(Char) } ->
+ lexer__string_get_hex_2(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 16, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, BinaryString),
+ { lexer__conv_string_to_int(BinaryString, 8, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
+
:- pred lexer__get_number(list(char), token, io__state, io__state).
:- mode lexer__get_number(in, out, di, uo) is det.
@@ -882,6 +1695,33 @@
)
).
+:- pred lexer__string_get_number(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_number(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_number(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_number(String, Len, Posn0,
+ Token, Context)
+ ; { Char = ('.') } ->
+ lexer__string_get_int_dot(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'e' ; Char = 'E' } ->
+ lexer__string_get_float_exponent(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, NumberString),
+ { lexer__conv_string_to_int(NumberString, 10, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, NumberString),
+ { lexer__conv_string_to_int(NumberString, 10, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
% XXX the float literal syntax doesn't match ISO Prolog
:- pred lexer__get_int_dot(list(char), token, io__state, io__state).
@@ -904,6 +1744,29 @@
)
).
+:- pred lexer__string_get_int_dot(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_int_dot(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_int_dot(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_float_decimals(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, NumberString),
+ { lexer__conv_string_to_int(NumberString, 10, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, NumberString),
+ { lexer__conv_string_to_int(NumberString, 10, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_float_decimals(list(char), token, io__state, io__state).
:- mode lexer__get_float_decimals(in, out, di, uo) is det.
@@ -926,6 +1789,30 @@
)
).
+:- pred lexer__string_get_float_decimals(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_float_decimals(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_float_decimals(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_float_decimals(String, Len, Posn0,
+ Token, Context)
+ ; { Char = 'e' ; Char = 'E' } ->
+ lexer__string_get_float_exponent(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, FloatString),
+ { lexer__conv_to_float(FloatString, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, FloatString),
+ { lexer__conv_to_float(FloatString, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_float_exponent(list(char), token, io__state, io__state).
:- mode lexer__get_float_exponent(in, out, di, uo) is det.
@@ -947,6 +1834,30 @@
)
).
+:- pred lexer__string_get_float_exponent(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_float_exponent(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { Char = ('+') ; Char = ('-') } ->
+ lexer__string_get_float_exponent_2(String, Len, Posn0,
+ Token, Context)
+ ; { char__is_digit(Char) } ->
+ lexer__string_get_float_exponent_3(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ { Token =
+ error("unterminated exponent in float token") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, FloatString),
+ { lexer__conv_to_float(FloatString, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_float_exponent_2(list(char), token,
io__state, io__state).
:- mode lexer__get_float_exponent_2(in, out, di, uo) is det.
@@ -971,6 +1882,31 @@
)
).
+:- pred lexer__string_get_float_exponent_2(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent_2(in, in, in, out, out, in, out)
+ is det.
+
+ % we've read past the E signalling the start of the exponent -
+ % make sure that there's at least one digit following,
+ % and then get the remaining digits
+
+lexer__string_get_float_exponent_2(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_float_exponent_3(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ { Token =
+ error("unterminated exponent in float token") },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ { Token = error("unterminated exponent in float token") },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
:- pred lexer__get_float_exponent_3(list(char), token,
io__state, io__state).
:- mode lexer__get_float_exponent_3(in, out, di, uo) is det.
@@ -993,6 +1929,28 @@
)
).
+:- pred lexer__string_get_float_exponent_3(string, int, posn, token,
+ string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent_3(in, in, in, out, out, in, out)
+ is det.
+
+lexer__string_get_float_exponent_3(String, Len, Posn0, Token, Context) -->
+ ( lexer__string_read_char(String, Len, Char) ->
+ ( { char__is_digit(Char) } ->
+ lexer__string_get_float_exponent_3(String, Len, Posn0,
+ Token, Context)
+ ;
+ lexer__string_ungetchar(String),
+ lexer__grab_string(String, Posn0, FloatString),
+ { lexer__conv_to_float(FloatString, Token) },
+ lexer__string_get_context(Posn0, Context)
+ )
+ ;
+ lexer__grab_string(String, Posn0, FloatString),
+ { lexer__conv_to_float(FloatString, Token) },
+ lexer__string_get_context(Posn0, Context)
+ ).
+
%-----------------------------------------------------------------------------%
% Utility routines
@@ -1002,6 +1960,12 @@
lexer__rev_char_list_to_int(RevChars, Base, Token) :-
lexer__rev_char_list_to_string(RevChars, String),
+ lexer__conv_string_to_int(String, Base, Token).
+
+:- pred lexer__conv_string_to_int(string, int, token).
+:- mode lexer__conv_string_to_int(in, in, out) is det.
+
+lexer__conv_string_to_int(String, Base, Token) :-
( string__base_string_to_int(Base, String, Int) ->
Token = integer(Int)
;
@@ -1013,6 +1977,12 @@
lexer__rev_char_list_to_float(RevChars, Token) :-
lexer__rev_char_list_to_string(RevChars, String),
+ lexer__conv_to_float(String, Token).
+
+:- pred lexer__conv_to_float(string, token).
+:- mode lexer__conv_to_float(in, out) is det.
+
+lexer__conv_to_float(String, Token) :-
( string__to_float(String, Float) ->
Token = float(Float)
;
Index: parser.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/parser.m,v
retrieving revision 1.29
diff -u -u -r1.29 parser.m
--- parser.m 1998/01/23 12:33:26 1.29
+++ parser.m 1998/05/12 19:37:31
@@ -9,10 +9,16 @@
% stability: high.
%
% This file exports the predicate parser__read_term, which reads
-% a term from the current input stream. The parser and lexer are
-% intended to exactly follow ISO Prolog syntax, but there are some
-% departures from that for three reasons:
+% a term from the current input stream.
+% The parser__read_term_from_string predicates are the same as the
+% read_term predicates, except that the term is read from
+% a string rather than from the current input stream.
+% The parser__parse_token_list predicate is similar,
+% but it takes a list of tokens rather than a string.
%
+% The parser and lexer are intended to exactly follow ISO Prolog
+% syntax, but there are some departures from that for three reasons:
+%
% (1) I wrote some of the code at home when the ISO Prolog draft
% was at uni - so in some places I just guessed.
% (2) In some places the lexer reports an error when it shouldn't.
@@ -28,19 +34,60 @@
:- module parser.
:- interface.
-:- import_module io, term_io.
+:- import_module io, term_io, lexer.
+
+%-----------------------------------------------------------------------------%
+%
+% parser__read_term/{3,4}:
+% Reads in (and parses) terms from the current input stream.
:- pred parser__read_term(read_term, io__state, io__state).
:- mode parser__read_term(out, di, uo) is det.
+% parser__read_term(Result):
+% Reads a term from the current input stream.
- % The string is the filename to use for the current input stream;
- % this is used in constructing the term__contexts in the read term.
- % This interface is used to support the `:- pragma source_file'
- % directive.
:- pred parser__read_term(string, read_term, io__state, io__state).
:- mode parser__read_term(in, out, di, uo) is det.
+% parser__read_term(FileName, Result):
+% Reads a term from the current input stream.
+% The string is the filename to use for the current input stream;
+% this is used in constructing the term__contexts in the read term.
+% This interface is used to support the `:- pragma source_file'
+% directive.
+
+%-----------------------------------------------------------------------------%
+%
+% parser__read_term_from_string/{4,6}:
+% Parses terms from a string.
+
+ % The read_term_from_string predicates are the same as the
+ % read_term predicates, except that the term is read from
+ % a string rather than from the current input stream.
+ % The returned value `EndPos' is the position one character
+ % past the end of the term read.
+ % The arguments `MaxOffset' and `StartPos' in the six-argument version
+ % specify the length of the string and the position within the
+ % string at which to start parsing.
+
+:- pred parser__read_term_from_string(string, string, posn, read_term).
+:- mode parser__read_term_from_string(in, in, out, out) is det.
+% parser__read_term_from_string(FileName, String, EndPos, Term).
+
+:- pred parser__read_term_from_string(string, string, int, posn, posn, read_term).
+:- mode parser__read_term_from_string(in, in, in, in, out, out) is det.
+% parser__read_term_from_string(FileName, String, MaxOffset, StartPos,
+% EndPos, Term).
%-----------------------------------------------------------------------------%
+%
+% parser__parse_tokens/3:
+% Parses a list of tokens.
+
+:- pred parser__parse_tokens(string, token_list, read_term).
+:- mode parser__parse_tokens(in, in, out) is det.
+ % parser__parse_tokens(FileName, TokenList, Result):
+
+%-----------------------------------------------------------------------------%
:- implementation.
:- import_module string, char, int, float, bool, list, std_util, require.
@@ -59,14 +106,30 @@
parser__read_term(FileName, Result) -->
lexer__get_token_list(Tokens),
- ( { Tokens = token_nil } ->
- { Result = eof }
+ { parser__parse_tokens(FileName, Tokens, Result) }.
+
+parser__read_term_from_string(FileName, String, EndPos, Result) :-
+ string__length(String, Len),
+ StartPos = posn(1, 0, 0),
+ parser__read_term_from_string(FileName, String, Len, StartPos, EndPos,
+ Result).
+
+parser__read_term_from_string(FileName, String, Len, StartPos, EndPos,
+ Result) :-
+ lexer__string_get_token_list(String, Len, Tokens, StartPos, EndPos),
+ parser__parse_tokens(FileName, Tokens, Result).
+
+%-----------------------------------------------------------------------------%
+
+parser__parse_tokens(FileName, Tokens, Result) :-
+ ( Tokens = token_nil ->
+ Result = eof
;
- { parser__init_state(FileName, Tokens, ParserState0) },
- { parser__parse_whole_term(Term, ParserState0, ParserState) },
- { parser__final_state(ParserState, VarSet, LeftOverTokens) },
- { parser__check_for_errors(Term, VarSet,
- Tokens, LeftOverTokens, Result) }
+ parser__init_state(FileName, Tokens, ParserState0),
+ parser__parse_whole_term(Term, ParserState0, ParserState),
+ parser__final_state(ParserState, VarSet, LeftOverTokens),
+ parser__check_for_errors(Term, VarSet,
+ Tokens, LeftOverTokens, Result)
).
:- pred parser__check_for_errors(parse(term), varset, token_list, token_list,
Index: string.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.m,v
retrieving revision 1.102
diff -u -u -r1.102 string.m
--- string.m 1998/03/18 14:16:12 1.102
+++ string.m 1998/05/12 19:37:41
@@ -190,6 +190,16 @@
% Calls error/1 if `Index' is out of range (negative, or greater than or
% equal to the length of `String').
+:- pred string__unsafe_index(string, int, char).
+:- mode string__unsafe_index(in, in, out) is det.
+% string__unsafe_index(String, Index, Char):
+% `Char' is the (`Index' + 1)-th character of `String'.
+% WARNING: behavior is UNDEFINED if `Index' is out of range
+% (negative, or greater than or equal to the length of `String').
+% This version is constant time, whereas string__index_det
+% may be linear in the length of the string.
+% Use with care!
+
:- pred string__foldl(pred(char, T, T), string, T, T).
:- mode string__foldl(pred(in, in, out) is det, in, in, out) is det.
:- mode string__foldl(pred(in, di, uo) is det, in, di, uo) is det.
@@ -238,6 +248,20 @@
% If `Count' is out of the range [0, length of `String' - `Start'], it is
% treated as if it were the nearest end-point of that range.)
+:- pred string__unsafe_substring(string, int, int, string).
+:- mode string__unsafe_substring(in, in, in, out) is det.
+% string__unsafe_substring(String, Start, Count, Substring):
+% `Substring' is first the `Count' characters in what would
+% remain of `String' after the first `Start' characters were
+% removed.
+% WARNING: if `Start' is out of the range [0, length of `String'],
+% or if `Count' is out of the range [0, length of `String' - `Start'],
+% then the behaviour is UNDEFINED.
+% Use with care!
+% This version takes time proportional to the length of the
+% substring, whereas string__substring may take time proportional
+% to the length of the whole string.
+
:- pred string__append_list(list(string), string).
:- mode string__append_list(in, out) is det.
:- mode string__append_list(out, in) is multidet.
@@ -481,10 +505,6 @@
LeftCount is Length - RightCount,
string__split(String, LeftCount, _LeftString, RightString).
-string__substring(String, Start, Count, Substring) :-
- string__split(String, Start, _Left, Right),
- string__left(Right, Count, Substring).
-
string__remove_suffix(A, B, C) :-
string__to_int_list(A, LA),
string__to_int_list(B, LB),
@@ -1639,9 +1659,6 @@
/*-----------------------------------------------------------------------*/
-:- pred string__unsafe_index(string, int, char).
-:- mode string__unsafe_index(in, in, out) is det.
-
:- pragma(c_code, string__unsafe_index(Str::in, Index::in, Ch::out), "
Ch = Str[Index];
").
@@ -1815,6 +1832,52 @@
:- pragma no_inline(string__append/3).
/*-----------------------------------------------------------------------*/
+
+/*
+:- pred string__substring(string, int, int, string).
+:- mode string__substring(in, in, in, out) is det.
+% string__substring(String, Start, Count, Substring):
+*/
+
+:- pragma c_code(string__substring(Str::in, Start::in, Count::in,
+ SubString::out),
+ will_not_call_mercury,
+"{
+ Integer len;
+ Word tmp;
+ if (Start < 0) Start = 0;
+ if (Count <= 0) {
+ make_aligned_string(LVALUE_CAST(ConstString, SubString), """");
+ } else {
+ len = strlen(Str);
+ if (Start > len) Start = len;
+ if (Count > len - Start) Count = len - Start;
+ incr_hp_atomic(tmp, (Count + sizeof(Word)) / sizeof(Word));
+ SubString = (char *) tmp;
+ memcpy(SubString, Str + Start, Count);
+ SubString[Count] = '\\0';
+ }
+}").
+
+
+/*
+:- pred string__unsafe_substring(string, int, int, string).
+:- mode string__unsafe_substring(in, in, in, out) is det.
+% string__unsafe_substring(String, Start, Count, Substring):
+*/
+
+:- pragma c_code(string__unsafe_substring(Str::in, Start::in, Count::in,
+ SubString::out),
+ will_not_call_mercury,
+"{
+ Integer len;
+ Word tmp;
+ incr_hp_atomic(tmp, (Count + sizeof(Word)) / sizeof(Word));
+ SubString = (char *) tmp;
+ memcpy(SubString, Str + Start, Count);
+ SubString[Count] = '\\0';
+}").
+
/*
:- pred string__split(string, int, string, string).
Index: string.nu.nl
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.nu.nl,v
retrieving revision 1.18
diff -u -u -r1.18 string.nu.nl
--- string.nu.nl 1997/07/27 15:07:21 1.18
+++ string.nu.nl 1998/05/12 19:37:42
@@ -51,6 +51,15 @@
%-----------------------------------------------------------------------------%
+string__substring(String, Start, Count, Substring) :-
+ string__split(String, Start, _Left, Right),
+ string__left(Right, Count, Substring).
+
+string__unsafe_substring(String, Start, Count, Substring) :-
+ string__substring(String, Start, Count, Substring).
+
+%-----------------------------------------------------------------------------%
+
string__split(String, Count, LeftString, RightString) :-
(
Count =< 0
Index: term.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/term.m,v
retrieving revision 1.81
diff -u -u -r1.81 term.m
--- term.m 1998/04/02 07:40:16 1.81
+++ term.m 1998/05/12 19:37:53
@@ -19,16 +19,21 @@
%-----------------------------------------------------------------------------%
-:- type term ---> term__functor(const, list(term), term__context)
- ; term__variable(var).
-:- type const ---> term__atom(string)
- ; term__integer(int)
- ; term__string(string)
- ; term__float(float).
+:- type term
+ ---> term__functor(const, list(term), term__context)
+ ; term__variable(var).
+:- type const
+ ---> term__atom(string)
+ ; term__integer(int)
+ ; term__string(string)
+ ; term__float(float).
+
+:- type term__context
+ ---> term__context(string, int)
+ % file name, line number.
+ ; term__detailed_context(string, int, int, int, int).
+ % file name, starting line & column, ending line & column
-:- type term__context ---> term__context(string, int).
- % file name, line number.
-
:- type var.
:- type var_supply.
@@ -746,10 +751,13 @@
% Given a term context, return the source line number.
term__context_line(term__context(_, LineNumber), LineNumber).
+term__context_line(term__detailed_context(_FileName, StartLine, _StartCol,
+ _EndLine, _EndCol), StartLine).
- % Given a term context, return the source file.
+ % Given a term context, return the source file name.
term__context_file(term__context(FileName, _), FileName).
+term__context_file(term__detailed_context(FileName, _, _, _, _), FileName).
% Used to initialize the term context when reading in
% (or otherwise constructing) a term.
cvs server: Diffing .
Index: bytecode_gen.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/bytecode_gen.m,v
retrieving revision 1.35
diff -u -u -r1.35 bytecode_gen.m
--- bytecode_gen.m 1998/02/12 01:16:56 1.35
+++ bytecode_gen.m 1998/05/12 19:38:24
@@ -147,7 +147,8 @@
bytecode_gen__goal(GoalExpr - GoalInfo, ByteInfo0, ByteInfo, Code) :-
bytecode_gen__goal_expr(GoalExpr, GoalInfo, ByteInfo0, ByteInfo,
GoalCode),
- goal_info_get_context(GoalInfo, term__context(_, Line)),
+ goal_info_get_context(GoalInfo, Context),
+ term__context_line(Context, Line),
Code = tree(node([context(Line)]), GoalCode).
:- pred bytecode_gen__goal_expr(hlds_goal_expr::in, hlds_goal_info::in,
Index: error_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/error_util.m,v
retrieving revision 1.4
diff -u -u -r1.4 error_util.m
--- error_util.m 1998/02/25 03:37:49 1.4
+++ error_util.m 1998/05/12 19:38:26
@@ -65,7 +65,8 @@
% :
% space
% indent
- Context = term__context(FileName, LineNumber),
+ term__context_file(Context, FileName),
+ term__context_line(Context, LineNumber),
string__length(FileName, FileNameLength),
string__int_to_string(LineNumber, LineNumberStr),
string__length(LineNumberStr, LineNumberStrLength0),
Index: fact_table.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/fact_table.m,v
retrieving revision 1.18
diff -u -u -r1.18 fact_table.m
--- fact_table.m 1998/03/18 08:07:34 1.18
+++ fact_table.m 1998/05/12 19:38:48
@@ -297,7 +297,8 @@
;
{ Result0 = error(Message, LineNum) },
io__input_stream_name(FileName),
- prog_out__write_context(term__context(FileName, LineNum)),
+ { term__context_init(FileName, LineNum, Context) },
+ prog_out__write_context(Context),
io__write_strings([Message, "\n"]),
io__set_exit_status(1),
{ NumFacts = NumFacts0 }
Index: prog_io.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/prog_io.m,v
retrieving revision 1.170
diff -u -u -r1.170 prog_io.m
--- prog_io.m 1998/03/03 17:35:43 1.170
+++ prog_io.m 1998/05/12 19:39:04
@@ -350,25 +350,48 @@
read_all_items(ModuleName, Messages, Items, Error) -->
%
- % read all the items (the first one is handled specially)
+ % Slurp up the source file into memory
%
io__input_stream(Stream),
io__input_stream_name(Stream, SourceFileName),
- read_first_item(ModuleName, SourceFileName,
- RevMessages, RevItems0, Error0),
+ io__read_file_as_string(Stream, Err, SourceFileContents),
+ (
+ { Err = ok },
- %
- % get the end_module declaration (if any),
- % check that it matches the initial module declaration (if any),
- % and remove both of them from the final item list.
- %
- { get_end_module(RevItems0, RevItems, EndModule) },
- { list__reverse(RevMessages, Messages0) },
- { list__reverse(RevItems, Items0) },
- check_end_module(EndModule,
- Messages0, Items0, Error0,
- Messages, Items, Error).
+ %
+ % parse all the items (the first one is handled specially)
+ %
+ read_first_item(ModuleName, SourceFileName, SourceFileContents,
+ 0, RevMessages, RevItems0, Error0),
+
+ %
+ % get the end_module declaration (if any),
+ % check that it matches the initial module declaration (if any),
+ % and remove both of them from the final item list.
+ %
+ { get_end_module(RevItems0, RevItems, EndModule) },
+ { list__reverse(RevMessages, Messages0) },
+ { list__reverse(RevItems, Items0) },
+ check_end_module(EndModule,
+ Messages0, Items0, Error0,
+ Messages, Items, Error)
+ ;
+
+ { Err = error(IOError) },
+ io__progname_base("prog_io.m", Progname),
+ {
+ io__error_message(IOError, ErrorMessage),
+ string__append_list([Progname,
+ ": error reading file `", SourceFileName, "': ",
+ ErrorMessage], Message),
+ dummy_term(Term),
+ Messages = [Message - Term],
+ Error = fatal,
+ Items = []
+ }
+ ).
+
%
% We need to jump through a few hoops when reading the first item,
% to allow the initial `:- module' declaration to be optional.
@@ -382,11 +405,12 @@
% and then if it turns out to not be a `:- module' declaration
% we reparse it in the default module scope. Blecchh.
%
-:- pred read_first_item(module_name, file_name,
+:- pred read_first_item(module_name, file_name, string, int,
message_list, item_list, module_error, io__state, io__state).
-:- mode read_first_item(in, in, out, out, out, di, uo) is det.
+:- mode read_first_item(in, in, in, in, out, out, out, di, uo) is det.
-read_first_item(DefaultModuleName, SourceFileName, Messages, Items, Error) -->
+read_first_item(DefaultModuleName, SourceFileName, SourceFileContents,
+ Posn0, Messages, Items, Error) -->
globals__io_lookup_bool_option(warn_missing_module_name, WarnMissing),
globals__io_lookup_bool_option(warn_wrong_module_name, WarnWrong),
@@ -397,7 +421,9 @@
% (so that any `:- module' declaration is taken to
% be a non-nested module unless explicitly qualified).
%
- parser__read_term(SourceFileName, MaybeFirstTerm),
+ { string__length(SourceFileContents, SourceFileLen) },
+ { parser__read_term_from_string(SourceFileName, SourceFileContents,
+ SourceFileLen, Posn0, Posn1, MaybeFirstTerm) },
{ root_module_name(RootModuleName) },
{ process_read_term(RootModuleName, MaybeFirstTerm, MaybeFirstItem) },
(
@@ -410,7 +436,7 @@
{ FirstItem = pragma(source_file(NewSourceFileName)) }
->
read_first_item(DefaultModuleName, NewSourceFileName,
- Messages, Items, Error)
+ SourceFileContents, Posn1, Messages, Items, Error)
;
%
% check if the first term was a `:- module' decl
@@ -447,6 +473,7 @@
{ Items0 = [FixedFirstItem] },
{ Error0 = no },
read_items_loop(ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn1,
Messages0, Items0, Error0,
Messages, Items, Error)
;
@@ -483,6 +510,7 @@
{ Items0 = [FixedFirstItem] },
{ Error0 = no },
read_items_loop_2(MaybeSecondItem, ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn1,
Messages0, Items0, Error0,
Messages, Items, Error)
).
@@ -520,33 +548,42 @@
% But optimizing for NU-Prolog is no longer a big priority...
:- pred read_items_loop(module_name, file_name,
+ string, int, posn,
message_list, item_list, module_error,
message_list, item_list, module_error,
io__state, io__state).
-:- mode read_items_loop(in, in, in, in, in, out, out, out, di, uo) is det.
+:- mode read_items_loop(in, in, in, in, in,
+ in, in, in, out, out, out, di, uo) is det.
-read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
- Msgs, Items, Error) -->
- read_item(ModuleName, SourceFileName, MaybeItem),
+read_items_loop(ModuleName, SourceFileName, SourceFileContents, SourceFileLen,
+ Posn0, Msgs1, Items1, Error1, Msgs, Items, Error) -->
+ read_item(ModuleName, SourceFileName, SourceFileContents,
+ SourceFileLen, Posn0, Posn1, MaybeItem),
read_items_loop_2(MaybeItem, ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn1,
Msgs1, Items1, Error1, Msgs, Items, Error).
%-----------------------------------------------------------------------------%
:- pred read_items_loop_2(maybe_item_or_eof, module_name, file_name,
+ string, int, posn,
message_list, item_list, module_error,
message_list, item_list, module_error,
io__state, io__state).
-:- mode read_items_loop_2(in, in, in, in, in, in, out, out, out, di, uo) is det.
+:- mode read_items_loop_2(in, in, in, in, in, in,
+ in, in, in, out, out, out, di, uo) is det.
% do a switch on the type of the next item
-read_items_loop_2(eof, _ModuleName, _SourceFileName, Msgs, Items, Error,
- Msgs, Items, Error) --> [].
+read_items_loop_2(eof, _ModuleName, _SourceFileName,
+ _SourceFileContents, _SourceFileLen, _Posn,
+ Msgs, Items, Error, Msgs, Items, Error) --> [].
% if the next item was end-of-file, then we're done.
read_items_loop_2(syntax_error(ErrorMsg, LineNumber), ModuleName,
- SourceFileName, Msgs0, Items0, _Error0, Msgs, Items, Error) -->
+ SourceFileName,
+ SourceFileContents, SourceFileLen, Posn0,
+ Msgs0, Items0, _Error0, Msgs, Items, Error) -->
% if the next item was a syntax error, then insert it in
% the list of messages and continue looping
{
@@ -557,10 +594,12 @@
Items1 = Items0,
Error1 = yes
},
- read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
- Msgs, Items, Error).
+ read_items_loop(ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn0,
+ Msgs1, Items1, Error1, Msgs, Items, Error).
read_items_loop_2(error(M, T), ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn0,
Msgs0, Items0, _Error0, Msgs, Items, Error) -->
% if the next item was a semantic error, then insert it in
% the list of messages and continue looping
@@ -569,10 +608,12 @@
Items1 = Items0,
Error1 = yes
},
- read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
- Msgs, Items, Error).
+ read_items_loop(ModuleName, SourceFileName, SourceFileContents,
+ SourceFileLen, Posn0,
+ Msgs1, Items1, Error1, Msgs, Items, Error).
read_items_loop_2(ok(Item, Context), ModuleName0, SourceFileName0,
+ SourceFileContents, SourceFileLen, Posn0,
Msgs0, Items0, Error0, Msgs, Items, Error) -->
% if the next item was a valid item, check whether it was
% a declaration that affects the current parsing context --
@@ -601,7 +642,9 @@
ModuleName = ModuleName0,
Items1 = [Item - Context | Items0]
},
- read_items_loop(ModuleName, SourceFileName, Msgs0, Items1, Error0,
+ read_items_loop(ModuleName, SourceFileName,
+ SourceFileContents, SourceFileLen, Posn0,
+ Msgs0, Items1, Error0,
Msgs, Items, Error).
%-----------------------------------------------------------------------------%
@@ -614,12 +657,15 @@
; error(string, term)
; ok(item, term__context).
-:- pred read_item(module_name, file_name, maybe_item_or_eof,
+:- pred read_item(module_name, file_name, string, int, posn, posn,
+ maybe_item_or_eof,
io__state, io__state).
-:- mode read_item(in, in, out, di, uo) is det.
+:- mode read_item(in, in, in, in, in, out, out, di, uo) is det.
-read_item(ModuleName, SourceFileName, MaybeItem) -->
- parser__read_term(SourceFileName, MaybeTerm),
+read_item(ModuleName, SourceFileName, SourceFileContents, SourceFileLen,
+ Posn0, Posn1, MaybeItem) -->
+ { parser__read_term_from_string(SourceFileName, SourceFileContents,
+ SourceFileLen, Posn0, Posn1, MaybeTerm) },
{ process_read_term(ModuleName, MaybeTerm, MaybeItem) }.
:- pred process_read_term(module_name, read_term, maybe_item_or_eof).
Index: term_errors.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/term_errors.m,v
retrieving revision 1.11
diff -u -u -r1.11 term_errors.m
--- term_errors.m 1998/03/03 17:36:10 1.11
+++ term_errors.m 1998/05/12 19:39:12
@@ -544,7 +544,8 @@
term_errors__describe_one_call_site(PPId - Context, Module, Piece) :-
term_errors__describe_one_proc_name(PPId, Module, ProcName),
- Context = term__context(FileName, LineNumber),
+ term__context_file(Context, FileName),
+ term__context_line(Context, LineNumber),
string__int_to_string(LineNumber, LineNumberPart),
string__append_list([
ProcName,
cvs server: Diffing notes
--
Fergus Henderson <fjh at cs.mu.oz.au> | "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh> | of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3 | -- the last words of T. S. Garp.
More information about the developers
mailing list