for review: add code to read terms from strings

Fergus Henderson fjh at cs.mu.OZ.AU
Wed May 13 05:51:43 AEST 1998
Previous message: [m-dev.] for review: avoiding some "mmake realclean"s during bootchecks
Next message: [m-dev.] for review: add code to read terms from strings
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,

Can someone -- how about Tom, or perhaps Mark Brown (since we
had that long discussion about this!) -- please review this one?

Estimated hours taken: 24

Add code to parse terms from strings rather than from streams.

The original intention for this change was twofold:
to increase expressiveness and to improve efficiency.
However, for the moment I have given up on the goal of
improving efficiency.  See the comments in lexer.m.

library/io.m:
	Add io__read_file_as_string/{4,5}, for efficiently
	reading a whole file into a single string.
	Add io__read_from_string/5, for reading terms of
	any type from a string. 
	(Should that instead be named string__read instead?)

library/string.m:
	Implement string__substring/4 more efficiently,
	using `pragma c_code' rather than by calling
	string__right(string__left(...)).
	Export string__unsafe_index/3, and add new exported
	predicate string__unsafe_substring/4 -- these
	are versions of string__index and string__substring
	that don't check for range errors.  They are
	needed to get reasonable efficiency when dealing
	with very large strings.

library/string.nu.nl:
	Add Prolog implementations of string__substring
	and string__unsafe_substring.

library/lexer.m:
	Add lexer__string_get_token_list/{5,6}, for parsing
	tokens from a string.  This required adding
	`string_' versions of most of the lexical analysis routines.
	Lots of code duplication, I'm afraid.  Also the
	string versions are still not as efficiency as they
	could ideally be.

library/parser.m:
	Add new predicates parser__read_term_from_string/{4,6}
	and parser__parse_token_list.

compiler/bytecode_gen.m:
compiler/error_util.m:
compiler/fact_table.m:
compiler/term_errors.m:
	Minor changes to use term__context as an ADT rather than
	as a concrete data type.

In addition, I made the following change,
which I do NOT plan to commit (at least not yet).

compiler/prog_io.m:
	Change to use io__read_file_as_string and
	parser__read_term_from_string.
	(Not committed because the existing method is
	in currently more efficient.)

cvs server: Diffing .
Index: io.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/io.m,v
retrieving revision 1.153
diff -u -u -r1.153 io.m
--- io.m	1998/03/12 19:45:38	1.153
+++ io.m	1998/05/12 19:37:10
@@ -120,6 +120,12 @@
 %		Reads all the characters from the current input stream until
 %		eof or error.
 
+:- pred io__read_file_as_string(io__res, string, io__state, io__state).
+:- mode io__read_file_as_string(out, out, di, uo) is det.
+%		Reads all the characters from the current input stream until
+%		eof or error.  Returns the result as a string rather than
+%		as a list of char.
+
 :- pred io__putback_char(char, io__state, io__state).
 :- mode io__putback_char(in, di, uo) is det.
 %		Un-reads a character from the current input stream.
@@ -147,6 +153,13 @@
 %		Reads all the characters from the given input stream until
 %		eof or error.
 
+:- pred io__read_file_as_string(io__input_stream, io__res, string,
+							io__state, io__state).
+:- mode io__read_file_as_string(in, out, out, di, uo) is det.
+%		Reads all the characters from the given input stream until
+%		eof or error.  Returns the result as a string rather than
+%		as a list of char.
+
 :- pred io__putback_char(io__input_stream, char, io__state, io__state).
 :- mode io__putback_char(in, in, di, uo) is det.
 %		Un-reads a character from specified stream.
@@ -172,6 +185,29 @@
 %		or if encounters an I/O error, then it returns
 %		`error(Message, LineNumber)'.
 
+% The type `posn' represents a position within a string.
+:- type posn
+	--->	posn(int, int, int).
+		% line number, offset for start of line, current offset
+		% (the first two are used only for the purposes of
+		% computing term_contexts, for use e.g. in error messages).
+
+:- pred io__read_from_string(string, string, int, io__read_result(T), posn, posn).
+:- mode io__read_from_string(in, in, in, out, in, out) is det.
+% mode io__read_from_string(FileName, String, MaxPos, Result, Posn0, Posn):
+%		Same as io__read/4 except that it reads from
+%		a string rather than from a stream.
+%		FileName is the name of the source (for use in error messages).
+%		String is the string to be parsed.
+%		Posn0 is the position to start parsing from.
+%		Posn is the position one past where the term read in ends.
+%		MaxPos is the offset in the string which should be
+%		considered the end-of-stream -- this is the upper bound
+%		for Posn.  (In the usual case, MaxPos is just the length
+%		of the String.)
+%		WARNING: if MaxPos > length of String then the behaviour
+%		is UNDEFINED.
+
 :- pred io__ignore_whitespace(io__result, io__state, io__state).
 :- mode io__ignore_whitespace(out, di, uo) is det.
 %		Discards all the whitespace from the current stream.
@@ -957,7 +993,7 @@
 
 :- implementation.
 :- import_module map, dir, term, term_io, varset, require, benchmarking, array.
-:- import_module int, std_util.
+:- import_module int, std_util, parser.
 
 :- type io__state ---> io__state(c_pointer).
 	% Values of type `io__state' are never really used:
@@ -1190,6 +1226,218 @@
 		io__read_file_2(Stream, [Char|Chars0], Result)
 	).
 
+%-----------------------------------------------------------------------------%
+
+io__read_file_as_string(Result, String) -->
+	io__input_stream(Stream),
+	io__read_file_as_string(Stream, Result, String).
+
+io__read_file_as_string(Stream, Result, String) -->
+	%
+	% check if the stream is a regular file;
+	% if so, allocate a buffer according to the
+	% size of the file.  Otherwise, just use
+	% a default buffer size of 4k minus a bit
+	% (to give malloc some room).
+	%
+	io__stream_file_size(Stream, FileSize),
+	{ FileSize >= 0 ->
+		BufferSize0 is FileSize + 1
+	;
+		BufferSize0 = 4000
+	},
+	{ io__alloc_buffer(BufferSize0, Buffer0) },
+
+	%
+	% Read the file into the buffer (resizing it as we go if necessary),
+	% convert the buffer into a string, and see if anything went wrong.
+	%
+	io__clear_err(Stream),
+	{ Pos0 = 0 },
+	io__read_file_as_string_2(Stream, Buffer0, Pos0, BufferSize0,
+		Buffer, Pos, BufferSize),
+	{ require(Pos < BufferSize, "io__read_file_as_string: overflow") },
+	{ io__buffer_to_string(Buffer, Pos, String) },
+	io__check_err(Stream, Result).
+
+:- pred io__read_file_as_string_2(io__input_stream, buffer, int, int,
+		buffer, int, int, io__state, io__state).
+:- mode io__read_file_as_string_2(in, di, in, in,
+		uo, out, out, di, uo) is det.
+
+io__read_file_as_string_2(Stream, Buffer0, Pos0, Size0, Buffer, Pos, Size) -->
+	io__read_into_buffer(Stream, Buffer0, Pos0, Size0,
+		Buffer1, Pos1),
+	( { Pos1 = Pos0 } ->
+		% end of file (or error)
+		{ Size = Size0 },
+		{ Pos = Pos1 },
+		{ Buffer = Buffer1 }
+	; { Pos1 = Size0 } ->
+		% full buffer
+		{ Size1 is Size0 * 2 },
+		{ io__resize_buffer(Buffer1, Size0, Size1, Buffer2) },
+		io__read_file_as_string_2(Stream, Buffer2, Pos1, Size1,
+			Buffer, Pos, Size)
+	;
+		io__read_file_as_string_2(Stream, Buffer1, Pos1, Size0,
+			Buffer, Pos, Size)
+	).
+	
+%-----------------------------------------------------------------------------%
+
+:- pred io__clear_err(stream, io__state, io__state).
+:- mode io__clear_err(in, di, uo) is det.
+% same as ANSI C's clearerr().
+
+:- pragma c_code(io__clear_err(Stream::in, _IO0::di, _IO::uo),
+		will_not_call_mercury,
+"{
+	MercuryFile *f = (MercuryFile *) Stream;
+	clearerr(f->file);
+}").
+
+:- pred io__check_err(stream, io__res, io__state, io__state).
+:- mode io__check_err(in, out, di, uo) is det.
+
+io__check_err(Stream, Res) -->
+	io__ferror(Stream, Int, Msg),
+	{ Int = 0 ->
+		Res = ok
+	;
+		Res = error(Msg)
+	}.
+
+:- pred io__ferror(stream, int, string, io__state, io__state).
+:- mode io__ferror(in, out, out, di, uo) is det.
+% similar to ANSI C's ferror().
+
+:- pragma c_code(ferror(Stream::in, RetVal::out, RetStr::out,
+		_IO0::di, _IO::uo),
+		will_not_call_mercury,
+"{
+	MercuryFile *f = (MercuryFile *) Stream;
+	RetVal = ferror(f->file);
+	ML_maybe_make_err_msg(RetVal != 0, ""read failed: "", RetStr);
+}").
+
+%-----------------------------------------------------------------------------%
+
+:- pred io__stream_file_size(stream, int, io__state, io__state).
+:- mode io__stream_file_size(in, out, di, uo) is det.
+% io__stream_file_size(Stream, Size):
+%	if Stream is a regular file, then Size is its size,
+%	otherwise Size is -1.
+
+:- pragma c_header_code("
+	#include <unistd.h>
+	#include <sys/stat.h>
+
+	/*
+	** in case some non-POSIX implementation doesn't have S_ISREG(),
+	** define it to always fail
+	*/
+	#ifndef S_ISREG
+	#define S_ISREG(x) FALSE
+	#endif
+").
+
+:- pragma c_code(io__stream_file_size(Stream::in, Size::out,
+		_IO0::di, _IO::uo),
+		will_not_call_mercury,
+"{
+	MercuryFile *f = (MercuryFile *) Stream;
+	struct stat s;
+	if (fstat(fileno(f->file), &s) == 0 && S_ISREG(s.st_mode)) {
+		Size = s.st_size;
+	} else {
+		Size = -1;
+	}
+}").
+
+%-----------------------------------------------------------------------------%
+
+% A `buffer' is just an array of Chars.
+% Buffer sizes are measured in Chars.
+
+:- type buffer ---> buffer(c_pointer).
+
+:- pred io__alloc_buffer(int::in, buffer::uo) is det.
+:- pragma c_code(io__alloc_buffer(Size::in, Buffer::uo),
+		will_not_call_mercury,
+"{
+	incr_hp_atomic(Buffer,
+		(Size * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+}").
+
+:- pred io__resize_buffer(buffer::di, int::in, int::in, buffer::uo) is det.
+:- pragma c_code(io__resize_buffer(Buffer0::di, OldSize::in, NewSize::in,
+			Buffer::uo),
+	will_not_call_mercury,
+"{
+	Char *buffer0 = (Char *) Buffer0;
+	Char *buffer;
+
+#ifdef CONSERVATIVE_GC
+	buffer = GC_REALLOC(buffer0, NewSize * sizeof(Char));
+#else
+	if (buffer0 + OldSize == (Char *) MR_hp) {
+		Word next;
+		incr_hp_atomic(next, 
+		   (NewSize * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+		assert(buffer0 + oldSize = (Char *) next);
+	    	buffer = buffer0;
+	} else {
+		/* just have to alloc and copy */
+		incr_hp_atomic(Buffer,
+		   (NewSize * sizeof(Char) + sizeof(Word) - 1) / sizeof(Word));
+		buffer = (Char *) buffer;
+		if (OldSize > NewSize) {
+			memcpy(buffer, buffer0, NewSize);
+		} else {
+			memcpy(buffer, buffer0, OldSize);
+		}
+	}
+#endif
+
+	Buffer = (Word) buffer;
+}").
+
+:- pred io__buffer_to_string(buffer::di, int::in, string::uo) is det.
+:- pragma c_code(io__buffer_to_string(Buffer::di, Len::in, Str::uo),
+	will_not_call_mercury,
+"{
+	Str = (String) Buffer;
+	Str[Len] = '\\0';
+}").
+
+:- pred io__buffer_to_string(buffer::di, string::uo) is det.
+:- pragma c_code(io__buffer_to_string(Buffer::di, Str::uo),
+	will_not_call_mercury,
+"{
+	Str = (String) Buffer;
+}").
+
+:- pred io__read_into_buffer(stream::in, buffer::di, int::in, int::in,
+		    buffer::uo, int::out, io__state::di, io__state::uo) is det.
+
+:- pragma c_code(io__read_into_buffer(Stream::in,
+		    Buffer0::di, Pos0::in, Size::in,
+		    Buffer::uo, Pos::out, _IO0::di, _IO::uo),
+		will_not_call_mercury,
+"{
+	MercuryFile *f = (MercuryFile *) Stream;
+	char *buffer = (Char *) Buffer0;
+	int items_read;
+
+	items_read = fread(buffer + Pos0, sizeof(Char), Size - Pos0, f->file);
+
+	Buffer = (Word) buffer;
+	Pos = Pos0 + items_read;
+}").
+
+%-----------------------------------------------------------------------------%
+
 io__read_binary_file(Result) -->
 	io__binary_input_stream(Stream),
 	io__read_binary_file(Stream, Result).
@@ -1228,24 +1476,38 @@
 
 io__read(Result) -->
 	term_io__read_term(ReadResult),
+	io__get_line_number(LineNumber),
+	{ io__process_read_term(ReadResult, LineNumber, Result) }.
+
+io__read_from_string(FileName, String, Len, Result, Posn0, Posn) :-
+	parser__read_term_from_string(FileName, String, Len, Posn0, Posn, ReadResult),
+	Posn = posn(LineNumber, _, _),
+	io__process_read_term(ReadResult, LineNumber, Result).
+
+:- pred io__process_read_term(read_term, int, io__read_result(T)).
+:- mode io__process_read_term(in, in, out) is det.
+
+io__process_read_term(ReadResult, LineNumber, Result) :-
 	(	
-		{ ReadResult = term(_VarSet, Term) },
-		( { term_to_type(Term, Type) } ->
-			{ Result = ok(Type) }
+		ReadResult = term(_VarSet, Term),
+		( term_to_type(Term, Type) ->
+			Result = ok(Type)
 		;
-			io__get_line_number(LineNumber),
-			( { \+ term__is_ground(Term) } ->
-				{ Result = error("io__read: the term read was not a ground term", LineNumber) }
+			( \+ term__is_ground(Term) ->
+				Result = error("io__read: the term read was not a ground term",
+					LineNumber)
 			;
-				{ Result = error("io__read: the term read did not have the right type", LineNumber) }
+				Result = error(
+					"io__read: the term read did not have the right type",
+					LineNumber)
 			)
 		)
 	;
-		{ ReadResult = eof },
-		{ Result = eof }
+		ReadResult = eof,
+		Result = eof
 	;
-		{ ReadResult = error(String, Int) },
-		{ Result = error(String, Int) }
+		ReadResult = error(String, Int),
+		Result = error(String, Int)
 	).
 
 io__read_anything(Stream, Result) -->
Index: lexer.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/lexer.m,v
retrieving revision 1.29
diff -u -u -r1.29 lexer.m
--- lexer.m	1998/03/03 17:25:59	1.29
+++ lexer.m	1998/05/12 19:37:26
@@ -55,6 +55,25 @@
 %	Keep reading until either we encounter either an `end' token
 %	(i.e. a full stop followed by whitespace) or the end-of-file.
 
+% The type `offset' represents a (zero-based) offset into a string.
+:- type offset == int.
+
+:- pred lexer__string_get_token_list(string, offset, token_list, posn, posn).
+:- mode lexer__string_get_token_list(in, in, out, in, out) is det.
+% lexer__string_get_token_list(String, MaxOffset, Tokens, InitialPos, FinalPos):
+%	Scan a list of tokens from a string,
+%	starting at the current offset specified by InitialPos.
+%	Keep scanning until either we encounter either an `end' token
+%	(i.e. a full stop followed by whitespace) or until
+%	we reach MaxOffset.  (MaxOffset must be =< the length of the string.)
+%	Return the tokens scanned in Tokens, and return the position
+%	one character past the end of the last token in FinalPos.
+
+:- pred lexer__string_get_token_list(string, token_list, posn, posn).
+:- mode lexer__string_get_token_list(in, out, in, out) is det.
+% lexer__string_get_token_list(String, Tokens, InitialPos, FinalPos):
+%	calls string_get_token_list/5 above with MaxPos = length of String.
+
 :- pred lexer__token_to_string(token, string).
 :- mode lexer__token_to_string(in, out) is det.
 %	Convert a token to a human-readable string describing the token.
@@ -72,8 +91,40 @@
 %-----------------------------------------------------------------------------%
 
 :- implementation.
-:- import_module list, require, string, int.
+:- import_module list, term, require, string, int.
 
+%
+% Note that there are two implementations of most predicates here:
+% one which deals with strings, the other that deals with io__states.
+% We can't write the io__state version in terms of the string
+% version because we don't know how much string to slurp up
+% until after we've lexically analysed it.  Some interactive
+% applications require the old Prolog behaviour of stopping
+% after an end token (i.e. `.' plus whitespace) rather than
+% reading in whole lines.
+% Conversely, we can't write the string version using the io__state
+% version, since that would require either cheating with the io__state
+% or ruining the string interface.
+% 
+% An alternative would be to write both versions in terms
+% of a generic "char_stream" typeclass, with instances
+% for io__states and for strings.
+% However, for this to be acceptably efficient it would
+% require the compiler to specialize the code, which
+% currently (13 May 98) it is not capable of doing.
+%
+% In fact, the string version is still not as efficient as I would
+% like.  The compiler ought to (but currently doesn't) unfold all
+% the instances of the `posn' type.  We could do this type unfolding
+% by hand, but it would be very tedious and it would make the code
+% less readable.  If/when there is compiler support for this, we
+% should also think about moving the `String' and `Len' arguments
+% into the posn (or making a new `lexer_state' struct which contains
+% both the posn and the String and Len arguments).
+%
+
+%-----------------------------------------------------------------------------%
+
 lexer__token_to_string(name(Name), String) :-
 	string__append_list(["token '", Name, "'"], String).
 lexer__token_to_string(variable(Var), String) :-
@@ -128,12 +179,92 @@
 		lexer__get_token_list(Tokens1)
 	).
 
+lexer__string_get_token_list(String, Tokens) -->
+	{ string__length(String, Len) },
+	lexer__string_get_token_list(String, Len, Tokens).
+
+lexer__string_get_token_list(String, Len, Tokens) -->
+	lexer__string_get_token(String, Len, Token, Context),
+	( { Token = eof } ->
+		{ Tokens = token_nil }
+	; { Token = end ; Token = error(_) ; Token = io_error(_) } ->
+		{ Tokens = token_cons(Token, Context, token_nil) }
+	;
+		{ Tokens = token_cons(Token, Context, Tokens1) },
+		lexer__string_get_token_list(String, Len, Tokens1)
+	).
+
+%-----------------------------------------------------------------------------%
+
+% some low-level routines
+
 :- pred lexer__get_context(token_context, io__state, io__state).
 :- mode lexer__get_context(out, di, uo) is det.
 
 lexer__get_context(Context) -->
 	io__get_line_number(Context).
 
+:- type string_token_context == token_context.
+
+:- pred lexer__string_get_context(posn, string_token_context, posn, posn).
+:- mode lexer__string_get_context(in, out, in, out) is det.
+
+lexer__string_get_context(StartPosn, Context, EndPosn, EndPosn) :-
+	StartPosn = posn(StartLineNum, _, _),
+	Context = StartLineNum.
+	% In future, we might want to modify this code to read something
+	% like this:
+	%	posn_to_line_and_column(StartPosn, StartLineNum, StartColumn),
+	%	posn_to_line_and_column(EndPosn, EndLineNum, EndColumn),
+	%	Context = detailed(StartLine, StartColumn, EndLine, EndColumn).
+
+:- pred lexer__string_read_char(string, int, char, posn, posn).
+:- mode lexer__string_read_char(in, in, out, in, out) is semidet.
+
+:- pragma inline(lexer__string_read_char/5).
+
+lexer__string_read_char(String, Len, Char, Posn0, Posn) :-
+	Posn0 = posn(LineNum0, LineOffset0, Offset0),
+	Offset0 < Len,
+	string__unsafe_index(String, Offset0, Char),
+	Offset is Offset0 + 1,
+	( Char = '\n' ->
+		LineNum is LineNum0 + 1,
+		Posn = posn(LineNum, Offset, Offset)
+	;
+		Posn = posn(LineNum0, LineOffset0, Offset)
+	).
+
+:- pred lexer__string_ungetchar(string, posn, posn).
+:- mode lexer__string_ungetchar(in, in, out) is det.
+
+lexer__string_ungetchar(String, Posn0, Posn) :-
+	Posn0 = posn(LineNum0, LineOffset0, Offset0),
+	Offset is Offset0 - 1,
+	string__unsafe_index(String, Offset, Char),
+	( Char = '\n' ->
+		LineNum is LineNum0 - 1,
+		Posn = posn(LineNum, Offset, Offset)
+	;
+		Posn = posn(LineNum0, LineOffset0, Offset)
+	).
+
+:- pred lexer__grab_string(string, posn, string, posn, posn).
+:- mode lexer__grab_string(in, in, out, in, out) is det.
+
+lexer__grab_string(String, Posn0, SubString, Posn, Posn) :-
+	Posn0 = posn(_, _, Offset0),
+	Posn = posn(_, _, Offset),
+	Count is Offset - Offset0,
+	string__unsafe_substring(String, Offset0, Count, SubString).
+
+:- pred lexer__string_set_line_number(int, posn, posn).
+:- mode lexer__string_set_line_number(in, in, out) is det.
+
+lexer__string_set_line_number(LineNumber, Posn0, Posn) :-
+	Posn0 = posn(_, _, Offset),
+	Posn = posn(LineNumber, Offset, Offset).
+
 %-----------------------------------------------------------------------------%
 
 :- pred lexer__get_token(token, token_context, io__state, io__state).
@@ -190,6 +321,60 @@
 		)
 	).
 
+:- pred lexer__string_get_token(string, int, token, token_context, posn, posn).
+:- mode lexer__string_get_token(in, in, out, out, in, out) is det.
+
+lexer__string_get_token(String, Len, Token, Context) -->
+	=(Posn0),
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ' ' ; Char = '\t' ; Char = '\n' } ->
+			lexer__string_get_token_2(String, Len, Token, Context)
+		; { char__is_upper(Char) ; Char = '_' } ->
+			lexer__string_get_variable(String, Len, Posn0,
+				Token, Context)
+		; { char__is_lower(Char) } ->
+			lexer__string_get_name(String, Len, Posn0,
+				Token, Context)
+		; { Char = '0' } ->
+			lexer__string_get_zero(String, Len, Posn0,
+				Token, Context)
+		; { char__is_digit(Char) } ->
+			lexer__string_get_number(String, Len, Posn0,
+				Token, Context)
+		; { lexer__special_token(Char, SpecialToken) } ->
+			lexer__string_get_context(Posn0, Context),
+			{ SpecialToken = open ->
+				Token = open_ct
+			;
+				Token = SpecialToken
+			}
+		; { Char = ('.') } ->
+			lexer__string_get_dot(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('%') } ->
+			lexer__string_skip_to_eol(String, Len, Token, Context)
+		; { Char = '"' ; Char = '''' } ->
+			lexer__string_get_quoted_name(String, Len, Char, [], 
+				Posn0, Token, Context)
+		; { Char = ('/') } ->
+			lexer__string_get_slash(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('#') } ->
+			=(Posn1),
+			lexer__string_get_source_line_number(String, Len,
+				Posn1, Token, Context)
+		; { lexer__graphic_token_char(Char) } ->
+			lexer__string_get_graphic(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_get_context(Posn0, Context),
+			{ Token = junk(Char) }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = eof }
+	).
+
 %-----------------------------------------------------------------------------%
 
 :- pred lexer__get_token_2(token, token_context, io__state, io__state).
@@ -246,6 +431,59 @@
 		)
 	).
 
+:- pred lexer__string_get_token_2(string, int, token, token_context,
+					posn, posn).
+:- mode lexer__string_get_token_2(in, in, out, out, in, out) is det.
+
+lexer__string_get_token_2(String, Len, Token, Context) -->
+	=(Posn0),
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ' ' ; Char = '\t' ; Char = '\n' } ->
+			lexer__string_get_token_2(String, Len, Token, Context)
+		; { char__is_upper(Char) ; Char = '_' } ->
+			lexer__string_get_variable(String, Len, Posn0,
+				Token, Context)
+		; { char__is_lower(Char) } ->
+			lexer__string_get_name(String, Len, Posn0,
+				Token, Context)
+		; { Char = '0' } ->
+			lexer__string_get_zero(String, Len, Posn0,
+				Token, Context)
+		; { char__is_digit(Char) } ->
+			lexer__string_get_number(String, Len, Posn0,
+				Token, Context)
+		; { lexer__special_token(Char, SpecialToken) } ->
+			lexer__string_get_context(Posn0, Context),
+			{ Token = SpecialToken }
+		; { Char = ('.') } ->
+			lexer__string_get_dot(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('%') } ->
+			lexer__string_skip_to_eol(String, Len, Token, Context)
+		; { Char = '"' ; Char = '''' } ->
+			lexer__string_get_quoted_name(String, Len, Char, [],
+				Posn0, Token, Context)
+		; { Char = ('/') } ->
+			lexer__string_get_slash(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('#') } ->
+			=(Posn1),
+			lexer__string_get_source_line_number(String, Len,
+				Posn1, Token, Context)
+		; { lexer__graphic_token_char(Char) } ->
+			lexer__string_get_graphic(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_get_context(Posn0, Context),
+			{ Token = junk(Char) }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = eof }
+	).
+
+%-----------------------------------------------------------------------------%
+
 %-----------------------------------------------------------------------------%
 
 :- pred lexer__special_token(char, token).
@@ -303,6 +541,29 @@
 		)
 	).
 
+:- pred lexer__string_get_dot(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_dot(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_dot(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { lexer__whitespace_after_dot(Char) } ->
+			lexer__string_ungetchar(String),
+			lexer__string_get_context(Posn0, Context),
+			{ Token = end }
+		; { lexer__graphic_token_char(Char) } ->
+			lexer__string_get_graphic(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__string_get_context(Posn0, Context),
+			{ Token = name(".") }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = end }
+	).
+
 :- pred lexer__whitespace_after_dot(char).
 :- mode lexer__whitespace_after_dot(in) is semidet.
 
@@ -334,6 +595,23 @@
 		)
 	).
 
+:- pred lexer__string_skip_to_eol(string, int, token, token_context,
+					posn, posn).
+:- mode lexer__string_skip_to_eol(in, in, out, out, in, out) is det.
+
+lexer__string_skip_to_eol(String, Len, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = '\n' } ->
+			lexer__string_get_token_2(String, Len, Token, Context)
+		;
+			lexer__string_skip_to_eol(String, Len, Token, Context)
+		)
+	;
+		=(Posn),
+		lexer__string_get_context(Posn, Context),
+		{ Token = eof }
+	).
+
 :- pred lexer__get_slash(token, token_context, io__state, io__state).
 :- mode lexer__get_slash(out, out, di, uo) is det.
 
@@ -352,12 +630,34 @@
 			lexer__get_context(Context),
 			lexer__get_graphic([Char, '/'], Token)
 		;
-			lexer__get_context(Context),
 			io__putback_char(Char),
+			lexer__get_context(Context),
 			{ Token = name("/") }
 		)
 	).
 
+:- pred lexer__string_get_slash(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_slash(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_slash(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ('*') } ->
+			lexer__string_get_comment(String, Len, Posn0,
+				Token, Context)
+		; { lexer__graphic_token_char(Char) } ->
+			lexer__string_get_graphic(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__string_get_context(Posn0, Context),
+			{ Token = name("/") }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = name("/") }
+	).
+
 :- pred lexer__get_comment(token, token_context, io__state, io__state).
 :- mode lexer__get_comment(out, out, di, uo) is det.
 
@@ -377,6 +677,24 @@
 		)
 	).
 
+:- pred lexer__string_get_comment(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_comment(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_comment(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ('*') } ->
+			lexer__string_get_comment_2(String, Len, Posn0,
+					Token, Context)
+		;
+			lexer__string_get_comment(String, Len, Posn0,
+					Token, Context)
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("unterminated '/*' comment") }
+	).
+
 :- pred lexer__get_comment_2(token, token_context, io__state, io__state).
 :- mode lexer__get_comment_2(out, out, di, uo) is det.
 
@@ -390,6 +708,7 @@
 		{ Token = error("unterminated '/*' comment") }
 	; { Result = ok(Char) },
 		( { Char = ('/') } ->
+			% end of /* ... */ comment, so get next token
 			lexer__get_token_2(Token, Context)
 		; { Char = ('*') } ->
 			lexer__get_comment_2(Token, Context)
@@ -398,6 +717,27 @@
 		)
 	).
 
+:- pred lexer__string_get_comment_2(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_comment_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_comment_2(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ('/') } ->
+			% end of /* ... */ comment, so get next token
+			lexer__string_get_token_2(String, Len, Token, Context)
+		; { Char = ('*') } ->
+			lexer__string_get_comment_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_get_comment(String, Len, Posn0,
+				Token, Context)
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("unterminated '/*' comment") }
+	).
+
 %-----------------------------------------------------------------------------%
 
 	% quoted names and quoted strings
@@ -422,6 +762,29 @@
 		)
 	).
 
+:- pred lexer__string_get_quoted_name(string, int, char, list(char), posn,
+				token, string_token_context, posn, posn).
+:- mode lexer__string_get_quoted_name(in, in, in, in, in, out, out, in, out)
+	is det.
+
+lexer__string_get_quoted_name(String, Len, QuoteChar, Chars, Posn0,
+		Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = QuoteChar } ->
+			lexer__string_get_quoted_name_quote(String, Len,
+				QuoteChar, Chars, Posn0, Token, Context)
+		; { Char = ('\\') } ->
+			lexer__string_get_quoted_name_escape(String, Len,
+				QuoteChar, Chars, Posn0, Token, Context)
+		;
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				[Char | Chars], Posn0, Token, Context)
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("unterminated quote") }
+	).
+
 :- pred lexer__get_quoted_name_quote(char, list(char), token,
 				io__state, io__state).
 :- mode lexer__get_quoted_name_quote(in, in, out, di, uo) is det.
@@ -441,6 +804,27 @@
 		)
 	).
 
+:- pred lexer__string_get_quoted_name_quote(string, int, char, list(char),
+				posn, token, string_token_context, posn, posn).
+:- mode lexer__string_get_quoted_name_quote(in, in, in, in, in, out, out,
+				in, out) is det.
+
+lexer__string_get_quoted_name_quote(String, Len, QuoteChar, Chars, Posn0,
+		Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = QuoteChar } ->
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				[Char | Chars], Posn0, Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__string_get_context(Posn0, Context),
+			{ lexer__finish_quoted_name(QuoteChar, Chars, Token) }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ lexer__finish_quoted_name(QuoteChar, Chars, Token) }
+	).
+
 :- pred lexer__finish_quoted_name(char, list(char), token).
 :- mode lexer__finish_quoted_name(in, in, out) is det.
 
@@ -480,6 +864,38 @@
 		)
 	).
 
+:- pred lexer__string_get_quoted_name_escape(string, int, char, list(char),
+					posn, token, string_token_context,
+					posn, posn).
+:- mode lexer__string_get_quoted_name_escape(in, in, in, in, in, out, out,
+					in, out) is det.
+
+lexer__string_get_quoted_name_escape(String, Len, QuoteChar, Chars, Posn0,
+		Token, Context) -->
+	=(Posn1),
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = '\n' } ->
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				Chars, Posn0, Token, Context)
+		; { lexer__escape_char(Char, EscapedChar) } ->
+			{ Chars1 = [EscapedChar | Chars] },
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				Chars1, Posn0, Token, Context)
+		; { Char = 'x' } ->
+			lexer__string_get_hex_escape(String, Len, QuoteChar,
+				Chars, [], Posn0, Token, Context)
+		; { char__is_octal_digit(Char) } ->
+			lexer__string_get_octal_escape(String, Len, QuoteChar,
+				Chars, [Char], Posn0, Token, Context)
+		;
+			lexer__string_get_context(Posn1, Context),
+			{ Token = error("invalid escape character") }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("unterminated quoted name") }
+	).
+
 :- pred lexer__escape_char(char, char).
 :- mode lexer__escape_char(in, out) is semidet.
 
@@ -517,6 +933,30 @@
 		)
 	).
 
+:- pred lexer__string_get_hex_escape(string, int, char, list(char), list(char),
+				posn, token, string_token_context, posn, posn).
+:- mode lexer__string_get_hex_escape(in, in, in, in, in, in, out, out, in, out)
+	is det.
+
+lexer__string_get_hex_escape(String, Len, QuoteChar, Chars, HexChars, Posn0,
+		Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_hex_digit(Char) } ->
+			lexer__string_get_hex_escape(String, Len, QuoteChar,
+				    Chars, [Char | HexChars], Posn0,
+				    Token, Context)
+		; { Char = ('\\') } ->
+			lexer__string_finish_hex_escape(String, Len, QuoteChar,
+				    Chars, HexChars, Posn0, Token, Context)
+		;
+			lexer__string_get_context(Posn0, Context),
+			{ Token = error("unterminated hex escape") }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("unterminated quote") }
+	).
+
 :- pred lexer__finish_hex_escape(char, list(char), list(char),
 				token, io__state, io__state).
 :- mode lexer__finish_hex_escape(in, in, in, out, di, uo) is det.
@@ -536,6 +976,31 @@
 		)
 	).
 
+:- pred lexer__string_finish_hex_escape(string, int, char, list(char),
+				list(char), posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_finish_hex_escape(in, in, in, in, in, in, out, out, in, out)
+				is det.
+
+lexer__string_finish_hex_escape(String, Len, QuoteChar, Chars, HexChars, Posn0,
+		Token, Context) -->
+	( { HexChars = [] } ->
+		lexer__string_get_context(Posn0, Context),
+		{ Token = error("empty hex escape") }
+	;
+		{ lexer__rev_char_list_to_string(HexChars, HexString) },
+		(
+			{ string__base_string_to_int(16, HexString, Int) },
+			{ char__to_int(Char, Int) }
+		->
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				[Char|Chars], Posn0, Token, Context)
+		;
+			lexer__string_get_context(Posn0, Context),
+			{ Token = error("invalid hex escape") }
+		)
+	).
+
 :- pred lexer__get_octal_escape(char, list(char), list(char),
 				token, io__state, io__state).
 :- mode lexer__get_octal_escape(in, in, in, out, di, uo) is det.
@@ -566,6 +1031,40 @@
 		)
 	).
 
+:- pred lexer__string_get_octal_escape(string, int, char, list(char),
+				list(char), posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_octal_escape(in, in, in, in, in, in, out, out,
+				in, out) is det.
+
+lexer__string_get_octal_escape(String, Len, QuoteChar, Chars, OctalChars,
+		Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_octal_digit(Char) } ->
+			lexer__string_get_octal_escape(String, Len,
+				QuoteChar, Chars, [Char | OctalChars], Posn0,
+				Token, Context)
+		; { Char = ('\\') } ->
+			lexer__string_finish_octal_escape(String, Len,
+				QuoteChar, Chars, OctalChars, Posn0,
+				Token, Context)
+		;
+			/****** 
+				% We don't report this as an error since
+				% we need bug-for-bug compatibility with
+				% NU-Prolog
+			{ Token = error("unterminated octal escape") }
+			******/
+			lexer__string_ungetchar(String),
+			lexer__string_finish_octal_escape(String, Len,
+				QuoteChar, Chars, OctalChars, Posn0,
+				Token, Context)
+		)
+	;
+		{ Token = error("unterminated quote") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__finish_octal_escape(char, list(char), list(char),
 				token, io__state, io__state).
 :- mode lexer__finish_octal_escape(in, in, in, out, di, uo) is det.
@@ -585,6 +1084,31 @@
 		)
 	).
 
+:- pred lexer__string_finish_octal_escape(string, int, char, list(char),
+				list(char), posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_finish_octal_escape(in, in, in, in, in, in, out, out,
+				in, out) is det.
+
+lexer__string_finish_octal_escape(String, Len, QuoteChar, Chars, OctalChars,
+		Posn0, Token, Context) -->
+	( { OctalChars = [] } ->
+		{ Token = error("empty octal escape") },
+		lexer__string_get_context(Posn0, Context)
+	;
+		{ lexer__rev_char_list_to_string(OctalChars, OctalString) },
+		(
+			{ string__base_string_to_int(8, OctalString, Int) },
+			{ char__to_int(Char, Int) }
+		->
+			lexer__string_get_quoted_name(String, Len, QuoteChar,
+				[Char|Chars], Posn0, Token, Context) 
+		;
+			{ Token = error("invalid octal escape") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	).
+
 %-----------------------------------------------------------------------------%
 
 	% names and variables
@@ -609,6 +1133,27 @@
 		)
 	).
 
+:- pred lexer__string_get_name(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_name(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_name(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_alnum_or_underscore(Char) } ->
+			lexer__string_get_name(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, Name),
+			{ Token = name(Name) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, Name),
+		{ Token = name(Name) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 	%
 	% A line number directive token is `#' followed by an integer
 	% (specifying the line number) followed by a newline.
@@ -664,6 +1209,59 @@
 		)
 	).
 
+	%
+	% A line number directive token is `#' followed by an integer
+	% (specifying the line number) followed by a newline.
+	% Such a token sets the source line number for the next
+	% line, but it is otherwise ignored.  This means that line number
+	% directives may appear anywhere that a token may appear, including
+	% in the middle of terms.
+	% (The source file name can be set with a `:- pragma source_file' 
+	% declaration.)
+	%
+
+:- pred lexer__string_get_source_line_number(string, int, posn,
+		token, token_context, posn, posn).
+:- mode lexer__string_get_source_line_number(in, in, in, out, out, in, out)
+		is det.
+
+lexer__string_get_source_line_number(String, Len, Posn1, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_source_line_number(String, Len,
+				Posn1, Token, Context)
+		; { Char = '\n' } ->
+			lexer__grab_string(String, Posn1, LineNumString),
+			(
+				{ string__base_string_to_int(10, LineNumString,
+					LineNum) },
+				{ LineNum > 0 }
+			->
+				lexer__string_set_line_number(LineNum),
+				lexer__string_get_token(String, Len, Token, Context)
+			;
+				lexer__string_get_context(Posn1, Context),
+				{ string__append_list([
+					"invalid line number `", LineNumString,
+					"' in `#' line number directive"],
+					Message) },
+				{ Token = error(Message) }
+			)
+		;
+			lexer__string_get_context(Posn1, Context),
+			{ string__from_char_list([Char], DirectiveString) },
+			{ string__append_list([
+				"invalid character `", DirectiveString,
+				"' in `#' line number directive"],
+				Message) },
+			{ Token = error(Message) }
+		)
+	;
+		lexer__string_get_context(Posn1, Context),
+		{ Token = error(
+			"unexpected end-of-file in `#' line number directive") }
+	).
+
 :- pred lexer__get_graphic(list(char), token, io__state, io__state).
 :- mode lexer__get_graphic(in, out, di, uo) is det.
 
@@ -684,6 +1282,27 @@
 		)
 	).
 
+:- pred lexer__string_get_graphic(string, int, posn, token,
+			string_token_context, posn, posn).
+:- mode lexer__string_get_graphic(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_graphic(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { lexer__graphic_token_char(Char) } ->
+			lexer__string_get_graphic(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, Name),
+			{ Token = name(Name) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, Name),
+		lexer__string_get_context(Posn0, Context),
+		{ Token = name(Name) }
+	).
+
 :- pred lexer__get_variable(list(char), token, io__state, io__state).
 :- mode lexer__get_variable(in, out, di, uo) is det.
 
@@ -704,6 +1323,27 @@
 		)
 	).
 
+:- pred lexer__string_get_variable(string, int, posn, token,
+					string_token_context, posn, posn).
+:- mode lexer__string_get_variable(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_variable(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_alnum_or_underscore(Char) } ->
+			lexer__string_get_variable(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, VariableName),
+			{ Token = variable(VariableName) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, VariableName),
+		{ Token = variable(VariableName) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 %-----------------------------------------------------------------------------%
 
 	% integer and float literals
@@ -738,6 +1378,43 @@
 		)
 	).
 
+:- pred lexer__string_get_zero(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_zero(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_zero(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_number(String, Len, Posn0,
+				Token, Context)
+		; { Char = '''' } ->
+			lexer__string_get_char_code(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'b' } ->
+			lexer__string_get_binary(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'o' } ->
+			lexer__string_get_octal(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'x' } ->
+			lexer__string_get_hex(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('.') } ->
+			lexer__string_get_int_dot(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'e' ; Char = 'E' } ->
+			lexer__string_get_float_exponent(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__string_get_context(Posn0, Context),
+			{ Token = integer(0) }
+		)
+	;
+		lexer__string_get_context(Posn0, Context),
+		{ Token = integer(0) }
+	).
+
 :- pred lexer__get_char_code(token, io__state, io__state).
 :- mode lexer__get_char_code(out, di, uo) is det.
 
@@ -752,6 +1429,20 @@
 		{ Token = integer(CharCode) }
 	).
 
+:- pred lexer__string_get_char_code(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_char_code(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_char_code(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		{ char__to_int(Char, CharCode) },
+		{ Token = integer(CharCode) },
+		lexer__string_get_context(Posn0, Context)
+	;
+		{ Token = error("unterminated char code constant") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_binary(token, io__state, io__state).
 :- mode lexer__get_binary(out, di, uo) is det.
 
@@ -770,6 +1461,25 @@
 		)
 	).
 
+:- pred lexer__string_get_binary(string, int, posn, token, string_token_context,
+		posn, posn).
+:- mode lexer__string_get_binary(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_binary(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_binary_digit(Char) } ->
+			lexer__string_get_binary_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			{ Token = error("unterminated binary constant") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		{ Token = error("unterminated binary constant") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_binary_2(list(char), token, io__state, io__state).
 :- mode lexer__get_binary_2(in, out, di, uo) is det.
 
@@ -788,6 +1498,27 @@
 		)
 	).
 
+:- pred lexer__string_get_binary_2(string, int, posn, token,
+			string_token_context, posn, posn).
+:- mode lexer__string_get_binary_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_binary_2(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_binary_digit(Char) } ->
+			lexer__string_get_binary_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, BinaryString),
+			{ lexer__conv_string_to_int(BinaryString, 2, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, BinaryString),
+		{ lexer__conv_string_to_int(BinaryString, 2, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_octal(token, io__state, io__state).
 :- mode lexer__get_octal(out, di, uo) is det.
 
@@ -806,6 +1537,25 @@
 		)
 	).
 
+:- pred lexer__string_get_octal(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_octal(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_octal(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_octal_digit(Char) } ->
+			lexer__string_get_octal_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			{ Token = error("unterminated octal constant") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		{ Token = error("unterminated octal constant") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_octal_2(list(char), token, io__state, io__state).
 :- mode lexer__get_octal_2(in, out, di, uo) is det.
 
@@ -824,6 +1574,27 @@
 		)
 	).
 
+:- pred lexer__string_get_octal_2(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_octal_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_octal_2(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_octal_digit(Char) } ->
+			lexer__string_get_octal_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, BinaryString),
+			{ lexer__conv_string_to_int(BinaryString, 8, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, BinaryString),
+		{ lexer__conv_string_to_int(BinaryString, 8, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_hex(token, io__state, io__state).
 :- mode lexer__get_hex(out, di, uo) is det.
 
@@ -842,6 +1613,26 @@
 		)
 	).
 
+:- pred lexer__string_get_hex(string, int, posn, token, string_token_context,
+				posn, posn).
+:- mode lexer__string_get_hex(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_hex(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_hex_digit(Char) } ->
+			lexer__string_get_hex_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			{ Token = error("unterminated hex constant") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		{ Token = error("unterminated hex constant") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
+
 :- pred lexer__get_hex_2(list(char), token, io__state, io__state).
 :- mode lexer__get_hex_2(in, out, di, uo) is det.
 
@@ -860,6 +1651,28 @@
 		)
 	).
 
+:- pred lexer__string_get_hex_2(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_hex_2(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_hex_2(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_hex_digit(Char) } ->
+			lexer__string_get_hex_2(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, BinaryString),
+			{ lexer__conv_string_to_int(BinaryString, 16, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, BinaryString),
+		{ lexer__conv_string_to_int(BinaryString, 8, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
+
 :- pred lexer__get_number(list(char), token, io__state, io__state).
 :- mode lexer__get_number(in, out, di, uo) is det.
 
@@ -882,6 +1695,33 @@
 		)
 	).
 
+:- pred lexer__string_get_number(string, int, posn, token,
+					string_token_context, posn, posn).
+:- mode lexer__string_get_number(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_number(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_number(String, Len, Posn0,
+				Token, Context)
+		; { Char = ('.') } ->
+			lexer__string_get_int_dot(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'e' ; Char = 'E' } ->
+			lexer__string_get_float_exponent(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, NumberString),
+			{ lexer__conv_string_to_int(NumberString, 10, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, NumberString),
+		{ lexer__conv_string_to_int(NumberString, 10, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 	% XXX the float literal syntax doesn't match ISO Prolog
 
 :- pred lexer__get_int_dot(list(char), token, io__state, io__state).
@@ -904,6 +1744,29 @@
 		)
 	).
 
+:- pred lexer__string_get_int_dot(string, int, posn, token,
+					string_token_context, posn, posn).
+:- mode lexer__string_get_int_dot(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_int_dot(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_float_decimals(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, NumberString),
+			{ lexer__conv_string_to_int(NumberString, 10, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__string_ungetchar(String),
+		lexer__grab_string(String, Posn0, NumberString),
+		{ lexer__conv_string_to_int(NumberString, 10, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_float_decimals(list(char), token, io__state, io__state).
 :- mode lexer__get_float_decimals(in, out, di, uo) is det.
 
@@ -926,6 +1789,30 @@
 		)
 	).
 
+:- pred lexer__string_get_float_decimals(string, int, posn, token,
+					string_token_context, posn, posn).
+:- mode lexer__string_get_float_decimals(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_float_decimals(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_float_decimals(String, Len, Posn0,
+				Token, Context)
+		; { Char = 'e' ; Char = 'E' } ->
+			lexer__string_get_float_exponent(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, FloatString),
+			{ lexer__conv_to_float(FloatString, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, FloatString),
+		{ lexer__conv_to_float(FloatString, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_float_exponent(list(char), token, io__state, io__state).
 :- mode lexer__get_float_exponent(in, out, di, uo) is det.
 
@@ -947,6 +1834,30 @@
 		)
 	).
 
+:- pred lexer__string_get_float_exponent(string, int, posn, token, 	
+				string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent(in, in, in, out, out, in, out) is det.
+
+lexer__string_get_float_exponent(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { Char = ('+') ; Char = ('-') } ->
+			lexer__string_get_float_exponent_2(String, Len, Posn0,
+				Token, Context)
+		; { char__is_digit(Char) } ->
+			lexer__string_get_float_exponent_3(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			{ Token =
+			  error("unterminated exponent in float token") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, FloatString),
+		{ lexer__conv_to_float(FloatString, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_float_exponent_2(list(char), token,
 				io__state, io__state).
 :- mode lexer__get_float_exponent_2(in, out, di, uo) is det.
@@ -971,6 +1882,31 @@
 		)
 	).
 
+:- pred lexer__string_get_float_exponent_2(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent_2(in, in, in, out, out, in, out)
+				is det.
+
+	% we've read past the E signalling the start of the exponent -
+	% make sure that there's at least one digit following,
+	% and then get the remaining digits
+
+lexer__string_get_float_exponent_2(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_float_exponent_3(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			{ Token =
+			  error("unterminated exponent in float token") },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		{ Token = error("unterminated exponent in float token") },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 :- pred lexer__get_float_exponent_3(list(char), token,
 					io__state, io__state).
 :- mode lexer__get_float_exponent_3(in, out, di, uo) is det.
@@ -993,6 +1929,28 @@
 		)
 	).
 
+:- pred lexer__string_get_float_exponent_3(string, int, posn, token,
+				string_token_context, posn, posn).
+:- mode lexer__string_get_float_exponent_3(in, in, in, out, out, in, out)
+				is det.
+
+lexer__string_get_float_exponent_3(String, Len, Posn0, Token, Context) -->
+	( lexer__string_read_char(String, Len, Char) ->
+		( { char__is_digit(Char) } ->
+			lexer__string_get_float_exponent_3(String, Len, Posn0,
+				Token, Context)
+		;
+			lexer__string_ungetchar(String),
+			lexer__grab_string(String, Posn0, FloatString),
+			{ lexer__conv_to_float(FloatString, Token) },
+			lexer__string_get_context(Posn0, Context)
+		)
+	;
+		lexer__grab_string(String, Posn0, FloatString),
+		{ lexer__conv_to_float(FloatString, Token) },
+		lexer__string_get_context(Posn0, Context)
+	).
+
 %-----------------------------------------------------------------------------%
 
 	% Utility routines
@@ -1002,6 +1960,12 @@
 
 lexer__rev_char_list_to_int(RevChars, Base, Token) :-
 	lexer__rev_char_list_to_string(RevChars, String),
+	lexer__conv_string_to_int(String, Base, Token).
+
+:- pred lexer__conv_string_to_int(string, int, token).
+:- mode lexer__conv_string_to_int(in, in, out) is det.
+
+lexer__conv_string_to_int(String, Base, Token) :-
 	( string__base_string_to_int(Base, String, Int) ->
 		Token = integer(Int)
 	;
@@ -1013,6 +1977,12 @@
 
 lexer__rev_char_list_to_float(RevChars, Token) :-
 	lexer__rev_char_list_to_string(RevChars, String),
+	lexer__conv_to_float(String, Token).
+
+:- pred lexer__conv_to_float(string, token).
+:- mode lexer__conv_to_float(in, out) is det.
+
+lexer__conv_to_float(String, Token) :-
 	( string__to_float(String, Float) ->
 		Token = float(Float)
 	;
Index: parser.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/parser.m,v
retrieving revision 1.29
diff -u -u -r1.29 parser.m
--- parser.m	1998/01/23 12:33:26	1.29
+++ parser.m	1998/05/12 19:37:31
@@ -9,10 +9,16 @@
 % stability: high.
 %
 % This file exports the predicate parser__read_term, which reads
-% a term from the current input stream.  The parser and lexer are
-% intended to exactly follow ISO Prolog syntax, but there are some
-% departures from that for three reasons:
+% a term from the current input stream.
+% The parser__read_term_from_string predicates are the same as the
+% read_term predicates, except that the term is read from
+% a string rather than from the current input stream.
+% The parser__parse_token_list predicate is similar,
+% but it takes a list of tokens rather than a string.
 %
+% The parser and lexer are intended to exactly follow ISO Prolog
+% syntax, but there are some departures from that for three reasons:
+%
 %	(1) I wrote some of the code at home when the ISO Prolog draft
 %	    was at uni - so in some places I just guessed.
 %	(2) In some places the lexer reports an error when it shouldn't.
@@ -28,19 +34,60 @@
 
 :- module parser.
 :- interface.
-:- import_module io, term_io.
+:- import_module io, term_io, lexer.
+
+%-----------------------------------------------------------------------------%
+%
+% parser__read_term/{3,4}:
+%	Reads in (and parses) terms from the current input stream.
 
 :- pred parser__read_term(read_term, io__state, io__state).
 :- mode parser__read_term(out, di, uo) is det.
+% parser__read_term(Result):
+%	Reads a term from the current input stream.
 
-	% The string is the filename to use for the current input stream;
-	% this is used in constructing the term__contexts in the read term.
-	% This interface is used to support the `:- pragma source_file'
-	% directive.
 :- pred parser__read_term(string, read_term, io__state, io__state).
 :- mode parser__read_term(in, out, di, uo) is det.
+% parser__read_term(FileName, Result):
+%	Reads a term from the current input stream.
+%	The string is the filename to use for the current input stream;
+%	this is used in constructing the term__contexts in the read term.
+%	This interface is used to support the `:- pragma source_file'
+%	directive.
+
+%-----------------------------------------------------------------------------%
+%
+% parser__read_term_from_string/{4,6}:
+%	Parses terms from a string.
+
+	% The read_term_from_string predicates are the same as the
+	% read_term predicates, except that the term is read from
+	% a string rather than from the current input stream.
+	% The returned value `EndPos' is the position one character
+	% past the end of the term read.
+	% The arguments `MaxOffset' and `StartPos' in the six-argument version
+	% specify the length of the string and the position within the
+	% string at which to start parsing.
+
+:- pred parser__read_term_from_string(string, string, posn, read_term).
+:- mode parser__read_term_from_string(in, in, out, out) is det.
+%	parser__read_term_from_string(FileName, String, EndPos, Term).
+
+:- pred parser__read_term_from_string(string, string, int, posn, posn, read_term).
+:- mode parser__read_term_from_string(in, in, in, in, out, out) is det.
+%	parser__read_term_from_string(FileName, String, MaxOffset, StartPos,
+%				    EndPos, Term).
 
 %-----------------------------------------------------------------------------%
+%
+% parser__parse_tokens/3:
+%	Parses a list of tokens.
+
+:- pred parser__parse_tokens(string, token_list, read_term).
+:- mode parser__parse_tokens(in, in, out) is det.
+	% parser__parse_tokens(FileName, TokenList, Result):
+
+%-----------------------------------------------------------------------------%
 
 :- implementation.
 :- import_module string, char, int, float, bool, list, std_util, require.
@@ -59,14 +106,30 @@
 
 parser__read_term(FileName, Result) -->
 	lexer__get_token_list(Tokens),
-	( { Tokens = token_nil } ->
-		{ Result = eof }
+	{ parser__parse_tokens(FileName, Tokens, Result) }.
+
+parser__read_term_from_string(FileName, String, EndPos, Result) :-
+	string__length(String, Len),
+	StartPos = posn(1, 0, 0),
+	parser__read_term_from_string(FileName, String, Len, StartPos, EndPos,
+			Result).
+
+parser__read_term_from_string(FileName, String, Len, StartPos, EndPos,
+		Result) :-
+	lexer__string_get_token_list(String, Len, Tokens, StartPos, EndPos),
+	parser__parse_tokens(FileName, Tokens, Result).
+
+%-----------------------------------------------------------------------------%
+
+parser__parse_tokens(FileName, Tokens, Result) :-
+	( Tokens = token_nil ->
+		Result = eof
 	;
-		{ parser__init_state(FileName, Tokens, ParserState0) },
-		{ parser__parse_whole_term(Term, ParserState0, ParserState) },
-		{ parser__final_state(ParserState, VarSet, LeftOverTokens) },
-		{ parser__check_for_errors(Term, VarSet,
-			Tokens, LeftOverTokens, Result) }
+		parser__init_state(FileName, Tokens, ParserState0),
+		parser__parse_whole_term(Term, ParserState0, ParserState),
+		parser__final_state(ParserState, VarSet, LeftOverTokens),
+		parser__check_for_errors(Term, VarSet,
+			Tokens, LeftOverTokens, Result)
 	).
 
 :- pred parser__check_for_errors(parse(term), varset, token_list, token_list,
Index: string.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.m,v
retrieving revision 1.102
diff -u -u -r1.102 string.m
--- string.m	1998/03/18 14:16:12	1.102
+++ string.m	1998/05/12 19:37:41
@@ -190,6 +190,16 @@
 %	Calls error/1 if `Index' is out of range (negative, or greater than or
 %	equal to the length of `String').
 
+:- pred string__unsafe_index(string, int, char).
+:- mode string__unsafe_index(in, in, out) is det.
+%	string__unsafe_index(String, Index, Char):
+%	`Char' is the (`Index' + 1)-th character of `String'.
+%	WARNING: behavior is UNDEFINED if `Index' is out of range
+%	(negative, or greater than or equal to the length of `String').
+%	This version is constant time, whereas string__index_det
+%	may be linear in the length of the string.
+%	Use with care!
+
 :- pred string__foldl(pred(char, T, T), string, T, T).
 :- mode string__foldl(pred(in, in, out) is det, in, in, out) is det.
 :- mode string__foldl(pred(in, di, uo) is det, in, di, uo) is det.
@@ -238,6 +248,20 @@
 %	If `Count' is out of the range [0, length of `String' - `Start'], it is
 %	treated as if it were the nearest end-point of that range.)
 
+:- pred string__unsafe_substring(string, int, int, string).
+:- mode string__unsafe_substring(in, in, in, out) is det.
+%	string__unsafe_substring(String, Start, Count, Substring):
+%	`Substring' is first the `Count' characters in what would
+%	remain of `String' after the first `Start' characters were
+%	removed.
+%	WARNING: if `Start' is out of the range [0, length of `String'],
+%	or if `Count' is out of the range [0, length of `String' - `Start'],
+%	then the behaviour is UNDEFINED.
+%	Use with care!
+%	This version takes time proportional to the length of the
+%	substring, whereas string__substring may take time proportional
+%	to the length of the whole string.
+
 :- pred string__append_list(list(string), string).
 :- mode string__append_list(in, out) is det.
 :- mode string__append_list(out, in) is multidet.
@@ -481,10 +505,6 @@
 	LeftCount is Length - RightCount,
 	string__split(String, LeftCount, _LeftString, RightString).
 
-string__substring(String, Start, Count, Substring) :-
-	string__split(String, Start, _Left, Right),
-	string__left(Right, Count, Substring).
-
 string__remove_suffix(A, B, C) :-
 	string__to_int_list(A, LA),
 	string__to_int_list(B, LB),
@@ -1639,9 +1659,6 @@
 
 /*-----------------------------------------------------------------------*/
 
-:- pred string__unsafe_index(string, int, char).
-:- mode string__unsafe_index(in, in, out) is det.
-
 :- pragma(c_code, string__unsafe_index(Str::in, Index::in, Ch::out), "
 	Ch = Str[Index];
 ").
@@ -1815,6 +1832,52 @@
 :- pragma no_inline(string__append/3).
 
 /*-----------------------------------------------------------------------*/
+
+/*
+:- pred string__substring(string, int, int, string).
+:- mode string__substring(in, in, in, out) is det.
+%	string__substring(String, Start, Count, Substring):
+*/
+
+:- pragma c_code(string__substring(Str::in, Start::in, Count::in,
+		SubString::out),
+		will_not_call_mercury,
+"{
+	Integer len;
+	Word tmp;
+	if (Start < 0) Start = 0;
+	if (Count <= 0) {
+		make_aligned_string(LVALUE_CAST(ConstString, SubString), """");
+	} else {
+		len = strlen(Str);
+		if (Start > len) Start = len;
+		if (Count > len - Start) Count = len - Start;
+		incr_hp_atomic(tmp, (Count + sizeof(Word)) / sizeof(Word));
+		SubString = (char *) tmp;
+		memcpy(SubString, Str + Start, Count);
+		SubString[Count] = '\\0';
+	}
+}").
+
+
+/*
+:- pred string__unsafe_substring(string, int, int, string).
+:- mode string__unsafe_substring(in, in, in, out) is det.
+%	string__unsafe_substring(String, Start, Count, Substring):
+*/
+
+:- pragma c_code(string__unsafe_substring(Str::in, Start::in, Count::in,
+		SubString::out),
+		will_not_call_mercury,
+"{
+	Integer len;
+	Word tmp;
+	incr_hp_atomic(tmp, (Count + sizeof(Word)) / sizeof(Word));
+	SubString = (char *) tmp;
+	memcpy(SubString, Str + Start, Count);
+	SubString[Count] = '\\0';
+}").
+
 
 /*
 :- pred string__split(string, int, string, string).
Index: string.nu.nl
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.nu.nl,v
retrieving revision 1.18
diff -u -u -r1.18 string.nu.nl
--- string.nu.nl	1997/07/27 15:07:21	1.18
+++ string.nu.nl	1998/05/12 19:37:42
@@ -51,6 +51,15 @@
 
 %-----------------------------------------------------------------------------%
 
+string__substring(String, Start, Count, Substring) :-
+	string__split(String, Start, _Left, Right),
+	string__left(Right, Count, Substring).
+
+string__unsafe_substring(String, Start, Count, Substring) :-
+	string__substring(String, Start, Count, Substring).
+
+%-----------------------------------------------------------------------------%
+
 string__split(String, Count, LeftString, RightString) :-
 	(
 		Count =< 0
Index: term.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/term.m,v
retrieving revision 1.81
diff -u -u -r1.81 term.m
--- term.m	1998/04/02 07:40:16	1.81
+++ term.m	1998/05/12 19:37:53
@@ -19,16 +19,21 @@
 
 %-----------------------------------------------------------------------------%
 
-:- type term		--->	term__functor(const, list(term), term__context)
-			;	term__variable(var).
-:- type const		--->	term__atom(string)
-			;	term__integer(int)
-			;	term__string(string)
-			;	term__float(float).
+:- type term	
+	--->	term__functor(const, list(term), term__context)
+	;	term__variable(var).
+:- type const		
+	--->	term__atom(string)
+	;	term__integer(int)
+	;	term__string(string)
+	;	term__float(float).
+
+:- type term__context  
+	--->    term__context(string, int)
+		% file name, line number.
+	;	term__detailed_context(string, int, int, int, int).
+		% file name, starting line & column, ending line & column
 
-:- type term__context   --->    term__context(string, int).
-				% file name, line number.
-
 :- type var.
 :- type var_supply.
 
@@ -746,10 +751,13 @@
 	% Given a term context, return the source line number.
 
 term__context_line(term__context(_, LineNumber), LineNumber).
+term__context_line(term__detailed_context(_FileName, StartLine, _StartCol,
+			_EndLine, _EndCol), StartLine).
 
-	% Given a term context, return the source file.
+	% Given a term context, return the source file name.
 
 term__context_file(term__context(FileName, _), FileName).
+term__context_file(term__detailed_context(FileName, _, _, _, _), FileName).
 
 	% Used to initialize the term context when reading in
 	% (or otherwise constructing) a term.
cvs server: Diffing .
Index: bytecode_gen.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/bytecode_gen.m,v
retrieving revision 1.35
diff -u -u -r1.35 bytecode_gen.m
--- bytecode_gen.m	1998/02/12 01:16:56	1.35
+++ bytecode_gen.m	1998/05/12 19:38:24
@@ -147,7 +147,8 @@
 bytecode_gen__goal(GoalExpr - GoalInfo, ByteInfo0, ByteInfo, Code) :-
 	bytecode_gen__goal_expr(GoalExpr, GoalInfo, ByteInfo0, ByteInfo,
 		GoalCode),
-	goal_info_get_context(GoalInfo, term__context(_, Line)),
+	goal_info_get_context(GoalInfo, Context),
+	term__context_line(Context, Line),
 	Code = tree(node([context(Line)]), GoalCode).
 
 :- pred bytecode_gen__goal_expr(hlds_goal_expr::in, hlds_goal_info::in,
Index: error_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/error_util.m,v
retrieving revision 1.4
diff -u -u -r1.4 error_util.m
--- error_util.m	1998/02/25 03:37:49	1.4
+++ error_util.m	1998/05/12 19:38:26
@@ -65,7 +65,8 @@
 			% :
 			% space
 			% indent
-		Context = term__context(FileName, LineNumber),
+		term__context_file(Context, FileName),
+		term__context_line(Context, LineNumber),
 		string__length(FileName, FileNameLength),
 		string__int_to_string(LineNumber, LineNumberStr),
 		string__length(LineNumberStr, LineNumberStrLength0),
Index: fact_table.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/fact_table.m,v
retrieving revision 1.18
diff -u -u -r1.18 fact_table.m
--- fact_table.m	1998/03/18 08:07:34	1.18
+++ fact_table.m	1998/05/12 19:38:48
@@ -297,7 +297,8 @@
 	;
 		{ Result0 = error(Message, LineNum) },
 		io__input_stream_name(FileName),
-		prog_out__write_context(term__context(FileName, LineNum)),
+		{ term__context_init(FileName, LineNum, Context) },
+		prog_out__write_context(Context),
 		io__write_strings([Message, "\n"]),
 		io__set_exit_status(1),
 		{ NumFacts = NumFacts0 }
Index: prog_io.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/prog_io.m,v
retrieving revision 1.170
diff -u -u -r1.170 prog_io.m
--- prog_io.m	1998/03/03 17:35:43	1.170
+++ prog_io.m	1998/05/12 19:39:04
@@ -350,25 +350,48 @@
 
 read_all_items(ModuleName, Messages, Items, Error) -->
 	%
-	% read all the items (the first one is handled specially)
+	% Slurp up the source file into memory
 	%
 	io__input_stream(Stream),
 	io__input_stream_name(Stream, SourceFileName),
-	read_first_item(ModuleName, SourceFileName,
-		RevMessages, RevItems0, Error0),
+	io__read_file_as_string(Stream, Err, SourceFileContents),
+	(
+		{ Err = ok },
 
-	%
-	% get the end_module declaration (if any),
-	% check that it matches the initial module declaration (if any),
-	% and remove both of them from the final item list.
-	%
-	{ get_end_module(RevItems0, RevItems, EndModule) },
-	{ list__reverse(RevMessages, Messages0) },
-	{ list__reverse(RevItems, Items0) },
-	check_end_module(EndModule,
-			Messages0, Items0, Error0,
-			Messages, Items, Error).
+		%
+		% parse all the items (the first one is handled specially)
+		%
+		read_first_item(ModuleName, SourceFileName, SourceFileContents,
+			0, RevMessages, RevItems0, Error0),
+
+		%
+		% get the end_module declaration (if any),
+		% check that it matches the initial module declaration (if any),
+		% and remove both of them from the final item list.
+		%
+		{ get_end_module(RevItems0, RevItems, EndModule) },
+		{ list__reverse(RevMessages, Messages0) },
+		{ list__reverse(RevItems, Items0) },
+		check_end_module(EndModule,
+				Messages0, Items0, Error0,
+				Messages, Items, Error)
+	;
+	
+		{ Err = error(IOError) },
 
+		io__progname_base("prog_io.m", Progname),
+		{
+		  io__error_message(IOError, ErrorMessage),
+		  string__append_list([Progname,
+		  	": error reading file `", SourceFileName, "': ", 
+			ErrorMessage], Message),
+		  dummy_term(Term),
+		  Messages = [Message - Term],
+		  Error = fatal,
+		  Items = []
+		}
+	).
+
 %
 % We need to jump through a few hoops when reading the first item,
 % to allow the initial `:- module' declaration to be optional.
@@ -382,11 +405,12 @@
 % and then if it turns out to not be a `:- module' declaration
 % we reparse it in the default module scope.  Blecchh.
 %
-:- pred read_first_item(module_name, file_name,
+:- pred read_first_item(module_name, file_name, string, int,
 		message_list, item_list, module_error, io__state, io__state).
-:- mode read_first_item(in, in, out, out, out, di, uo) is det.
+:- mode read_first_item(in, in, in, in, out, out, out, di, uo) is det.
 
-read_first_item(DefaultModuleName, SourceFileName, Messages, Items, Error) -->
+read_first_item(DefaultModuleName, SourceFileName, SourceFileContents,
+		Posn0, Messages, Items, Error) -->
 
 	globals__io_lookup_bool_option(warn_missing_module_name, WarnMissing),
 	globals__io_lookup_bool_option(warn_wrong_module_name, WarnWrong),
@@ -397,7 +421,9 @@
 	% (so that any `:- module' declaration is taken to
 	% be a non-nested module unless explicitly qualified).
 	%
-	parser__read_term(SourceFileName, MaybeFirstTerm),
+    	{ string__length(SourceFileContents, SourceFileLen) },
+	{ parser__read_term_from_string(SourceFileName, SourceFileContents,
+		SourceFileLen, Posn0, Posn1, MaybeFirstTerm) },
 	{ root_module_name(RootModuleName) },
 	{ process_read_term(RootModuleName, MaybeFirstTerm, MaybeFirstItem) },
 	(
@@ -410,7 +436,7 @@
 	    { FirstItem = pragma(source_file(NewSourceFileName)) }
 	->
 	    read_first_item(DefaultModuleName, NewSourceFileName,
-		Messages, Items, Error)
+	    	SourceFileContents, Posn1, Messages, Items, Error)
 	;
 	    %
 	    % check if the first term was a `:- module' decl
@@ -447,6 +473,7 @@
 	    { Items0 = [FixedFirstItem] },
 	    { Error0 = no },
 	    read_items_loop(ModuleName, SourceFileName,
+	    		SourceFileContents, SourceFileLen, Posn1,
 			Messages0, Items0, Error0,
 			Messages, Items, Error)
 	;
@@ -483,6 +510,7 @@
 	    { Items0 = [FixedFirstItem] },
 	    { Error0 = no },
 	    read_items_loop_2(MaybeSecondItem, ModuleName, SourceFileName,
+	    	SourceFileContents, SourceFileLen, Posn1,
 		Messages0, Items0, Error0,
 		Messages, Items, Error)
 	).
@@ -520,33 +548,42 @@
 	% But optimizing for NU-Prolog is no longer a big priority...
 
 :- pred read_items_loop(module_name, file_name,
+			string, int, posn,
 			message_list, item_list, module_error, 
 			message_list, item_list, module_error, 
 			io__state, io__state).
-:- mode read_items_loop(in, in, in, in, in, out, out, out, di, uo) is det.
+:- mode read_items_loop(in, in, in, in, in, 
+			in, in, in, out, out, out, di, uo) is det.
 
-read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
-		Msgs, Items, Error) -->
-	read_item(ModuleName, SourceFileName, MaybeItem),
+read_items_loop(ModuleName, SourceFileName, SourceFileContents, SourceFileLen,
+		Posn0, Msgs1, Items1, Error1, Msgs, Items, Error) -->
+	read_item(ModuleName, SourceFileName, SourceFileContents,
+		SourceFileLen, Posn0, Posn1, MaybeItem),
  	read_items_loop_2(MaybeItem, ModuleName, SourceFileName,
+			SourceFileContents, SourceFileLen, Posn1,
 			Msgs1, Items1, Error1, Msgs, Items, Error).
 
 %-----------------------------------------------------------------------------%
 
 :- pred read_items_loop_2(maybe_item_or_eof, module_name, file_name,
+			string, int, posn,
 			message_list, item_list, module_error,
 			message_list, item_list, module_error,
 			io__state, io__state).
-:- mode read_items_loop_2(in, in, in, in, in, in, out, out, out, di, uo) is det.
+:- mode read_items_loop_2(in, in, in, in, in, in,
+			in, in, in, out, out, out, di, uo) is det.
 
 % do a switch on the type of the next item
 
-read_items_loop_2(eof, _ModuleName, _SourceFileName, Msgs, Items, Error,
-		Msgs, Items, Error) --> []. 
+read_items_loop_2(eof, _ModuleName, _SourceFileName,
+		_SourceFileContents, _SourceFileLen, _Posn,
+		Msgs, Items, Error, Msgs, Items, Error) --> []. 
 	% if the next item was end-of-file, then we're done.
 
 read_items_loop_2(syntax_error(ErrorMsg, LineNumber), ModuleName,
-		SourceFileName, Msgs0, Items0, _Error0, Msgs, Items, Error) -->
+		SourceFileName,
+		SourceFileContents, SourceFileLen, Posn0,
+		Msgs0, Items0, _Error0, Msgs, Items, Error) -->
 	% if the next item was a syntax error, then insert it in
 	% the list of messages and continue looping
 	{
@@ -557,10 +594,12 @@
 	  Items1 = Items0,
 	  Error1 = yes
 	},
-	read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
-		Msgs, Items, Error).
+	read_items_loop(ModuleName, SourceFileName,
+		SourceFileContents, SourceFileLen, Posn0,
+		Msgs1, Items1, Error1, Msgs, Items, Error).
 
 read_items_loop_2(error(M, T), ModuleName, SourceFileName,
+		SourceFileContents, SourceFileLen, Posn0,
 		Msgs0, Items0, _Error0, Msgs, Items, Error) -->
 	% if the next item was a semantic error, then insert it in
 	% the list of messages and continue looping
@@ -569,10 +608,12 @@
 	  Items1 = Items0,
 	  Error1 = yes
 	},
- 	read_items_loop(ModuleName, SourceFileName, Msgs1, Items1, Error1,
-			Msgs, Items, Error).
+	read_items_loop(ModuleName, SourceFileName, SourceFileContents,
+		SourceFileLen, Posn0,
+		Msgs1, Items1, Error1, Msgs, Items, Error).
 
 read_items_loop_2(ok(Item, Context), ModuleName0, SourceFileName0,
+			SourceFileContents, SourceFileLen, Posn0,
 			Msgs0, Items0, Error0, Msgs, Items, Error) -->
 	% if the next item was a valid item, check whether it was
 	% a declaration that affects the current parsing context --
@@ -601,7 +642,9 @@
 		ModuleName = ModuleName0,
 		Items1 = [Item - Context | Items0]
 	},
- 	read_items_loop(ModuleName, SourceFileName, Msgs0, Items1, Error0,
+ 	read_items_loop(ModuleName, SourceFileName,
+			SourceFileContents, SourceFileLen, Posn0,
+			Msgs0, Items1, Error0,
 			Msgs, Items, Error).
 
 %-----------------------------------------------------------------------------%
@@ -614,12 +657,15 @@
 			;	error(string, term)
 			;	ok(item, term__context).
 
-:- pred read_item(module_name, file_name, maybe_item_or_eof,
+:- pred read_item(module_name, file_name, string, int, posn, posn,
+			maybe_item_or_eof,
 			io__state, io__state).
-:- mode read_item(in, in, out, di, uo) is det.
+:- mode read_item(in, in, in, in, in, out, out, di, uo) is det.
 
-read_item(ModuleName, SourceFileName, MaybeItem) -->
-	parser__read_term(SourceFileName, MaybeTerm),
+read_item(ModuleName, SourceFileName, SourceFileContents, SourceFileLen,
+		Posn0, Posn1, MaybeItem) -->
+	{ parser__read_term_from_string(SourceFileName, SourceFileContents,
+		SourceFileLen, Posn0, Posn1, MaybeTerm) },
 	{ process_read_term(ModuleName, MaybeTerm, MaybeItem) }.
 
 :- pred process_read_term(module_name, read_term, maybe_item_or_eof).
Index: term_errors.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/term_errors.m,v
retrieving revision 1.11
diff -u -u -r1.11 term_errors.m
--- term_errors.m	1998/03/03 17:36:10	1.11
+++ term_errors.m	1998/05/12 19:39:12
@@ -544,7 +544,8 @@
 
 term_errors__describe_one_call_site(PPId - Context, Module, Piece) :-
 	term_errors__describe_one_proc_name(PPId, Module, ProcName),
-	Context = term__context(FileName, LineNumber),
+	term__context_file(Context, FileName),
+	term__context_line(Context, LineNumber),
 	string__int_to_string(LineNumber, LineNumberPart),
 	string__append_list([
 		ProcName,
cvs server: Diffing notes

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
Previous message: [m-dev.] for review: avoiding some "mmake realclean"s during bootchecks
Next message: [m-dev.] for review: add code to read terms from strings
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the developers mailing list