[m-rev.] for review: avoid "poison null" security vulnerabilities

Simon Taylor staylr at gmail.com
Mon Mar 12 20:59:07 AEDT 2007


Estimated hours taken: 15
Branches: main

Make all functions which create strings from characters throw an exception
or fail if the list of characters contains a null character.

This removes a potential source of security vulnerabilities where one
part of the program performs checks against the whole of a string passed
in by an attacker (processing the string as a list of characters or using 
`unsafe_index' to look past the null character), but then passes the string
to another part of the program or an operating system call that only sees
up to the first null character.  Even if Mercury stored the length with
the string, allowing the creation of strings containing nulls would be a
bad idea because it would be too easy to pass a string to foreign code
without checking (as in the first example link).

For examples see:
<http://insecure.org/news/P55-07.txt>
<http://www.securiteam.com/securitynews/5WP0B1FKKQ.html>
<http://www.securityfocus.com/archive/1/445788>
<http://www.securityfocus.com/archive/82/368750>
<http://secunia.com/advisories/16420/>

NEWS:
	Document the change.

library/string.m:
	Throw an exception if null characters are found in
	string.from_char_list and string.from_rev_char_list.

	Add string.from_char_list_semidet and string.from_rev_char_list_semidet
	which fail rather throwing an exception.  This doesn't match the
	normal naming convention, but string.from_{,rev_}char_list are widely
	used, so changing their determinism would be a bit too disruptive.

	Don't allocate an unnecessary extra word for each string created by
	from_char_list and from_rev_char_list.
	
	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

library/lexer.m:
	Check for invalid characters when reading Mercury strings and
	quoted names.

	Improve error messages by skipping to the end of any string
	or quoted name containing an error.  Previously we just stopped
	processing at the error leaving an unmatched quote.
	
library/io.m:
	Make io.read_line_as_string and io.read_file_as_string fail
	if the input file contains a null character.

	Fix an XXX: '\0\' is not recognised as a character constant,
	but char.det_from_int can be used to make a null character.

library/char.m:
	Explain the workaround for '\0\' not being accepted as a char
	constant.

	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

compiler/layout.m:
compiler/layout_out.m:
compiler/c_util.m:
compiler/stack_layout.m:
compiler/llds.m:
compiler/mlds.m:
compiler/ll_backend.*.m:
compiler/ml_backend.*.m:
	Don't pass around strings containing null characters (the string
	tables for the debugger).  This doesn't cause any problems now,
	but won't work with the accurate garbage collector.  Use lists
	of strings instead, and add the null characters when writing the
	strings out.

tests/hard_coded/null_char.{m,exp}:
	Change an existing test case to test that creation of a string
	containing a null throws an exception.

tests/hard_coded/null_char.exp2:
	Deleted because alternative output is no longer needed.

tests/invalid/Mmakefile:
tests/invalid/null_char.m:
tests/invalid/null_char.err_exp:
	Test error messages for construction of strings containing null
	characters by the lexer.

tests/invalid/unicode{1,2}.err_exp:
	Update the expected output after the change to the handling of
	invalid quoted names and strings.


Index: NEWS
===================================================================
RCS file: /home/mercury1/repository/mercury/NEWS,v
retrieving revision 1.450
diff -u -u -r1.450 NEWS
--- NEWS	6 Mar 2007 04:22:37 -0000	1.450
+++ NEWS	9 Mar 2007 02:33:06 -0000
@@ -17,6 +17,14 @@
 
 Changes to the Mercury standard library:
 
+* Predicates and functions which create strings from lists of characters
+  now throw an exception if a null character is found.  Unexpected null
+  characters in strings are a potential source of security vulnerabilities.
+
+  Predicates string.from_char_list_semidet/2 and
+  string.from_rev_char_list_semidet/2 have been added.  These fail rather
+  than throwing an exception if a null character is found.
+
 * string.float_to_string now trims redundant trailing zeroes (although
   at least one fractional digit is always present).  This change affects the
   output from the debugger and io.print etc.
Index: compiler/c_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/c_util.m,v
retrieving revision 1.34
diff -u -u -r1.34 c_util.m
--- compiler/c_util.m	27 Sep 2006 06:16:48 -0000	1.34
+++ compiler/c_util.m	9 Mar 2007 08:31:12 -0000
@@ -22,9 +22,9 @@
 
 :- import_module backend_libs.builtin_ops.
 
-
 :- import_module char.
 :- import_module io.
+:- import_module list.
 %-----------------------------------------------------------------------------%
 %
 % Line numbering.
@@ -53,13 +53,12 @@
     %
 :- pred output_quoted_string(string::in, io::di, io::uo) is det.
 
-    % Output_quoted_multi_string is like output_quoted_string except that
-    % the string may contain embedded NUL characters (i.e. '\0').
-    % The int specifies the length of the string.
+    % output_quoted_multi_string is like list.foldl(output_quoted_string)
+    % except that a null character will be written between each string
+    % in the list.
     %
-:- type multi_string == string.
-:- pred output_quoted_multi_string(int::in, multi_string::in,
-    io::di, io::uo) is det.
+:- type multi_string == list(string).
+:- pred output_quoted_multi_string(multi_string::in, io::di, io::uo) is det.
 
     % Print out a char suitably escaped for use as a C char literal.
     % This doesn't actually print out the enclosing single quotes --
@@ -207,16 +206,19 @@
 %
 % String and character handling.
 
-output_quoted_string(S0, !IO) :-
-    output_quoted_multi_string(string.length(S0), S0, !IO).
+output_quoted_string(S, !IO) :-
+    output_quoted_string(0, length(S), S, !IO).
 
-output_quoted_multi_string(Len, S, !IO) :-
-    output_quoted_multi_string_2(0, Len, S, !IO).
+output_quoted_multi_string([], !IO).
+output_quoted_multi_string([S | Ss], !IO) :-
+    output_quoted_string(S, !IO),
+    output_quoted_char(char.det_from_int(0), !IO),
+    output_quoted_multi_string(Ss, !IO).
 
-:- pred output_quoted_multi_string_2(int::in, int::in, string::in,
+:- pred output_quoted_string(int::in, int::in, string::in,
     io::di, io::uo) is det.
 
-output_quoted_multi_string_2(Cur, Len, S, !IO) :-
+output_quoted_string(Cur, Len, S, !IO) :-
     ( Cur < Len ->
         % Avoid a limitation in the MSVC compiler where string literals
         % can be no longer then 2048 chars. However if you output the string
@@ -231,8 +233,6 @@
             true
         ),
 
-        % We must use unsafe index, because we want to be able to access chars
-        % beyond the first NUL.
         string.unsafe_index(S, Cur, Char),
         output_quoted_char(Char, !IO),
         
@@ -256,7 +256,7 @@
             true
         ),
 
-        output_quoted_multi_string_2(Cur + 1, Len, S, !IO)
+        output_quoted_string(Cur + 1, Len, S, !IO)
     ;
         true
     ).
Index: compiler/dupproc.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/dupproc.m,v
retrieving revision 1.17
diff -u -u -r1.17 dupproc.m
--- compiler/dupproc.m	6 Jan 2007 09:23:30 -0000	1.17
+++ compiler/dupproc.m	8 Mar 2007 08:54:15 -0000
@@ -448,7 +448,7 @@
         Const = llconst_string(_),
         StdConst = Const
     ;
-        Const = llconst_multi_string(_, _),
+        Const = llconst_multi_string(_),
         StdConst = Const
     ;
         Const = llconst_code_addr(CodeAddr),
Index: compiler/exprn_aux.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/exprn_aux.m,v
retrieving revision 1.80
diff -u -u -r1.80 exprn_aux.m
--- compiler/exprn_aux.m	6 Jan 2007 09:23:31 -0000	1.80
+++ compiler/exprn_aux.m	8 Mar 2007 08:54:30 -0000
@@ -162,7 +162,7 @@
         IsConst = StaticGroundTerms
     ).
 const_is_constant(llconst_string(_), _, yes).
-const_is_constant(llconst_multi_string(_, _), _, yes).
+const_is_constant(llconst_multi_string(_), _, yes).
 const_is_constant(llconst_code_addr(CodeAddr), ExprnOpts, IsConst) :-
     addr_is_constant(CodeAddr, ExprnOpts, IsConst).
 const_is_constant(llconst_data_addr(_, _), _, yes).
Index: compiler/global_data.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/global_data.m,v
retrieving revision 1.30
diff -u -u -r1.30 global_data.m
--- compiler/global_data.m	6 Jan 2007 09:23:32 -0000	1.30
+++ compiler/global_data.m	8 Mar 2007 08:54:49 -0000
@@ -1076,7 +1076,7 @@
         ; Const0 = llconst_int(_)
         ; Const0 = llconst_float(_)
         ; Const0 = llconst_string(_)
-        ; Const0 = llconst_multi_string(_, _)
+        ; Const0 = llconst_multi_string(_)
         ; Const0 = llconst_code_addr(_)
         ),
         Const = Const0
Index: compiler/jumpopt.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/jumpopt.m,v
retrieving revision 1.102
diff -u -u -r1.102 jumpopt.m
--- compiler/jumpopt.m	15 Jan 2007 02:50:54 -0000	1.102
+++ compiler/jumpopt.m	8 Mar 2007 08:55:10 -0000
@@ -1024,8 +1024,8 @@
 jumpopt.short_labels_const(_, llconst_int(I), llconst_int(I)).
 jumpopt.short_labels_const(_, llconst_float(F), llconst_float(F)).
 jumpopt.short_labels_const(_, llconst_string(S), llconst_string(S)).
-jumpopt.short_labels_const(_, llconst_multi_string(L, S),
-        llconst_multi_string(L, S)).
+jumpopt.short_labels_const(_, llconst_multi_string(S),
+        llconst_multi_string(S)).
 jumpopt.short_labels_const(Instrmap, llconst_code_addr(CodeAddr0),
         llconst_code_addr(CodeAddr)) :-
     ( CodeAddr0 = code_label(Label0) ->
Index: compiler/layout.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/layout.m,v
retrieving revision 1.34
diff -u -u -r1.34 layout.m
--- compiler/layout.m	22 Dec 2006 05:37:49 -0000	1.34
+++ compiler/layout.m	8 Mar 2007 08:33:04 -0000
@@ -50,8 +50,10 @@
 %-----------------------------------------------------------------------------%
 
     % This type is for strings which may contain embedded null characters.
+    % When a string_with_0s is written, a null character will be written
+    % in between each string in the list.
 :- type string_with_0s
-    --->    string_with_0s(string).
+    --->    string_with_0s(list(string)).
 
 :- type event_set_layout_data
     --->    event_set_layout_data(
Index: compiler/layout_out.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/layout_out.m,v
retrieving revision 1.88
diff -u -u -r1.88 layout_out.m
--- compiler/layout_out.m	19 Jan 2007 07:04:17 -0000	1.88
+++ compiler/layout_out.m	9 Mar 2007 03:23:52 -0000
@@ -116,6 +116,7 @@
 :- import_module parse_tree.prog_foreign.
 
 :- import_module assoc_list.
+:- import_module char.
 :- import_module int.
 :- import_module list.
 :- import_module map.
@@ -1728,71 +1729,62 @@
     io.write_string("\n", !IO),
     output_layout_name_storage_type_name(LayoutName, yes, !IO),
     io.write_string(" = {", !IO),
-    string.length(EventSetDesc, EventSetDescSize),
-    output_module_string_table_chars_driver(0, EventSetDescSize,
-        string_with_0s(EventSetDesc), !IO),
+    output_module_string_table_strings(EventSetDesc, [], !IO),
     io.write_string("};\n", !IO),
     decl_set_insert(decl_data_addr(layout_addr(LayoutName)), !DeclSet).
 
-    % The string table cannot be zero size; it must contain at least an
-    % empty string.
-    %
 :- pred output_module_string_table(module_name::in,
     int::in, string_with_0s::in, decl_set::in, decl_set::out,
     io::di, io::uo) is det.
 
-output_module_string_table(ModuleName, StringTableSize, StringTable,
-        !DeclSet, !IO) :-
+output_module_string_table(ModuleName, _StringTableSize,
+        string_with_0s(StringTable0), !DeclSet, !IO) :-
     TableName = module_layout_string_table(ModuleName),
     io.write_string("\n", !IO),
     output_layout_name_storage_type_name(TableName, yes, !IO),
     io.write_string(" = {", !IO),
-    output_module_string_table_chars_driver(0, StringTableSize - 1,
-        StringTable, !IO),
+
+    %
+    % The string table cannot be zero size; it must contain at least an
+    % empty string.
+    %
+    ( StringTable0 = [], FirstString = "", Rest = []
+    ; StringTable0 = [FirstString | Rest]
+    ),
+    output_module_string_table_strings(FirstString, Rest, !IO),
     io.write_string("};\n", !IO),
     decl_set_insert(decl_data_addr(layout_addr(TableName)), !DeclSet).
 
-    % The job of this predicate is to minimize stack space consumption in
-    % grades that do not allow output_module_string_table_chars to be tail
-    % recursive. The maximum observed size of the module string so far has
-    % been just short of 64 kilobytes; writing that out in 256 batches of
-    % 256 characters minimizes maximum total stack requirements.
-    %
-:- pred output_module_string_table_chars_driver(int::in, int::in,
-    string_with_0s::in, io::di, io::uo) is det.
+:- pred output_module_string_table_strings(string::in, list(string)::in,
+    io::di, io::uo) is det.
 
-output_module_string_table_chars_driver(CurIndex, MaxIndex, StringWithNulls,
-        !IO) :-
-    ( CurIndex < MaxIndex ->
-        SubMaxIndex = int.min(MaxIndex, CurIndex + 255),
-        output_module_string_table_chars(CurIndex, SubMaxIndex,
-            StringWithNulls, !IO),
-        output_module_string_table_chars_driver(SubMaxIndex + 1, MaxIndex,
-            StringWithNulls, !IO)
-    ;
-        true
-    ).
+output_module_string_table_strings(String, [], !IO) :-
+    output_module_string_table_chars(0, length(String) - 1, String, !IO).
+output_module_string_table_strings(String, [Next | Rest], !IO) :-
+    output_module_string_table_chars(0, length(String) - 1, String, !IO),
+    io.write_string(",\n", !IO),
+    output_module_string_table_strings(Next, Rest, !IO).
 
-:- pred output_module_string_table_chars(int::in, int::in, string_with_0s::in,
+:- pred output_module_string_table_chars(int::in, int::in, string::in,
     io::di, io::uo) is det.
 
-output_module_string_table_chars(CurIndex, MaxIndex, StringWithNulls, !IO) :-
-    ( CurIndex mod 10 = 0 ->
-        io.write_string("\n", !IO)
-    ;
-        true
-    ),
-    StringWithNulls = string_with_0s(String),
-    string.unsafe_index(String, CurIndex, Char),
-    io.write_char('''', !IO),
-    c_util.output_quoted_char(Char, !IO),
-    io.write_char('''', !IO),
-    io.write_string(", ", !IO),
-    ( CurIndex < MaxIndex ->
-        output_module_string_table_chars(CurIndex + 1, MaxIndex,
-            StringWithNulls, !IO)
+output_module_string_table_chars(CurIndex, MaxIndex, String, !IO) :-
+    ( CurIndex =< MaxIndex ->
+        ( CurIndex mod 10 = 0 ->
+            io.nl(!IO)
+        ;
+            true
+        ),
+        string.unsafe_index(String, CurIndex, Char),
+        io.write_char('''', !IO),
+        c_util.output_quoted_char(Char, !IO),
+        io.write_char('''', !IO),
+        io.write_string(", ", !IO),
+        output_module_string_table_chars(CurIndex + 1, MaxIndex, String, !IO)
     ;
-        true
+        io.write_char('''', !IO),
+        c_util.output_quoted_char(char.det_from_int(0), !IO),
+        io.write_char('''', !IO)
     ).
 
 %-----------------------------------------------------------------------------%
Index: compiler/llds.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds.m,v
retrieving revision 1.346
diff -u -u -r1.346 llds.m
--- compiler/llds.m	15 Jan 2007 02:23:47 -0000	1.346
+++ compiler/llds.m	8 Mar 2007 09:00:08 -0000
@@ -911,9 +911,10 @@
     ;       llconst_int(int)
     ;       llconst_float(float)
     ;       llconst_string(string)
-    ;       llconst_multi_string(int, string)
-            % A string containing embedded NULLs, whose real length is given
-            % by the integer, and not the location of the first NULL.
+    ;       llconst_multi_string(list(string))
+            % A string containing an embedded NULL between each substring
+            % in the list.
+
     ;       llconst_code_addr(code_addr)
     ;       llconst_data_addr(data_addr, maybe(int)).
             % If the second arg is yes(Offset), then increment the address
@@ -1218,7 +1219,7 @@
 const_type(llconst_int(_), integer).
 const_type(llconst_float(_), float).
 const_type(llconst_string(_), string).
-const_type(llconst_multi_string(_, _), string).
+const_type(llconst_multi_string(_), string).
 const_type(llconst_code_addr(_), code_ptr).
 const_type(llconst_data_addr(_, _), data_ptr).
 
Index: compiler/llds_out.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds_out.m,v
retrieving revision 1.304
diff -u -u -r1.304 llds_out.m
--- compiler/llds_out.m	19 Jan 2007 07:04:17 -0000	1.304
+++ compiler/llds_out.m	9 Mar 2007 02:26:46 -0000
@@ -4955,10 +4955,13 @@
     io.write_string(""", ", !IO),
     io.write_int(StringLength, !IO),
     io.write_string(")", !IO).
-output_rval_const(llconst_multi_string(Length, String), !IO) :-
+output_rval_const(llconst_multi_string(String), !IO) :-
     io.write_string("MR_string_const(""", !IO),
-    c_util.output_quoted_multi_string(Length, String, !IO),
+    c_util.output_quoted_multi_string(String, !IO),
     io.write_string(""", ", !IO),
+
+    % The "+1" is for the NULL character.
+    Length = list.foldl((func(S, L0) = L0 + length(S) + 1), String, 0),
     io.write_int(Length, !IO),
     io.write_string(")", !IO).
 output_rval_const(llconst_code_addr(CodeAddress), !IO) :-
Index: compiler/llds_to_x86_64.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/llds_to_x86_64.m,v
retrieving revision 1.2
diff -u -u -r1.2 llds_to_x86_64.m
--- compiler/llds_to_x86_64.m	27 Feb 2007 20:36:28 -0000	1.2
+++ compiler/llds_to_x86_64.m	8 Mar 2007 09:01:11 -0000
@@ -643,7 +643,7 @@
     Op = yes(operand_label("<<llconst_float>>")).
 transform_rval(RegMap, const(llconst_string(String)), RegMap, no, yes(Op)) :-
     Op = [x86_64_directive(string([String]))].
-transform_rval(RegMap, const(llconst_multi_string(_, _)), RegMap, Op, no) :-
+transform_rval(RegMap, const(llconst_multi_string(_)), RegMap, Op, no) :-
     Op = yes(operand_label("<<llconst_multi_string>>")).
 transform_rval(RegMap, const(llconst_code_addr(CodeAddr)), RegMap, Op, no) :-
     code_addr_type(CodeAddr, CodeAddrType),
Index: compiler/mlds.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds.m,v
retrieving revision 1.147
diff -u -u -r1.147 mlds.m
--- compiler/mlds.m	4 Mar 2007 23:38:02 -0000	1.147
+++ compiler/mlds.m	8 Mar 2007 08:37:05 -0000
@@ -1572,10 +1572,9 @@
     ;       mlconst_int(int)
     ;       mlconst_float(float)
     ;       mlconst_string(string)
-    ;       mlconst_multi_string(int, string)
-            % A multi_string_const is a string containing embedded NULs,
-            % whose real length is given by the integer, and not the location
-            % of the first null character.
+    ;       mlconst_multi_string(list(string))
+            % A multi_string_const is a string containing embedded NULs
+            % between each substring in the list.
 
     ;       mlconst_code_addr(mlds_code_addr)
     ;       mlconst_data_addr(mlds_data_addr)
Index: compiler/mlds_to_c.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_c.m,v
retrieving revision 1.212
diff -u -u -r1.212 mlds_to_c.m
--- compiler/mlds_to_c.m	4 Mar 2007 23:38:03 -0000	1.212
+++ compiler/mlds_to_c.m	8 Mar 2007 09:05:00 -0000
@@ -3747,9 +3747,9 @@
     io.write_string("""", !IO),
     c_util.output_quoted_string(String, !IO),
     io.write_string("""", !IO).
-mlds_output_rval_const(mlconst_multi_string(Length, String), !IO) :-
+mlds_output_rval_const(mlconst_multi_string(String), !IO) :-
     io.write_string("""", !IO),
-    c_util.output_quoted_multi_string(Length, String, !IO),
+    c_util.output_quoted_multi_string(String, !IO),
     io.write_string("""", !IO).
 mlds_output_rval_const(mlconst_code_addr(CodeAddr), !IO) :-
     mlds_output_code_addr(CodeAddr, !IO).
Index: compiler/mlds_to_gcc.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_gcc.m,v
retrieving revision 1.129
diff -u -u -r1.129 mlds_to_gcc.m
--- compiler/mlds_to_gcc.m	13 Feb 2007 01:58:51 -0000	1.129
+++ compiler/mlds_to_gcc.m	9 Mar 2007 02:29:04 -0000
@@ -3664,8 +3664,10 @@
 	gcc__build_float(FloatVal, Expr).
 build_rval_const(mlconst_string(String), _, Expr) -->
 	gcc__build_string(String, Expr).
-build_rval_const(mlconst_multi_string(Length, String), _, Expr) -->
-	gcc__build_string(Length, String, Expr).
+build_rval_const(mlconst_multi_string(_Strings), _, _Expr) -->
+	% multi-strings are only used for the debugger.
+	{ sorry(this_file,
+		"debugging not yet supported with `--target asm'") }.
 build_rval_const(mlconst_code_addr(CodeAddr), GlobalInfo, Expr) -->
 	build_code_addr(CodeAddr, GlobalInfo, Expr).
 build_rval_const(mlconst_data_addr(DataAddr), _, Expr) -->
Index: compiler/mlds_to_il.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_il.m,v
retrieving revision 1.178
diff -u -u -r1.178 mlds_to_il.m
--- compiler/mlds_to_il.m	19 Jan 2007 07:04:21 -0000	1.178
+++ compiler/mlds_to_il.m	8 Mar 2007 09:04:29 -0000
@@ -556,7 +556,7 @@
 rename_const(mlconst_int(I)) = mlconst_int(I).
 rename_const(mlconst_float(F)) = mlconst_float(F).
 rename_const(mlconst_string(S)) = mlconst_string(S).
-rename_const(mlconst_multi_string(I, S)) = mlconst_multi_string(I, S).
+rename_const(mlconst_multi_string(S)) = mlconst_multi_string(S).
 rename_const(mlconst_code_addr(C)) = mlconst_code_addr(rename_code_addr(C)).
 rename_const(mlconst_data_addr(A)) = mlconst_data_addr(rename_data_addr(A)).
 rename_const(mlconst_null(T)) = mlconst_null(T).
@@ -2362,7 +2362,7 @@
         Const = mlconst_float(Float),
         Instrs = instr_node(ldc(float64, f(Float)))
     ;
-        Const = mlconst_multi_string(_Length, _MultiString),
+        Const = mlconst_multi_string(_MultiString),
         Instrs = throw_unimplemented("load multi_string_const")
     ;
         Const = mlconst_code_addr(CodeAddr),
@@ -3651,7 +3651,7 @@
 rval_const_to_type(mlconst_string(_))
         = mercury_type(StrType, type_cat_string, non_foreign_type(StrType)) :-
     StrType = builtin_type(builtin_type_string).
-rval_const_to_type(mlconst_multi_string(_, _))
+rval_const_to_type(mlconst_multi_string(_))
         = mercury_type(StrType, type_cat_string, non_foreign_type(StrType)) :-
     StrType = builtin_type(builtin_type_string).
 rval_const_to_type(mlconst_null(MldsType)) = MldsType.
Index: compiler/mlds_to_java.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_java.m,v
retrieving revision 1.90
diff -u -u -r1.90 mlds_to_java.m
--- compiler/mlds_to_java.m	19 Jan 2007 07:04:22 -0000	1.90
+++ compiler/mlds_to_java.m	8 Mar 2007 08:58:02 -0000
@@ -3285,9 +3285,9 @@
     c_util.output_quoted_string(String, !IO),
     io.write_string("""", !IO).
 
-output_rval_const(mlconst_multi_string(Length, String), !IO) :-
+output_rval_const(mlconst_multi_string(String), !IO) :-
     io.write_string("""", !IO),
-    c_util.output_quoted_multi_string(Length, String, !IO),
+    c_util.output_quoted_multi_string(String, !IO),
     io.write_string("""", !IO).
 
 output_rval_const(mlconst_code_addr(CodeAddr), !IO) :-
Index: compiler/mlds_to_managed.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/mlds_to_managed.m,v
retrieving revision 1.39
diff -u -u -r1.39 mlds_to_managed.m
--- compiler/mlds_to_managed.m	23 Dec 2006 12:49:25 -0000	1.39
+++ compiler/mlds_to_managed.m	8 Mar 2007 09:01:34 -0000
@@ -618,9 +618,9 @@
     io.write_string("""", !IO),
     c_util.output_quoted_string(S, !IO),
     io.write_string("""", !IO).
-write_rval_const(_Lang, mlconst_multi_string(L, S), !IO) :-
+write_rval_const(_Lang, mlconst_multi_string(S), !IO) :-
     io.write_string("""", !IO),
-    c_util.output_quoted_multi_string(L, S, !IO),
+    c_util.output_quoted_multi_string(S, !IO),
     io.write_string("""", !IO).
 write_rval_const(Lang, mlconst_code_addr(CodeAddrConst), !IO) :-
     (
Index: compiler/opt_debug.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/opt_debug.m,v
retrieving revision 1.191
diff -u -u -r1.191 opt_debug.m
--- compiler/opt_debug.m	15 Jan 2007 02:23:47 -0000	1.191
+++ compiler/opt_debug.m	8 Mar 2007 08:59:12 -0000
@@ -311,8 +311,8 @@
     float_to_string(F).
 dump_const(_, llconst_string(S)) =
     """" ++ S ++ """".
-dump_const(_, llconst_multi_string(L, _S)) =
-    "multi_string(" ++ int_to_string(L) ++ ")".
+dump_const(_, llconst_multi_string(_S)) =
+    "multi_string(...)".
 dump_const(MaybeProcLabel, llconst_code_addr(CodeAddr)) =
     "code_addr_const(" ++ dump_code_addr(MaybeProcLabel, CodeAddr) ++ ")".
 dump_const(_, llconst_data_addr(DataAddr, MaybeOffset)) = Str :-
Index: compiler/opt_util.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/opt_util.m,v
retrieving revision 1.162
diff -u -u -r1.162 opt_util.m
--- compiler/opt_util.m	23 Jan 2007 07:00:37 -0000	1.162
+++ compiler/opt_util.m	8 Mar 2007 08:57:00 -0000
@@ -2205,8 +2205,7 @@
 replace_labels_rval_const(llconst_int(N), _, llconst_int(N)).
 replace_labels_rval_const(llconst_float(N), _, llconst_float(N)).
 replace_labels_rval_const(llconst_string(S), _, llconst_string(S)).
-replace_labels_rval_const(llconst_multi_string(L, S), _,
-        llconst_multi_string(L, S)).
+replace_labels_rval_const(llconst_multi_string(S), _, llconst_multi_string(S)).
 replace_labels_rval_const(llconst_code_addr(Addr0), ReplMap,
         llconst_code_addr(Addr)) :-
     replace_labels_code_addr(Addr0, ReplMap, Addr).
Index: compiler/stack_layout.m
===================================================================
RCS file: /home/mercury1/repository/mercury/compiler/stack_layout.m,v
retrieving revision 1.133
diff -u -u -r1.133 stack_layout.m
--- compiler/stack_layout.m	8 Jan 2007 09:15:19 -0000	1.133
+++ compiler/stack_layout.m	8 Mar 2007 09:29:31 -0000
@@ -151,7 +151,7 @@
     StaticCellInfo1 = LayoutInfo ^ static_cell_info,
     StringTable = string_table(_, RevStringList, StringOffset),
     list.reverse(RevStringList, StringList),
-    concat_string_list(StringList, StringOffset, ConcatStrings),
+    ConcatStrings = string_with_0s(StringList),
 
     list.condense([TableIoDecls, ProcLayouts, InternalLayouts], Layouts0),
     (
@@ -235,76 +235,6 @@
 
 %---------------------------------------------------------------------------%
 
-    % concat_string_list appends a list of strings together,
-    % appending a null character after each string.
-    % The resulting string will contain embedded null characters,
-    %
-:- pred concat_string_list(list(string)::in, int::in,
-    string_with_0s::out) is det.
-
-concat_string_list(Strings, Len, string_with_0s(Result)) :-
-    concat_string_list_2(Strings, Len, Result).
-
-:- pred concat_string_list_2(list(string)::in, int::in, string::out) is det.
-
-:- pragma foreign_decl("C", "
-    #include ""mercury_tags.h"" /* for MR_list_*() */
-    #include ""mercury_heap.h"" /* for MR_offset_incr_hp_atomic*() */
-    #include ""mercury_misc.h"" /* for MR_fatal_error() */
-").
-
-:- pragma foreign_proc("C",
-    concat_string_list_2(StringList::in, ArenaSize::in, Arena::out),
-    [will_not_call_mercury, promise_pure, thread_safe],
-"{
-    MR_Word     cur_node;
-    MR_Integer  cur_offset;
-    MR_Word     tmp;
-
-    MR_offset_incr_hp_atomic(tmp, 0,
-        (ArenaSize + sizeof(MR_Word)) / sizeof(MR_Word));
-    Arena = (char *) tmp;
-
-    cur_offset = 0;
-    cur_node = StringList;
-
-    while (! MR_list_is_empty(cur_node)) {
-        (void) strcpy(&Arena[cur_offset], (char *) MR_list_head(cur_node));
-        cur_offset += strlen((char *) MR_list_head(cur_node)) + 1;
-        cur_node = MR_list_tail(cur_node);
-    }
-
-    if (cur_offset != ArenaSize) {
-        char    msg[256];
-
-        sprintf(msg, ""internal error in creating string table;\\n""
-            ""cur_offset = %ld, ArenaSize = %ld\\n"",
-            (long) cur_offset, (long) ArenaSize);
-        MR_fatal_error(msg);
-    }
-}").
-
-% This version is only used if there is no matching foreign_proc version.
-% Note that this version only works if the Mercury implementation's
-% string representation allows strings to contain embedded null characters.
-% So we check that.
-concat_string_list_2(StringsList, _Len, StringWithNulls) :-
-    (
-        char.to_int(NullChar, 0),
-        NullCharString = string.char_to_string(NullChar),
-        string.length(NullCharString, 1)
-    ->
-        StringsWithNullsList = list.map(func(S) = S ++ NullCharString,
-            StringsList),
-        StringWithNulls = string.append_list(StringsWithNullsList)
-    ;
-        % the Mercury implementation's string representation
-        % doesn't support strings containing null characters
-        private_builtin.sorry("stack_layout.concat_string_list")
-    ).
-
-%---------------------------------------------------------------------------%
-
 :- pred format_label_tables(map(string, label_table)::in,
     list(file_layout_data)::out) is det.
 
Index: library/char.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/char.m,v
retrieving revision 1.56
diff -u -u -r1.56 char.m
--- library/char.m	13 Feb 2007 01:58:52 -0000	1.56
+++ library/char.m	9 Mar 2007 08:36:00 -0000
@@ -43,6 +43,11 @@
     % represent characters using Unicode, but store files in an 8-bit national
     % character set.
     %
+    % Note that '\0' is not accepted as a Mercury character constant.
+    % Instead, a null character can be created using `char.det_from_int(0)'.
+    % Null characters aren't very useful in Mercury because they aren't
+    % allowed in strings.
+    %
 :- func char.to_int(char) = int.
 :- pred char.to_int(char, int).
 :- mode char.to_int(in, out) is det.
@@ -71,18 +76,21 @@
 :- pred char.min_char_value(int::out) is det.
 
     % Convert a character to uppercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func char.to_upper(char) = char.
 :- pred char.to_upper(char::in, char::out) is det.
 
     % Convert a character to lowercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func char.to_lower(char) = char.
 :- pred char.to_lower(char::in, char::out) is det.
 
     % char.lower_upper(Lower, Upper) is true iff
     % Lower is a lower-case letter and Upper is the corresponding
-    % upper-case letter.
+    % upper-case letter, and both Lower and Upper are unaccented
+    % Latin letters.
     %
 :- pred char.lower_upper(char, char).
 :- mode char.lower_upper(in, out) is semidet.
Index: library/io.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/io.m,v
retrieving revision 1.375
diff -u -u -r1.375 io.m
--- library/io.m	3 Mar 2007 03:43:33 -0000	1.375
+++ library/io.m	11 Mar 2007 23:03:07 -0000
@@ -162,6 +162,9 @@
     % eof or error. Returns the result as a string rather than
     % as a list of char.
     %
+    % Returns an error if the file contains a null character, because
+    % null characters are not allowed in Mercury strings.
+    %
 :- pred io.read_file_as_string(io.maybe_partial_res(string)::out,
     io::di, io::uo) is det.
 
@@ -248,6 +251,9 @@
     % Reads all the characters from the given input stream until eof or error.
     % Returns the result as a string rather than as a list of char.
     %
+    % Returns an error if the file contains a null character, because
+    % null characters are not allowed in Mercury strings.
+    %
 :- pred io.read_file_as_string(io.input_stream::in,
     io.maybe_partial_res(string)::out, io::di, io::uo) is det.
 
@@ -2020,6 +2026,8 @@
     ( Res < 0 ->
         ( Res = -1 ->
             Result = eof
+        ; Res = -2 ->
+            Result = error(io_error("null character in input"))
         ;
             io.make_err_msg("read failed: ", Msg, !IO),
             Result = error(io_error(Msg))
@@ -2056,8 +2064,12 @@
             }
             break;
         }
+        if (char_code == 0) {
+            Res = -2; 
+            break;
+        }
         if (char_code != (MR_UnsignedChar) char_code) {
-            Res = -2;
+            Res = -3;
             break;
         }
         read_buffer[i++] = char_code;
@@ -2105,6 +2117,9 @@
         ( Char = '\n' ->
             Res = 0,
             String = "\n"
+        ; char.to_int(Char, 0) ->
+            Res = -2,
+            String = ""
         ;
             io.read_line_as_string_2(Stream, no, Res, String0, !IO),
             string.first_char(String, Char, String0)
@@ -2123,7 +2138,7 @@
     ;
         Result = error(_),
         String = "",
-        Res = -2
+        Res = -3
     ).
 
 io.read_file(Result, !IO) :-
@@ -2174,14 +2189,17 @@
     io.read_file_as_string_2(Stream, Buffer0, Buffer, Pos0, Pos,
         BufferSize0, BufferSize, !IO),
     require(Pos < BufferSize, "io.read_file_as_string: overflow"),
-    io.buffer_to_string(Buffer, Pos, String),
-    io.input_check_err(Stream, Result0, !IO),
-    (
-        Result0 = ok,
-        Result = ok(String)
-    ;
-        Result0 = error(Error),
-        Result = error(String, Error)
+    ( io.buffer_to_string(Buffer, Pos, String) ->
+        io.input_check_err(Stream, Result0, !IO),
+        (
+            Result0 = ok,
+            Result = ok(String)
+        ;
+            Result0 = error(Error),
+            Result = error(String, Error)
+        )
+    ;   
+        Result = error("", io_error("null character in input"))
     ).
 
 :- pred io.read_file_as_string_2(io.input_stream::in, buffer::buffer_di,
@@ -3471,8 +3489,8 @@
 }").
 
 io.alloc_buffer(Size, buffer(Array)) :-
-    % XXX '0' is used as Mercury doesn't recognise '\0' as a char constant.
-    array.init(Size, '0', Array).
+    char.det_from_int(0, NullChar),
+    array.init(Size, NullChar, Array).
 
 :- pred io.resize_buffer(int::in, int::in,
     buffer::buffer_di, buffer::buffer_uo) is det.
@@ -3515,21 +3533,30 @@
 }").
 
 io.resize_buffer(_OldSize, NewSize, buffer(Array0), buffer(Array)) :-
-    % XXX '0' is used as Mercury doesn't recognise '\0' as a char constant.
-    array.resize(Array0, NewSize, '0', Array).
+    char.det_from_int(0, Char),
+    array.resize(Array0, NewSize, Char, Array).
 
-:- pred io.buffer_to_string(buffer::buffer_di, int::in, string::uo) is det.
+:- pred io.buffer_to_string(buffer::buffer_di, int::in, string::uo) is semidet.
 
 :- pragma foreign_proc("C",
     io.buffer_to_string(Buffer::buffer_di, Len::in, Str::uo),
     [will_not_call_mercury, promise_pure, tabled_for_io, thread_safe,
         does_not_affect_liveness],
 "{
+    int i;
+
     Str = Buffer;
     Str[Len] = '\\0';
+
+    /* Check that the string doesn't contain null characters. */
+    if (strlen(Str) != Len) {
+        SUCCESS_INDICATOR= MR_FALSE;
+    } else {
+        SUCCESS_INDICATOR = MR_TRUE;
+    }
 }").
 
-io.buffer_to_string(buffer(Array), Len, from_char_list(List)) :-
+io.buffer_to_string(buffer(Array), Len, from_char_list_semidet(List)) :-
     array.fetch_items(Array, min(Array), min(Array) + Len - 1, List).
 
 :- pred io.read_into_buffer(stream::in, buffer::buffer_di, buffer::buffer_uo,
Index: library/lexer.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/lexer.m,v
retrieving revision 1.52
diff -u -u -r1.52 lexer.m
--- library/lexer.m	15 Jan 2007 02:24:00 -0000	1.52
+++ library/lexer.m	11 Mar 2007 22:30:05 -0000
@@ -348,7 +348,7 @@
             skip_to_eol(Token, Context, !IO)
         ; ( Char = '"' ; Char = '''' ) ->
             get_context(Context, !IO),
-            get_quoted_name(Char, [], Token, !IO)
+            start_quoted_name(Char, [], Token, !IO)
         ; Char = ('/') ->
             get_slash(Token, Context, !IO)
         ; Char = ('#') ->
@@ -393,7 +393,7 @@
         ; Char = ('%') ->
             string_skip_to_eol(String, Len, Token, Context, !Posn)
         ; ( Char = '"' ; Char = '''' ) ->
-            string_get_quoted_name(String, Len, Char, [], Posn0, Token,
+            string_start_quoted_name(String, Len, Char, [], Posn0, Token,
                 Context, !Posn)
         ; Char = ('/') ->
             string_get_slash(String, Len, Posn0, Token, Context, !Posn)
@@ -458,7 +458,7 @@
             skip_to_eol(Token, Context, !IO)
         ; ( Char = '"' ; Char = '''' ) ->
             get_context(Context, !IO),
-            get_quoted_name(Char, [], Token, !IO)
+            start_quoted_name(Char, [], Token, !IO)
         ; Char = ('/') ->
             get_slash(Token, Context, !IO)
         ; Char = ('#') ->
@@ -499,7 +499,7 @@
         ; Char = ('%') ->
             string_skip_to_eol(String, Len, Token, Context, !Posn)
         ; ( Char = '"' ; Char = '''' ) ->
-            string_get_quoted_name(String, Len, Char, [], Posn0, Token,
+            string_start_quoted_name(String, Len, Char, [], Posn0, Token,
                 Context, !Posn)
         ; Char = ('/') ->
             string_get_slash(String, Len, Posn0, Token, Context, !Posn)
@@ -775,6 +775,40 @@
 %
 % Quoted names and quoted strings.
 
+:- pred start_quoted_name(char::in, list(char)::in, token::out,
+    io::di, io::uo) is det.
+
+start_quoted_name(QuoteChar, Chars, Token, !IO) :-
+    get_quoted_name(QuoteChar, Chars, Token0, !IO),
+    ( Token0 = error(_) ->
+        % Skip to the end of the string or name.
+        start_quoted_name(QuoteChar, Chars, _, !IO),
+        Token = Token0
+    ; Token0 = eof ->
+        Token = error("unterminated quote")
+    ;
+        Token = Token0
+    ).
+
+:- pred string_start_quoted_name(string::in, int::in, char::in,
+    list(char)::in, posn::in, token::out, string_token_context::out,
+    posn::in, posn::out) is det.
+
+string_start_quoted_name(String, Len, QuoteChar, Chars, Posn0,
+        Token, Context, !Posn) :-
+    string_get_quoted_name(String, Len, QuoteChar, Chars, Posn0,
+        Token0, Context, !Posn),
+    ( Token0 = error(_) ->
+        % Skip to the end of the string or name.
+        string_start_quoted_name(String, Len, QuoteChar, Chars,
+            Posn0, _, _, !Posn),
+        Token = Token0
+    ; Token0 = eof ->
+        Token = error("unterminated quote")
+    ;
+        Token = Token0
+    ).
+
 :- pred get_quoted_name(char::in, list(char)::in, token::out,
     io::di, io::uo) is det.
 
@@ -785,7 +819,7 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        Token = error("unterminated quote")
+        Token = eof
     ;
         Result = ok(Char),
         ( Char = QuoteChar ->
@@ -816,7 +850,7 @@
         )
     ;
         string_get_context(Posn0, Context, !Posn),
-        Token = error("unterminated quote")
+        Token = eof
     ).
 
 :- pred get_quoted_name_quote(char::in, list(char)::in, token::out,
@@ -863,13 +897,16 @@
 :- pred finish_quoted_name(char::in, list(char)::in, token::out) is det.
 
 finish_quoted_name(QuoteChar, Chars, Token) :-
-    rev_char_list_to_string(Chars, String),
-    ( QuoteChar = '''' ->
-        Token = name(String)
-    ; QuoteChar = '"' ->
-        Token = string(String)
+    ( rev_char_list_to_string(Chars, String) ->
+        ( QuoteChar = '''' ->
+            Token = name(String)
+        ; QuoteChar = '"' ->
+            Token = string(String)
+        ;
+            error("lexer.m: unknown quote character")
+        )
     ;
-        error("lexer.m: unknown quote character")
+        Token = error("invalid character in quoted name")
     ).
 
 :- pred get_quoted_name_escape(char::in, list(char)::in, token::out,
@@ -882,7 +919,7 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        Token = error("unterminated quoted name")
+        Token = eof
     ;
         Result = ok(Char),
         ( Char = '\n' ->
@@ -942,7 +979,7 @@
         )
     ;
         string_get_context(Posn0, Context, !Posn),
-        Token = error("unterminated quoted name")
+        Token = eof
     ).
 
 :- pred escape_char(char::in, char::out) is semidet.
@@ -959,40 +996,22 @@
 escape_char('"', '"').
 escape_char('`', '`').
 
-:- pred get_hex_escape(char::in, list(char)::in, list(char)::in,
-    token::out, io::di, io::uo) is det.
-
-get_hex_escape(QuoteChar, Chars, HexChars, Token, !IO) :-
-    io.read_char(Result, !IO),
-    (
-        Result = error(Error),
-        Token = io_error(Error)
-    ;
-        Result = eof,
-        Token = error("unterminated quote")
-    ;
-        Result = ok(Char),
-        ( char.is_hex_digit(Char) ->
-            get_hex_escape(QuoteChar, Chars, [Char | HexChars], Token, !IO)
-        ; Char = ('\\') ->
-            finish_hex_escape(QuoteChar, Chars, HexChars, Token, !IO)
-        ;
-            Token = error("unterminated hex escape")
-        )
-    ).
-
 :- pred get_unicode_escape(int::in, char::in, list(char)::in, list(char)::in,
     token::out, io::di, io::uo) is det.
 
 get_unicode_escape(NumHexChars, QuoteChar, Chars, HexChars, Token, !IO) :-
     ( if NumHexChars = list.length(HexChars) then
-        rev_char_list_to_string(HexChars, HexString),
         ( if
+            rev_char_list_to_string(HexChars, HexString),
             string.base_string_to_int(16, HexString, UnicodeCharCode),
             convert_unicode_char_to_target_chars(UnicodeCharCode, UTFChars)
         then
-            get_quoted_name(QuoteChar, list.reverse(UTFChars) ++ Chars,
-                Token, !IO)
+            ( if UnicodeCharCode = 0 then
+                Token = null_character_error
+            else
+                get_quoted_name(QuoteChar, list.reverse(UTFChars) ++ Chars,
+                    Token, !IO)
+            )
         else
             Token = error("invalid Unicode character code")
         )
@@ -1003,7 +1022,7 @@
             Token = io_error(Error)
         ;
             Result = eof,
-            Token = error("unterminated quote")
+            Token = eof
         ;
             Result = ok(Char),
             ( if char.is_hex_digit(Char) then
@@ -1022,14 +1041,19 @@
 string_get_unicode_escape(NumHexChars, String, Len, QuoteChar, Chars,
         HexChars, Posn0, Token, Context, !Posn) :-
     ( if NumHexChars = list.length(HexChars) then
-        rev_char_list_to_string(HexChars, HexString),
         ( if
+            rev_char_list_to_string(HexChars, HexString),
             string.base_string_to_int(16, HexString, UnicodeCharCode),
             convert_unicode_char_to_target_chars(UnicodeCharCode, UTFChars)
         then
             RevCharsWithUnicode = list.reverse(UTFChars) ++ Chars,
-            string_get_quoted_name(String, Len, QuoteChar, RevCharsWithUnicode,
-                Posn0, Token, Context, !Posn)
+            ( if UnicodeCharCode = 0 then
+                string_get_context(Posn0, Context, !Posn),
+                Token = null_character_error
+            else
+                string_get_quoted_name(String, Len, QuoteChar,
+                    RevCharsWithUnicode, Posn0, Token, Context, !Posn)
+            )
         else
             string_get_context(Posn0, Context, !Posn),
             Token = error("invalid Unicode character code")
@@ -1045,7 +1069,7 @@
             )
         else
             string_get_context(Posn0, Context, !Posn),
-            Token = error("unterminated quote")
+            Token = eof
         )
     ).
 
@@ -1192,6 +1216,28 @@
     EncodingInt = 1;
 ").
 
+:- pred get_hex_escape(char::in, list(char)::in, list(char)::in,
+    token::out, io::di, io::uo) is det.
+
+get_hex_escape(QuoteChar, Chars, HexChars, Token, !IO) :-
+    io.read_char(Result, !IO),
+    (
+        Result = error(Error),
+        Token = io_error(Error)
+    ;
+        Result = eof,
+        Token = eof
+    ;
+        Result = ok(Char),
+        ( char.is_hex_digit(Char) ->
+            get_hex_escape(QuoteChar, Chars, [Char | HexChars], Token, !IO)
+        ; Char = ('\\') ->
+            finish_hex_escape(QuoteChar, Chars, HexChars, Token, !IO)
+        ;
+            Token = error("unterminated hex escape")
+        )
+    ).
+
 :- pred string_get_hex_escape(string::in, int::in, char::in,
     list(char)::in, list(char)::in, posn::in, token::out,
     string_token_context::out, posn::in, posn::out) is det.
@@ -1211,7 +1257,7 @@
         )
     ;
         string_get_context(Posn0, Context, !Posn),
-        Token = error("unterminated quote")
+        Token = eof
     ).
 
 :- pred finish_hex_escape(char::in, list(char)::in, list(char)::in,
@@ -1223,12 +1269,16 @@
         Token = error("empty hex escape")
     ;
         HexChars = [_ | _],
-        rev_char_list_to_string(HexChars, HexString),
         (
+            rev_char_list_to_string(HexChars, HexString),
             string.base_string_to_int(16, HexString, Int),
             char.to_int(Char, Int)
         ->
-            get_quoted_name(QuoteChar, [Char|Chars], Token, !IO)
+            ( Int = 0 ->
+                Token = null_character_error
+            ;
+                get_quoted_name(QuoteChar, [Char|Chars], Token, !IO)
+            )
         ;
             Token = error("invalid hex escape")
         )
@@ -1246,13 +1296,18 @@
         Token = error("empty hex escape")
     ;
         HexChars = [_ | _],
-        rev_char_list_to_string(HexChars, HexString),
         (
+            rev_char_list_to_string(HexChars, HexString),
             string.base_string_to_int(16, HexString, Int),
             char.to_int(Char, Int)
         ->
-            string_get_quoted_name(String, Len, QuoteChar, [Char | Chars],
-                Posn0, Token, Context, !Posn)
+            ( Int = 0 ->
+                Token = null_character_error,
+                string_get_context(Posn0, Context, !Posn)
+            ;
+                string_get_quoted_name(String, Len, QuoteChar, [Char | Chars],
+                    Posn0, Token, Context, !Posn)
+            )
         ;
             string_get_context(Posn0, Context, !Posn),
             Token = error("invalid hex escape")
@@ -1269,7 +1324,7 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        Token = error("unterminated quote")
+        Token = eof
     ;
         Result = ok(Char),
         ( char.is_octal_digit(Char) ->
@@ -1308,7 +1363,7 @@
                 OctalChars, Posn0, Token, Context, !Posn)
         )
     ;
-        Token = error("unterminated quote"),
+        Token = eof,
         string_get_context(Posn0, Context, !Posn)
     ).
 
@@ -1321,12 +1376,16 @@
         Token = error("empty octal escape")
     ;
         OctalChars = [_ | _],
-        rev_char_list_to_string(OctalChars, OctalString),
         (
+            rev_char_list_to_string(OctalChars, OctalString),
             string.base_string_to_int(8, OctalString, Int),
             char.to_int(Char, Int)
         ->
-            get_quoted_name(QuoteChar, [Char | Chars], Token, !IO)
+            ( Int = 0 ->
+                Token = null_character_error
+            ;
+                get_quoted_name(QuoteChar, [Char | Chars], Token, !IO)
+            )
         ;
             Token = error("invalid octal escape")
         )
@@ -1344,13 +1403,18 @@
         string_get_context(Posn0, Context, !Posn)
     ;
         OctalChars = [_ | _],
-        rev_char_list_to_string(OctalChars, OctalString),
         (
+            rev_char_list_to_string(OctalChars, OctalString),
             string.base_string_to_int(8, OctalString, Int),
             char.to_int(Char, Int)
         ->
-            string_get_quoted_name(String, Len, QuoteChar, [Char | Chars],
-                Posn0, Token, Context, !Posn)
+            ( Int = 0 ->
+                Token = null_character_error,
+                string_get_context(Posn0, Context, !Posn)
+            ;
+                string_get_quoted_name(String, Len, QuoteChar, [Char | Chars],
+                    Posn0, Token, Context, !Posn)
+            )
         ;
             Token = error("invalid octal escape"),
             string_get_context(Posn0, Context, !Posn)
@@ -1370,16 +1434,22 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        rev_char_list_to_string(Chars, Name),
-        Token = name(Name)
+        ( rev_char_list_to_string(Chars, Name) ->
+            Token = name(Name)
+        ;
+            Token = error("invalid character in name")
+        )
     ;
         Result = ok(Char),
         ( char.is_alnum_or_underscore(Char) ->
             get_name([Char | Chars], Token, !IO)
         ;
             io.putback_char(Char, !IO),
-            rev_char_list_to_string(Chars, Name),
-            Token = name(Name)
+            ( rev_char_list_to_string(Chars, Name) ->
+                Token = name(Name)
+            ;
+                Token = error("invalid character in name")
+            ) 
         )
     ).
 
@@ -1429,22 +1499,32 @@
         ( char.is_digit(Char) ->
             get_source_line_number([Char | Chars], Token, Context, !IO)
         ; Char = '\n' ->
-            rev_char_list_to_string(Chars, String),
             (
-                string.base_string_to_int(10, String, Int),
-                Int > 0
+                rev_char_list_to_string(Chars, String)
             ->
-                io.set_line_number(Int, !IO),
-                get_token(Token, Context, !IO)
+                (
+                    string.base_string_to_int(10, String, Int),
+                    Int > 0
+                ->
+                    io.set_line_number(Int, !IO),
+                    get_token(Token, Context, !IO)
+                ;
+                    get_context(Context, !IO),
+                    string.append_list(["invalid line number `", String,
+                        "' in `#' line number directive"], Message),
+                    Token = error(Message)
+                )
             ;
                 get_context(Context, !IO),
-                string.append_list(["invalid line number `", String,
-                    "' in `#' line number directive"], Message),
-                Token = error(Message)
+                Token = error("invalid character in `#' line number directive")
             )
         ;
             get_context(Context, !IO),
-            string.from_char_list([Char], String),
+            ( char.to_int(Char, 0) ->
+                String = "NUL"
+            ;
+                string.from_char_list([Char], String)
+            ),
             string.append_list(["invalid character `", String,
                 "' in `#' line number directive"], Message),
             Token = error(Message)
@@ -1475,7 +1555,11 @@
             )
         ;
             string_get_context(Posn1, Context, !Posn),
-            string.from_char_list([Char], DirectiveString),
+            ( char.to_int(Char, 0) ->
+                DirectiveString = "NUL"
+            ;
+                string.from_char_list([Char], DirectiveString)
+            ),
             string.append_list(["invalid character `", DirectiveString,
                 "' in `#' line number directive"], Message),
             Token = error(Message)
@@ -1494,16 +1578,22 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        rev_char_list_to_string(Chars, Name),
-        Token = name(Name)
+        ( rev_char_list_to_string(Chars, Name) ->
+            Token = name(Name)
+        ;
+            Token = error("invalid character in graphic token")
+        )
     ;
         Result = ok(Char),
         ( graphic_token_char(Char) ->
             get_graphic([Char | Chars], Token, !IO)
         ;
             io.putback_char(Char, !IO),
-            rev_char_list_to_string(Chars, Name),
-            Token = name(Name)
+            ( rev_char_list_to_string(Chars, Name) ->
+                Token = name(Name)
+            ;
+                Token = error("invalid character in graphic token")
+            )
         )
     ).
 
@@ -1535,16 +1625,22 @@
         Token = io_error(Error)
     ;
         Result = eof,
-        rev_char_list_to_string(Chars, VariableName),
-        Token = variable(VariableName)
+        ( rev_char_list_to_string(Chars, VariableName) ->
+            Token = variable(VariableName)
+        ;
+            Token = error("invalid character in variable")
+        )
     ;
         Result = ok(Char),
         ( char.is_alnum_or_underscore(Char) ->
             get_variable([Char | Chars], Token, !IO)
         ;
             io.putback_char(Char, !IO),
-            rev_char_list_to_string(Chars, VariableName),
-            Token = variable(VariableName)
+            ( rev_char_list_to_string(Chars, VariableName) ->
+                Token = variable(VariableName)
+            ;
+                Token = error("invalid character in variable")
+            )
         )
     ).
 
@@ -2184,8 +2280,11 @@
 :- pred rev_char_list_to_int(list(char)::in, int::in, token::out) is det.
 
 rev_char_list_to_int(RevChars, Base, Token) :-
-    rev_char_list_to_string(RevChars, String),
-    conv_string_to_int(String, Base, Token).
+    ( rev_char_list_to_string(RevChars, String) ->
+        conv_string_to_int(String, Base, Token)
+    ;
+        Token = error("invalid character in int")
+    ).
 
 :- pred conv_string_to_int(string::in, int::in, token::out) is det.
 
@@ -2199,8 +2298,11 @@
 :- pred rev_char_list_to_float(list(char)::in, token::out) is det.
 
 rev_char_list_to_float(RevChars, Token) :-
-    rev_char_list_to_string(RevChars, String),
-    conv_to_float(String, Token).
+    ( rev_char_list_to_string(RevChars, String) ->
+        conv_to_float(String, Token)
+    ;
+        Token = error("invalid character in int")
+    ).
 
 :- pred conv_to_float(string::in, token::out) is det.
 
@@ -2211,9 +2313,14 @@
         Token = error("invalid float token")
     ).
 
-:- pred rev_char_list_to_string(list(char)::in, string::out) is det.
+:- pred rev_char_list_to_string(list(char)::in, string::out) is semidet.
 
 rev_char_list_to_string(RevChars, String) :-
-   string.from_rev_char_list(RevChars, String).
+   string.from_rev_char_list_semidet(RevChars, String).
+
+:- func null_character_error = token.
+
+null_character_error =
+    error("null character is illegal in strings and names").
 
 %-----------------------------------------------------------------------------%
Index: library/string.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/string.m,v
retrieving revision 1.256
diff -u -u -r1.256 string.m
--- library/string.m	15 Feb 2007 00:41:48 -0000	1.256
+++ library/string.m	9 Mar 2007 09:09:19 -0000
@@ -12,10 +12,18 @@
 %
 % This modules provides basic string handling facilities.
 %
-% Note that in the current implementation, strings are represented as in C,
-% using a null character as the string terminator. Future implementations,
-% however, might allow null characters in strings. Programmers should
-% avoid creating strings that might contain null characters.
+% Unexpected null characters embedded in the middle of strings can be a source
+% of security vulnerabilities, so the Mercury library predicates and functions
+% which create strings from (lists of) characters throw an exception if a null
+% character is detected.  Programmers must not create strings that might
+% contain null characters using the foreign language interface.
+%
+% The representation of strings is implementation dependent and subject to
+% change. In the current implementation, when Mercury is compiled to C, strings
+% are represented as in C, using a null character as the string terminator.
+% When Mercury is compiled to Java, strings are represented as Java `String's.
+% When Mercury is compiled to .NET IL code, strings are represented as .NET
+% `System.String's.
 %
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
@@ -123,7 +131,7 @@
 :- mode string.char_to_string(in, uo) is det.
 :- mode string.char_to_string(out, in) is semidet.
 
-    % A synonym for string.int_to_char/1.
+    % A synonym for string.char_to_string/1.
     %
 :- func string.from_char(char::in) = (string::uo) is det.
 
@@ -217,6 +225,7 @@
     is det.
 
     % Converts a string to lowercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func string.to_lower(string::in) = (string::uo) is det.
 :- pred string.to_lower(string, string).
@@ -224,6 +233,7 @@
 :- mode string.to_lower(in, in) is semidet.        % implied
 
     % Converts a string to uppercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func string.to_upper(string::in) = (string::uo) is det.
 :- pred string.to_upper(string, string).
@@ -231,16 +241,19 @@
 :- mode string.to_upper(in, in) is semidet.        % implied
 
     % Convert the first character (if any) of a string to uppercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func string.capitalize_first(string) = string.
 :- pred string.capitalize_first(string::in, string::out) is det.
 
     % Convert the first character (if any) of a string to lowercase.
+    % Note that this only converts unaccented Latin letters.
     %
 :- func string.uncapitalize_first(string) = string.
 :- pred string.uncapitalize_first(string::in, string::out) is det.
 
     % Convert the string to a list of characters.
+    % Throws an exception if the list of characters contains a null character.
     %
 :- func string.to_char_list(string) = list(char).
 :- pred string.to_char_list(string, list(char)).
@@ -248,18 +261,29 @@
 :- mode string.to_char_list(uo, in) is det.
 
     % Convert a list of characters to a string.
+    % Throws an exception if the list of characters contains a null character.
     %
 :- func string.from_char_list(list(char)::in) = (string::uo) is det.
 :- pred string.from_char_list(list(char), string).
 :- mode string.from_char_list(in, uo) is det.
 :- mode string.from_char_list(out, in) is det.
 
+    % As above, but fail instead of throwing an exception if the
+    % list contains a null character.
+:- pred string.from_char_list_semidet(list(char)::in, string::uo) is semidet.
+
     % Same as string.from_char_list, except that it reverses the order
     % of the characters.
+    % Throws an exception if the list of characters contains a null character.
     %
 :- func string.from_rev_char_list(list(char)::in) = (string::uo) is det.
 :- pred string.from_rev_char_list(list(char)::in, string::uo) is det.
 
+    % As above, but fail instead of throwing an exception if the
+    % list contains a null character.
+:- pred string.from_rev_char_list_semidet(list(char)::in, string::uo)
+    is semidet.
+
     % Converts a signed base 10 string to an int; throws an exception
     % if the string argument does not match the regexp [+-]?[0-9]+
     %
@@ -1134,9 +1158,6 @@
     private_builtin.unsafe_type_cast(C_Pointer, Int),
     Str = "c_pointer(0x" ++ string.int_to_base_string(Int, 16) ++ ")".
 
-string.from_char_list(CharList, Str) :-
-    string.to_char_list(Str, CharList).
-
 string.int_to_string_thousands(N) =
     string.int_to_base_string_group(N, 10, 3, ",").
 
@@ -1208,9 +1229,18 @@
 % :- pred string.to_char_list(string, list(char)).
 % :- mode string.to_char_list(in, uo) is det.
 % :- mode string.to_char_list(uo, in) is det.
+:- pragma promise_pure(string.to_char_list/2).
+
+string.to_char_list(Str::in, CharList::out) :-
+    string.to_char_list_2(Str, CharList).
+string.to_char_list(Str::uo, CharList::in) :-
+    string.from_char_list(CharList, Str).
+
+:- pred string.to_char_list_2(string, list(char)).
+:- mode string.to_char_list_2(in, out) is det.
 
 :- pragma foreign_proc("C",
-    string.to_char_list(Str::in, CharList::out),
+    string.to_char_list_2(Str::in, CharList::out),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness],
 "{
@@ -1223,8 +1253,34 @@
     }
 }").
 
+string.to_char_list_2(Str, CharList) :-
+    string.to_char_list_3(Str, 0, CharList).
+
+:- pred string.to_char_list_3(string::in, int::in, list(char)::uo) is det.
+
+string.to_char_list_3(Str, Index, CharList) :-
+    ( string.index(Str, Index, Char) ->
+        string.to_char_list_3(Str, Index + 1, CharList0),
+        CharList = [Char | CharList0]
+    ;
+        CharList = []
+    ).
+
+%-----------------------------------------------------------------------------%
+
+:- pragma promise_pure(string.from_char_list/2).
+
+string.from_char_list(Chars::out, Str::in) :-
+    string.to_char_list(Str, Chars).
+string.from_char_list(Chars::in, Str::uo) :-
+    ( string.from_char_list_semidet(Chars, Str0) ->
+        Str = Str0
+    ;
+        error("string.from_char_list: null character in list")
+    ).
+
 :- pragma foreign_proc("C",
-    string.to_char_list(Str::uo, CharList::in),
+    string.from_char_list_semidet(CharList::in, Str::uo),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness],
 "{
@@ -1236,7 +1292,7 @@
     ** Loop to calculate list length + sizeof(MR_Word) in `size'
     ** using list in `char_list_ptr'.
     */
-    size = sizeof(MR_Word);
+    size = 0;
     char_list_ptr = CharList;
     while (! MR_list_is_empty(char_list_ptr)) {
         size++;
@@ -1244,78 +1300,80 @@
     }
 
     /*
-    ** Allocate (length + 1) bytes of heap space for string
-    ** i.e. (length + 1 + sizeof(MR_Word) - 1) / sizeof(MR_Word) words.
+    ** Allocate heap space for string
     */
     MR_allocate_aligned_string_msg(Str, size, MR_PROC_LABEL);
 
     /*
     ** Loop to copy the characters from the char_list to the string.
     */
+    SUCCESS_INDICATOR = MR_TRUE;
     size = 0;
     char_list_ptr = CharList;
     while (! MR_list_is_empty(char_list_ptr)) {
-        Str[size++] = MR_list_head(char_list_ptr);
+        MR_Char c;
+        c = (MR_Char) MR_list_head(char_list_ptr);
+        /*
+        ** It is an error to put a null character in a string
+        ** (see the comments at the top of this file).
+        */
+        if (c == '\\0') {
+            SUCCESS_INDICATOR = MR_FALSE;
+            break;
+        }
+        Str[size++] = c;
         char_list_ptr = MR_list_tail(char_list_ptr);
     }
 
     Str[size] = '\\0';
 }").
 
-:- pragma promise_equivalent_clauses(string.to_char_list/2).
+:- pragma promise_equivalent_clauses(string.from_char_list_semidet/2).
 
-string.to_char_list(Str::in, CharList::out) :-
-    string.to_char_list_2(Str, 0, CharList).
-string.to_char_list(Str::uo, CharList::in) :-
+string.from_char_list_semidet(CharList::in, Str::uo) :-
     (
         CharList = [],
         Str = ""
     ;
         CharList = [C | Cs],
-        string.to_char_list(Str0, Cs),
+        \+ char_to_int(C, 0),
+        string.from_char_list_semidet(Cs, Str0),
         string.first_char(Str, C, Str0)
     ).
 
-:- pred string.to_char_list_2(string::in, int::in, list(char)::uo) is det.
-
-string.to_char_list_2(Str, Index, CharList) :-
-    ( string.index(Str, Index, Char) ->
-        string.to_char_list_2(Str, Index + 1, CharList0),
-        CharList = [Char | CharList0]
-    ;
-        CharList = []
-    ).
-
 %---------------------------------------------------------------------------%
 
 % We could implement from_rev_char_list using list.reverse and from_char_list,
 % but the optimized implementation in C below is there for efficiency since
 % it improves the overall speed of parsing by about 7%.
 
+string.from_rev_char_list(Chars, Str) :-
+    ( string.from_rev_char_list_semidet(Chars, Str0) ->
+        Str = Str0
+    ;
+        error("string.from_rev_char_list: null character in list")
+    ).
+
 :- pragma foreign_proc("C",
-    string.from_rev_char_list(Chars::in, Str::uo),
+    string.from_rev_char_list_semidet(Chars::in, Str::uo),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness],
 "{
     MR_Word list_ptr;
-    MR_Word size, len;
+    MR_Word size;
 
     /*
-    ** Loop to calculate list length + sizeof(MR_Word) in `size'
-    ** using list in `list_ptr' and separately count the length of the string.
+    ** Loop to calculate list length in `size' using list in `list_ptr'
     */
-    size = sizeof(MR_Word);
-    len = 1;
+    size = 0;
     list_ptr = Chars;
     while (!MR_list_is_empty(list_ptr)) {
         size++;
-        len++;
         list_ptr = MR_list_tail(list_ptr);
     }
 
     /*
-    ** Allocate (length + 1) bytes of heap space for string
-    ** i.e. (length + 1 + sizeof(MR_Word) - 1) / sizeof(MR_Word) words.
+    ** Allocate heap space for string
     */
     MR_allocate_aligned_string_msg(Str, size, MR_PROC_LABEL);
 
@@ -1323,21 +1381,30 @@
     ** Set size to be the offset of the end of the string
     ** (ie the \\0) and null terminate the string.
     */
-    Str[--len] = '\\0';
+    Str[size] = '\\0';
 
     /*
     ** Loop to copy the characters from the list_ptr to the string
     ** in reverse order.
     */
     list_ptr = Chars;
+    SUCCESS_INDICATOR = MR_TRUE;
     while (!MR_list_is_empty(list_ptr)) {
-        Str[--len] = (MR_Char) MR_list_head(list_ptr);
+        MR_Char c;
+        c = (MR_Char) MR_list_head(list_ptr);
+        if (c == '\\0') {
+            SUCCESS_INDICATOR = MR_FALSE;
+            break;
+        }
+        Str[--size] = c;
         list_ptr = MR_list_tail(list_ptr);
     }
 }").
 
-string.from_rev_char_list(Chars::in, Str::uo) :-
-    Str = string.from_char_list(list.reverse(Chars)).
+string.from_rev_char_list_semidet(Chars::in, Str::uo) :-
+    string.from_char_list_semidet(list.reverse(Chars), Str).
+
+%---------------------------------------------------------------------------%
 
 string.to_upper(StrIn, StrOut) :-
     string.to_char_list(StrIn, List),
@@ -3411,8 +3478,21 @@
 #endif
 ").
 
+string.set_char(Char, Index, !Str) :-
+    ( char.to_int(Char, 0) ->
+        error("string.set_char: null character")
+    ;
+        string.set_char_2(Char, Index, !Str)
+    ).
+
+:- pred string.set_char_2(char, int, string, string).
+:- mode string.set_char_2(in, in, in, out) is semidet.
+% XXX This mode is disabled because the compiler puts constant
+% strings into static data even when they might be updated.
+%:- mode string.set_char_2(in, in, di, uo) is semidet.
+
 :- pragma foreign_proc("C",
-    string.set_char(Ch::in, Index::in, Str0::in, Str::out),
+    string.set_char_2(Ch::in, Index::in, Str0::in, Str::out),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness],
 "
@@ -3427,7 +3507,7 @@
     }
 ").
 :- pragma foreign_proc("C#",
-    string.set_char(Ch::in, Index::in, Str0::in, Str::out),
+    string.set_char_2(Ch::in, Index::in, Str0::in, Str::out),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
     if (Index >= Str0.Length) {
@@ -3439,13 +3519,13 @@
         SUCCESS_INDICATOR = true;
     }
 ").
-string.set_char(Ch, Index, Str0, Str) :-
+string.set_char_2(Ch, Index, Str0, Str) :-
     string.to_char_list(Str0, List0),
     list.replace_nth(List0, Index + 1, Ch, List),
     string.to_char_list(Str, List).
 
 % :- pragma foreign_proc("C",
-%   string.set_char(Ch::in, Index::in, Str0::di, Str::uo),
+%   string.set_char_2(Ch::in, Index::in, Str0::di, Str::uo),
 %   [will_not_call_mercury, promise_pure, thread_safe, does_not_affect_liveness],
 % "
 %   if ((MR_Unsigned) Index >= strlen(Str0)) {
@@ -3458,7 +3538,7 @@
 % ").
 %
 % :- pragma foreign_proc("C#",
-%   string.set_char(Ch::in, Index::in, Str0::di, Str::uo),
+%   string.set_char_2(Ch::in, Index::in, Str0::di, Str::uo),
 %   [will_not_call_mercury, promise_pure, thread_safe],
 % "
 %   if (Index >= Str0.Length) {
@@ -3473,8 +3553,21 @@
 
 /*-----------------------------------------------------------------------*/
 
+string.unsafe_set_char(Char, Index, !Str) :-
+    ( char.to_int(Char, 0) ->
+        error("string.unsafe_set_char: null character")
+    ;
+        string.unsafe_set_char_2(Char, Index, !Str)
+    ).
+
+:- pred string.unsafe_set_char_2(char, int, string, string).
+:- mode string.unsafe_set_char_2(in, in, in, out) is det.
+% XXX This mode is disabled because the compiler puts constant
+% strings into static data even when they might be updated.
+%:- mode string.unsafe_set_char_2(in, in, di, uo) is det.
+
 :- pragma foreign_proc("C",
-    string.unsafe_set_char(Ch::in, Index::in, Str0::in, Str::out),
+    string.unsafe_set_char_2(Ch::in, Index::in, Str0::in, Str::out),
     [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
         does_not_affect_liveness],
 "
@@ -3484,7 +3577,7 @@
     MR_set_char(Str, Index, Ch);
 ").
 :- pragma foreign_proc("C#",
-    string.unsafe_set_char(Ch::in, Index::in, Str0::in, Str::out),
+    string.unsafe_set_char_2(Ch::in, Index::in, Str0::in, Str::out),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
     Str = System.String.Concat(Str0.Substring(0, Index),
@@ -3492,21 +3585,21 @@
         Str0.Substring(Index + 1));
 ").
 :- pragma foreign_proc("Java",
-    string.unsafe_set_char(Ch::in, Index::in, Str0::in, Str::out),
+    string.unsafe_set_char_2(Ch::in, Index::in, Str0::in, Str::out),
     [will_not_call_mercury, promise_pure, thread_safe],
 "
     Str = Str0.substring(0, Index) + Ch + Str0.substring(Index + 1);
 ").
 
 % :- pragma foreign_proc("C",
-%   string.unsafe_set_char(Ch::in, Index::in, Str0::di, Str::uo),
+%   string.unsafe_set_char_2(Ch::in, Index::in, Str0::di, Str::uo),
 %   [will_not_call_mercury, promise_pure, thread_safe, does_not_affect_liveness],
 % "
 %   Str = Str0;
 %   MR_set_char(Str, Index, Ch);
 % ").
 % :- pragma foreign_proc("C#",
-%   string.unsafe_set_char(Ch::in, Index::in, Str0::di, Str::uo),
+%   string.unsafe_set_char_2(Ch::in, Index::in, Str0::di, Str::uo),
 %   [will_not_call_mercury, promise_pure, thread_safe],
 % "
 %   Str = System.String.Concat(Str0.Substring(0, Index),
@@ -3514,7 +3607,7 @@
 %       Str0.Substring(Index + 1));
 % ").
 % :- pragma foreign_proc("Java",
-%   string.unsafe_set_char(Ch::in, Index::in, Str0::di, Str::uo),
+%   string.unsafe_set_char_2(Ch::in, Index::in, Str0::di, Str::uo),
 %   [will_not_call_mercury, promise_pure, thread_safe],
 % "
 %   Str = Str0.substring(0, Index) + Ch + Str0.substring(Index + 1);
Index: runtime/mercury_string.h
===================================================================
RCS file: /home/mercury1/repository/mercury/runtime/mercury_string.h,v
retrieving revision 1.33
diff -u -u -r1.33 mercury_string.h
--- runtime/mercury_string.h	13 Feb 2007 01:58:57 -0000	1.33
+++ runtime/mercury_string.h	7 Mar 2007 09:16:01 -0000
@@ -19,6 +19,8 @@
 ** But BEWARE: when stored in an MR_Integer, the value must be
 ** first cast to `MR_UnsignedChar'.
 ** Mercury strings are stored as pointers to '\0'-terminated arrays of MR_Char.
+** Mercury strings must not contain null characters. Unexpected null characters
+** are a source of security vulnerabilities.
 **
 ** We may eventually move to using wchar_t for Mercury characters and strings,
 ** so it is important to use these typedefs.
Index: tests/hard_coded/null_char.exp
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/null_char.exp,v
retrieving revision 1.1
diff -u -u -r1.1 null_char.exp
All tests should result in exceptions being thrown.
exception(univ_cons(software_error("string.from_char_list: null character in lis
t")))
exception(univ_cons(software_error("string.from_rev_char_list: null character in
 list")))
exception(univ_cons(software_error("string.set_char: null character")))
exception(univ_cons(software_error("string.unsafe_set_char: null character")))
error("", io_error("null character in input"))
error(io_error("null character in input"))

Index: tests/hard_coded/null_char.exp2
===================================================================
RCS file: tests/hard_coded/null_char.exp2
diff -N tests/hard_coded/null_char.exp2
--- tests/hard_coded/null_char.exp2	6 Aug 2002 00:30:52 -0000	1.1
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1 +0,0 @@
-text before null before some more
Index: tests/hard_coded/null_char.m
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/null_char.m,v
retrieving revision 1.1
diff -u -u -r1.1 null_char.m
--- tests/hard_coded/null_char.m	6 Aug 2002 00:30:52 -0000	1.1
+++ tests/hard_coded/null_char.m	11 Mar 2007 23:16:09 -0000
@@ -1,22 +1,67 @@
-% Test output of strings containing null characters
-
-% XXX Note that currently we don't handle this correctly;
-% we ignore everything after the first null character.
-% So the ".exp2" file for this test case allows that output.
-% If/when this is fixed, the ".exp2" file for this
-% test case should be removed.
-
+% Test that creation of a string containing a null character
+% results in an exception being thrown.
 :- module null_char.
 :- interface.
 :- import_module io.
 
-:- pred main(io__state::di, io__state::uo) is det.
+:- pred main(io::di, io::uo) is cc_multi.
 
 :- implementation.
-:- import_module string.
+:- import_module char, exception, list, string.
+
+main(!IO) :-
+	io.write_string(
+		"All tests should result in exceptions being thrown.\n", !IO),
+	run_test(test1, !IO),
+	run_test(test2, !IO),
+	run_test(test3, !IO),
+	run_test(test4, !IO),
+	test_file(io.read_file_as_string, !IO),
+	test_file(io.read_line_as_string, !IO).
+
+:- pred test_file(pred(io.input_stream, T, io, io), io, io).
+:- mode test_file((pred(in, out, di, uo) is det), di, uo) is det.
+
+test_file(P, !IO) :-
+	io.open_input("null_char.input_file", OpenRes, !IO),
+	(
+		OpenRes = ok(Stream),
+		P(Stream, Res, !IO),
+		io.close_input(Stream, !IO),
+		io.write(Res, !IO),
+		io.nl(!IO)
+	;
+		OpenRes = error(Err),
+		io.write_string(io.error_message(Err), !IO),
+		io.nl(!IO)
+	).
+
+:- pred run_test(pred(T)::in(pred(out) is det), io::di, io::uo) is cc_multi.
+
+run_test(P, !IO) :-
+	try(P, Res),
+	io.write(Res, !IO),
+	io.nl(!IO).
+
+:- pred test1(string::out) is det.
+
+test1(String) :-
+	String = string.from_char_list(['1', 'a', nul, 'g']).
+
+:- pred test2(string::out) is det.
+
+test2(String) :-
+	String = string.from_rev_char_list(['1', 'a', nul, 'g']).
+
+:- pred test3(string::out) is det.
+
+test3(String) :-
+	String = string.set_char_det(nul, 2, "1234").
+
+:- pred test4(string::out) is det.
+
+test4(String) :-
+	String = string.unsafe_set_char(nul, 2, "1234").
 
-main -->
-	print("text before null \0 text after null character\n"),
-	{ Foo = "before\0&after", Bar = " some more\0&more" },
-	print(Foo ++ Bar),
-	print("\n").
+:- func nul = char.
+nul = char.det_from_int(0).
Index: tests/hard_coded/string_hash.m
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/string_hash.m,v
retrieving revision 1.2
diff -u -u -r1.2 string_hash.m
--- tests/hard_coded/string_hash.m	19 Mar 2004 10:13:14 -0000	1.2
+++ tests/hard_coded/string_hash.m	8 Mar 2007 08:14:06 -0000
@@ -56,8 +56,8 @@
 	( Length = 0 ->
 		true
 	;
-		random__random(char__min_char_value,
-			(char__max_char_value - char__min_char_value + 1),
+		random__random(char__min_char_value + 1,
+			(char__max_char_value - char__min_char_value),
 			Int, !RS),
 		( char__to_int(Char, Int) ->
 			!:List = [Char | !.List]
Index: tests/invalid/Mmakefile
===================================================================
RCS file: /home/mercury1/repository/tests/invalid/Mmakefile,v
retrieving revision 1.209
diff -u -u -r1.209 Mmakefile
--- tests/invalid/Mmakefile	22 Feb 2007 03:50:38 -0000	1.209
+++ tests/invalid/Mmakefile	9 Mar 2007 04:08:55 -0000
@@ -140,6 +140,7 @@
 	no_exports \
 	not_a_switch \
 	not_in_interface \
+	null_char \
 	nullary_ho_func_error \
 	occurs \
 	one_member \
Index: tests/invalid/null_char.err_exp
===================================================================
RCS file: tests/invalid/null_char.err_exp
diff -N tests/invalid/null_char.err_exp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/invalid/null_char.err_exp	11 Mar 2007 22:38:28 -0000
@@ -0,0 +1,21 @@
+null_char.m:012: Syntax error: null character is illegal in strings and names.
+null_char.m:014: Syntax error: null character is illegal in strings and names.
+null_char.m:014: Error: atom expected in clause head: 1.
+null_char.m:018: Syntax error: null character is illegal in strings and names.
+null_char.m:018: Syntax error at token `)': unexpected token at start of (sub)term.
+null_char.m:022: Syntax error: null character is illegal in strings and names.
+null_char.m:022: Syntax error at token `. ': unexpected token at start of (sub)term.
+null_char.m:026: Syntax error: null character is illegal in strings and names.
+null_char.m:026: Syntax error at token `)': unexpected token at start of (sub)term.
+null_char.m:028: Syntax error: null character is illegal in strings and names.
+null_char.m:028: Syntax error at token `. ': unexpected token at start of (sub)term.
+null_char.m:030: Syntax error: null character is illegal in strings and names.
+null_char.m:030: Syntax error at token `. ': unexpected token at start of (sub)term.
+null_char.m:032: Syntax error: invalid character in quoted name.
+null_char.m:034: Syntax error: Illegal character 0x0 (0) in input.
+null_char.m:012: Error: clause for predicate `null_char.int'/0
+null_char.m:012:   without preceding `pred' declaration.
+null_char.m:012: Inferred :- pred int.
+null_char.m:016: Error: no clauses for predicate `wrong2'/1.
+null_char.m:020: Error: no clauses for predicate `wrong3'/1.
+null_char.m:024: Error: no clauses for predicate `wrong4'/1.
Index: tests/invalid/null_char.m
===================================================================
RCS file: tests/invalid/null_char.m
diff -N tests/invalid/null_char.m
% Test handling of null characters (any null character is an error).
:- module null_char.

:- interface.

:- type foo == int.

:- implementation.

:- import_module char.

:- pred 'wr\0\ng1'(int).

'wr\0\ng'(1).

:- pred wrong2(string).

wrong2("wr\0\ng").

:- pred wrong3(string).

wrong3(_String) :-'wr\0\ng'.

:- pred wrong4(char).

wrong4('\0\').

:- pred 'wr\0\ng5'.

'wr\0\ng5'.

:- pred 'wr^@ng6'(string).

'wr^@ng6'("wrong").

:- pred wrong7(string).

wrong7("wr\x0\ng").

:- pred wrong8(string).

wrong8("wr\u0000ng").

:- pred wrong9(string).

wrong9("wr^@ng").

Index: tests/invalid/unicode1.err_exp
===================================================================
RCS file: /home/mercury1/repository/tests/invalid/unicode1.err_exp,v
retrieving revision 1.1
diff -u -u -r1.1 unicode1.err_exp
--- tests/invalid/unicode1.err_exp	3 Aug 2006 06:28:08 -0000	1.1
+++ tests/invalid/unicode1.err_exp	12 Mar 2007 01:45:05 -0000
@@ -1,5 +1,5 @@
 unicode1.m:005: Syntax error: invalid hex character in Unicode escape.
-unicode1.m:005: Syntax error: unterminated quote.
+unicode1.m:005: Syntax error at token 'int': operator or `.' expected.
 unicode1.m:001: Warning: interface for module `unicode1' does not export
 unicode1.m:001:   anything.
 For more information, recompile with `-E'.
Index: tests/invalid/unicode2.err_exp
===================================================================
RCS file: /home/mercury1/repository/tests/invalid/unicode2.err_exp,v
retrieving revision 1.1
diff -u -u -r1.1 unicode2.err_exp
--- tests/invalid/unicode2.err_exp	3 Aug 2006 06:28:08 -0000	1.1
+++ tests/invalid/unicode2.err_exp	12 Mar 2007 01:45:05 -0000
@@ -1,5 +1,5 @@
 unicode2.m:005: Syntax error: invalid Unicode character code.
-unicode2.m:005: Syntax error: unterminated quote.
+unicode2.m:005: Syntax error at token 'int': operator or `.' expected.
 unicode2.m:001: Warning: interface for module `unicode2' does not export
 unicode2.m:001:   anything.
 For more information, recompile with `-E'.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list