[m-rev.] for review: escape all control characters in io.write, deconstruct.functor
Julien Fischer
jfischer at opturion.com
Fri Jun 15 18:02:56 AEST 2018
For review by anyone.
I'll update the NEWS file seprately.
----------------------------------------------------
Escape all control characters in io.write, deconstruct.functor etc.
The above predicates currently escape all of the C0 control characters (+
Delete). This change modifies them to escape all of the characters in the
Unicode category `Other,control' using backslash escapes when they exist and
octal escapes otherwise.
library/term_io.m:
Do not treat C1 control characters as Mercury source characters.
Re-order the list of Mercury punctuation characters by codepoint
order; it is difficult to check for completion otherwise.
Put a list of special characters escapes in order.
runtime/mercury_ml_expand_body.h:
library/rtti_implementation.m:
Update the implementations of functor/4 to escape all control
characters when returning the functor of a character.
library/deconstruct.m:
Specify that functor/4 should escape all control characters in
the value returned for characters and strings. (XXX TODO: it
currently doesn't implement the new behaviour for strings; I'll
add that separately.)
library/io.m:
library/stream.string_writer.m:
Similar to the above but for io.write etc.
tests/hard_coded/write.{m,exp}:
tests/hard_coded/deconstruct_arg.{m,exp,exp2}:
Extend these tests to cover the block of C1 control characters
and the boundaries around it.
Julien.
diff --git a/library/deconstruct.m b/library/deconstruct.m
index b957cf6..0bfb506 100644
--- a/library/deconstruct.m
+++ b/library/deconstruct.m
@@ -74,11 +74,15 @@
% handled as if it had standard equality.
% - for integers, the string is a base 10 number;
% positive integers have no sign.
- % - for finite floats, the string is a floating point, base 10 number;
- % positive floating point numbers have no sign.
- % - for infinite floats, the string "infinity" or "-infinity";
- % - for strings, the string, inside double quotation marks
- % - for characters, the character inside single quotation marks
+ % - for finite floats, the string is a base 10 floating point number;
+ % positive floating point numbers have no sign;
+ % for infinite floats, the string "infinity" or "-infinity".
+ % - for strings, the string, inside double quotation marks using
+ % backslash escapes if necessary and backslash or octal escapes for
+ % all characters for which char.is_control/1 is true.
+ % - for characters, the character inside single quotation marks using
+ % a backslash escape if necssary and a backslash or octal escape for
+ % for all characters for which char.is_control/1 is true.
% - for predicates, the string <<predicate>>, and for functions,
% the string <<function>>, except with include_details_cc,
% in which case it will be the predicate or function name.
diff --git a/library/io.m b/library/io.m
index 40baad0..5c3e6a7 100644
--- a/library/io.m
+++ b/library/io.m
@@ -430,14 +430,16 @@
% be valid Mercury syntax whenever possible.
%
% Strings and characters are always printed out in quotes, using backslash
- % escapes if necessary. For higher-order types, or for types defined using
- % the foreign language interface (pragma foreign_type), the text output
- % will only describe the type that is being printed, not the value, and the
- % result may not be parsable by `read'. For the types containing
- % existential quantifiers, the type `type_desc' and closure types, the
- % result may not be parsable by `read', either. But in all other cases
- % the format used is standard Mercury syntax, and if you append a period
- % and newline (".\n"), then the results can be read in again using `read'.
+ % escapes if necessary and backslash or octal escapes for all characters
+ % for which char.is_control/1 is true. For higher-order types, or for types
+ % defined using the foreign language interface (pragma foreign_type), the
+ % text output will only describe the type that is being printed, not the
+ % value, and the result may not be parsable by `read'. For the types
+ % containing existential quantifiers, the type `type_desc' and closure
+ % types, the result may not be parsable by `read', either. But in all other
+ % cases the format used is standard Mercury syntax, and if you append a
+ % period and newline (".\n"), then the results can be read in again using
+ % `read'.
%
% write/5 is the same as write/4 except that it allows the caller
% to specify how non-canonical types should be handled. write_cc/3
diff --git a/library/rtti_implementation.m b/library/rtti_implementation.m
index 9ed3b82..45967aa 100644
--- a/library/rtti_implementation.m
+++ b/library/rtti_implementation.m
@@ -2819,11 +2819,9 @@ deconstruct_2(Term, TypeInfo, TypeCtorInfo, TypeCtorRep, NonCanon,
( if quote_special_escape_char(Char, EscapedChar) then
Functor = EscapedChar
else if
- Int = char.to_int(Char),
- ( 0x0 =< Int, Int =< 0x1f
- ; Int = 0x7f
- )
+ char.is_control(Char)
then
+ char.to_int(Char, Int),
string.int_to_base_string(Int, 8, OctalString0),
string.pad_left(OctalString0, '0', 3, OctalString),
Functor = "'\\" ++ OctalString ++ "\\'"
diff --git a/library/stream.string_writer.m b/library/stream.string_writer.m
index c276831..d2df2c7 100644
--- a/library/stream.string_writer.m
+++ b/library/stream.string_writer.m
@@ -124,14 +124,16 @@
% valid Mercury syntax whenever possible.
%
% Strings and characters are always printed out in quotes, using backslash
- % escapes if necessary. For higher-order types, or for types defined using
- % the foreign language interface (pragma foreign_type), the text output
- % will only describe the type that is being printed, not the value, and the
- % result may not be parsable by `read'. For the types containing
- % existential quantifiers, the type `type_desc' and closure types, the
- % result may not be parsable by `read', either. But in all other cases the
- % format used is standard Mercury syntax, and if you append a period and
- % newline (".\n"), then the results can be read in again using `read'.
+ % escapes if necessary and backslash or octal escapes for all characters
+ % for which char.is_control/1 is true. For higher-order types, or for types
+ % defined using the foreign language interface (pragma foreign_type), the
+ % text output will only describe the type that is being printed, not the
+ % value, and the result may not be parsable by `read'. For the types
+ % containing existential quantifiers, the type `type_desc' and closure
+ % types, the result may not be parsable by `read', either. But in all
+ % other cases the format used is standard Mercury syntax, and if you append
+ % a period and newline (".\n"), then the results can be read in again using
+ % `read'.
%
% write/5 is the same as write/4 except that it allows the caller to
% specify how non-canonical types should be handled. write_cc/4 is the
diff --git a/library/term_io.m b/library/term_io.m
index eeaef88..2f4e627 100644
--- a/library/term_io.m
+++ b/library/term_io.m
@@ -785,7 +785,7 @@ string_is_escaped_char(Char::out, String::in) :-
is_mercury_source_char(Char) :-
( char.is_alnum(Char)
; is_mercury_punctuation_char(Char)
- ; char.to_int(Char) >= 0x80
+ ; char.to_int(Char) >= 0xA0 % 0x7f - 0x9f are control characters.
).
%---------------------------------------------------------------------------%
@@ -942,39 +942,43 @@ mercury_escape_char(Char) = EscapeCode :-
% Note: the code here is similar to code in runtime/mercury_trace_base.c;
% any changes here may require similar changes there.
+% Codepoints: 0x20 -> 0x2f.
is_mercury_punctuation_char(' ').
is_mercury_punctuation_char('!').
-is_mercury_punctuation_char('@').
+is_mercury_punctuation_char('"').
is_mercury_punctuation_char('#').
is_mercury_punctuation_char('$').
is_mercury_punctuation_char('%').
-is_mercury_punctuation_char('^').
is_mercury_punctuation_char('&').
-is_mercury_punctuation_char('*').
+is_mercury_punctuation_char('''').
is_mercury_punctuation_char('(').
is_mercury_punctuation_char(')').
-is_mercury_punctuation_char('-').
-is_mercury_punctuation_char('_').
+is_mercury_punctuation_char('*').
is_mercury_punctuation_char('+').
-is_mercury_punctuation_char('=').
-is_mercury_punctuation_char('`').
-is_mercury_punctuation_char('~').
-is_mercury_punctuation_char('{').
-is_mercury_punctuation_char('}').
-is_mercury_punctuation_char('[').
-is_mercury_punctuation_char(']').
-is_mercury_punctuation_char(';').
+is_mercury_punctuation_char(',').
+is_mercury_punctuation_char('-').
+is_mercury_punctuation_char('.').
+is_mercury_punctuation_char('/').
+% Codepoints: 0x3a -> 0x40.
is_mercury_punctuation_char(':').
-is_mercury_punctuation_char('''').
-is_mercury_punctuation_char('"').
+is_mercury_punctuation_char(';').
is_mercury_punctuation_char('<').
+is_mercury_punctuation_char('=').
is_mercury_punctuation_char('>').
-is_mercury_punctuation_char('.').
-is_mercury_punctuation_char(',').
-is_mercury_punctuation_char('/').
is_mercury_punctuation_char('?').
+is_mercury_punctuation_char('@').
+% Codepoints: 0x5b -> 0x60.
+is_mercury_punctuation_char('[').
is_mercury_punctuation_char('\\').
+is_mercury_punctuation_char(']').
+is_mercury_punctuation_char('^').
+is_mercury_punctuation_char('_').
+is_mercury_punctuation_char('`').
+% Codpoints: 0x7b -> 0x7e.
+is_mercury_punctuation_char('{').
is_mercury_punctuation_char('|').
+is_mercury_punctuation_char('~').
+is_mercury_punctuation_char('}').
%---------------------------------------------------------------------------%
@@ -1012,10 +1016,10 @@ encode_escaped_char(Char::out, Str::in) :-
mercury_escape_special_char('\a', 'a').
mercury_escape_special_char('\b', 'b').
-mercury_escape_special_char('\r', 'r').
mercury_escape_special_char('\f', 'f').
-mercury_escape_special_char('\t', 't').
mercury_escape_special_char('\n', 'n').
+mercury_escape_special_char('\r', 'r').
+mercury_escape_special_char('\t', 't').
mercury_escape_special_char('\v', 'v').
mercury_escape_special_char('\\', '\\').
mercury_escape_special_char('''', '''').
diff --git a/runtime/mercury_ml_expand_body.h b/runtime/mercury_ml_expand_body.h
index 0db8252..8ec6089 100644
--- a/runtime/mercury_ml_expand_body.h
+++ b/runtime/mercury_ml_expand_body.h
@@ -893,10 +893,15 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word *data_word_ptr,
case '\n': str_ptr = "'\\n'"; break;
case '\v': str_ptr = "'\\v'"; break;
default:
- // Print C0 control characters and Delete in
- // octal.
- if (data_word <= 0x1f || data_word == 0x7f) {
- sprintf(buf, "\'\\%03o\\\'", data_word);
+ // Print remaining control characters using octal
+ // escapes.
+ if (
+ (0x00 <= data_word && data_word <= 0x1f) ||
+ (0x7f <= data_word && data_word <= 0x9f)
+ ) {
+ sprintf(buf,
+ "\'\\%03" MR_INTEGER_LENGTH_MODIFIER "o\\\'",
+ data_word);
} else if (MR_is_ascii(data_word)) {
sprintf(buf, "\'%c\'", (char) data_word);
} else if (MR_is_surrogate(data_word)) {
diff --git a/tests/hard_coded/deconstruct_arg.exp b/tests/hard_coded/deconstruct_arg.exp
index 49f7a28..7f4ef4f 100644
--- a/tests/hard_coded/deconstruct_arg.exp
+++ b/tests/hard_coded/deconstruct_arg.exp
@@ -264,6 +264,20 @@ deconstruct deconstruct: functor '\'' arity 0
deconstruct limited deconstruct 3 of '\''
functor '\'' arity 0 []
+deconstruct functor: '~'/0
+deconstruct argument 0 of '~' doesn't exist
+deconstruct argument 1 of '~' doesn't exist
+deconstruct argument 2 of '~' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '~' arity 0
+[]
+deconstruct limited deconstruct 3 of '~'
+functor '~' arity 0 []
+
deconstruct functor: '\001\'/0
deconstruct argument 0 of '\001\' doesn't exist
deconstruct argument 1 of '\001\' doesn't exist
@@ -306,6 +320,48 @@ deconstruct deconstruct: functor '\177\' arity 0
deconstruct limited deconstruct 3 of '\177\'
functor '\177\' arity 0 []
+deconstruct functor: '\200\'/0
+deconstruct argument 0 of '\200\' doesn't exist
+deconstruct argument 1 of '\200\' doesn't exist
+deconstruct argument 2 of '\200\' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '\200\' arity 0
+[]
+deconstruct limited deconstruct 3 of '\200\'
+functor '\200\' arity 0 []
+
+deconstruct functor: '\237\'/0
+deconstruct argument 0 of '\237\' doesn't exist
+deconstruct argument 1 of '\237\' doesn't exist
+deconstruct argument 2 of '\237\' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '\237\' arity 0
+[]
+deconstruct limited deconstruct 3 of '\237\'
+functor '\237\' arity 0 []
+
+deconstruct functor: ' '/0
+deconstruct argument 0 of ' ' doesn't exist
+deconstruct argument 1 of ' ' doesn't exist
+deconstruct argument 2 of ' ' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor ' ' arity 0
+[]
+deconstruct limited deconstruct 3 of ' '
+functor ' ' arity 0 []
+
deconstruct functor: 'Ω'/0
deconstruct argument 0 of 'Ω' doesn't exist
deconstruct argument 1 of 'Ω' doesn't exist
@@ -544,7 +600,7 @@ deconstruct deconstruct: functor newline arity 0
deconstruct limited deconstruct 3 of '<<predicate>>'
functor newline arity 0 []
-deconstruct functor: lambda_deconstruct_arg_m_176/1
+deconstruct functor: lambda_deconstruct_arg_m_182/1
deconstruct argument 0 of '<<predicate>>' is [1, 2]
deconstruct argument 1 of '<<predicate>>' doesn't exist
deconstruct argument 2 of '<<predicate>>' doesn't exist
@@ -553,10 +609,10 @@ deconstruct argument 'mooo!' doesn't exist
deconstruct argument 'packed1' doesn't exist
deconstruct argument 'packed2' doesn't exist
deconstruct argument 'packed3' doesn't exist
-deconstruct deconstruct: functor lambda_deconstruct_arg_m_176 arity 1
+deconstruct deconstruct: functor lambda_deconstruct_arg_m_182 arity 1
[[1, 2]]
deconstruct limited deconstruct 3 of '<<predicate>>'
-functor lambda_deconstruct_arg_m_176 arity 1 [[1, 2]]
+functor lambda_deconstruct_arg_m_182 arity 1 [[1, 2]]
deconstruct functor: p/3
deconstruct argument 0 of '<<predicate>>' is 1
diff --git a/tests/hard_coded/deconstruct_arg.exp2 b/tests/hard_coded/deconstruct_arg.exp2
index 349ed1c..bc508fa 100644
--- a/tests/hard_coded/deconstruct_arg.exp2
+++ b/tests/hard_coded/deconstruct_arg.exp2
@@ -264,6 +264,20 @@ deconstruct deconstruct: functor '\'' arity 0
deconstruct limited deconstruct 3 of '\''
functor '\'' arity 0 []
+deconstruct functor: '~'/0
+deconstruct argument 0 of '~' doesn't exist
+deconstruct argument 1 of '~' doesn't exist
+deconstruct argument 2 of '~' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '~' arity 0
+[]
+deconstruct limited deconstruct 3 of '~'
+functor '~' arity 0 []
+
deconstruct functor: '\001\'/0
deconstruct argument 0 of '\001\' doesn't exist
deconstruct argument 1 of '\001\' doesn't exist
@@ -306,6 +320,48 @@ deconstruct deconstruct: functor '\177\' arity 0
deconstruct limited deconstruct 3 of '\177\'
functor '\177\' arity 0 []
+deconstruct functor: '\200\'/0
+deconstruct argument 0 of '\200\' doesn't exist
+deconstruct argument 1 of '\200\' doesn't exist
+deconstruct argument 2 of '\200\' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '\200\' arity 0
+[]
+deconstruct limited deconstruct 3 of '\200\'
+functor '\200\' arity 0 []
+
+deconstruct functor: '\237\'/0
+deconstruct argument 0 of '\237\' doesn't exist
+deconstruct argument 1 of '\237\' doesn't exist
+deconstruct argument 2 of '\237\' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor '\237\' arity 0
+[]
+deconstruct limited deconstruct 3 of '\237\'
+functor '\237\' arity 0 []
+
+deconstruct functor: ' '/0
+deconstruct argument 0 of ' ' doesn't exist
+deconstruct argument 1 of ' ' doesn't exist
+deconstruct argument 2 of ' ' doesn't exist
+deconstruct argument 'moo' doesn't exist
+deconstruct argument 'mooo!' doesn't exist
+deconstruct argument 'packed1' doesn't exist
+deconstruct argument 'packed2' doesn't exist
+deconstruct argument 'packed3' doesn't exist
+deconstruct deconstruct: functor ' ' arity 0
+[]
+deconstruct limited deconstruct 3 of ' '
+functor ' ' arity 0 []
+
deconstruct functor: 'Ω'/0
deconstruct argument 0 of 'Ω' doesn't exist
deconstruct argument 1 of 'Ω' doesn't exist
diff --git a/tests/hard_coded/deconstruct_arg.m b/tests/hard_coded/deconstruct_arg.m
index 88dddd4..4e7f89f 100644
--- a/tests/hard_coded/deconstruct_arg.m
+++ b/tests/hard_coded/deconstruct_arg.m
@@ -130,11 +130,17 @@ main(!IO) :-
test_all('\v', !IO),
test_all('\\', !IO),
test_all('\'', !IO),
+ test_all('~', !IO),
% test C0 control characters
- test_all('\1\', !IO),
- test_all('\37\', !IO),
+ test_all('\001\', !IO),
+ test_all('\037\', !IO),
test_all('\177\', !IO),
+ % test C1 control characters
+ test_all('\200\', !IO),
+ test_all('\237\', !IO),
+ % No-break space (next codepoint after C1 control characters)
+ test_all('\240\', !IO),
% test a character that requires more than one byte in its
% UTF-8 encoding.
diff --git a/tests/hard_coded/write.exp b/tests/hard_coded/write.exp
index f9e16ef..91faf21 100644
--- a/tests/hard_coded/write.exp
+++ b/tests/hard_coded/write.exp
@@ -29,8 +29,11 @@ TESTING BUILTINS
"Foo%sFoo"
"\""
"\a\b\f\t\n\r\v\"\\"
+"\001\\037\\177\\200\\237\ "
'a'
+'A'
'&'
+'\001\'
'\a'
'\b'
'\f'
@@ -38,9 +41,17 @@ TESTING BUILTINS
'\n'
'\r'
'\v'
+'\037\'
+' '
'\''
'\\'
'\"'
+'~'
+'\177\'
+'\200\'
+'\237\'
+' '
+0.0
3.14159
1.128324983e-21
2.23954899e+23
diff --git a/tests/hard_coded/write.m b/tests/hard_coded/write.m
index 700f5ee..168e50c 100644
--- a/tests/hard_coded/write.m
+++ b/tests/hard_coded/write.m
@@ -12,6 +12,7 @@
:- implementation.
:- import_module array.
+:- import_module char.
:- import_module float.
:- import_module int.
:- import_module list.
@@ -127,10 +128,14 @@ test_builtins(!IO) :-
io.write_line("Foo%sFoo", !IO),
io.write_line("""", !IO), % interesting - prints """ of course
io.write_line("\a\b\f\t\n\r\v\"\\", !IO),
+ io.write_line("\001\\037\\177\\200\\237\\240\", !IO),
% Test characters.
io.write_line('a', !IO),
+ io.write_line('A', !IO),
io.write_line('&', !IO),
+
+ io.write_line('\001\', !IO), % Second C0 control.
io.write_line('\a', !IO),
io.write_line('\b', !IO),
io.write_line('\f', !IO),
@@ -138,11 +143,21 @@ test_builtins(!IO) :-
io.write_line('\n', !IO),
io.write_line('\r', !IO),
io.write_line('\v', !IO),
+ io.write_line('\037\', !IO), % Last C0 control.
+ io.write_line(' ', !IO),
+
io.write_line('\'', !IO),
io.write_line(('\\') : character, !IO),
io.write_line('\"', !IO),
+ io.write_line('~', !IO),
+ io.write_line('\177\', !IO), % Delete.
+ io.write_line('\200\', !IO), % First C1 control.
+ io.write_line('\237\', !IO), % Last C1 control.
+ io.write_line('\240\', !IO), % No-break space.
+
% Test floats.
+ io.write_line(0.0, !IO),
io.write_line(3.14159, !IO),
io.write_line(11.28324983E-22, !IO),
io.write_line(22.3954899E22, !IO),
More information about the reviews
mailing list