[m-rev.] for review: escape all control characters in io.write, deconstruct.functor
Julien Fischer
jfischer at opturion.com
Tue Jun 19 13:56:18 AEST 2018
Hi,
Is anybody intending to review this one?
Julien.
On Fri, 15 Jun 2018, Julien Fischer wrote:
>
> For review by anyone.
>
> I'll update the NEWS file seprately.
>
> ----------------------------------------------------
>
> Escape all control characters in io.write, deconstruct.functor etc.
>
> The above predicates currently escape all of the C0 control characters (+
> Delete). This change modifies them to escape all of the characters in the
> Unicode category `Other,control' using backslash escapes when they exist and
> octal escapes otherwise.
>
> library/term_io.m:
> Do not treat C1 control characters as Mercury source characters.
>
> Re-order the list of Mercury punctuation characters by codepoint
> order; it is difficult to check for completion otherwise.
>
> Put a list of special characters escapes in order.
>
> runtime/mercury_ml_expand_body.h:
> library/rtti_implementation.m:
> Update the implementations of functor/4 to escape all control
> characters when returning the functor of a character.
>
> library/deconstruct.m:
> Specify that functor/4 should escape all control characters in
> the value returned for characters and strings. (XXX TODO: it
> currently doesn't implement the new behaviour for strings; I'll
> add that separately.)
>
> library/io.m:
> library/stream.string_writer.m:
> Similar to the above but for io.write etc.
>
> tests/hard_coded/write.{m,exp}:
> tests/hard_coded/deconstruct_arg.{m,exp,exp2}:
> Extend these tests to cover the block of C1 control characters
> and the boundaries around it.
>
> Julien.
>
> diff --git a/library/deconstruct.m b/library/deconstruct.m
> index b957cf6..0bfb506 100644
> --- a/library/deconstruct.m
> +++ b/library/deconstruct.m
> @@ -74,11 +74,15 @@
> % handled as if it had standard equality.
> % - for integers, the string is a base 10 number;
> % positive integers have no sign.
> - % - for finite floats, the string is a floating point, base 10 number;
> - % positive floating point numbers have no sign.
> - % - for infinite floats, the string "infinity" or "-infinity";
> - % - for strings, the string, inside double quotation marks
> - % - for characters, the character inside single quotation marks
> + % - for finite floats, the string is a base 10 floating point number;
> + % positive floating point numbers have no sign;
> + % for infinite floats, the string "infinity" or "-infinity".
> + % - for strings, the string, inside double quotation marks using
> + % backslash escapes if necessary and backslash or octal escapes for
> + % all characters for which char.is_control/1 is true.
> + % - for characters, the character inside single quotation marks using
> + % a backslash escape if necssary and a backslash or octal escape for
> + % for all characters for which char.is_control/1 is true.
> % - for predicates, the string <<predicate>>, and for functions,
> % the string <<function>>, except with include_details_cc,
> % in which case it will be the predicate or function name.
> diff --git a/library/io.m b/library/io.m
> index 40baad0..5c3e6a7 100644
> --- a/library/io.m
> +++ b/library/io.m
> @@ -430,14 +430,16 @@
> % be valid Mercury syntax whenever possible.
> %
> % Strings and characters are always printed out in quotes, using
> % backslash
> - % escapes if necessary. For higher-order types, or for types defined
> using
> - % the foreign language interface (pragma foreign_type), the text output
> - % will only describe the type that is being printed, not the value, and
> the
> - % result may not be parsable by `read'. For the types containing
> - % existential quantifiers, the type `type_desc' and closure types, the
> - % result may not be parsable by `read', either. But in all other cases
> - % the format used is standard Mercury syntax, and if you append a period
> - % and newline (".\n"), then the results can be read in again using
> `read'.
> + % escapes if necessary and backslash or octal escapes for all characters
> + % for which char.is_control/1 is true. For higher-order types, or for
> types
> + % defined using the foreign language interface (pragma foreign_type),
> the
> + % text output will only describe the type that is being printed, not the
> + % value, and the result may not be parsable by `read'. For the types
> + % containing existential quantifiers, the type `type_desc' and closure
> + % types, the result may not be parsable by `read', either. But in all
> other
> + % cases the format used is standard Mercury syntax, and if you append a
> + % period and newline (".\n"), then the results can be read in again
> using
> + % `read'.
> %
> % write/5 is the same as write/4 except that it allows the caller
> % to specify how non-canonical types should be handled. write_cc/3
> diff --git a/library/rtti_implementation.m b/library/rtti_implementation.m
> index 9ed3b82..45967aa 100644
> --- a/library/rtti_implementation.m
> +++ b/library/rtti_implementation.m
> @@ -2819,11 +2819,9 @@ deconstruct_2(Term, TypeInfo, TypeCtorInfo,
> TypeCtorRep, NonCanon,
> ( if quote_special_escape_char(Char, EscapedChar) then
> Functor = EscapedChar
> else if
> - Int = char.to_int(Char),
> - ( 0x0 =< Int, Int =< 0x1f
> - ; Int = 0x7f
> - )
> + char.is_control(Char)
> then
> + char.to_int(Char, Int),
> string.int_to_base_string(Int, 8, OctalString0),
> string.pad_left(OctalString0, '0', 3, OctalString),
> Functor = "'\\" ++ OctalString ++ "\\'"
> diff --git a/library/stream.string_writer.m b/library/stream.string_writer.m
> index c276831..d2df2c7 100644
> --- a/library/stream.string_writer.m
> +++ b/library/stream.string_writer.m
> @@ -124,14 +124,16 @@
> % valid Mercury syntax whenever possible.
> %
> % Strings and characters are always printed out in quotes, using
> % backslash
> - % escapes if necessary. For higher-order types, or for types defined
> using
> - % the foreign language interface (pragma foreign_type), the text output
> - % will only describe the type that is being printed, not the value, and
> the
> - % result may not be parsable by `read'. For the types containing
> - % existential quantifiers, the type `type_desc' and closure types, the
> - % result may not be parsable by `read', either. But in all other cases
> the
> - % format used is standard Mercury syntax, and if you append a period and
> - % newline (".\n"), then the results can be read in again using `read'.
> + % escapes if necessary and backslash or octal escapes for all characters
> + % for which char.is_control/1 is true. For higher-order types, or for
> types
> + % defined using the foreign language interface (pragma foreign_type),
> the
> + % text output will only describe the type that is being printed, not the
> + % value, and the result may not be parsable by `read'. For the types
> + % containing existential quantifiers, the type `type_desc' and closure
> + % types, the result may not be parsable by `read', either. But in all
> + % other cases the format used is standard Mercury syntax, and if you
> append
> + % a period and newline (".\n"), then the results can be read in again
> using
> + % `read'.
> %
> % write/5 is the same as write/4 except that it allows the caller to
> % specify how non-canonical types should be handled. write_cc/4 is the
> diff --git a/library/term_io.m b/library/term_io.m
> index eeaef88..2f4e627 100644
> --- a/library/term_io.m
> +++ b/library/term_io.m
> @@ -785,7 +785,7 @@ string_is_escaped_char(Char::out, String::in) :-
> is_mercury_source_char(Char) :-
> ( char.is_alnum(Char)
> ; is_mercury_punctuation_char(Char)
> - ; char.to_int(Char) >= 0x80
> + ; char.to_int(Char) >= 0xA0 % 0x7f - 0x9f are control characters.
> ) .
>
> % ---------------------------------------------------------------------------%
> @@ -942,39 +942,43 @@ mercury_escape_char(Char) = EscapeCode :-
> % Note: the code here is similar to code in
> % runtime/mercury_trace_base.c;
> % any changes here may require similar changes there.
>
> +% Codepoints: 0x20 -> 0x2f.
> is_mercury_punctuation_char(' ').
> is_mercury_punctuation_char('!').
> -is_mercury_punctuation_char('@').
> +is_mercury_punctuation_char('"').
> is_mercury_punctuation_char('#').
> is_mercury_punctuation_char('$').
> is_mercury_punctuation_char('%').
> -is_mercury_punctuation_char('^').
> is_mercury_punctuation_char('&').
> -is_mercury_punctuation_char('*').
> +is_mercury_punctuation_char('''').
> is_mercury_punctuation_char('(').
> is_mercury_punctuation_char(')').
> -is_mercury_punctuation_char('-').
> -is_mercury_punctuation_char('_').
> +is_mercury_punctuation_char('*').
> is_mercury_punctuation_char('+').
> -is_mercury_punctuation_char('=').
> -is_mercury_punctuation_char('`').
> -is_mercury_punctuation_char('~').
> -is_mercury_punctuation_char('{').
> -is_mercury_punctuation_char('}').
> -is_mercury_punctuation_char('[').
> -is_mercury_punctuation_char(']').
> -is_mercury_punctuation_char(';').
> +is_mercury_punctuation_char(',').
> +is_mercury_punctuation_char('-').
> +is_mercury_punctuation_char('.').
> +is_mercury_punctuation_char('/').
> +% Codepoints: 0x3a -> 0x40.
> is_mercury_punctuation_char(':').
> -is_mercury_punctuation_char('''').
> -is_mercury_punctuation_char('"').
> +is_mercury_punctuation_char(';').
> is_mercury_punctuation_char('<').
> +is_mercury_punctuation_char('=').
> is_mercury_punctuation_char('>').
> -is_mercury_punctuation_char('.').
> -is_mercury_punctuation_char(',').
> -is_mercury_punctuation_char('/').
> is_mercury_punctuation_char('?').
> +is_mercury_punctuation_char('@').
> +% Codepoints: 0x5b -> 0x60.
> +is_mercury_punctuation_char('[').
> is_mercury_punctuation_char('\\').
> +is_mercury_punctuation_char(']').
> +is_mercury_punctuation_char('^').
> +is_mercury_punctuation_char('_').
> +is_mercury_punctuation_char('`').
> +% Codpoints: 0x7b -> 0x7e.
> +is_mercury_punctuation_char('{').
> is_mercury_punctuation_char('|').
> +is_mercury_punctuation_char('~').
> +is_mercury_punctuation_char('}').
>
> % ---------------------------------------------------------------------------%
>
> @@ -1012,10 +1016,10 @@ encode_escaped_char(Char::out, Str::in) :-
>
> mercury_escape_special_char('\a', 'a').
> mercury_escape_special_char('\b', 'b').
> -mercury_escape_special_char('\r', 'r').
> mercury_escape_special_char('\f', 'f').
> -mercury_escape_special_char('\t', 't').
> mercury_escape_special_char('\n', 'n').
> +mercury_escape_special_char('\r', 'r').
> +mercury_escape_special_char('\t', 't').
> mercury_escape_special_char('\v', 'v').
> mercury_escape_special_char('\\', '\\').
> mercury_escape_special_char('''', '''').
> diff --git a/runtime/mercury_ml_expand_body.h
> b/runtime/mercury_ml_expand_body.h
> index 0db8252..8ec6089 100644
> --- a/runtime/mercury_ml_expand_body.h
> +++ b/runtime/mercury_ml_expand_body.h
> @@ -893,10 +893,15 @@ EXPAND_FUNCTION_NAME(MR_TypeInfo type_info, MR_Word
> *data_word_ptr,
> case '\n': str_ptr = "'\\n'"; break;
> case '\v': str_ptr = "'\\v'"; break;
> default:
> - // Print C0 control characters and Delete in
> - // octal.
> - if (data_word <= 0x1f || data_word == 0x7f) {
> - sprintf(buf, "\'\\%03o\\\'", data_word);
> + // Print remaining control characters using octal
> + // escapes.
> + if ( + (0x00 <= data_word
> && data_word <= 0x1f) ||
> + (0x7f <= data_word && data_word <= 0x9f)
> + ) { + sprintf(buf,
> + "\'\\%03" MR_INTEGER_LENGTH_MODIFIER
> "o\\\'",
> + data_word);
> } else if (MR_is_ascii(data_word)) {
> sprintf(buf, "\'%c\'", (char) data_word);
> } else if (MR_is_surrogate(data_word)) {
> diff --git a/tests/hard_coded/deconstruct_arg.exp
> b/tests/hard_coded/deconstruct_arg.exp
> index 49f7a28..7f4ef4f 100644
> --- a/tests/hard_coded/deconstruct_arg.exp
> +++ b/tests/hard_coded/deconstruct_arg.exp
> @@ -264,6 +264,20 @@ deconstruct deconstruct: functor '\'' arity 0
> deconstruct limited deconstruct 3 of '\''
> functor '\'' arity 0 []
>
> +deconstruct functor: '~'/0
> +deconstruct argument 0 of '~' doesn't exist
> +deconstruct argument 1 of '~' doesn't exist
> +deconstruct argument 2 of '~' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '~' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '~'
> +functor '~' arity 0 []
> +
> deconstruct functor: '\001\'/0
> deconstruct argument 0 of '\001\' doesn't exist
> deconstruct argument 1 of '\001\' doesn't exist
> @@ -306,6 +320,48 @@ deconstruct deconstruct: functor '\177\' arity 0
> deconstruct limited deconstruct 3 of '\177\'
> functor '\177\' arity 0 []
>
> +deconstruct functor: '\200\'/0
> +deconstruct argument 0 of '\200\' doesn't exist
> +deconstruct argument 1 of '\200\' doesn't exist
> +deconstruct argument 2 of '\200\' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '\200\' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '\200\'
> +functor '\200\' arity 0 []
> +
> +deconstruct functor: '\237\'/0
> +deconstruct argument 0 of '\237\' doesn't exist
> +deconstruct argument 1 of '\237\' doesn't exist
> +deconstruct argument 2 of '\237\' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '\237\' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '\237\'
> +functor '\237\' arity 0 []
> +
> +deconstruct functor: ' '/0
> +deconstruct argument 0 of ' ' doesn't exist
> +deconstruct argument 1 of ' ' doesn't exist
> +deconstruct argument 2 of ' ' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor ' ' arity 0
> +[]
> +deconstruct limited deconstruct 3 of ' '
> +functor ' ' arity 0 []
> +
> deconstruct functor: 'Ω'/0
> deconstruct argument 0 of 'Ω' doesn't exist
> deconstruct argument 1 of 'Ω' doesn't exist
> @@ -544,7 +600,7 @@ deconstruct deconstruct: functor newline arity 0
> deconstruct limited deconstruct 3 of '<<predicate>>'
> functor newline arity 0 []
>
> -deconstruct functor: lambda_deconstruct_arg_m_176/1
> +deconstruct functor: lambda_deconstruct_arg_m_182/1
> deconstruct argument 0 of '<<predicate>>' is [1, 2]
> deconstruct argument 1 of '<<predicate>>' doesn't exist
> deconstruct argument 2 of '<<predicate>>' doesn't exist
> @@ -553,10 +609,10 @@ deconstruct argument 'mooo!' doesn't exist
> deconstruct argument 'packed1' doesn't exist
> deconstruct argument 'packed2' doesn't exist
> deconstruct argument 'packed3' doesn't exist
> -deconstruct deconstruct: functor lambda_deconstruct_arg_m_176 arity 1
> +deconstruct deconstruct: functor lambda_deconstruct_arg_m_182 arity 1
> [[1, 2]]
> deconstruct limited deconstruct 3 of '<<predicate>>'
> -functor lambda_deconstruct_arg_m_176 arity 1 [[1, 2]]
> +functor lambda_deconstruct_arg_m_182 arity 1 [[1, 2]]
>
> deconstruct functor: p/3
> deconstruct argument 0 of '<<predicate>>' is 1
> diff --git a/tests/hard_coded/deconstruct_arg.exp2
> b/tests/hard_coded/deconstruct_arg.exp2
> index 349ed1c..bc508fa 100644
> --- a/tests/hard_coded/deconstruct_arg.exp2
> +++ b/tests/hard_coded/deconstruct_arg.exp2
> @@ -264,6 +264,20 @@ deconstruct deconstruct: functor '\'' arity 0
> deconstruct limited deconstruct 3 of '\''
> functor '\'' arity 0 []
>
> +deconstruct functor: '~'/0
> +deconstruct argument 0 of '~' doesn't exist
> +deconstruct argument 1 of '~' doesn't exist
> +deconstruct argument 2 of '~' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '~' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '~'
> +functor '~' arity 0 []
> +
> deconstruct functor: '\001\'/0
> deconstruct argument 0 of '\001\' doesn't exist
> deconstruct argument 1 of '\001\' doesn't exist
> @@ -306,6 +320,48 @@ deconstruct deconstruct: functor '\177\' arity 0
> deconstruct limited deconstruct 3 of '\177\'
> functor '\177\' arity 0 []
>
> +deconstruct functor: '\200\'/0
> +deconstruct argument 0 of '\200\' doesn't exist
> +deconstruct argument 1 of '\200\' doesn't exist
> +deconstruct argument 2 of '\200\' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '\200\' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '\200\'
> +functor '\200\' arity 0 []
> +
> +deconstruct functor: '\237\'/0
> +deconstruct argument 0 of '\237\' doesn't exist
> +deconstruct argument 1 of '\237\' doesn't exist
> +deconstruct argument 2 of '\237\' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor '\237\' arity 0
> +[]
> +deconstruct limited deconstruct 3 of '\237\'
> +functor '\237\' arity 0 []
> +
> +deconstruct functor: ' '/0
> +deconstruct argument 0 of ' ' doesn't exist
> +deconstruct argument 1 of ' ' doesn't exist
> +deconstruct argument 2 of ' ' doesn't exist
> +deconstruct argument 'moo' doesn't exist
> +deconstruct argument 'mooo!' doesn't exist
> +deconstruct argument 'packed1' doesn't exist
> +deconstruct argument 'packed2' doesn't exist
> +deconstruct argument 'packed3' doesn't exist
> +deconstruct deconstruct: functor ' ' arity 0
> +[]
> +deconstruct limited deconstruct 3 of ' '
> +functor ' ' arity 0 []
> +
> deconstruct functor: 'Ω'/0
> deconstruct argument 0 of 'Ω' doesn't exist
> deconstruct argument 1 of 'Ω' doesn't exist
> diff --git a/tests/hard_coded/deconstruct_arg.m
> b/tests/hard_coded/deconstruct_arg.m
> index 88dddd4..4e7f89f 100644
> --- a/tests/hard_coded/deconstruct_arg.m
> +++ b/tests/hard_coded/deconstruct_arg.m
> @@ -130,11 +130,17 @@ main(!IO) :-
> test_all('\v', !IO),
> test_all('\\', !IO),
> test_all('\'', !IO),
> + test_all('~', !IO),
>
> % test C0 control characters
> - test_all('\1\', !IO),
> - test_all('\37\', !IO),
> + test_all('\001\', !IO),
> + test_all('\037\', !IO),
> test_all('\177\', !IO),
> + % test C1 control characters
> + test_all('\200\', !IO),
> + test_all('\237\', !IO),
> + % No-break space (next codepoint after C1 control characters)
> + test_all('\240\', !IO),
>
> % test a character that requires more than one byte in its
> % UTF-8 encoding.
> diff --git a/tests/hard_coded/write.exp b/tests/hard_coded/write.exp
> index f9e16ef..91faf21 100644
> --- a/tests/hard_coded/write.exp
> +++ b/tests/hard_coded/write.exp
> @@ -29,8 +29,11 @@ TESTING BUILTINS
> "Foo%sFoo"
> "\""
> "\a\b\f\t\n\r\v\"\\"
> +"\001\\037\\177\\200\\237\ "
> 'a'
> +'A'
> '&'
> +'\001\'
> '\a'
> '\b'
> '\f'
> @@ -38,9 +41,17 @@ TESTING BUILTINS
> '\n'
> '\r'
> '\v'
> +'\037\'
> +' '
> '\''
> '\\'
> '\"'
> +'~'
> +'\177\'
> +'\200\'
> +'\237\'
> +' '
> +0.0
> 3.14159
> 1.128324983e-21
> 2.23954899e+23
> diff --git a/tests/hard_coded/write.m b/tests/hard_coded/write.m
> index 700f5ee..168e50c 100644
> --- a/tests/hard_coded/write.m
> +++ b/tests/hard_coded/write.m
> @@ -12,6 +12,7 @@
> :- implementation.
>
> :- import_module array.
> +:- import_module char.
> :- import_module float.
> :- import_module int.
> :- import_module list.
> @@ -127,10 +128,14 @@ test_builtins(!IO) :-
> io.write_line("Foo%sFoo", !IO),
> io.write_line("""", !IO), % interesting - prints """ of course
> io.write_line("\a\b\f\t\n\r\v\"\\", !IO),
> + io.write_line("\001\\037\\177\\200\\237\\240\", !IO),
>
> % Test characters.
> io.write_line('a', !IO),
> + io.write_line('A', !IO),
> io.write_line('&', !IO),
> +
> + io.write_line('\001\', !IO), % Second C0 control.
> io.write_line('\a', !IO),
> io.write_line('\b', !IO),
> io.write_line('\f', !IO),
> @@ -138,11 +143,21 @@ test_builtins(!IO) :-
> io.write_line('\n', !IO),
> io.write_line('\r', !IO),
> io.write_line('\v', !IO),
> + io.write_line('\037\', !IO), % Last C0 control.
> + io.write_line(' ', !IO),
> +
> io.write_line('\'', !IO),
> io.write_line(('\\') : character, !IO),
> io.write_line('\"', !IO),
>
> + io.write_line('~', !IO),
> + io.write_line('\177\', !IO), % Delete.
> + io.write_line('\200\', !IO), % First C1 control.
> + io.write_line('\237\', !IO), % Last C1 control.
> + io.write_line('\240\', !IO), % No-break space.
> +
> % Test floats.
> + io.write_line(0.0, !IO),
> io.write_line(3.14159, !IO),
> io.write_line(11.28324983E-22, !IO),
> io.write_line(22.3954899E22, !IO),
>
More information about the reviews
mailing list