[m-rev.] for review: Deprecate modes of string predicates that imply round-trippability.
Mark Brown
mark at mercurylang.org
Wed Oct 23 15:45:58 AEDT 2019
Hi Peter,
On Wed, Oct 23, 2019 at 12:54 PM Peter Wang <novalazy at gmail.com> wrote:
>
> Mark pointed out that to_char_list/2 having multiple modes implies the
> ability to round trip convert between a string and list of chars,
> which is not true if to_char_list replaces code units in ill-formed
> sequences with U+FFFD; converting the list of chars back to a string
> may produce a different string from the original input.
>
> library/string.m:
> Deprecate reverse modes of to_char_list/2, to_rev_char_list/2,
> from_char_list/2 and char_to_string/2.
Not the last one, but I see you already noticed that.
> Add commented out
> `obsolete_proc' pragmas to be enabled at a later date.
>
> Add comment about a future change to char_to_string.
>
> Implement char_to_string/2 without using the multiple moded
> to_char_list/2.
>
> Delete the unused Mercury implementation of string.append/3
> that depends on multi-moded to_char_list/2. The implementation is
> incorrect anyway in the presence of ill-formed code unit sequences.
>
> compiler/old_type_constraints.m:
> compiler/typecheck.m:
> Replace use of deprecated mode of char_to_string/2.
Likewise not needed.
>
> NEWS:
> Announce changes.
> ---
> NEWS | 7 +++++
> compiler/old_type_constraints.m | 3 +-
> compiler/typecheck.m | 3 +-
> library/string.m | 55 ++++++++++++++++++---------------
> 4 files changed, 41 insertions(+), 27 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index a5e1c887a..ba69b8d16 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -428,6 +428,13 @@ Changes to the Mercury standard library:
> - compare_ignore_case_ascii/3
> - to_rev_char_list/2
>
> + The following procedures in the string module have been deprecated:
> +
> + - to_char_list(uo, in)
> + - to_rev_char_list(uo, in)
> + - from_char_list(out, in)
> + - char_to_string(out, in)
And here.
> @@ -5726,8 +5726,13 @@ det_to_float(FloatString) = Float :-
> char_to_string(C) = S1 :-
> char_to_string(C, S1).
>
> -char_to_string(Char, String) :-
> - to_char_list(String, [Char]).
> +:- pragma promise_equivalent_clauses(char_to_string/2).
> +
> +char_to_string(Char::in, String::uo) :-
> + from_char_list([Char], String).
> +char_to_string(Char::out, String::in) :-
> + string.index_next(String, 0, NextIndex, Char),
> + string.length(String, NextIndex).
>
Does this work as intended? If the string contains an ill-formed
sequence of one code unit, won't we get NextIndex = 1 and Char =
U+FFFD? One of the new predicates you have proposed ought to be useful
here.
Mark
More information about the reviews
mailing list