[m-rev.] for review: Deprecate modes of string predicates that imply round-trippability.

Mark Brown mark at mercurylang.org
Wed Oct 23 15:45:58 AEDT 2019


Hi Peter,

On Wed, Oct 23, 2019 at 12:54 PM Peter Wang <novalazy at gmail.com> wrote:
>
> Mark pointed out that to_char_list/2 having multiple modes implies the
> ability to round trip convert between a string and list of chars,
> which is not true if to_char_list replaces code units in ill-formed
> sequences with U+FFFD; converting the list of chars back to a string
> may produce a different string from the original input.
>
> library/string.m:
>     Deprecate reverse modes of to_char_list/2, to_rev_char_list/2,
>     from_char_list/2 and char_to_string/2.

Not the last one, but I see you already noticed that.

> Add commented out
>     `obsolete_proc' pragmas to be enabled at a later date.
>
>     Add comment about a future change to char_to_string.
>
>     Implement char_to_string/2 without using the multiple moded
>     to_char_list/2.
>
>     Delete the unused Mercury implementation of string.append/3
>     that depends on multi-moded to_char_list/2. The implementation is
>     incorrect anyway in the presence of ill-formed code unit sequences.
>
> compiler/old_type_constraints.m:
> compiler/typecheck.m:
>     Replace use of deprecated mode of char_to_string/2.

Likewise not needed.

>
> NEWS:
>     Announce changes.
> ---
>  NEWS                            |  7 +++++
>  compiler/old_type_constraints.m |  3 +-
>  compiler/typecheck.m            |  3 +-
>  library/string.m                | 55 ++++++++++++++++++---------------
>  4 files changed, 41 insertions(+), 27 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index a5e1c887a..ba69b8d16 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -428,6 +428,13 @@ Changes to the Mercury standard library:
>     - compare_ignore_case_ascii/3
>     - to_rev_char_list/2
>
> +  The following procedures in the string module have been deprecated:
> +
> +   - to_char_list(uo, in)
> +   - to_rev_char_list(uo, in)
> +   - from_char_list(out, in)
> +   - char_to_string(out, in)

And here.


> @@ -5726,8 +5726,13 @@ det_to_float(FloatString) = Float :-
>  char_to_string(C) = S1 :-
>      char_to_string(C, S1).
>
> -char_to_string(Char, String) :-
> -    to_char_list(String, [Char]).
> +:- pragma promise_equivalent_clauses(char_to_string/2).
> +
> +char_to_string(Char::in, String::uo) :-
> +    from_char_list([Char], String).
> +char_to_string(Char::out, String::in) :-
> +    string.index_next(String, 0, NextIndex, Char),
> +    string.length(String, NextIndex).
>

Does this work as intended? If the string contains an ill-formed
sequence of one code unit, won't we get NextIndex = 1 and Char =
U+FFFD? One of the new predicates you have proposed ought to be useful
here.

Mark


More information about the reviews mailing list