[m-rev.] for review: Define behaviour of string.to_char_list (and rev) on ill-formed sequences.

Mark Brown mark at mercurylang.org
Thu Oct 17 04:24:38 AEDT 2019


On Thu, Oct 17, 2019 at 3:52 AM Mark Brown <mark at mercurylang.org> wrote:
>
> Hmm, the reverse mode is not det, as U+FFFD relates to every possible ill-formed sequence (as well as to the correctly formed replacement char). The existing implementation is similarly incorrect. I would suggest leaving these as is and defining new functions to convert to/from char lists (in addition to ones you proposed to convert to char_or_code_unit).

In fact, since the signature of this predicate implies the ability to
round-trip strings, maybe it should implement your other proposal to
inject utf8 code units into the surrogate range after all. While I'm
not convinced it's a good idea to do that generally, it's probably the
least bad option for existing code that uses to_char_list/2.

Mark


More information about the reviews mailing list