[m-rev.] for review/discussion: ill-formed sequences in Unicode strings
Peter Wang
novalazy at gmail.com
Wed Sep 11 18:00:17 AEST 2019
On Wed, 11 Sep 2019 17:40:39 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
>
> Ok, I'm fine with that. What I do want is that an identifible subset of
> the string module (and other related modules) that will allow me to (a)
> (optionally) validate that a string is well-formed and (b) preserve that
> well-formedness. I don't mind if predicates in the string module also
> handle ill-formed subsequences (unless there is a significant overhead
> in doing so).
Great.
We should definitely add a string.verify_encoding predicate soon.
How is that for a name?
>
> What actually happens now if you have non-UTF-8 characters in a comment?
As of the lexer changes this year, it acts as if the file is truncated
at the first ill-formed sequence.
Peter
More information about the reviews
mailing list