[m-rev.] for review: Make string.append(out, out, in) work with ill-formed sequences.
Peter Wang
novalazy at gmail.com
Wed Oct 23 17:05:49 AEDT 2019
On Wed, 23 Oct 2019 16:30:51 +1100, Mark Brown <mark at mercurylang.org> wrote:
> Hi Peter,
>
> On Wed, Oct 23, 2019 at 3:02 PM Peter Wang <novalazy at gmail.com> wrote:
>
> > library/string.m:
> > Simplify string.append(out, out, in) and make it work sensibly in
> > the presence of ill-formed code unit sequences, breaking the input
> > string after each code point or code unit in an ill-formed sequence.
> >
>
> This doesn't match the forwards mode, which can join together two
> ill-formed sequences to make a valid code point :-(
Yes, I see.
> I can think of two changes to the declarative semantics that could resolve
> this:
>
> 1. Disallow the case where we make a valid code point by appending (some
> part of) an ill-formed sequence at the end of the first argument with one
> at the start of the second argument.
>
> 2. Disallow _any_ ill-formed sequence at the start of the second argument.
>
> The latter would affect more programs, but is probably better as the test
> would be more efficient in the commonly-used forwards mode.
>
> Whatever the case, the documentation should clarify the semantics.
I can accept requiring a separate predicate in the case someone actually
needs to join two ill-formed sequences to form a valid code point,
which surely would be very rare.
How about deprecating and removing the nondet mode of string.append?
It can be supported as a separate predicate.
Peter
More information about the reviews
mailing list