[m-rev.] for review: Make string.append(out, out, in) work with ill-formed sequences.

Peter Wang novalazy at gmail.com
Wed Oct 23 17:05:49 AEDT 2019


On Wed, 23 Oct 2019 16:30:51 +1100, Mark Brown <mark at mercurylang.org> wrote:
> Hi Peter,
> 
> On Wed, Oct 23, 2019 at 3:02 PM Peter Wang <novalazy at gmail.com> wrote:
> 
> > library/string.m:
> >     Simplify string.append(out, out, in) and make it work sensibly in
> >     the presence of ill-formed code unit sequences, breaking the input
> >     string after each code point or code unit in an ill-formed sequence.
> >
> 
> This doesn't match the forwards mode, which can join together two
> ill-formed sequences to make a valid code point :-(

Yes, I see.

> I can think of two changes to the declarative semantics that could resolve
> this:
> 
> 1. Disallow the case where we make a valid code point by appending (some
> part of) an ill-formed sequence at the end of the first argument with one
> at the start of the second argument.
> 
> 2. Disallow _any_ ill-formed sequence at the start of the second argument.
> 
> The latter would affect more programs, but is probably better as the test
> would be more efficient in the commonly-used forwards mode.
> 
> Whatever the case, the documentation should clarify the semantics.

I can accept requiring a separate predicate in the case someone actually
needs to join two ill-formed sequences to form a valid code point,
which surely would be very rare.

How about deprecating and removing the nondet mode of string.append?
It can be supported as a separate predicate.

Peter


More information about the reviews mailing list