[m-rev.] for review: Reduce memory allocation in string.to_upper, string.to_lower.

Peter Wang novalazy at gmail.com
Thu Jun 23 14:23:52 AEST 2016


On Thu, 23 Jun 2016 12:06:06 +1000 (AEST), Julien Fischer <jfischer at opturion.com> wrote:
> 
> Hi Peter,
> 
> On Mon, 20 Jun 2016, Peter Wang wrote:
> 
> > library/string.m:
> > 	Implement to_upper(in, uo) and to_lower(in, uo) with foreign
> > 	code, not creating intermediate character lists.
> >
> > 	Implement to_upper(in, in) and to_lower(in, in) modes without
> > 	allocating memory.
> >
> > 	Be more specific in documentation about which characters are
> > 	affected by some functions/predicates.
> 
> ...
> 
> > diff --git a/library/string.m b/library/string.m
> > index 55ef8b3..c13f7d4 100644
> > --- a/library/string.m
> > +++ b/library/string.m
> > @@ -768,19 +768,19 @@
> > %
> >
> >     % Convert the first character (if any) of a string to uppercase.
> > -    % Note that this only converts unaccented Latin letters.
> > +    % Note that this only converts letters (a-z) in the ASCII range.
> >     %
> 
> It may be worth extending that comment to say that base letters that lie
> in the ASCII range in strings containing combining characters will also
> be converted, for example:
> 
>      io.write_string("a\u0301\n", !IO)       ==> á
>      io.write_string(to_upper("a\u0301\n")   ==> Á

Here's an attempt at the wording:

to_upper

    Converts a string to uppercase.
    Only letters (a-z) in the ASCII range are converted.

    This function transforms each code point individually.
    Letters that occur within a combining sequence will be converted,
    whereas the precomposed character equivalent to the combining
    sequence would not be converted. For example:

	to_upper("a\u0301") ==> "A\u0301"   % á decomposed
	to_upper("\u00e1")  ==> "\u00e1"    % á precomposed

Peter


More information about the reviews mailing list