[m-users.] Confused by action of string.prefix_length

Sean Charles (emacstheviking) objitsu at gmail.com
Wed Jun 8 17:32:23 AEST 2022


OK, well the explanation makes sense, I guess in my head I had not fully comprehended what a code point truly is, not being that accustomed to that terminology myself.

At first (naive) reading I ASSUMED it would just return the -visible character count- i.e. the, dare I say it, intuitive value of 3, rather than the surgically precise value of 6!

I don’t think my code will be wrong as I am using the other string predicates to slice and dice and as Peter said, they too deal in code points so everything should just work out as expected.

Thanks again, that was a most interesting moment, to have ones concrete expectation blowing up in ones face like that!
A true Zen like learning moment!

Mercury contintues to delight!

:)

Thanks again everybody,
Sean.


> On 8 Jun 2022, at 03:09, Peter Wang <novalazy at gmail.com> wrote:
> 
> On Wed, 08 Jun 2022 11:25:46 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com <mailto:zoltan.somogyi at runbox.com>> wrote:
>> 
>> 2022-06-08 11:13 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
>>> The tildes in your email are U+02DC SMALL TILDE (˜).
>>> Each U+02DC takes two UTF-8 code units (i.e. bytes) to encode, and
>>> string.prefix_length returns the length of the prefix it finds
>>> in terms of code units.
>> 
>> I suppose the obvious next question is: *why* does it return the length
>> in terms of code units, or rather, *only* in terms of code units?
>> 
>> The documentation of prefix_length is:
>> 
>> % The length (in code units) of the maximal prefix of String consisting
>> % entirely of code points satisfying Pred.
>> 
>> which gives no justification for this choice.
> 
> The string module works in terms of code units for the most part.
> Apart from that, I expected (then and now) that the most common use
> for the result of prefix_length would be to skip past the prefix.
> 
>> Obviously, stepping past a given prefix is easier if you know its length
>> in code units, but sometimes, you may want to know how many code
>> points Pred has succeeded for. We could add a version of prefix_length
>> (and suffix_length) that either computes the length just in code points,
>> or in both code units and points. The latter would have to a predicate,
>> with a name such as prefix_lengths (note the plural). Since I rarely work
>> with Unicode, I don't know which would be more useful. Opinions?
> 
> I think counting code points is rarely useful but if you wanted to add
> something, a predicate that returns both would make sense
> as the implementation would be keeping track of both anyway.
> 
> Peter
> _______________________________________________
> users mailing list
> users at lists.mercurylang.org <mailto:users at lists.mercurylang.org>
> https://lists.mercurylang.org/listinfo/users <https://lists.mercurylang.org/listinfo/users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20220608/541b0c7c/attachment-0001.html>


More information about the users mailing list