[m-rev.] for review: utf-8 improvements

Julien Fischer juliensf at csse.unimelb.edu.au
Mon Mar 26 14:58:26 AEDT 2012


On Mon, 26 Mar 2012, Peter Wang wrote:

> If necessary I'll submit just the bug fixes separately for 11.07.

I don't see any reason for the whole diff not to go on to the 11.07
branch.

> ---
>
> Branches: main, 11.07
>
> Optimise some UTF-8 routines in C grades and fix a few bugs.
>
> library/string.m:
> 	Avoid function calls in unsafe_index, unsafe_index_next, and
> 	unsafe_prev_index in the ASCII case.
>
> 	Handle illegal code unit at start of string in first_char(in, uo, in)
> 	and first_char(in, uo, uo) modes.
>
> runtime/mercury_string.c:
> runtime/mercury_string.h:
> 	Fix a bug where MR_utf8_next would not advance from pos 0.  Fortunately
> 	MR_utf8_next is only rarely called, to skip past illegal code units.
>
> 	Delete redundant initial test in MR_utf8_prev.
>
> 	Add MR_utf8_get_mb to extract multibyte code points only.
> 	Unroll a loop.
>
> 	Add MR_utf8_get_next_mb to extract multibyte code points only.
>
> 	Make MR_utf8_prev_get avoid an extra function call in the ASCII case.
>
> 	Use MR_Integer consistently for string offsets instead of int.

That looks fine.

Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
Post messages to:       mercury-reviews at csse.unimelb.edu.au
Administrative Queries: owner-mercury-reviews at csse.unimelb.edu.au
Subscriptions:          mercury-reviews-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the reviews mailing list