[m-rev.] for review: string switches using tries
Zoltan Somogyi
zoltan.somogyi at runbox.com
Tue Feb 24 16:00:07 AEDT 2015
On Tue, 24 Feb 2015 12:32:48 +1100, Peter Wang <novalazy at gmail.com> wrote:
> The implementation will not work in general if the host compiler and the
> target differ in the string encoding, e.g. the compiler uses UTF-8 but
> the target uses UTF-16.
>
> The fix should only require that we replace string.{to,from}_code_unit_list
> with functions that deal in the code units of the TARGET string encoding,
> and build tries from that. The standard library does not yet have
> string.{to,from}_{utf8,utf16}_code_unit_list so, for now, the safe option
> is to disable the trie implementation when the string encodings differ.
Agreed. I have have added a line that disables the use of tries
if --cross-compiling is set. However, I believe the existing code
in string.m that deals with Unicode, which I think you wrote,
assumes utf8.
> > --- a/compiler/c_util.m
> > +++ b/compiler/c_util.m
> > @@ -146,14 +146,17 @@
> >
> > :- type binop_category
> > ---> array_index_binop
> > + ; string_index_binop
> > ; pointer_compare_binop
> > ; compound_compare_binop
> > + ; offset_string_compare_binop(int)
> > + ; general_string_compare_binop
> > ; string_compare_binop
>
> Are both general_string_compare_binop and string_compare_binop used?
Yes. The former has just one op, strcmp, which returns -ve/0/+ve,
while the other has six ops, eq/ne/lt/le/gt/ge, which return bools.
This is one reason why I think bunding ops into categories and then
treating every op in each category the same isn't a good idea.
> > + NullCodeUnit = 0, % Match the terminating NULL character.
>
> s/NULL/NUL or s/NULL/null
>
> This won't work for backends which do not use a NUL terminator.
No, it won't, but those systems are practically extinct, and I don't think
we support Mercury on any of them. Our existing code for many operations
on strings, e.g. string.from_char_list won't work on them, since they
assume null terminated strings.
If you want to add a test to configure.ac to reject systems that
use other string representations, go ahead.
I followed your other suggestions. Thanks for the review.
Zoltan.
More information about the reviews
mailing list