[m-rev.] for review: string switches using tries

Zoltan Somogyi zoltan.somogyi at runbox.com
Tue Feb 24 16:00:07 AEDT 2015



On Tue, 24 Feb 2015 12:32:48 +1100, Peter Wang <novalazy at gmail.com> wrote:
> The implementation will not work in general if the host compiler and the
> target differ in the string encoding, e.g. the compiler uses UTF-8 but
> the target uses UTF-16.
> 
> The fix should only require that we replace string.{to,from}_code_unit_list
> with functions that deal in the code units of the TARGET string encoding,
> and build tries from that.  The standard library does not yet have
> string.{to,from}_{utf8,utf16}_code_unit_list so, for now, the safe option
> is to disable the trie implementation when the string encodings differ.

Agreed. I have have added a line that disables the use of tries
if --cross-compiling is set. However, I believe the existing code
in string.m that deals with Unicode, which I think you wrote,
assumes utf8.

> > --- a/compiler/c_util.m
> > +++ b/compiler/c_util.m
> > @@ -146,14 +146,17 @@
> >  
> >  :- type binop_category
> >      --->    array_index_binop
> > +    ;       string_index_binop
> >      ;       pointer_compare_binop
> >      ;       compound_compare_binop
> > +    ;       offset_string_compare_binop(int)
> > +    ;       general_string_compare_binop
> >      ;       string_compare_binop
> 
> Are both general_string_compare_binop and string_compare_binop used?

Yes. The former has just one op, strcmp, which returns -ve/0/+ve,
while the other has six ops, eq/ne/lt/le/gt/ge, which return bools.
This is one reason why I think bunding ops into categories and then
treating every op in each category the same isn't a good idea.
 
> > +                NullCodeUnit = 0,    % Match the terminating NULL character.
> 
> s/NULL/NUL  or  s/NULL/null
> 
> This won't work for backends which do not use a NUL terminator.

No, it won't, but those systems are practically extinct, and I don't think
we support Mercury on any of them. Our existing code for many operations
on strings, e.g. string.from_char_list won't work on them, since they
assume null terminated strings.

If you want to add a test to configure.ac to reject systems that
use other string representations, go ahead.

I followed your other suggestions. Thanks for the review.

Zoltan.





More information about the reviews mailing list