[m-rev.] for review: string switches using tries

Paul Bone paul at bone.id.au
Tue Feb 24 18:04:51 AEDT 2015


On Tue, Feb 24, 2015 at 04:43:57PM +1100, Peter Wang wrote:
> On Tue, 24 Feb 2015 16:00:07 +1100 (EST), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> > 
> > 
> > On Tue, 24 Feb 2015 12:32:48 +1100, Peter Wang <novalazy at gmail.com> wrote:
> > > The implementation will not work in general if the host compiler and the
> > > target differ in the string encoding, e.g. the compiler uses UTF-8 but
> > > the target uses UTF-16.
> > > 
> > > The fix should only require that we replace string.{to,from}_code_unit_list
> > > with functions that deal in the code units of the TARGET string encoding,
> > > and build tries from that.  The standard library does not yet have
> > > string.{to,from}_{utf8,utf16}_code_unit_list so, for now, the safe option
> > > is to disable the trie implementation when the string encodings differ.
> > 
> > Agreed. I have have added a line that disables the use of tries
> > if --cross-compiling is set. However, I believe the existing code
> > in string.m that deals with Unicode, which I think you wrote,
> > assumes utf8.
> 
> string.m presents an encoding agnostic interface.  The implementation
> only allows UTF-8 for the C/Erlang backends, and UTF-16 for the Java/C#
> backends.

Is it possible to use the target's encoding method when creating the trie?

I've been considering re-writing the streams library making Mercury always
use one specific encoding, say UTF-8.  Then adding other encodings including
the host system's default encoding as a wrappers that take one stream type
and returns a new stream type.


-- 
Paul Bone



More information about the reviews mailing list