[m-rev.] for review: string switches using tries

Zoltan Somogyi zoltan.somogyi at runbox.com
Tue Feb 24 18:18:51 AEDT 2015



On Tue, 24 Feb 2015 18:04:51 +1100, Paul Bone <paul at bone.id.au> wrote:
> > string.m presents an encoding agnostic interface.  The implementation
> > only allows UTF-8 for the C/Erlang backends, and UTF-16 for the Java/C#
> > backends.
> 
> Is it possible to use the target's encoding method when creating the trie?

Yes, provided it is known. With --cross-compiling and --target=c, you
wouldn't know whether the target uses ASCII/utf8 or EBCDIC.
(As we just discussed, --cross-compiling does not specify anything useful
about the actual target platform.)

> I've been considering re-writing the streams library making Mercury always
> use one specific encoding, say UTF-8.  Then adding other encodings including
> the host system's default encoding as a wrappers that take one stream type
> and returns a new stream type.

That is a separate use case, one which can afford the cost of representation
conversions because it is already paying the cost of I/O. Switches don't have
that luxury. Tries have at most a minor performance advantage over hash switches.
If they can be made to work in a set of circumstances only by methods that
add enough overhead to make tries uncompetitive, then there is no point
in using those methods, even if they work.

Zoltan.





More information about the reviews mailing list