[m-rev.] for review: string switches using tries

Paul Bone paul at bone.id.au
Tue Feb 24 20:09:54 AEDT 2015


On Tue, Feb 24, 2015 at 06:18:51PM +1100, Zoltan Somogyi wrote:
> 
> 
> On Tue, 24 Feb 2015 18:04:51 +1100, Paul Bone <paul at bone.id.au> wrote:
> > > string.m presents an encoding agnostic interface.  The implementation
> > > only allows UTF-8 for the C/Erlang backends, and UTF-16 for the Java/C#
> > > backends.
> > 
> > Is it possible to use the target's encoding method when creating the trie?
> 
> Yes, provided it is known. With --cross-compiling and --target=c, you
> wouldn't know whether the target uses ASCII/utf8 or EBCDIC.
> (As we just discussed, --cross-compiling does not specify anything useful
> about the actual target platform.)

Good point.


> > I've been considering re-writing the streams library making Mercury always
> > use one specific encoding, say UTF-8.  Then adding other encodings including
> > the host system's default encoding as a wrappers that take one stream type
> > and returns a new stream type.
> 
> That is a separate use case, one which can afford the cost of representation
> conversions because it is already paying the cost of I/O. Switches don't have
> that luxury. Tries have at most a minor performance advantage over hash switches.
> If they can be made to work in a set of circumstances only by methods that
> add enough overhead to make tries uncompetitive, then there is no point
> in using those methods, even if they work.
> 

I wouldn't suggest using anything like this at runtime, but perhaps it'd be
useful at compile time, when constructing the trie.

If it is known that Mercury always uses utf8 for the string type, then both
parameters and string constants in Mercury would be utf8.  The trie can be
constructed in utf8 and encoding is not an issue at runtime.

However as Peter said Mercury sometimes uses utf16 for the string type, so
the point is currently moot.


-- 
Paul Bone



More information about the reviews mailing list