[m-rev.] for review: string switches using tries

Peter Wang novalazy at gmail.com
Wed Feb 25 10:53:54 AEDT 2015


On Tue, 24 Feb 2015 20:09:54 +1100, Paul Bone <paul at bone.id.au> wrote:
> On Tue, Feb 24, 2015 at 06:18:51PM +1100, Zoltan Somogyi wrote:
> > 
> > 
> > On Tue, 24 Feb 2015 18:04:51 +1100, Paul Bone <paul at bone.id.au> wrote:
> > > > string.m presents an encoding agnostic interface.  The implementation
> > > > only allows UTF-8 for the C/Erlang backends, and UTF-16 for the Java/C#
> > > > backends.
> > > 
> > > Is it possible to use the target's encoding method when creating the trie?
> > 
> > Yes, provided it is known. With --cross-compiling and --target=c, you
> > wouldn't know whether the target uses ASCII/utf8 or EBCDIC.
> > (As we just discussed, --cross-compiling does not specify anything useful
> > about the actual target platform.)
> 
> Good point.

For the forseeable future, each target language has one possible string
encoding.  If that ever changes (preferably never) the compiler would
just need to be told explicitly what to target.

> > > I've been considering re-writing the streams library making Mercury always
> > > use one specific encoding, say UTF-8.  Then adding other encodings including
> > > the host system's default encoding as a wrappers that take one stream type
> > > and returns a new stream type.
> > 
> > That is a separate use case, one which can afford the cost of representation
> > conversions because it is already paying the cost of I/O. Switches don't have
> > that luxury. Tries have at most a minor performance advantage over hash switches.
> > If they can be made to work in a set of circumstances only by methods that
> > add enough overhead to make tries uncompetitive, then there is no point
> > in using those methods, even if they work.
> > 
> 
> I wouldn't suggest using anything like this at runtime, but perhaps it'd be
> useful at compile time, when constructing the trie.
> 
> If it is known that Mercury always uses utf8 for the string type, then both
> parameters and string constants in Mercury would be utf8.  The trie can be
> constructed in utf8 and encoding is not an issue at runtime.
> 
> However as Peter said Mercury sometimes uses utf16 for the string type, so
> the point is currently moot.

There is little or no benefit to using UTF-8 over UTF-16 internally over
all backends.  On the other hand there is a major benefit to aligning
the Mercury string representation with the "native" string type, if it
exists.

Peter



More information about the reviews mailing list