[m-dev.] MSVC and --no-smart-indexing

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Feb 9 02:36:19 AEDT 2015



On Sun, 8 Feb 2015 23:34:00 +1100, Paul Bone <paul at bone.id.au> wrote:
> We disabled smart indexing for the source distribution after a problem was
> discovered the with the computed goto code on low level backends:
> 
>     http://www.mercurylang.org/list-archives/users/2010-June/005155.html
>     https://www.mercurylang.org/bugs/view.php?id=115
> 
> At the time we suspected that some code generation doesn't understand that
> the target only has 32 bits per word while the host has 64 bits per word.
> We could either fix this directly or re-enable smart indexing but cause
> --cross-compiling to disable computed gotos.

The former seems preferable.

I don't have any convenient way to test cross-compilation between
32 and 64 bit systems, but if someone does, I could generate diffs
for them to test.

> After searching ml_switch_gen.m and other files it doesn't appear if
> computed gotos are used at all in the high level backends.

Well, whether they are used or not depends on whether you consider
a C switch to use computed gotos. There is no computed goto emitted
by the Mercury compiler, but a C compiler may well use one to implement
the switch.

The problem may be that the hashing function may hash the same string
to different hash values on platforms with different word sizes. That would
mean that a .c file containing hash table constructed on one platform,
if compiler on another platform, have the right entries in the wrong slots.

This should be fixable by having the hash functions mask out bits
above e.g. 2^30 on all platforms.

> Can anyone think of any other problems with re-enabling smart
> indexing for the source distribution?

The above is one.

> Slightly off-topic but there is perhaps a better way to generate code for
> large string switches.  A string switch could be decomposed into a nested
> series of switches on each character (like a radix-search).

Those are called tries. Basically, when you are doing a switch on e.g.

     aaaa -> arm1
     abbb -> arm2
     bbb   -> arm3

you want a trie like this:

switch (str[0]) {
    case 'a':
        switch (str[1]) {
            case 'a':
                if (strcmp(str+2, "aa") == 0)
                    arm1
                else
                    fail;
                break;
            case 'b':
                if (strcmp(str+2, "bb") == 0)
                    arm2
                else
                    fail;
                break;
            default:
                fail;
        }
        break;
    case 'b':
        if (strcmp(str+1, "bb") == 0)
            arm3
        else
            fail;
        break;
    default:
        fail;
}

(In real life, you would need explicit bounds checks on the
lookups of str[N], and you would need engineering to reduce
the cost of those checks to the minimum.)

The interesting question is: how do you represent code such as
strcmp(str+N, "...") in the MLDS? You have to represent it somehow,
because doing a switch for every character EVEN AFTER you know
that there is only one possible match would be pretty slow.

If you materialize the first operand, then the memory allocation
required for that will destroy whatever efficiency you gained from
using a trie, and then some. However, if you don't, then
you need to add a new primitive operator representing
string-comparison-with-offset-for-first-arg. For just one
use-case, that seems a bit inelegant, and I don't know
whether that operation can be implemented EFFICIENTLY
in C#, Java, or Erlang. If you do know, please chime in.

> Of course this
> could also generate very large amounts of code which might cause different
> issues for C compilers.  Also has gperf been considered, either directly or
> by re-implementing it's algorithms in Mercury?

Considered and rejected, as too much work relative to the gain.

For us, gperf would be SO much more useful if its output was a raw table,
instead of C or C++ code. When I considered the question, I figured
I would wait until its developers provided that as an option, since that
would MUCH easier than the scraping the info we want out of C code text,
or reimplementing the algorithm. However, I just checked, and they
still don't have that capability, although it SHOULD be trivial for them
to provide it.

Of course, whatever Mercury does, the MSVC compiler should be
fixed to handle the code we now generate. Has anyone filed a bug
report about this? Or is this one of those things where MSVC's
behavior is broken-as-designed? It is ridiculous to have a compiler
maintained by a few people (mmc) being more robust in its handling
of large programs than a compiler maintained by a huge corporation.

Zoltan.





More information about the developers mailing list