[m-dev.] MSVC and --no-smart-indexing

Paul Bone paul at bone.id.au
Mon Feb 9 10:32:12 AEDT 2015


On Mon, Feb 09, 2015 at 02:36:19AM +1100, Zoltan Somogyi wrote:
> 
> 
> On Sun, 8 Feb 2015 23:34:00 +1100, Paul Bone <paul at bone.id.au> wrote:
> > We disabled smart indexing for the source distribution after a problem was
> > discovered the with the computed goto code on low level backends:
> > 
> >     http://www.mercurylang.org/list-archives/users/2010-June/005155.html
> >     https://www.mercurylang.org/bugs/view.php?id=115
> > 
> > At the time we suspected that some code generation doesn't understand that
> > the target only has 32 bits per word while the host has 64 bits per word.
> > We could either fix this directly or re-enable smart indexing but cause
> > --cross-compiling to disable computed gotos.
> 
> The former seems preferable.
> 
> I don't have any convenient way to test cross-compilation between
> 32 and 64 bit systems, but if someone does, I could generate diffs
> for them to test.

I can test this.

> > After searching ml_switch_gen.m and other files it doesn't appear if
> > computed gotos are used at all in the high level backends.
> 
> Well, whether they are used or not depends on whether you consider
> a C switch to use computed gotos. There is no computed goto emitted
> by the Mercury compiler, but a C compiler may well use one to implement
> the switch.

For the purposes of my problem and goals I'm only thinking of computed gotos
emitted by the Mercury compiler.

> The problem may be that the hashing function may hash the same string
> to different hash values on platforms with different word sizes. That would
> mean that a .c file containing hash table constructed on one platform,
> if compiler on another platform, have the right entries in the wrong slots.
> 
> This should be fixable by having the hash functions mask out bits
> above e.g. 2^30 on all platforms.

That's easy enough to do and except for the most extreme cases it won't
cause any significant issues.

> > Can anyone think of any other problems with re-enabling smart
> > indexing for the source distribution?
> 
> The above is one.
> 
> > Slightly off-topic but there is perhaps a better way to generate code for
> > large string switches.  A string switch could be decomposed into a nested
> > series of switches on each character (like a radix-search).
> 
> Those are called tries. Basically, when you are doing a switch on e.g.
> 
>      aaaa -> arm1
>      abbb -> arm2
>      bbb   -> arm3
> 
> you want a trie like this:
> 
> switch (str[0]) {
>     case 'a':
>         switch (str[1]) {
>             case 'a':
>                 if (strcmp(str+2, "aa") == 0)
>                     arm1
>                 else
>                     fail;
>                 break;
>             case 'b':
>                 if (strcmp(str+2, "bb") == 0)
>                     arm2
>                 else
>                     fail;
>                 break;
>             default:
>                 fail;
>         }
>         break;
>     case 'b':
>         if (strcmp(str+1, "bb") == 0)
>             arm3
>         else
>             fail;
>         break;
>     default:
>         fail;
> }
> 
> (In real life, you would need explicit bounds checks on the
> lookups of str[N], and you would need engineering to reduce
> the cost of those checks to the minimum.)

Since strings are null terminated we should be able to handle this with
cases that detect the null byte.

> The interesting question is: how do you represent code such as
> strcmp(str+N, "...") in the MLDS? You have to represent it somehow,
> because doing a switch for every character EVEN AFTER you know
> that there is only one possible match would be pretty slow.

I considered implementing this as a HLDS->HLDS transformation, then it would
also work on all backends that don't provide their own switch on strings.
However it will only work well if there are some string primates that will
return the Nth character without testing the length of the string on every
call, that return the "rest" of the string without copying it (we might have
this already), and that will return the null byte at the end of the string
and allow our switch code to test to see if the byte is null.  Also this
won't support unicode without some extra effort.

Using cases for the null bytes at the end of the string will allow us to
easily encode a switch over the strings such as.

    aa
    ab
    a
    b

> 
> If you materialize the first operand, then the memory allocation
> required for that will destroy whatever efficiency you gained from
> using a trie, and then some. However, if you don't, then
> you need to add a new primitive operator representing
> string-comparison-with-offset-for-first-arg. For just one
> use-case, that seems a bit inelegant, and I don't know
> whether that operation can be implemented EFFICIENTLY
> in C#, Java, or Erlang. If you do know, please chime in.
>

I think that in Java this will always allocate new memory.

> > Of course this
> > could also generate very large amounts of code which might cause different
> > issues for C compilers.  Also has gperf been considered, either directly or
> > by re-implementing it's algorithms in Mercury?
> 
> Considered and rejected, as too much work relative to the gain.
> 
> For us, gperf would be SO much more useful if its output was a raw table,
> instead of C or C++ code. When I considered the question, I figured
> I would wait until its developers provided that as an option, since that
> would MUCH easier than the scraping the info we want out of C code text,
> or reimplementing the algorithm. However, I just checked, and they
> still don't have that capability, although it SHOULD be trivial for them
> to provide it.
> 
> Of course, whatever Mercury does, the MSVC compiler should be
> fixed to handle the code we now generate. Has anyone filed a bug
> report about this? Or is this one of those things where MSVC's
> behavior is broken-as-designed? It is ridiculous to have a compiler
> maintained by a few people (mmc) being more robust in its handling
> of large programs than a compiler maintained by a huge corporation.

It may have already been fixed in more recent versions of MSVC.  I don't
have access to these so I don't know.  If not then I would guess that they
would laugh at us for being so small but claiming that our compiler is more
robust.  That said, our compiler is probably exercised less than there's.
Apart from G12 (or whatever it is these days) and PrinceXML I don't think
Mercury has to deal with machine generated code very often.

Anyway, I expect nothing to happen if I report this so I don't think it's
worth my time.

-- 
Paul Bone



More information about the developers mailing list