[m-dev.] a conditional field update operator

Zoltan Somogyi zoltan.somogyi at runbox.com
Wed Mar 8 17:53:30 AEDT 2017



On Wed, 8 Mar 2017 16:55:20 +1100, Paul Bone <paul at bone.id.au> wrote:
> > > Depending on the code that creates the new value (X), it might be more
> > > appropriate to use a normal equality test.  It probably makes sense to
> > > support both equality tests.
> > 
> > pointer_equal is a builtin that compares its operands as bit vectors;
> > basically, it checks if the update would yield a new structure that is
> > bit-identical to the old one. I don't see any situation in which it is *not*
> > a correct test to use. (For integer and boolean fields, we often use
> > New = Old as the test, but only because it is easier to type and to read;
> > it is semantically identical to the pointer_equal test.)
> 
> Yes, it is the most conservative test.
> 
> However an equality test will catch items that are semantically equal, even
> if not structurally or pointer equal, and avoid the memory allocation in
> those situations also.  This may be what people want, but possibly less
> often, although it might not be correct in some cases.

I thought you meant using normal equality tests only for atomic types,
the way we hand-implement conditional update in the Mercury compiler now.
I now see you are proposing it for nonatomic type as well.

For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
of accidentally using deep equality instead of pointer equality. Then it may be
that 0.5% of the time you compare two 100-byte structures for equality
instead of two pointers, and 0.5% of the time you compare two 10 megabyte
structures for equality instead of two pointers. The cost of that second 0.5%
will be *way* more than what can be possibly justified by the amount of allocation
being saved.

In general, the cost of an allocation is sort-of linear in the amount being
allocated, and therefore the fixed size of the cell that the update applies to
is a sort-of natural bound for the cost of allocation. On the other hand,
the cost of a deep equality test HAS no bound. Paying an unbounded cost
for a bounded gain is not a good general strategy.

> > The reasons why I am proposing a new operator is precisely because
> > 
> > - this is the nth time we have found a need for it in the compiler,
> >   for largish values of n, and
> > 
> > - I think it is likely that in many cases, it is only the size and (at the moment)
> >   relatively hard-to-read nature of the five-line version that prevents it
> >   from being used instead of the one-line, always-allocate-a-new-structure
> >   version. For example, the compiler has many passes that transform
> >   parts of the HLDS *slightly*, leaving many or most parts of it intact.
> >   If we had the new operator, we could use it to significantly reduce
> >   the amount of memory being allocated by such passes.
> 
> Mmm, true.  I hadn't considered the cases where I _don't_ use such a
> pattern, even though I could/should.

I intended my original mail specifically to ask you guys to think about
*potential future* uses of conditional update, not just actual, current uses.
I see I should have made that more explicit.

> > Is your argument that you think such "rebuild with some rare modifications"
> > operations are rare in general? Mike's previous mail seems to imply that
> > he thinks just the opposite.
> 
> I had _assumed_ that they were rare. But your above example of the HLDS
> reminded me of all the times I had done the same, and wished for something
> like this.  I now think that making it a symbolic operator such as :?= is
> best.

Good; then we are on the same wavelenth.

> I assumed that Mike based his message off the idea that, at least in
> Mercury's C grades, a memory allocation is quite expensive while a branch is
> much cheaper.

On current (and foreseeable) architectures, the cost of a branch can range
from effective zero, if the branch is perfectly predictable, to huge (several
tens of cycles, maybe more than a hundred cycles) if the branch is perfectly
unpredictable, i.e. effectively random. Call these costs epsilon and huge
respectively. For a branch for which the CPU's branch predictor gets things
right N% of the time, its average cost will be (N * epsilon + (100-N) * huge)/100.

The value of N depends not just on the branch but on lots of other things,
because modern branch predictors take several aspects of the history
leading up to the execution of the branch into account when they make
their prediction. (For example, they can learn that this branch is very likely
to be taken if either of the previous two branches were not taken, if that
happens to be true.) However, they are tuned to exploit *local* history,
history within e.g. a C function or Java method, not the significantly less
regular cross-procedural-boundary history that would be useful for Mercury code.

There is a branch in GC_malloc, in the test for whether the freelist for
the selected block size is empty, but it is almost perfectly predictable.

The branch in conditional update, in cases where you want to use it,
will be significantly less predictable. (If it is perfectly predictable that
the new field value is not the same as the old, you would use unconditional
update; if it is perfectly predictable that the new value is the same as the old,
you won't have any update at all.)

The obvious extra cost of allocating a new structure even when you don't need to
is the work involved in allocating the structure and filling it in. However,
both are reasonably cheap as long as the filled-in structure stays in the cache.
The problem is that almost certainly it won't, because even if the new structure
becomes dead soon, gc is rare enough that it is very unlikely to come along
and collect the dead structure before it is flushed from the cache.

So basically, the tradeoff is between the two costliest operations
in modern CPUs: mispredicted branches and memory accesses.
Neither is obviously costlier than the other; which one is costlier
depends on the details.

By the way, Nick Nethercote has written a paper a few years ago
where he showed that counts of these two operations, mispredicted branches
and memory accesses, allowed him to predict the runtimes of a selected
set of Haskell programs with about 85% accuracy. One can take this
to mean that to a first approximation, all other operations the program does
are effectively free.

> > If you make conditional-update use a procedure call as its syntax,
> > then either
> > 
> > - its implementation is also a procedure call, in which case
> >   it is unnecessarily slow; or
> > 
> > - its implementation is not a procedure call but inline code,
> >   it which case the compiler would have to have completely new code
> >   to make the transformation.
> 
> I no-longer think this is a good idea.  But wouldn't it be recognised
> directly by the parser, like a reserved word?

Yes, but "parsing conditional update using procedure call syntax"
would still require more new code than "parsing conditional update
as a minor variation of unconditional update syntax", because
unlike the latter case, the former case would NOT be able to reuse
most of the existing code for parsing field updates. This is why I wrote
"completely new code" above.

> > Using the same overall syntax as !Info ^ field := X but with another
> > operators is easier for us as implementors. And for users, the easiest
> > way to make conditional-update easy to use, and to make switching
> > from always-update to conditional-update, or vice versa, is to make
> > condition-update a one-character change from the existing := operator.
> 
> These are good goals.

It seems we are converging on consensus.

Zoltan.




More information about the developers mailing list