[m-dev.] a conditional field update operator

Paul Bone paul at bone.id.au
Thu Mar 9 10:55:32 AEDT 2017


On Wed, Mar 08, 2017 at 05:53:30PM +1100, Zoltan Somogyi wrote:
> 
> 
> On Wed, 8 Mar 2017 16:55:20 +1100, Paul Bone <paul at bone.id.au> wrote:
> > > > Depending on the code that creates the new value (X), it might be more
> > > > appropriate to use a normal equality test.  It probably makes sense to
> > > > support both equality tests.
> > > 
> > > pointer_equal is a builtin that compares its operands as bit vectors;
> > > basically, it checks if the update would yield a new structure that is
> > > bit-identical to the old one. I don't see any situation in which it is *not*
> > > a correct test to use. (For integer and boolean fields, we often use
> > > New = Old as the test, but only because it is easier to type and to read;
> > > it is semantically identical to the pointer_equal test.)
> > 
> > Yes, it is the most conservative test.
> > 
> > However an equality test will catch items that are semantically equal, even
> > if not structurally or pointer equal, and avoid the memory allocation in
> > those situations also.  This may be what people want, but possibly less
> > often, although it might not be correct in some cases.
> 
> I thought you meant using normal equality tests only for atomic types,
> the way we hand-implement conditional update in the Mercury compiler now.
> I now see you are proposing it for nonatomic type as well.
> 
> For those, I am pretty sure it is a bad idea. Suppose you have a 1% chance
> of accidentally using deep equality instead of pointer equality. Then it may be
> that 0.5% of the time you compare two 100-byte structures for equality
> instead of two pointers, and 0.5% of the time you compare two 10 megabyte
> structures for equality instead of two pointers. The cost of that second 0.5%
> will be *way* more than what can be possibly justified by the amount of allocation
> being saved.

Perhaps I'm wrong but I had assumed that a normal equality test (such as
unification) will first compare pointers, and then do the deep tests only if
the pointers are unequal, and short ciruit as soon as two fields are
unequal.  And that it will do this transitively.  So that the actual /
amortized cost is low or trivial.

Either way, I don't think this should be the normal comparison used by a :?=
operator.  But it might make sense that this is available to developers who
want it.  Of course they could write it manually, I don't think it would be
used very often relative to the your proposal with pointer equality.

> In general, the cost of an allocation is sort-of linear in the amount being
> allocated, and therefore the fixed size of the cell that the update applies to
> is a sort-of natural bound for the cost of allocation. On the other hand,
> the cost of a deep equality test HAS no bound. Paying an unbounded cost
> for a bounded gain is not a good general strategy.

Even if my assumption that it is amortized or that large unbounded
comparisons are pathological, it is still more difficult to predict than the
bounded cost of allocation.  I agree with you.

> > > The reasons why I am proposing a new operator is precisely because
> > > 
> > > - this is the nth time we have found a need for it in the compiler,
> > >   for largish values of n, and
> > > 
> > > - I think it is likely that in many cases, it is only the size and (at the moment)
> > >   relatively hard-to-read nature of the five-line version that prevents it
> > >   from being used instead of the one-line, always-allocate-a-new-structure
> > >   version. For example, the compiler has many passes that transform
> > >   parts of the HLDS *slightly*, leaving many or most parts of it intact.
> > >   If we had the new operator, we could use it to significantly reduce
> > >   the amount of memory being allocated by such passes.
> > 
> > Mmm, true.  I hadn't considered the cases where I _don't_ use such a
> > pattern, even though I could/should.
> 
> I intended my original mail specifically to ask you guys to think about
> *potential future* uses of conditional update, not just actual, current uses.
> I see I should have made that more explicit.

I also thought that part of the intent is to ask what uses may occur in
closed code bases such as Prince.  The other developers will know more than
me about this, but AIUI there's less need for this in Prince than the
Mercury compiler.  Also sometimes when we manipulate tree-like structures
(the DOM) we do so using mutvars, because the structures have cycles that
are easier to manipulate with mutvars.

> > I assumed that Mike based his message off the idea that, at least in
> > Mercury's C grades, a memory allocation is quite expensive while a branch is
> > much cheaper.
> 
> On current (and foreseeable) architectures, the cost of a branch can range
> from effective zero, if the branch is perfectly predictable, to huge (several
> tens of cycles, maybe more than a hundred cycles) if the branch is perfectly
> unpredictable, i.e. effectively random. Call these costs epsilon and huge
> respectively. For a branch for which the CPU's branch predictor gets things
> right N% of the time, its average cost will be (N * epsilon + (100-N) * huge)/100.
> 
> The value of N depends not just on the branch but on lots of other things,
> because modern branch predictors take several aspects of the history
> leading up to the execution of the branch into account when they make
> their prediction. (For example, they can learn that this branch is very likely
> to be taken if either of the previous two branches were not taken, if that
> happens to be true.) However, they are tuned to exploit *local* history,
> history within e.g. a C function or Java method, not the significantly less
> regular cross-procedural-boundary history that would be useful for Mercury code.

Each additional branch also affects the predictability of other branches, it
may cause them to be evicted from the cache.

> There is a branch in GC_malloc, in the test for whether the freelist for
> the selected block size is empty, but it is almost perfectly predictable.
> 
> The branch in conditional update, in cases where you want to use it,
> will be significantly less predictable. (If it is perfectly predictable that
> the new field value is not the same as the old, you would use unconditional
> update; if it is perfectly predictable that the new value is the same as the old,
> you won't have any update at all.)
> 
> The obvious extra cost of allocating a new structure even when you don't need to
> is the work involved in allocating the structure and filling it in. However,
> both are reasonably cheap as long as the filled-in structure stays in the cache.
> The problem is that almost certainly it won't, because even if the new structure
> becomes dead soon, gc is rare enough that it is very unlikely to come along
> and collect the dead structure before it is flushed from the cache.
> 
> So basically, the tradeoff is between the two costliest operations
> in modern CPUs: mispredicted branches and memory accesses.
> Neither is obviously costlier than the other; which one is costlier
> depends on the details.

That makes sense.

Maybe I'm biased against Boehm GC, knowing the other problems that it has
created for us, particularly poor multi-threaded performance.

>From what I actually know about Boehm GC checking the free lists should be a
few (dependent) memory reads and a well-predicted branch on the fast path.
It's important that they are dependent reads (it does pointer following) so
it may create stalls.  Each time a free list is populated it is done so by
lazy scanning of a memory block.  Depending on the utilisation of a block
that could populate the list with a varying amount of items.  Let's say for
16 byte objects and 4096 byte pages which are half-full. That would add 128
items to the free list (I haven't accounted for the page header).  The
branch will be mis-predicted at least 1/128 times.  But since that's when it
executes the slow path anyway it might not be a problem.  The slow path does
a lot more work.  It must synchronize with other threads, get a page, sweep
the page and build the free lists.  This is amortised, but it's difficult to
say how well.

To summarise, Exactly what this cost is and how it trades off against a
mis-predicted branch is an open question, we'd need to measure it.

> By the way, Nick Nethercote has written a paper a few years ago
> where he showed that counts of these two operations, mispredicted branches
> and memory accesses, allowed him to predict the runtimes of a selected
> set of Haskell programs with about 85% accuracy. One can take this
> to mean that to a first approximation, all other operations the program does
> are effectively free.

That's been my understanding for a while.  But I wasn't aware of Nick's
paper, is this the one? http://dotat.at/tmp/msp02.pdf

> > > If you make conditional-update use a procedure call as its syntax,
> > > then either
> > > 
> > > - its implementation is also a procedure call, in which case
> > >   it is unnecessarily slow; or
> > > 
> > > - its implementation is not a procedure call but inline code,
> > >   it which case the compiler would have to have completely new code
> > >   to make the transformation.
> > 
> > I no-longer think this is a good idea.  But wouldn't it be recognised
> > directly by the parser, like a reserved word?
> 
> Yes, but "parsing conditional update using procedure call syntax"
> would still require more new code than "parsing conditional update
> as a minor variation of unconditional update syntax", because
> unlike the latter case, the former case would NOT be able to reuse
> most of the existing code for parsing field updates. This is why I wrote
> "completely new code" above.

Ah, okay.

> > > Using the same overall syntax as !Info ^ field := X but with another
> > > operators is easier for us as implementors. And for users, the easiest
> > > way to make conditional-update easy to use, and to make switching
> > > from always-update to conditional-update, or vice versa, is to make
> > > condition-update a one-character change from the existing := operator.
> > 
> > These are good goals.
> 
> It seems we are converging on consensus.

Yes, I think so.  My tangents are mostly just tangents that may be
interesting but don't really effect this design decision.


-- 
Paul Bone
http://paul.bone.id.au


More information about the developers mailing list