[m-dev.] [Mercury-Language/mercury] Segfault in only Mercury after solutions in a particular code path (#72)

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Aug 26 13:36:33 AEST 2019



On Mon, 26 Aug 2019 10:59:40 +1000 (AEST), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote: 
> The bug seems to be in simplification, as shown by the attached HLDS dumps.
> It is moving the assignment to R to *before* its value is determined.
> 
> I will take a look at it.

I have taken a look at it, and simplificiation is not to blame.
It is not the cause of the bug: it only reveals the bug. The true cause
of the bug is a fundamental problem in the interaction between
type-specific unification predicates when operating on partially instantiated
data structures, and the direct arg optimization.

The relevant part of the hlc code we generate when we compile (a slightly
modified copy of) the bug72 test case with inlining looks like this:

main_2_p_0(void)
{
  {
    MR_bool succeeded;
    MR_Word R_4 = (MR_Word) (MR_mkword(MR_mktag(1), (MR_Word) (NULL)));
    MR_Word S_6 = (MR_Word) (MR_mkword(MR_mktag(1), &bug72a_scalar_common_1[1]));
    MR_Word Var_9;
    MR_Word Var_13;
    MR_Word ArgX1_21;

    succeeded = (S_6 != (MR_Word) ((MR_Unsigned) 0U));
    if (succeeded)
    {
      Var_13 = ((MR_Word) ((MR_hl_field(MR_mktag(1), S_6, (MR_Integer) 0))));
      Var_9 = ((MR_Word) ((MR_hl_field(MR_mktag(1), S_6, (MR_Integer) 1))));
      succeeded = ((MR_tag((MR_Word) Var_13)) == (MR_Integer) 1);
      if (succeeded)
      {
        ArgX1_21 = (MR_Word) (MR_body((MR_Word) (Var_13), (MR_Integer) 1));
        R_4 = (MR_Word) (MR_mkword(MR_mktag(1), (MR_Word) (ArgX1_21)));
        succeeded = (Var_9 == (MR_Word) ((MR_Unsigned) 0U));
      }
    }
    ...

This code works.

With --no-inlining, the corresponding code we get is

main_2_p_0(void)
{
  {
    MR_bool succeeded;
    MR_Word R_4 = (MR_Word) (MR_mkword(MR_mktag(1), (MR_Word) (NULL)));
    MR_Word S_6;
    MR_Word Var_9;
    MR_Word Var_13;

    bug72a__sols_1_p_0(&S_6);
    succeeded = (S_6 != (MR_Word) ((MR_Unsigned) 0U));
    if (succeeded)
    {
      Var_13 = ((MR_Word) ((MR_hl_field(MR_mktag(1), S_6, (MR_Integer) 0))));
      Var_9 = ((MR_Word) ((MR_hl_field(MR_mktag(1), S_6, (MR_Integer) 1))));
      succeeded = bug72a____Unify____maybe_reviewed_0_1(R_4, Var_13);
      if (succeeded)
        succeeded = (Var_9 == (MR_Word) ((MR_Unsigned) 0U));
    }
    ...

with the code of the unify function being

bug72a____Unify____maybe_reviewed_0_1(
  MR_Word HeadVar__1_1,
  MR_Word HeadVar__2_2)
{
  {
    MR_bool succeeded = ((MR_tag((MR_Word) HeadVar__2_2)) == (MR_Integer) 1);
    MR_Word ArgX1_5;

    if (succeeded)
    {
      ArgX1_5 = (MR_Word) (MR_body((MR_Word) (HeadVar__2_2), (MR_Integer) 1));
      HeadVar__1_1 = (MR_Word) (MR_mkword(MR_mktag(1), (MR_Word) (ArgX1_5)));
      succeeded = MR_TRUE;
    }
    return succeeded;
  }
}

This is exactly the same code as one would expect from the inlined version,
with some cosmetic differences arising from differences in variable numbering,
and from MLDS optimizations merging successive "if (succeeded) ..." statements.

The root of the problem is the parameter passing. In the inlined version,
when we fill in the argument of R_4, we update R_4. In the non-inlined version,
when we fill in the argument of HeadVar__1, for which main passes R_4,
we update HeadVar__1, but do NOT update R_4 in main. Normally, this is
not a problem: the filled in argument is normally on the heap, and the caller's
pointer also points to it, so the caller also sees the field as being filled in.
However, with the direct arg optimization, the field being filled in is NOT
on the heap; it is in the pointer, next to the primary tag. It is the lack of
any update to this field in the caller's R_4 that leaves the value of R_4 as a
tagged NULL pointer, whose dereferencing leads to the crash.

I see two obvious approaches to fixing this. The quick-and-dirty fix would be
to insist that the calls to unify predicates that have this problem (i.e. they are
for a type with one or more direct arg functors, and some args of those direct-arg
functors are initially free) be inlined even in the presence of --no-inlining.
The other would be to modify the parameter passing conventions for such
unifications, which would require not just nontrivial changes to the code
that handles both the caller and the callee sides of such calls, but probably
also significant changes in all the code generators. Currently all code generators
assume that a call's outputs define *new* variables; making them handle situations
in which a call has one or two outputs that update *existing* variables would also
not be trivial.

On that basis, the first approach would seem more pragmatic, even though
conceptually it is far from satisfying.

Opinions?

Zoltan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug72a.m
Type: text/x-objcsrc
Size: 1162 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/developers/attachments/20190826/76f90b76/attachment.bin>


More information about the developers mailing list