Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Join Full Outer Join with filter is not filtering on some equality clause matches #11599

Open
DanielHunte opened this issue Nov 19, 2024 · 5 comments · May be fixed by #11068
Open

Merge Join Full Outer Join with filter is not filtering on some equality clause matches #11599

DanielHunte opened this issue Nov 19, 2024 · 5 comments · May be fixed by #11068
Assignees
Labels
bug Something isn't working merge-join triage Newly created issue that needs attention.

Comments

@DanielHunte
Copy link

DanielHunte commented Nov 19, 2024

Bug description

Full Outer Joins results are expected to have unmatched rows from either table with nulls for the other table's columns. There is a bug in Velox's implementation where when x rows (x >= 2) match on the equality clause (ex. t0 = u0) but don't pass the filter (ex. t0 > 4), for some reason x - 1 rows are still considered a match and therefore results in missing rows in the output. See the following example.

    SELECT
        *
    FROM (
        VALUES (1), (2), (3)
    ) t(t0)
),
u AS (
    SELECT
        *
    FROM (
        VALUES (2), (3), (4)
    ) t(u0)
)
SELECT
    *
FROM t
FULL OUTER JOIN u
    ON t.t0 = u.u0
    AND t.t0 > 4;

Nothing should match since nothing passes the filter. This query should produce the table:

  t0 | u0
   1 | NULL
   2 | NULL
NULL | 2
   3 | NULL
NULL | 3
NULL | 4

However Velox thinks there is a match on 2 and produces the table:

  t0 | u0
   1 | NULL
   2 | 2
   3 | NULL
NULL | 3
NULL | 4

2 | 2 is an extra row and 2 | NULL and NULL | 2 are missing rows.

Here is the unit test I made within MergeJoinTest.cpp:

TEST_F(MergeJoinTest, fullOuterJoinFilteredOutMatches) {
  auto left = makeRowVector({"t0"}, {makeFlatVector<int64_t>({1, 2, 3})});
  auto right = makeRowVector({"u0"}, {makeFlatVector<int64_t>({2, 3, 4})});
  createDuckDbTable("t", {left});
  createDuckDbTable("u", {right});

  // Full outer join.
  auto planNodeIdGenerator = std::make_shared<core::PlanNodeIdGenerator>();
  auto plan =
      PlanBuilder(planNodeIdGenerator)
          .values({left})
          .mergeJoin(
              {"t0"},
              {"u0"},
              PlanBuilder(planNodeIdGenerator).values({right}).planNode(),
              "t0 > 4",
              {"t0", "u0"},
              core::JoinType::kFull)
          .planNode();

  AssertQueryBuilder(plan, duckDbQueryRunner_)
      .assertResults(
          "SELECT * FROM t FULL OUTER JOIN u ON t.t0 = u.u0 AND t.t0 > 4");
}

System information

Wrote unit test on my linux dev server.

Relevant logs

No response

@DanielHunte DanielHunte added bug Something isn't working triage Newly created issue that needs attention. labels Nov 19, 2024
@DanielHunte DanielHunte self-assigned this Nov 19, 2024
@Yuhta
Copy link
Contributor

Yuhta commented Nov 19, 2024

CC: @pedroerp

@kagamiori
Copy link
Contributor

@DanielHunte Could you also paste the Velox unit test of merge join you created here?

@pedroerp
Copy link
Contributor

I think this PR addresses this issue (still under review):

#11068

Cc: @JkSelf

@DanielHunte
Copy link
Author

@DanielHunte Could you also paste the Velox unit test of merge join you created here?

Sure. I added it to the end of the description.

@JkSelf
Copy link
Collaborator

JkSelf commented Nov 21, 2024

It seems #11068 can fix this issue. I will follow up to resolve @pedroerp's comments. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working merge-join triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants