Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: validating FK constraint uses extremely slow hash join #33452

Closed
thoszhang opened this issue Jan 3, 2019 · 1 comment
Closed

sql: validating FK constraint uses extremely slow hash join #33452

thoszhang opened this issue Jan 3, 2019 · 1 comment
Assignees
Labels
A-schema-changes C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@thoszhang
Copy link
Contributor

Currently VALIDATE CONSTRAINT for foreign keys performs a hash join on the source and target columns, which can be extremely slow. SCRUB has a much faster implementation of fk validation that uses a merge join: I ran the SCRUB fk validation on a tpcc-100 table on a roachprod cluster, and it only took a few seconds, compared to >1 hour (before I killed it) for the query that would be used in VALIDATE CONSTRAINT. We should try to reimplement VALIDATE CONSTRAINT to use the same approach as SCRUB.

There's no straightforward way to do this via SQL because null values need to be handled correctly, which requires an ON clause that contains a bunch of a=b OR (a IS NULL AND b IS NULL) clauses (as in the current implementation, https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/check.go#L115-L121). The way SCRUB gets around this is to create a plan that uses a merge join, and then have a separate step to set NullEquality to true on all MergeJoinerSpecs in the plan.

This will partially address #32118, but the unrelated issue of long transactions restarting still needs to be fixed.

@thoszhang thoszhang added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-schema-changes labels Jan 3, 2019
@thoszhang thoszhang self-assigned this Jan 3, 2019
@thoszhang
Copy link
Contributor Author

Some of the above no longer applies after the new changes to foreign key matching (#20305). We need to rewrite the FK validation query to conform to the new matching semantics.

I believe that the correct queries will not require clauses of the form a=b OR (a IS NULL AND b IS NULL), so they will automatically use a merge join without further intervention.

craig bot pushed a commit that referenced this issue Feb 5, 2019
34365: sql: fix FK validation join implementation r=lucy-zhang a=lucy-zhang

This PR updates the SQL query used for VALIDATE CONSTRAINT for foreign keys.
The new implementation is compatible with the recent changes to FK matching
semantics (both MATCH FULL and MATCH SIMPLE). It also uses a merge join, which
will improve performance significantly compared to the old hash join
implementation.

Release note (sql change): VALIDATE CONSTRAINT for foreign keys is now
compatible with the new MATCH FULL and MATCH SIMPLE semantics, and is more
performant.

Fixes #33452

Co-authored-by: Lucy Zhang <[email protected]>
@craig craig bot closed this as completed in #34365 Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

1 participant