-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: fix bug in poller that allows it to emit a resolved ts before emitting row updates at the ts #41665
changefeedccl: fix bug in poller that allows it to emit a resolved ts before emitting row updates at the ts #41665
Conversation
7fac9e0
to
d8029c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look forward to the nemesis test which exercises this code path from existing. Can we at least add a small unit test that asserts that we do not emit a resolved timestamp at that scan boundary (we always do before this change). Perhaps it just requires adding an assertion to an existing test.
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @danhhz)
d8029c2
to
ea0ebbf
Compare
FYI I'm looking at this, but it's taking a sec to get the scanBoundaries stuff loaded up again |
Wouldn't the assertion in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @ajwerner)
pkg/ccl/changefeedccl/changefeed.go, line 173 at r1 (raw file):
// that have timestamps less than or equal to any resolved timestamp its forwarded before. if !sf.Frontier().Less(row.updated) { return errors.AssertionFailedf("error: detected timestamp %s that is less than "+
I'm 100% in favor of making this error instead of skip in 20.1 early, but I'm against backporting it at this point. Too risky. I'm not convinced that every edge case is handled for us to get this strong of a guarantee and if this error fires, we'll stop the changefeed. This means it should get its own PR (or at least its own commit)
pkg/ccl/changefeedccl/poller.go, line 269 at r1 (raw file):
// Make sure scan boundaries less than or equal to `resolvedTS` were // added to the `scanBoundaries` list before proceeding. if err := p.tableHist.WaitForTS(ctx, resolvedTS); err != nil {
good catch
pkg/ccl/changefeedccl/poller.go, line 283 at r1 (raw file):
// to trigger the scan first before resolving its scan boundary // timestamp. frontier.Forward(e.resolved.Span, resolvedTS)
it's strange to me that the frontier doesn't match what we've sent to AddResolved. instead of moving the AddResolved call to an else, can we replace the resolvedTS = p.mu.scanBoundaries[0]
line with resolvedTS = p.mu.scanBoundaries[0].Prev()
?
Not for this PR of course, but this whole poller/tableHistory thing is in serious need of some better abstractions. I can see where I was going with it, but the they really didn't hold up as we found things I didn't initially think of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming my suggestion below works
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @ajwerner)
pkg/ccl/changefeedccl/poller.go, line 283 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
it's strange to me that the frontier doesn't match what we've sent to AddResolved. instead of moving the AddResolved call to an else, can we replace the
resolvedTS = p.mu.scanBoundaries[0]
line withresolvedTS = p.mu.scanBoundaries[0].Prev()
?
Hmm, on second thought I'm coming around to what you have here. My suggestion would basically guarantee that we get strange looking resolved timestamps, which seems likely to cause some UX confusion
Can we do both? how about we both skip sending the resolved timestamp as you have here, and also stick a resolvedTs = resolvedTs.Prev()
at the beginning of the if boundaryBreak {
branch with a comment. Something like "A boundary means we're about to do a backfill at this timestamp, so at the changefeed level the boundary itself is not resolved."
ts before emitting row updates at said ts. Fixes: cockroachdb#41415 Currently, the poller keeps track of the most recent table descriptor version for every table being watched by it. It polls to get the current table descriptor periodically, and triggers a backfill if it detects that the previous version of the table descriptor has a mutation but the current one doesn't. It triggers the aforementioned backfill scan at the `ModificationTime` of the new table descriptor (meaning that the row updates resulting from this backfill scan will have the `ModificationTime` of the of the table descriptor as their `updated` timestamps). Now, it is possible that the poller forwarded a resolved timestamp up to the `changeAggregator` that is equal to or greater than this `ModificationTime` *before* the backfill scan was actually triggered. This PR fixes this problem by first making sure that the concerned goroutine waits until the backfill scan boundary has been detected and then skips forwarding all resolved timestamps until the scan boundary is executed upon. Release note (changefeedccl): Fix bug in poller in case of schema change. Release justification: Fix bug in poller that causes it to emit row updates at a timestamp less than or equal to an already forwarded resolved timestamp.
ea0ebbf
to
cd19443
Compare
pkg/ccl/changefeedccl/changefeed.go, line 173 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Done. It now just logs an error message and the tests simply ensure that this message wasn't logged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)
pkg/ccl/changefeedccl/poller.go, line 283 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Hmm, on second thought I'm coming around to what you have here. My suggestion would basically guarantee that we get strange looking resolved timestamps, which seems likely to cause some UX confusion
Can we do both? how about we both skip sending the resolved timestamp as you have here, and also stick a
resolvedTs = resolvedTs.Prev()
at the beginning of theif boundaryBreak {
branch with a comment. Something like "A boundary means we're about to do a backfill at this timestamp, so at the changefeed level the boundary itself is not resolved."
Done.
This is ready for a final look. |
Thanks for the quick reviews! bors r+ |
Build failed |
This seems like a flake, trying again. bors r+ |
bors r+ |
41665: changefeedccl: fix bug in poller that allows it to emit a resolved ts before emitting row updates at the ts r=ajwerner a=aayushshah15 Fixes: #41415 Currently, the poller keeps track of the most recent table descriptor version for every table being watched by it. It polls to get the current table descriptor periodically, and triggers a backfill if it detects that the previous version of the table descriptor has a mutation but the current one doesn't. It triggers the aforementioned backfill scan at the `ModificationTime` of the new table descriptor (meaning that the row updates resulting from this backfill scan will have the `ModificationTime` of the of the table descriptor as their `updated` timestamps). Now, it is possible that the poller forwarded a resolved timestamp up to the `changeAggregator` that is equal to or greater than this `ModificationTime` *before* the backfill scan was actually triggered. This PR fixes this problem by first making sure that the concerned goroutine waits until the backfill scan boundary has been detected and then skips forwarding all resolved timestamps until the scan boundary is executed upon. Release note (changefeedccl): Fix bug in poller in case of schema change. Release justification: Fix bug in poller that causes it to emit row updates at a timestamp less than or equal to an already forwarded resolved timestamp. Co-authored-by: Aayush Shah <[email protected]>
Build succeeded |
Fixes: #41415
Currently, the poller keeps track of the most recent table descriptor
version for every table being watched by it. It polls to get the current
table descriptor periodically, and triggers a backfill if it detects
that the previous version of the table descriptor has a mutation but
the current one doesn't.
It triggers the aforementioned backfill scan at the
ModificationTime
of the new table descriptor (meaning that the row updates resulting from
this backfill scan will have the
ModificationTime
of the of the tabledescriptor as their
updated
timestamps). Now, it is possible that thepoller forwarded a resolved timestamp up to the
changeAggregator
thatis equal to or greater than this
ModificationTime
before thebackfill scan was actually triggered.
This PR fixes this problem by first making sure that the concerned
goroutine waits until the backfill scan boundary has been detected and
then skips forwarding all resolved timestamps until the scan boundary is
executed upon.
Release note (changefeedccl): Fix bug in poller in case of schema change.
Release justification: Fix bug in poller that causes it to emit row
updates at a timestamp less than or equal to an already forwarded
resolved timestamp.