Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: respect filesystem walk errors in cloudFeed #43006

Merged
merged 1 commit into from
Dec 6, 2019

Conversation

nvanbenschoten
Copy link
Member

Closes #42979.

This commit properly handles errors in cloudFeed's filepath.WalkFunc
implementation. The documentation says:

If there was a problem walking to the file or directory named by
path, the incoming error will describe the problem and the function
can decide how to handle that error (and Walk will not descend into
that directory). If an error is returned, processing stops.

I stressed the test for 10,000 iterations and never saw anything, so it's
possible that this was a fluke. It's not clear what filesystem error this was
throwing so we might still see this pop up again, but at least we'll now
correctly propagate the error and surface it instead of hitting an NPE.
Because of that, I'm closing the issue for now.

Release note: None

@nvanbenschoten nvanbenschoten requested a review from danhhz December 5, 2019 22:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor

danhhz commented Dec 5, 2019

:lgtm:

This happens when the file disappears between getting the filenames in the directory and lstat-ing the info to pass it to walkFn. In our tests, this happens because export storage first writes the file under a temporary filename and then renames it to the final path. Previously, the temporary file was written in the system temp, but as of a recent change, it's written in the final dir. This was causing issues, so this morning knz changed the temp files to have a ".tmp" suffix which this walkFn already ignores. If you rebased this commit back to just before #42987, then I bet it would fail under stress.

@nvanbenschoten
Copy link
Member Author

Ah, I missed that fly by. This still seems like a nice complementary solution. TFTR.

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 5, 2019

Build failed (retrying...)

@otan
Copy link
Contributor

otan commented Dec 5, 2019

i think this broke something (/ hid something that is broken):

[run] {"Time":"2019-12-05T18:17:11.027965988-05:00","Action":"output","Package":"github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl","Test":"TestChangefeedNemeses/cloudstorage","Output":"        nemeses_test.go:38: lstat /tmp/TestChangefeedNemeses_cloudstorage598908568/0/2019-12-05/201912052317095153334660000000001-ee65cbd3711bd858-1-20-00000004-foo-12.ndjson518464510.tmp: no such file or directory\n"}

https://teamcity.cockroachdb.com/viewLog.html?tab=buildLog&logTab=tree&filter=debug&expand=all&buildId=1629211&_focus=11613

@nvanbenschoten
Copy link
Member Author

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 5, 2019

Canceled

Closes cockroachdb#42979.

This commit properly handles errors in `cloudFeed`'s `filepath.WalkFunc`
implementation. The documentation says:
> If there was a problem walking to the file or directory named by
> path, the incoming error will describe the problem and the function
> can decide how to handle that error (and Walk will not descend into
> that directory). If an error is returned, processing stops.

I stressed the test for 10,000 iterations and never saw anything, so it's
possible that this was a fluke. It's not clear what filesystem error this was
throwing so we might still see this pop up again, but at least we'll now
correctly propagate the error and surface it instead of hitting an NPE.
Because of that, I'm closing the issue for now.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fix42979 branch from 6b68391 to 0104746 Compare December 5, 2019 23:35
@nvanbenschoten
Copy link
Member Author

Yeah, I think this hit the error case Dan mentioned and broke

a ".tmp" suffix which this walkFn already ignores

The error handling code should have fired only after checking for and ignoring .tmp files. Sorry for the disruption. Done.

bors r+

craig bot pushed a commit that referenced this pull request Dec 6, 2019
42952: pgdate: fix parsing of dates before unix epoch r=otan a=otan

Resolves #42937.

Previously, pgdate handling dates before the unix epoch with time
attached to it rounded up instead of rounding down, meaning times such
as '1969-12-30 01:00:00' rounded up to days = `-1`, when it should be
days = `-2`. This PR addresses that change.

I will probably look at backporting this.

Release note (bug fix): We did not previously handled date casts from
timestamp/timestamptz with time attached to it for times before the unix
epoch correctly. For example, '1969-12-30 01:00:00'::timestamp would
round to '1969-12-31' instead of '1969-12-30'. This PR addresses that
change.

43006: ccl/changefeedccl: respect filesystem walk errors in cloudFeed r=nvanbenschoten a=nvanbenschoten

Closes #42979.

This commit properly handles errors in `cloudFeed`'s `filepath.WalkFunc`
implementation. The documentation says:
> If there was a problem walking to the file or directory named by
> path, the incoming error will describe the problem and the function
> can decide how to handle that error (and Walk will not descend into
> that directory). If an error is returned, processing stops.

I stressed the test for 10,000 iterations and never saw anything, so it's
possible that this was a fluke. It's not clear what filesystem error this was
throwing so we might still see this pop up again, but at least we'll now
correctly propagate the error and surface it instead of hitting an NPE.
Because of that, I'm closing the issue for now.

Release note: None

43009: row: move cfetcher initialization validation steps outside of loop r=yuzefovich a=rohany

Some initialization checks that the cfetcher performed were occuring
in a loop for no reason. This PR moves them out of the loop.

Release note: None

43010: cmd/roachtest: only log sqlsmith on non-errors r=mjibson a=mjibson

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Rohan Yadav <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
@craig
Copy link
Contributor

craig bot commented Dec 6, 2019

Build succeeded

@craig craig bot merged commit 0104746 into cockroachdb:master Dec 6, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/fix42979 branch December 27, 2019 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ccl/changefeedccl: TestChangefeedDiff failed
4 participants