-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: improve performance of gzip, and add zstd compression #88635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna)
5a9c603
to
569a1c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
pkg/ccl/changefeedccl/compresion.go
Outdated
case sinkCompressionZstd: | ||
return zstd.NewWriter(dest, zstd.WithEncoderLevel(zstd.SpeedFastest)) | ||
default: | ||
return nil, errors.AssertionFailedf("unsupported encoder algorithm %q", algo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps?
return nil, errors.AssertionFailedf("unsupported encoder algorithm %q", algo) | |
return nil, errors.AssertionFailedf("unsupported compression algorithm %q", algo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
cc @nollenr |
116c20f
to
c3dd28b
Compare
pkg/ccl/changefeedccl/compresion.go
Outdated
@@ -0,0 +1,70 @@ | |||
// Copyright 2022 The Cockroach Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to correct the spelling on the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh.... Ooops... Or .. maybe it was compres'd?
Expand the set of supported compression algorithms in changefeed. A faster implementation of gzip algorithm is avaible, and is used by default. The gzip algorithm implementation can be reverted to Go standard gzip implementation via the `changefeed.fast_gzip.enabled` setting. In addition, add support for compression files with zstd. Release notes (enterprise change): Changefeed can emit files compressed with zstd algorithm -- which provides good compression, and is much faster than gzip. In addition, a new, faster implementation of gzip is used by default.
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Backported to v22.2.1 via #91002 |
Ensure resources acquired by cloud storage files are released when the sink is closed. As of cockroachdb#88635, cloud storage uses faster implementation of gzip compression algorithm (along with zstd). This new implementation is sufficiently different from the standard gzip implementation in that it requires the compression codec to be closed, even when the caller is terminating. Failure to do so results in the memory as well as the goroutine leakage. This resource leakage may become sufficiently noticable if the changefeed experiences many repeated errors. This PR modifies Close() call to make sure that the underlying compression codecs are also closed (Note: we rely on the high level logic in distSQL to ensure that the processor gets orderly shut down, and the shutdown code calls Close() method; However, there is still exists a possiblity that the shutdown might not be orderly, and in those cases resource leakage may still occur. This possiblity will need to be revisited in the follow on PR). Fixes cockroachdb#106774 Release note (enterprise change): Fix an issue where the changefeeds emitting to cloud sink with compression may experience resource leakage (memory and go routines) when experiencing transient errors.
106786: util/log: allow custom crash report tags r=pjtatlow a=pjtatlow Today it can be difficult to trace back a sentry event to the CC cluster where it originated, especially for serverless clusters. This change enables a new environment variable (COCKROACH_CRASH_REPORT_TAGS), which allows the database operator to provide additional information that will be included in the sentry event. Release Note: None 106795: changefeedccl: Cleanup resources when closing file r=miretskiy a=miretskiy Ensure resources acquired by cloud storage files are released when the sink is closed. As of #88635, cloud storage uses faster implementation of gzip compression algorithm (along with zstd). This new implementation is sufficiently different from the standard gzip implementation in that it requires the compression codec to be closed, even when the caller is terminating. Failure to do so results in the memory as well as the goroutine leakage. This resource leakage may become sufficiently noticable if the changefeed experiences many repeated errors. This PR modifies Close() call to make sure that the underlying compression codecs are also closed (Note: we rely on the high level logic in distSQL to ensure that the processor gets orderly shut down, and the shutdown code calls Close() method; However, there is still exists a possiblity that the shutdown might not be orderly, and in those cases resource leakage may still occur. This possiblity will need to be revisited in the follow on PR). Fixes #106774 Release note (enterprise change): Fix an issue where the changefeeds emitting to cloud sink with compression may experience resource leakage (memory and go routines) when experiencing transient errors. 106827: insights: fix flakey TestInsightsIntegrationForContention r=j82w a=j82w The test is flakey sometimes because there was a minimum time check the contention duration. The problem is other parts of the crdb could be slow which caused the contention time to be less than expected. The test now only checks if the value is greater than 0 and is less than 1 minute. waiting_txn_fingerprint_id should always have a value. Fixed the check to make sure it's not the default value and renamed variables to match the value. Fixes: #106622 Release note: None Epic: none 106840: schemachanger: Fix CREATE SEQUENCE OWNED BY failure r=Xiang-Gu a=Xiang-Gu Previously, stmts like `CREATE SEQUENCE s OWNED BY col` where table name is missing will fail with an internal error. This commit fix this. Fix #106838 Release note: None Co-authored-by: PJ Tatlow <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Xiang Gu <[email protected]>
Ensure resources acquired by cloud storage files are released when the sink is closed. As of #88635, cloud storage uses faster implementation of gzip compression algorithm (along with zstd). This new implementation is sufficiently different from the standard gzip implementation in that it requires the compression codec to be closed, even when the caller is terminating. Failure to do so results in the memory as well as the goroutine leakage. This resource leakage may become sufficiently noticable if the changefeed experiences many repeated errors. This PR modifies Close() call to make sure that the underlying compression codecs are also closed (Note: we rely on the high level logic in distSQL to ensure that the processor gets orderly shut down, and the shutdown code calls Close() method; However, there is still exists a possiblity that the shutdown might not be orderly, and in those cases resource leakage may still occur. This possiblity will need to be revisited in the follow on PR). Fixes #106774 Release note (enterprise change): Fix an issue where the changefeeds emitting to cloud sink with compression may experience resource leakage (memory and go routines) when experiencing transient errors.
Ensure resources acquired by cloud storage files are released when the sink is closed. As of #88635, cloud storage uses faster implementation of gzip compression algorithm (along with zstd). This new implementation is sufficiently different from the standard gzip implementation in that it requires the compression codec to be closed, even when the caller is terminating. Failure to do so results in the memory as well as the goroutine leakage. This resource leakage may become sufficiently noticable if the changefeed experiences many repeated errors. This PR modifies Close() call to make sure that the underlying compression codecs are also closed (Note: we rely on the high level logic in distSQL to ensure that the processor gets orderly shut down, and the shutdown code calls Close() method; However, there is still exists a possiblity that the shutdown might not be orderly, and in those cases resource leakage may still occur. This possiblity will need to be revisited in the follow on PR). Fixes #106774 Release note (enterprise change): Fix an issue where the changefeeds emitting to cloud sink with compression may experience resource leakage (memory and go routines) when experiencing transient errors.
Expand the set of supported compression algorithms in changefeed.
A faster implementation of gzip algorithm is avaible, and is used
by default. The gzip algorithm implementation can be reverted
to Go standard gzip implementation via the
changefeed.fast_gzip.enabled
setting.In addition, add support for compression files with zstd.
Addresses #88585
Release notes (enterprise change): Changefeed can emit files compressed
with zstd algorithm -- which provides good compression, and is much
faster than gzip. In addition, a new, faster implementation of
gzip is used by default.