Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: improve performance of gzip, and add zstd compression #88635

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

miretskiy
Copy link
Contributor

@miretskiy miretskiy commented Sep 23, 2022

Expand the set of supported compression algorithms in changefeed.

A faster implementation of gzip algorithm is avaible, and is used
by default. The gzip algorithm implementation can be reverted
to Go standard gzip implementation via the
changefeed.fast_gzip.enabled setting.

In addition, add support for compression files with zstd.

Addresses #88585

Release notes (enterprise change): Changefeed can emit files compressed
with zstd algorithm -- which provides good compression, and is much
faster than gzip. In addition, a new, faster implementation of
gzip is used by default.

@miretskiy miretskiy requested review from a team as code owners September 23, 2022 22:21
@miretskiy miretskiy requested review from stevendanna and removed request for a team September 23, 2022 22:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@shermanCRL shermanCRL changed the title changefeedccl: Expand the set of available compression algorithms] changefeedccl: improve performance of gzip, and add zstd compression Sep 23, 2022
Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna)

@miretskiy miretskiy force-pushed the gzip branch 5 times, most recently from 5a9c603 to 569a1c4 Compare September 24, 2022 22:28
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

case sinkCompressionZstd:
return zstd.NewWriter(dest, zstd.WithEncoderLevel(zstd.SpeedFastest))
default:
return nil, errors.AssertionFailedf("unsupported encoder algorithm %q", algo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps?

Suggested change
return nil, errors.AssertionFailedf("unsupported encoder algorithm %q", algo)
return nil, errors.AssertionFailedf("unsupported compression algorithm %q", algo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@shermanCRL
Copy link
Contributor

cc @nollenr

@miretskiy miretskiy force-pushed the gzip branch 3 times, most recently from 116c20f to c3dd28b Compare September 27, 2022 01:22
@@ -0,0 +1,70 @@
// Copyright 2022 The Cockroach Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to correct the spelling on the file name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh.... Ooops... Or .. maybe it was compres'd?

Expand the set of supported compression algorithms in changefeed.

A faster implementation of gzip algorithm is avaible, and is used
by default.  The gzip algorithm implementation can be reverted
to Go standard gzip implementation via the
`changefeed.fast_gzip.enabled` setting.

In addition, add support for compression files with zstd.

Release notes (enterprise change): Changefeed can emit files compressed
with zstd algorithm -- which provides good compression, and is much
faster than gzip.  In addition, a new, faster implementation of
gzip is used by default.
@miretskiy
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 27, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 28, 2022

Build succeeded:

@craig craig bot merged commit 4982322 into cockroachdb:master Sep 28, 2022
@shermanCRL
Copy link
Contributor

Backported to v22.2.1 via #91002

miretskiy pushed a commit to miretskiy/cockroach that referenced this pull request Jul 13, 2023
Ensure resources acquired by cloud storage files are released
when the sink is closed.

As of cockroachdb#88635,
cloud storage uses faster implementation of gzip compression
algorithm (along with zstd).  This new implementation
is sufficiently different from the standard gzip implementation
in that it requires the compression codec to be closed, even
when the caller is terminating.  Failure to do so results in the
memory as well as the goroutine leakage.

This resource leakage may become sufficiently noticable
if the changefeed experiences many repeated errors.

This PR modifies Close() call to make sure that the underlying
compression codecs are also closed (Note: we rely on the high level
logic in distSQL to ensure that the processor gets orderly shut down,
and the shutdown code calls Close() method; However, there is still
exists a possiblity that the shutdown might not be orderly, and in those
cases resource leakage may still occur.  This possiblity will need
to be revisited in the follow on PR).

Fixes cockroachdb#106774

Release note (enterprise change): Fix an issue where the changefeeds
emitting to cloud sink with compression may experience resource leakage
(memory and go routines) when experiencing transient errors.
craig bot pushed a commit that referenced this pull request Jul 14, 2023
106786: util/log: allow custom crash report tags r=pjtatlow a=pjtatlow

Today it can be difficult to trace back a sentry event to the CC cluster where
it originated, especially for serverless clusters. This change enables a new
environment variable (COCKROACH_CRASH_REPORT_TAGS), which allows
the database operator to provide additional information that will be included
in the sentry event.

Release Note: None

106795: changefeedccl: Cleanup resources when closing file r=miretskiy a=miretskiy

Ensure resources acquired by cloud storage files are released when the sink is closed.

As of #88635, cloud storage uses faster implementation of gzip compression algorithm (along with zstd).  This new implementation is sufficiently different from the standard gzip implementation in that it requires the compression codec to be closed, even when the caller is terminating.  Failure to do so results in the memory as well as the goroutine leakage.

This resource leakage may become sufficiently noticable if the changefeed experiences many repeated errors.

This PR modifies Close() call to make sure that the underlying compression codecs are also closed (Note: we rely on the high level logic in distSQL to ensure that the processor gets orderly shut down, and the shutdown code calls Close() method; However, there is still exists a possiblity that the shutdown might not be orderly, and in those cases resource leakage may still occur.  This possiblity will need to be revisited in the follow on PR).

Fixes #106774

Release note (enterprise change): Fix an issue where the changefeeds emitting to cloud sink with compression may experience resource leakage (memory and go routines) when experiencing transient errors.

106827: insights: fix flakey TestInsightsIntegrationForContention r=j82w a=j82w

The test is flakey sometimes because there was a minimum time check the contention duration. The problem is other parts of the crdb could be slow which caused the contention time to be less than expected. The test now only checks if the value is greater than 0 and is less than 1 minute. 

waiting_txn_fingerprint_id should always have a value. Fixed the check to make sure it's not the default value and renamed variables to match the value.

Fixes: #106622
Release note: None
Epic: none

106840: schemachanger: Fix CREATE SEQUENCE OWNED BY failure r=Xiang-Gu a=Xiang-Gu

Previously, stmts like `CREATE SEQUENCE s OWNED BY col` where table name is missing will fail with an internal error. This commit fix this.

Fix #106838
Release note: None

Co-authored-by: PJ Tatlow <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
miretskiy pushed a commit that referenced this pull request Jul 14, 2023
Ensure resources acquired by cloud storage files are released
when the sink is closed.

As of #88635,
cloud storage uses faster implementation of gzip compression
algorithm (along with zstd).  This new implementation
is sufficiently different from the standard gzip implementation
in that it requires the compression codec to be closed, even
when the caller is terminating.  Failure to do so results in the
memory as well as the goroutine leakage.

This resource leakage may become sufficiently noticable
if the changefeed experiences many repeated errors.

This PR modifies Close() call to make sure that the underlying
compression codecs are also closed (Note: we rely on the high level
logic in distSQL to ensure that the processor gets orderly shut down,
and the shutdown code calls Close() method; However, there is still
exists a possiblity that the shutdown might not be orderly, and in those
cases resource leakage may still occur.  This possiblity will need
to be revisited in the follow on PR).

Fixes #106774

Release note (enterprise change): Fix an issue where the changefeeds
emitting to cloud sink with compression may experience resource leakage
(memory and go routines) when experiencing transient errors.
miretskiy pushed a commit that referenced this pull request Jul 14, 2023
Ensure resources acquired by cloud storage files are released
when the sink is closed.

As of #88635,
cloud storage uses faster implementation of gzip compression
algorithm (along with zstd).  This new implementation
is sufficiently different from the standard gzip implementation
in that it requires the compression codec to be closed, even
when the caller is terminating.  Failure to do so results in the
memory as well as the goroutine leakage.

This resource leakage may become sufficiently noticable
if the changefeed experiences many repeated errors.

This PR modifies Close() call to make sure that the underlying
compression codecs are also closed (Note: we rely on the high level
logic in distSQL to ensure that the processor gets orderly shut down,
and the shutdown code calls Close() method; However, there is still
exists a possiblity that the shutdown might not be orderly, and in those
cases resource leakage may still occur.  This possiblity will need
to be revisited in the follow on PR).

Fixes #106774

Release note (enterprise change): Fix an issue where the changefeeds
emitting to cloud sink with compression may experience resource leakage
(memory and go routines) when experiencing transient errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants