changefeedccl: Do not block on file size based flushes #88395

miretskiy · 2022-09-21T20:27:32Z

Prior to this change, cloud storage sink trigger
file sized based flush whenever new row would
would push the file size beyond configured threshold.

This had the effect of singificantly reducing the throughput whenever such event occured -- no additional events could be added to cloud storage sink, while the previus flush was active.

This is not necessary. Cloud storage sink can trigger file based flushes asynchronously. The only requirement is that if a real, non file based, flush arrives, or if we need to emit resolved timestamps, then we must wait for all of the active flush requests to complete.

In addition, because every event added to cloud sink has associate allocation, which is released when file is written out, performing flushes asynchronously is safe with respect to memory usage and accounting.

Release note (enterprise change): Changefeeds, using cloud storage sink, now have better throughput.
Release justification: performance fix

cockroach-teamcity · 2022-09-21T20:27:40Z

This change is

Prior to this change, cloud storage sink trigger file sized based flush whenever new row would would push the file size beyond configured threshold. This had the effect of singificantly reducing the throughput whenever such event occured -- no additional events could be added to cloud storage sink, while the previus flush was active. This is not necessary. Cloud storage sink can trigger file based flushes asynchronously. The only requirement is that if a real, non file based, flush arrives, or if we need to emit resolved timestamps, then we must wait for all of the active flush requests to complete. In addition, because every event added to cloud sink has associate allocation, which is released when file is written out, performing flushes asynchronously is safe with respect to memory usage and accounting. Release note (enterprise change): Changefeeds, using cloud storage sink, now have better throughput. Release justification: performance fix

miretskiy · 2022-09-23T12:26:41Z

@jayshrivastava @ajwerner -- I think based on performance numbers so far, this PR can be reviewedc/merged/backported.

miretskiy · 2022-09-23T13:25:06Z

Tests on a 340G table indicate 2.5x performance impact.

jayshrivastava

This is great work. Left minor comments/discussions.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

pkg/ccl/changefeedccl/sink_cloudstorage.go line 623 at r1 (raw file):

		}
	}
	s.asyncFlushActive = asyncFlushEnabled

This code looks good, but we don't explicitly test changing this setting in the middle of operation. There may be some subtle bug. Also, if we use s.asyncFlushActive here, we can assume it won't change while the changefeed is running. This would make it easier to reason about what's happening in case of any CI failures or in case we want to update this in the future. If you're confident, feel free to keep it.

pkg/ccl/changefeedccl/sink_cloudstorage.go line 694 at r1 (raw file):

func (s *cloudStorageSink) Close() error {
	s.files = nil
	return errors.CombineErrors(s.waitAsyncFlush(), s.es.Close())

I was concerned about waiting for a flush here, but I think this is the right behavior. If a changefeed stops because it's done, we should wait for files to be flushed. If we close due to an error, then there is no harm in flushing what is in flight.

This makes me wonder about the parallel consumer. Do you think it should also wait for in flight events to be emitted to the sink? The change aggregator calls eventConsumer.Close() then sink.Close(). Right now, it does not wait for events to be flushed to the sink.

miretskiy

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)

pkg/ccl/changefeedccl/sink_cloudstorage.go line 623 at r1 (raw file):

Previously, jayshrivastava wrote…

This code looks good, but we don't explicitly test changing this setting in the middle of operation. There may be some subtle bug. Also, if we use s.asyncFlushActive here, we can assume it won't change while the changefeed is running. This would make it easier to reason about what's happening in case of any CI failures or in case we want to update this in the future. If you're confident, feel free to keep it.

I mean, asyncFlushActive gets assigned the current value of the enable setting.
It's fine if you change this setting -- the fact that I wait above makes sure that the
old async behavior is reverted. I also tested it on the cluster. I could write a test
specifically around that... Not sure if it will be that valuable though.

pkg/ccl/changefeedccl/sink_cloudstorage.go line 694 at r1 (raw file):

Previously, jayshrivastava wrote…

I was concerned about waiting for a flush here, but I think this is the right behavior. If a changefeed stops because it's done, we should wait for files to be flushed. If we close due to an error, then there is no harm in flushing what is in flight.

This makes me wonder about the parallel consumer. Do you think it should also wait for in flight events to be emitted to the sink? The change aggregator calls eventConsumer.Close() then sink.Close(). Right now, it does not wait for events to be flushed to the sink.

It is definitely the right behavior. Whether it stops because it's done, or because of an error, we have
to wait ; if the context given to this sink was cancelled, then whatever IO performed will be wrapped up; we would return an error (into flushErr) and we would wait and reap go routines.
And it is also the right thing to do if we shut down properly -- w/out an error: we must flush any in progress data.

This flushing stuff is subtle. TLDR: no need to flush.
Two cases to consider:

We are exiting because of an error. No need to flush in the close method -- just cleanup (i.e. close external storage, wait for go routines, etc).
Exit because we are done. We could be done because of user action (cancel) -- no need to flush, though I suppose we could. Or we are exiting because of e.g. schema change policy = stop, or perhaps we are running w/ initial scan only. In those cases, we emit resolved span events with EXIT marker. This gets noticed by change aggregator which flushes sink explicitly once its frontier advances -- and then this gets forward to coordinator, and then we exit. That is: there is a flush that's happening anyway.

miretskiy · 2022-09-23T16:13:31Z

bors r+

craig · 2022-09-23T17:44:11Z

Build succeeded:

Bazel Essential CI (Cockroach)

ajwerner · 2022-10-12T15:36:46Z

I think this is straight up wrong. I think for a given change aggregator we have to wait for an earlier file flush to succeed before starting to flush the later one.

miretskiy · 2022-10-13T00:25:00Z

I think it's a pessimistic assessment. Explicit flushes wait for outstanding. The chances of reordering with multi megabyte files are slim

miretskiy requested a review from jayshrivastava September 21, 2022 20:27

miretskiy marked this pull request as ready for review September 21, 2022 20:27

miretskiy requested a review from a team as a code owner September 21, 2022 20:27

miretskiy requested review from ajwerner and removed request for a team September 21, 2022 20:27

miretskiy force-pushed the async branch 4 times, most recently from 9a4c534 to 3ac7fff Compare September 22, 2022 13:20

miretskiy force-pushed the async branch from 3ac7fff to 81f9932 Compare September 22, 2022 22:40

jayshrivastava approved these changes Sep 23, 2022

View reviewed changes

miretskiy requested a review from jayshrivastava September 23, 2022 15:26

miretskiy commented Sep 23, 2022

View reviewed changes

shermanCRL mentioned this pull request Sep 23, 2022

changefeedccl: macro benchmarks over several performance improvements #87921

Closed

craig bot merged commit 6971977 into cockroachdb:master Sep 23, 2022

cockroach-teamcity mentioned this pull request Sep 23, 2022

PR #88395 - changefeedccl: Do not block on file size based flushes cockroachdb/docs#15188

Closed

jayshrivastava mentioned this pull request Sep 27, 2022

changefeedccl: parallelize event consumption #87994

Merged

miretskiy mentioned this pull request Sep 28, 2022

release-22.2: Changefeed backfill performance fixes #88915

Closed

jayshrivastava mentioned this pull request Oct 12, 2022

changefeedccl: Async flushes may produce out of order events #89683

Closed

jayshrivastava added the backport-22.2.x label Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: Do not block on file size based flushes #88395

changefeedccl: Do not block on file size based flushes #88395

miretskiy commented Sep 21, 2022

cockroach-teamcity commented Sep 21, 2022

miretskiy commented Sep 23, 2022

miretskiy commented Sep 23, 2022 •

edited

Loading

jayshrivastava left a comment

miretskiy left a comment

miretskiy commented Sep 23, 2022

craig bot commented Sep 23, 2022

ajwerner commented Oct 12, 2022

miretskiy commented Oct 13, 2022

changefeedccl: Do not block on file size based flushes #88395

changefeedccl: Do not block on file size based flushes #88395

Conversation

miretskiy commented Sep 21, 2022

cockroach-teamcity commented Sep 21, 2022

miretskiy commented Sep 23, 2022

miretskiy commented Sep 23, 2022 • edited Loading

jayshrivastava left a comment

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

miretskiy commented Sep 23, 2022

craig bot commented Sep 23, 2022

ajwerner commented Oct 12, 2022

miretskiy commented Oct 13, 2022

miretskiy commented Sep 23, 2022 •

edited

Loading