Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: changefeed concurrency and frontier observability #67206 #67268

Merged
merged 2 commits into from
Jul 6, 2021

Conversation

miretskiy
Copy link
Contributor

Stop relying on ExportRequestLimit to determine the number of concurrent
export requests, and introduce a decidated ScanRequestLimit setting.

If the setting is specified, uses that setting; otherwise, the default
value is computed as 3 * (number of nodes in the cluster), which is the
old behavior, but we cap this number so that concurrency does not get
out of hand if running in a very large cluster.

Fixes #67190

Improve Observability of of change frontier updates.

Add a metric to keep track of the number of frontier updates in the
changefeed. Add logging when job progress updates take excessive amount
of time.

Fixes #67192

Release Nodes: Provide a better configurability of scan request
concurrency. Scan requests are issued by changefeeds during the
backfill.

@miretskiy miretskiy requested review from stevendanna and HonoreDB July 6, 2021 13:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miretskiy
Copy link
Contributor Author

@HonoreDB I pushed PR you already looked at to another branch. Can you take another look?

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)


pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

	// This is all hand-wavy: 3 per node used to be the default for a very long time.
	// However, this could get out of hand if the clusters are large.
	// So cap the max to an arbitrary value of a 100.

I like the addition of the cap (although perhaps moving it to a constant might be nice.

One question I have is why we want to scale by the node count at this point at all. We will naturally be scaling the number of scans that happen by the node count because change aggregators for different ranges are distributed across all of the nodes, so why account for it again here?

In a conversation about this, one item that came up is that in the case of core changefeeds, this makes a bit more sense.

To keep it backportable I think it makes sense to leave it as you have it here, but perhaps this is something we could follow-up on.

Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 1 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @stevendanna)


pkg/ccl/changefeedccl/changefeed_processors.go, line 1291 at r2 (raw file):

	}()

	cf.metrics.Flushes.Inc(1)

Is Flushes the right metric to be incrementing here?


pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I like the addition of the cap (although perhaps moving it to a constant might be nice.

One question I have is why we want to scale by the node count at this point at all. We will naturally be scaling the number of scans that happen by the node count because change aggregators for different ranges are distributed across all of the nodes, so why account for it again here?

In a conversation about this, one item that came up is that in the case of core changefeeds, this makes a bit more sense.

To keep it backportable I think it makes sense to leave it as you have it here, but perhaps this is something we could follow-up on.

I think the rationale is that it's different kinds of nodes that can in theory be scaled independently? But fair point, as-is this doesn't really help with that, it just implicitly assumes that our ability to serve ExportRequests is O(number of changefeed aggregator nodes * total number of nodes =~ number of nodes squared), if I understand correctly, which grows faster than our actual capacity.

@miretskiy miretskiy requested a review from stevendanna July 6, 2021 14:58
Copy link
Contributor Author

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, and @stevendanna)


pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

I think the rationale is that it's different kinds of nodes that can in theory be scaled independently? But fair point, as-is this doesn't really help with that, it just implicitly assumes that our ability to serve ExportRequests is O(number of changefeed aggregator nodes * total number of nodes =~ number of nodes squared), if I understand correctly, which grows faster than our actual capacity.

I would like to keep things as they are so that this change can be backported.
We should specify an argument to kvfeed whether or not the changefeed is a core style changefeed.

Yevgeniy Miretskiy added 2 commits July 6, 2021 11:25
Stop relying on ExportRequestLimit to determine the number of concurrent
export requests, and introduce a decidated ScanRequestLimit setting.

If the setting is specified, uses that setting; otherwise, the default
value is computed as 3 * (number of nodes in the cluster), which is the
old behavior, but we cap this number so that concurrency does not get
out of hand if running in a very large cluster.

Fixes cockroachdb#67190

Release Nodes: Provide a better configurability of scan request
concurrency.  Scan requests are issued by changefeeds during the
backfill.
Add a metric to keep track of the number of frontier updates in the
changefeed.  Add logging when job progress updates take excessive amount
of time.

Fixes cockroachdb#67192

Release Notes: None
@miretskiy miretskiy force-pushed the backfill_concurrency branch from fefdf03 to e5a133e Compare July 6, 2021 15:26
Copy link
Contributor

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @HonoreDB, @miretskiy, and @stevendanna)

@miretskiy
Copy link
Contributor Author

tftr
bors r+

@craig
Copy link
Contributor

craig bot commented Jul 6, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants