misc cleanups & bump gazette #1873

jgraettinger · 2025-01-16T03:39:05Z

Description:

Various minor improvements and refactor cleanups which were rebased / extracted from an abandoned work branch.

No functional changes aside from improved flowctl errors.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

The former produces the complete error chain, while the latter doesn't.

This is only mechanical renames.

Remove `legacyCheckpoint` and `legacyState` migration mechanism, as the migration has been completed. Refactor taskBase.heartbeatLoop() and call it earlier in the lifecycle of captures / derivations / materializations. We can query for the current container at the time of periodic stats generation, rather than waiting until after a first container is up to start the loop.

`controlPlane` encapsulates commonalities in calling control plane APIs on behalf of a data-plane task context.

jgraettinger · 2025-01-24T14:51:03Z

Ping

jshearer

LGTM

jshearer · 2025-01-24T15:49:01Z

go/runtime/task.go

+				"assignment", shard.Assignment().Decoded,
+			)
+
+			// TODO(johnny): Notify control-plane of failure.


What does this mean? AFAIK we currently find out about task errors because they log shard failed. Would this be a different mechanism?

This commit is lifted from a feature branch I wrote a couple of months ago, when the plan was to have reactors call out to the control plane here. Instead, we're going to do more with the shard failed logs we already produce. This should just be removed.

jshearer · 2025-01-24T15:59:57Z

go/runtime/control_plane.go

+		}
+
+		if sc := httpResp.StatusCode; sc >= 500 && sc < 600 {
+			skim.RetryMillis = rand.Uint64N(4_750) + 250 // Random backoff in range [0.250s, 5s].


Are there any downstream consequences of never surfacing these? Like, is there any reason we should limit the number of times we retry a 5xx error before surfacing it?

So far I've only seen these pop up transiently when there's heavy load on agent-api, in which case a retry seems like the right solution.

They do get surfaced in the agent-api Cloud Run service, both as plots and also logs.

We ourselves never return 500's -- they're coming from Cloud Run, for its own reasons, and uncorrelated to anything we're doing. If it has a longer outage, this retry is the best handling we can have that I'm aware of.

(Note also, if we logged them out here, it would be an explosive increase in our own log volume if there were a service-wide cloud run disruption).

jgraettinger requested a review from psFried January 16, 2025 03:39

jgraettinger added 5 commits January 21, 2025 16:05

flowctl: switch some tracing calls to ?anyhow::Error from %anyhow::Error

cfd4000

The former produces the complete error chain, while the latter doesn't.

go/runtime: reduce the export visibility of types which don't need it

78d04f4

This is only mechanical renames.

go/runtime: refactor controlPlane out of controlPlaneAuthorizer

5ca0f72

`controlPlane` encapsulates commonalities in calling control plane APIs on behalf of a data-plane task context.

go.mod: bump gazette to f172f5b

063a0cc

jgraettinger force-pushed the johnny/shard-failure branch from 038baaf to 063a0cc Compare January 21, 2025 22:05

jgraettinger changed the title ~~misc cleanups~~ misc cleanups & bump gazette Jan 21, 2025

jgraettinger requested review from jshearer and removed request for psFried January 22, 2025 21:57

jshearer approved these changes Jan 24, 2025

View reviewed changes

jgraettinger merged commit beb66df into master Jan 24, 2025
4 checks passed

jgraettinger deleted the johnny/shard-failure branch January 24, 2025 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misc cleanups & bump gazette #1873

misc cleanups & bump gazette #1873

jgraettinger commented Jan 16, 2025 •

edited

Loading

jgraettinger commented Jan 24, 2025

jshearer left a comment

jshearer Jan 24, 2025

jgraettinger Jan 24, 2025 •

edited

Loading

jshearer Jan 24, 2025

jgraettinger Jan 24, 2025 •

edited

Loading

misc cleanups & bump gazette #1873

misc cleanups & bump gazette #1873

Conversation

jgraettinger commented Jan 16, 2025 • edited Loading

jgraettinger commented Jan 24, 2025

jshearer left a comment

Choose a reason for hiding this comment

jshearer Jan 24, 2025

Choose a reason for hiding this comment

jgraettinger Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

jshearer Jan 24, 2025

Choose a reason for hiding this comment

jgraettinger Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

jgraettinger commented Jan 16, 2025 •

edited

Loading

jgraettinger Jan 24, 2025 •

edited

Loading

jgraettinger Jan 24, 2025 •

edited

Loading