feat(globalconfig): Only report upstream errors after elapsed interval #2628

iker-barriocanal · 2023-10-18T12:33:16Z

Especially during deployments, Relay sometimes faces some 503s or timeouts from upstream, and reports an error. Having a handful of such errors is noisy and doesn't help to identify when problems really exist.

This change introduces an interval the global config service has to wait before reporting such errors, while consistently failing fetches.

Note that the interval does not impact other types of errors, like failure to send requests to upstream (e.g. network is down) or global config missing from the response. These errors are consistent and should be reported as early as possible.

Resolves #2609.

Especially during deployments, Relay sometimes faces some 503s or timeouts from upstream, and reports an error. Having a handful of such errors is noisy and doesn't help to identify when problems really exist. This change introduces an interval the global config service has to wait before reporting such errors, while consistently failing fetches. Note that the interval does not impact other type of errors, like failure to send requests to upstream (e.g. network is down) or global config missing from the response. These errors are consistent and should be reported as early as possible.

iker-barriocanal · 2023-10-18T12:35:55Z

relay-server/src/actors/global_config.rs

@@ -164,6 +170,8 @@ impl GlobalConfigService {
            internal_rx,
            upstream,
            fetch_handle: SleepHandle::idle(),
+            last_fetched: Instant::now(),
+            upstream_failure_interval: Duration::from_secs(35),


35 is arbitrary, but should have time for 3 requests (after 10s each).

Should this be in Config?

We may not need it for now, but I'll keep this in mind to address it in the future.

jjbayer

I would make the interval configurable, apart from that LGTM!

jjbayer · 2023-10-18T13:44:11Z

relay-server/src/actors/global_config.rs

@@ -164,6 +170,8 @@ impl GlobalConfigService {
            internal_rx,
            upstream,
            fetch_handle: SleepHandle::idle(),
+            last_fetched: Instant::now(),
+            upstream_failure_interval: Duration::from_secs(35),


Should this be in Config?

iker-barriocanal requested a review from a team October 18, 2023 12:33

iker-barriocanal self-assigned this Oct 18, 2023

update changelog

937545d

iker-barriocanal commented Oct 18, 2023

View reviewed changes

jjbayer approved these changes Oct 18, 2023

View reviewed changes

iker-barriocanal merged commit a378fff into master Oct 18, 2023

iker-barriocanal deleted the iker/feat/globalconfig-error-interval branch October 18, 2023 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(globalconfig): Only report upstream errors after elapsed interval #2628

feat(globalconfig): Only report upstream errors after elapsed interval #2628

iker-barriocanal commented Oct 18, 2023

iker-barriocanal Oct 18, 2023

jjbayer Oct 18, 2023

iker-barriocanal Oct 18, 2023

jjbayer left a comment

jjbayer Oct 18, 2023

feat(globalconfig): Only report upstream errors after elapsed interval #2628

feat(globalconfig): Only report upstream errors after elapsed interval #2628

Conversation

iker-barriocanal commented Oct 18, 2023

iker-barriocanal Oct 18, 2023

Choose a reason for hiding this comment

jjbayer Oct 18, 2023

Choose a reason for hiding this comment

iker-barriocanal Oct 18, 2023

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

jjbayer Oct 18, 2023

Choose a reason for hiding this comment