Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jitter to strict and logical dns clusters #35745

Merged
merged 14 commits into from
Aug 25, 2024

Conversation

Stevenjin8
Copy link
Contributor

@Stevenjin8 Stevenjin8 commented Aug 19, 2024

Resolves #35641. Adding DNS jitter to resolvers makes it so that envoy doesnt stampede the DNS server when it has multiple entries with the same expiration.

Testing is still WIP. I am open to any suggestions.

Commit Message: dns: add jitter to strict dns
Additional Description:
Risk Level: low
Testing: unit tests
Docs Changes:
Release Notes:
for the :ref:strict DNS <arch_overview_service_discovery_types_strict_dns> and :ref:logical DNS <arch_overview_service_discovery_types_logical_dns> cluster types,
the new :ref:dns_jitter <envoy_v3_api_field_config.cluster.v3.Cluster.dns_jitter> field, if
provided, will causes the cluster to refresh DNS entries later by a random amount of time as to
avoid stampedes of DNS requests. This field sets the upper bound (exclusive) for the random amount.
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Copy link

Hi @Stevenjin8, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #35745 was opened by Stevenjin8.

see: more, trace.

Signed-off-by: Steven Jin Xuan <[email protected]>
@adisuissa
Copy link
Contributor

Thanks for adding this feature.
Testing can be done by using a mock random number generator that will make your test deterministic.
Can you please also add some context to the PR description (first comment)?

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
High-level question: is this also needed for logical-dns and other DNS-related resolvers?

@Stevenjin8
Copy link
Contributor Author

Thanks! High-level question: is this also needed for logical-dns and other DNS-related resolvers?

My guess is yes, but the original issue mentions respect_dns_ttl=true which is available for both strict and logical resolvers (?). @howardjohn

if (jitter >= ttl_refresh_rate) {
jitter = std::chrono::milliseconds(0);
}
final_refresh_rate = ttl_refresh_rate - jitter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do +- jitter? Or always subtract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did subtract so to avoid having expired entries.

Signed-off-by: Steven Jin Xuan <[email protected]>
@Stevenjin8
Copy link
Contributor Author

Stevenjin8 commented Aug 20, 2024

@adisuissa Do you have any tips on mocking the random generator? The problem is that currently, the random generator is a private member that is initialized within the constructor, so I can't find a way to access it or make it a mock. I could change the initializer and StrictDnsClusterImpl::create to take in a RandomGenerator&, but then its unclear to me who would own the RandomGenerator. I could pass in a unique pointer instead of a reference, but that feels like an overkill. Similarly, I could template with StrictDnsClusterImpl with T such that T is the type of random generator (RandomGeneratorImpl or MockRandomGenerator), but that's even worse.

A simpler way would be to test that the values affected by the random generator fall within a range. For example, instead of

EXPECT_CALL(*resolver.timer_, enableTimer(std::chrono::milliseconds(5000), _));

we would have something like

EXPECT_CALL(*resolver.timer_, enableTimer(testing::Between(std::chrono::milliseconds(5000 - 512), std::chrono::milliseconds(5000)), _));

@@ -74,6 +77,7 @@ class StrictDnsClusterImpl : public BaseDynamicClusterImpl {
Network::DnsLookupFamily dns_lookup_family_;
uint32_t overprovisioning_factor_;
bool weighted_priority_health_;
Random::RandomGeneratorImpl random_generator_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always use the random generator from factory context.

Copy link
Contributor Author

@Stevenjin8 Stevenjin8 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode Where is the random generator from the factory context? I'm looking in cluster_factory.h and I see that it includes random_generator.h, but I don't see any usages of a random generator.

There is one from a a grandparent class, so I think I'll use that. Maybe its the same one.

Signed-off-by: Steven Jin Xuan <[email protected]>
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #35745 was synchronize by Stevenjin8.

see: more, trace.

Signed-off-by: Steven Jin Xuan <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the jitter support!
Left a few comments.

api/envoy/config/cluster/v3/cluster.proto Show resolved Hide resolved
api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
Signed-off-by: Steven Jin Xuan <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Left some minor comments, but otherwise LGTM.
Please also add a release note.
/assign-from @envoyproxy/senior-maintainers

api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
Copy link

@envoyproxy/senior-maintainers assignee is @mattklein123

🐱

Caused by: a #35745 (review) was submitted by @adisuissa.

see: more, trace.

Stevenjin8 and others added 4 commits August 23, 2024 14:47
Co-authored-by: Adi (Suissa) Peleg <[email protected]>
Signed-off-by: Steven Jin <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]>
Signed-off-by: Steven Jin <[email protected]>
Signed-off-by: Steven Jin Xuan <[email protected]>
@Stevenjin8
Copy link
Contributor Author

Thanks. Left some minor comments, but otherwise LGTM. Please also add a release note. /assign-from @envoyproxy/senior-maintainers

Thank you for the review

@Stevenjin8 Stevenjin8 changed the title WIP: Add jitter to strict dns Add jitter to strict dns Aug 23, 2024
@Stevenjin8 Stevenjin8 changed the title Add jitter to strict dns Add jitter to strict and logical dns clusters Aug 23, 2024
@mattklein123 mattklein123 merged commit 20e2788 into envoyproxy:main Aug 25, 2024
46 checks passed
@KBaichoo KBaichoo mentioned this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stampede of DNS requests when TTL expires causes strain on upstream DNS servers
5 participants