-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support proactive checks in overload manager Api #18256
Support proactive checks in overload manager Api #18256
Conversation
Signed-off-by: Kateryna Nezdolii <[email protected]>
@antoniovicente @mattklein123 @KBaichoo as discussed some time ago in #15707, broke huge change into smaller parts. |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
need to update |
Unfortunately, I currently don't have a lot of cycles for Envoy reviews. Assigning to Kevin to do a first pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Here's a first pass.
Is there's a doc on the rest of the chain of PRs that I can read to uunderstand the bigger picture? Thanks
/** | ||
* Returns current resource usage tracked by monitor. | ||
*/ | ||
virtual int64_t currentResourceUsage() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question: is the semantics of this to to block on that read or will implementors get the most recent read or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callers will get most recent read, added explanation to api docs.
|
||
class ProactiveResourceMonitor { | ||
public: | ||
ProactiveResourceMonitor() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we're using both uint64_t
and int64_t
in this interface, perhaps we can simplify on one to avoid lots of cast. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched all to int64_t
OverloadProactiveResourceName::GlobalDownstreamMaxConnections}}; | ||
}; | ||
|
||
using OverloadProactiveResourceNames = ConstSingleton<OverloadProactiveResourceNameValues>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum and this alias are almost the same apart from an 's'. Perhaps there's a better name?
@@ -46,6 +67,32 @@ class ThreadLocalOverloadState : public ThreadLocal::ThreadLocalObject { | |||
public: | |||
// Get a thread-local reference to the value for the given action key. | |||
virtual const OverloadActionState& getState(const std::string& action) PURE; | |||
/** | |||
* Invokes corresponding resource monitor to allocate resource for given resource monitor in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invokes the corresponding
same below
@@ -46,6 +67,32 @@ class ThreadLocalOverloadState : public ThreadLocal::ThreadLocalObject { | |||
public: | |||
// Get a thread-local reference to the value for the given action key. | |||
virtual const OverloadActionState& getState(const std::string& action) PURE; | |||
/** | |||
* Invokes corresponding resource monitor to allocate resource for given resource monitor in | |||
* thread safe manner. Returns true if there is enough resource quota available and allocation has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a thread safe manner.
Same below.
createProactiveResourceMonitor(const Protobuf::Message& config, | ||
ResourceMonitorFactoryContext& context) PURE; | ||
|
||
std::string category() const override { return "envoy.resource_monitors"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the difference between this factory and the resource monitor factory? (Sorry if I'm being dense, they just look the same and I wonder whether they could be collapsed) Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference is the return type for newly created monitor. We could have had instead two methods within ResourceMonitorFactory
to create regular and proactive resource monitor. The downside there would be that each resource monitor type config factory will always need to provide implementation for both methods and one of them will always be "unimplemented". I see this as more evil (mode code duplication) that duplicating factory class here.
} else { | ||
ENVOY_LOG_MISC(warn, " {Failed to allocate unknown proactive resource }"); | ||
// Resource monitor is not configured, pass through mode. | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not be what the api specified e.g. return false if resource not registered is what I think the api says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
if (proactive_resource->second.tryAllocateResource(increment)) { | ||
return true; | ||
} else { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just do return proactive_resource->second.tryAllocateResource(increment);
// proactive and regular resource monitors in configuration API. But internally we will maintain | ||
// two distinct collections of proactive and regular resources. Proactive resources are not | ||
// subject to periodic flushes and can be recalculated/updated on demand by invoking | ||
// `tryAllocateResource/tryDeallocateResource` via thread local overload state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was very helpful, is there a doc somewhere I can read upon the differences of these some more? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KBaichoo we don't have design doc that describes in detail the latest proposed implementation for proactive checks, there were multiple iterations. I will prep a doc, good idea.
/wait |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
@KBaichoo documented the motivation and technical solution here: https://docs.google.com/document/d/1utDdgePeX8jTTYazi4FO-M73niQk8BQ_6cvrm0bHeIo/edit?usp=sharing. Please let me know if it is detailed enough. |
Spent time working on integration test and then realised it's not possible yet to have an overload integration test for proactive checks because there is no client code using proactive checks yet. |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting together a design doc, add some comments. This lgtm.
absl::optional<std::reference_wrapper<ResourceUpdateCallbacks>> callbacks_; | ||
}; | ||
|
||
class FakeProactiveResourceMonitor : public ProactiveResourceMonitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider refactoring this and the factory like FakeResourceMonitor
. See:
class FakeResourceMonitor : public Server::ResourceMonitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KBaichoo should refactoring go in the same PR or a separate one, wdyt? (i would make it a separate one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, missed this. I think a separate one sgtm.
Signed-off-by: Kateryna Nezdolii <[email protected]>
Clang tidy check fails with error which i don't think is related to this change:
|
/retest |
Retrying Azure Pipelines: |
The CI failure looks irrelevant indeed. Could you perhaps merge the main branch again and see if it helps? /wait |
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
/retest |
Retrying Azure Pipelines: |
This time there is a new problem #19060 with CI. |
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with us @nezdolik. @yanavlasov were you planning to review this as well? This LGTM apart from the nits pointed out.
OverloadProactiveResourceName::GlobalDownstreamMaxConnections)); | ||
bool resource_allocated = manager->getThreadLocalOverloadState().tryAllocateResource( | ||
Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 1); | ||
EXPECT_EQ(true, resource_allocated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EXPECT_TRUE()
EXPECT_EQ(true, resource_allocated); | ||
resource_allocated = manager->getThreadLocalOverloadState().tryAllocateResource( | ||
Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 3); | ||
EXPECT_EQ(false, resource_allocated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: EXPECT_FALSE()
|
||
bool resource_deallocated = manager->getThreadLocalOverloadState().tryDeallocateResource( | ||
Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 1); | ||
EXPECT_EQ(true, resource_deallocated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise
Signed-off-by: Kateryna Nezdolii <[email protected]>
Thanks @KBaichoo @yanavlasov, applied review comments. |
@yanavlasov would you be able to take a look? |
Signed-off-by: Kateryna Nezdolii [email protected]
This PR adds proacrive checks into overload manager/thread local state, when callers of overload manager API can check resource usage instantaneously on demand. This is first PR out of series (full code here: #15707). next PR will include new proactive monitor for downstream connections.
Commit Message: Support reactive checks in overload manager Api
Additional Description:
Risk Level: Low/Medium
Testing:
Docs Changes: NA
Release Notes:
Platform Specific Features:
Fixes #12419