Support proactive checks in overload manager Api #18256

nezdolik · 2021-09-24T15:01:59Z

Signed-off-by: Kateryna Nezdolii [email protected]

This PR adds proacrive checks into overload manager/thread local state, when callers of overload manager API can check resource usage instantaneously on demand. This is first PR out of series (full code here: #15707). next PR will include new proactive monitor for downstream connections.

Commit Message: Support reactive checks in overload manager Api
Additional Description:
Risk Level: Low/Medium
Testing:
Docs Changes: NA
Release Notes:
Platform Specific Features:
Fixes #12419

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-09-24T15:04:27Z

@antoniovicente @mattklein123 @KBaichoo as discussed some time ago in #15707, broke huge change into smaller parts.

envoy/server/resource_monitor.h

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-09-27T17:02:34Z

need to update overload_integration_test to support proactive resource monitors as well

antoniovicente · 2021-09-27T19:55:33Z

Unfortunately, I currently don't have a lot of cycles for Envoy reviews. Assigning to Kevin to do a first pass.

KBaichoo

Thanks for working on this. Here's a first pass.

Is there's a doc on the rest of the chain of PRs that I can read to uunderstand the bigger picture? Thanks

KBaichoo · 2021-09-27T20:56:47Z

envoy/server/overload/proactive_resource_monitor.h

+  /**
+   * Returns current resource usage tracked by monitor.
+   */
+  virtual int64_t currentResourceUsage() const PURE;


Naive question: is the semantics of this to to block on that read or will implementors get the most recent read or?

Callers will get most recent read, added explanation to api docs.

KBaichoo · 2021-09-27T21:00:20Z

envoy/server/overload/proactive_resource_monitor.h

+
+class ProactiveResourceMonitor {
+public:
+  ProactiveResourceMonitor() = default;


Seems like we're using both uint64_t and int64_t in this interface, perhaps we can simplify on one to avoid lots of cast. Thoughts?

switched all to int64_t

KBaichoo · 2021-09-28T00:47:03Z

envoy/server/overload/thread_local_overload_state.h

+           OverloadProactiveResourceName::GlobalDownstreamMaxConnections}};
+};
+
+using OverloadProactiveResourceNames = ConstSingleton<OverloadProactiveResourceNameValues>;


The enum and this alias are almost the same apart from an 's'. Perhaps there's a better name?

KBaichoo · 2021-09-28T00:47:53Z

envoy/server/overload/thread_local_overload_state.h

@@ -46,6 +67,32 @@ class ThreadLocalOverloadState : public ThreadLocal::ThreadLocalObject {
 public:
  // Get a thread-local reference to the value for the given action key.
  virtual const OverloadActionState& getState(const std::string& action) PURE;
+  /**
+   * Invokes corresponding resource monitor to allocate resource for given resource monitor in


Invokes the corresponding

same below

KBaichoo · 2021-09-28T00:48:45Z

envoy/server/overload/thread_local_overload_state.h

@@ -46,6 +67,32 @@ class ThreadLocalOverloadState : public ThreadLocal::ThreadLocalObject {
 public:
  // Get a thread-local reference to the value for the given action key.
  virtual const OverloadActionState& getState(const std::string& action) PURE;
+  /**
+   * Invokes corresponding resource monitor to allocate resource for given resource monitor in
+   * thread safe manner. Returns true if there is enough resource quota available and allocation has


in a thread safe manner.

Same below.

KBaichoo · 2021-09-28T00:53:21Z

envoy/server/resource_monitor_config.h

+  createProactiveResourceMonitor(const Protobuf::Message& config,
+                                 ResourceMonitorFactoryContext& context) PURE;
+
+  std::string category() const override { return "envoy.resource_monitors"; }


Whats the difference between this factory and the resource monitor factory? (Sorry if I'm being dense, they just look the same and I wonder whether they could be collapsed) Thanks

The only difference is the return type for newly created monitor. We could have had instead two methods within ResourceMonitorFactory to create regular and proactive resource monitor. The downside there would be that each resource monitor type config factory will always need to provide implementation for both methods and one of them will always be "unimplemented". I see this as more evil (mode code duplication) that duplicating factory class here.

KBaichoo · 2021-09-28T00:59:50Z

source/server/overload_manager_impl.cc

+    } else {
+      ENVOY_LOG_MISC(warn, " {Failed to allocate unknown proactive resource }");
+      // Resource monitor is not configured, pass through mode.
+      return true;


This seems not be what the api specified e.g. return false if resource not registered is what I think the api says.

See: https://github.com/envoyproxy/envoy/pull/18256/files#diff-2f26eb90eacceaaace91ff4f8dd5084b053cf741d7facc156b11b34c2c5c867dR73

good catch!

KBaichoo · 2021-09-28T01:01:11Z

source/server/overload_manager_impl.cc

+      if (proactive_resource->second.tryAllocateResource(increment)) {
+        return true;
+      } else {
+        return false;
+      }


Could just do return proactive_resource->second.tryAllocateResource(increment);

KBaichoo · 2021-09-28T01:06:17Z

source/server/overload_manager_impl.cc

+  // proactive and regular resource monitors in configuration API. But internally we will maintain
+  // two distinct collections of proactive and regular resources. Proactive resources are not
+  // subject to periodic flushes and can be recalculated/updated on demand by invoking
+  // `tryAllocateResource/tryDeallocateResource` via thread local overload state.


This comment was very helpful, is there a doc somewhere I can read upon the differences of these some more? Thanks!

@KBaichoo we don't have design doc that describes in detail the latest proposed implementation for proactive checks, there were multiple iterations. I will prep a doc, good idea.

KBaichoo · 2021-10-05T19:30:45Z

/wait

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-10-15T18:13:59Z

@KBaichoo documented the motivation and technical solution here: https://docs.google.com/document/d/1utDdgePeX8jTTYazi4FO-M73niQk8BQ_6cvrm0bHeIo/edit?usp=sharing. Please let me know if it is detailed enough.

nezdolik · 2021-10-15T18:15:23Z

Spent time working on integration test and then realised it's not possible yet to have an overload integration test for proactive checks because there is no client code using proactive checks yet.

Signed-off-by: Kateryna Nezdolii <[email protected]>

KBaichoo

Thanks for putting together a design doc, add some comments. This lgtm.

KBaichoo · 2021-10-15T20:25:29Z

test/server/overload_manager_impl_test.cc

+  absl::optional<std::reference_wrapper<ResourceUpdateCallbacks>> callbacks_;
+};
+
+class FakeProactiveResourceMonitor : public ProactiveResourceMonitor {


Maybe consider refactoring this and the factory like FakeResourceMonitor. See:

envoy/test/integration/fake_resource_monitor.h

Line 12 in 1d9dc75

class FakeResourceMonitor : public Server::ResourceMonitor {

@KBaichoo should refactoring go in the same PR or a separate one, wdyt? (i would make it a separate one)

Whoops, missed this. I think a separate one sgtm.

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-10-18T22:07:57Z

Clang tidy check fails with error which i don't think is related to this change:

clang-tidy check failed, potentially fixed by clang-apply-replacements:
Diagnostics:
- BuildDirectory: /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy
  DiagnosticMessage:
    FileOffset: 313
    FilePath: ./source/common/common/hash.h
    Message: '''xxhash.h'' file not found'
    Replacements: []
  DiagnosticName: clang-diagnostic-error
  Level: Error
  Ranges:
  - {FileOffset: 313, FilePath: ./source/common/common/hash.h, Length: 10}

nezdolik · 2021-10-18T22:29:45Z

/retest

repokitteh-read-only · 2021-10-18T22:29:49Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18256 (comment) was created by @nezdolik.

see: more, trace.

rojkov · 2021-11-15T09:19:18Z

The CI failure looks irrelevant indeed. Could you perhaps merge the main branch again and see if it helps?

/wait

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-11-19T11:07:46Z

/retest

repokitteh-read-only · 2021-11-19T11:07:49Z

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18256 (comment) was created by @nezdolik.

see: more, trace.

rojkov · 2021-11-19T11:32:15Z

This time there is a new problem #19060 with CI.

nezdolik · 2021-11-19T19:00:29Z

/retest

repokitteh-read-only · 2021-11-19T19:00:33Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18256 (comment) was created by @nezdolik.

see: more, trace.

KBaichoo

Thanks for bearing with us @nezdolik. @yanavlasov were you planning to review this as well? This LGTM apart from the nits pointed out.

KBaichoo · 2021-11-22T13:57:40Z

test/server/overload_manager_impl_test.cc

+      OverloadProactiveResourceName::GlobalDownstreamMaxConnections));
+  bool resource_allocated = manager->getThreadLocalOverloadState().tryAllocateResource(
+      Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 1);
+  EXPECT_EQ(true, resource_allocated);


nit: EXPECT_TRUE()

KBaichoo · 2021-11-22T13:57:55Z

test/server/overload_manager_impl_test.cc

+  EXPECT_EQ(true, resource_allocated);
+  resource_allocated = manager->getThreadLocalOverloadState().tryAllocateResource(
+      Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 3);
+  EXPECT_EQ(false, resource_allocated);


nit: EXPECT_FALSE()

KBaichoo · 2021-11-22T13:58:06Z

test/server/overload_manager_impl_test.cc

+
+  bool resource_deallocated = manager->getThreadLocalOverloadState().tryDeallocateResource(
+      Server::OverloadProactiveResourceName::GlobalDownstreamMaxConnections, 1);
+  EXPECT_EQ(true, resource_deallocated);


Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2021-11-26T14:31:23Z

Thanks @KBaichoo @yanavlasov, applied review comments.

nezdolik · 2021-12-01T21:49:40Z

@yanavlasov would you be able to take a look?

Support reactive checks in overload manager Api

27f8ac0

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik requested a review from htuch as a code owner September 24, 2021 15:01

nezdolik changed the title ~~Support reactive checks in overload manager Api~~ Support proactive checks in overload manager Api Sep 24, 2021

nezdolik commented Sep 24, 2021

View reviewed changes

envoy/server/resource_monitor.h Outdated Show resolved Hide resolved

Fix format

ae0faab

Signed-off-by: Kateryna Nezdolii <[email protected]>

htuch assigned antoniovicente Sep 26, 2021

Fix compile error for integration tests

b0afa36

Signed-off-by: Kateryna Nezdolii <[email protected]>

antoniovicente assigned KBaichoo and unassigned antoniovicente Sep 27, 2021

KBaichoo reviewed Sep 28, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Oct 5, 2021

Kateryna Nezdolii added 2 commits October 15, 2021 14:11

Moving classes around

9ef48cc

Signed-off-by: Kateryna Nezdolii <[email protected]>

Apply review comments

7e9319c

Signed-off-by: Kateryna Nezdolii <[email protected]>

repokitteh-read-only bot removed the waiting label Oct 15, 2021

Kateryna Nezdolii added 2 commits October 15, 2021 20:11

Merge remote-tracking branch 'origin/main'

22a3389

Signed-off-by: Kateryna Nezdolii <[email protected]>

fix clang tidy

edb87d6

Signed-off-by: Kateryna Nezdolii <[email protected]>

KBaichoo previously approved these changes Oct 15, 2021

View reviewed changes

mattklein123 dismissed KBaichoo’s stale review via edb87d6 October 15, 2021 20:59

Merge remote-tracking branch 'origin/main'

d460ebc

Signed-off-by: Kateryna Nezdolii <[email protected]>

yanavlasov self-assigned this Nov 10, 2021

repokitteh-read-only bot added the waiting label Nov 15, 2021

Kateryna Nezdolii added 2 commits November 19, 2021 10:56

Fix merge conflict

5017e97

Signed-off-by: Kateryna Nezdolii <[email protected]>

Merge remote-tracking branch 'origin/main'

ce930d2

Signed-off-by: Kateryna Nezdolii <[email protected]>

repokitteh-read-only bot removed the waiting label Nov 19, 2021

KBaichoo previously approved these changes Nov 22, 2021

View reviewed changes

Apply review comments

4787f8b

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik dismissed KBaichoo’s stale review via 4787f8b November 26, 2021 10:46

KBaichoo approved these changes Nov 29, 2021

View reviewed changes

yanavlasov approved these changes Dec 2, 2021

View reviewed changes

yanavlasov merged commit 4b5eee6 into envoyproxy:main Dec 2, 2021

Support proactive checks in overload manager Api #18256

Support proactive checks in overload manager Api #18256

Conversation

nezdolik commented Sep 24, 2021 • edited Loading

nezdolik commented Sep 24, 2021 • edited Loading

nezdolik commented Sep 27, 2021

antoniovicente commented Sep 27, 2021

KBaichoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KBaichoo commented Oct 5, 2021

nezdolik commented Oct 15, 2021 • edited Loading

nezdolik commented Oct 15, 2021

KBaichoo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nezdolik commented Oct 18, 2021

nezdolik commented Oct 18, 2021

repokitteh-read-only bot commented Oct 18, 2021

rojkov commented Nov 15, 2021

nezdolik commented Nov 19, 2021

repokitteh-read-only bot commented Nov 19, 2021

rojkov commented Nov 19, 2021

nezdolik commented Nov 19, 2021

repokitteh-read-only bot commented Nov 19, 2021

KBaichoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nezdolik commented Nov 26, 2021

nezdolik commented Dec 1, 2021

nezdolik commented Sep 24, 2021 •

edited

Loading

nezdolik commented Sep 24, 2021 •

edited

Loading

nezdolik commented Oct 15, 2021 •

edited

Loading

KBaichoo left a comment •

edited

Loading