Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor GC overhead in circuit breakers #40115

Closed
Bukhtawar opened this issue Mar 15, 2019 · 7 comments
Closed

Factor GC overhead in circuit breakers #40115

Bukhtawar opened this issue Mar 15, 2019 · 7 comments
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload discuss

Comments

@Bukhtawar
Copy link
Contributor

Bukhtawar commented Mar 15, 2019

Problem Description

There are a bunch of circuit breakers that track for estimated bytes consumed for major memory contributors but there is still room for unaccounted memory #35564, #20837, #20250 etc. The newly introduced real circuit breaker #31767 isn't foolproof as large time gaps between reservation and actual allocation can cause few bulky concurrent request to cause OOM.

Proposal

Since large GC overhead is an early sign of struggling GC and non-collectible heap which may keep on growing. The proposal is to factor in overhead in addition to actual heap consumption based on user configurable setting gc_overhead_threshold to detect symptoms early. If the node is already running a high GC overhead based on weighed average of past few GC runs we start to trip requests on the node.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@not-napoleon not-napoleon added >enhancement :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload labels Mar 15, 2019
@Bukhtawar
Copy link
Contributor Author

Bukhtawar commented Apr 11, 2019

Summarizing the proposal

  1. Compute LWMA of avg_collection_time = collection_time / collections (measure of GC effectiveness) and overhead = 100 * current_gc_collection_time / time_elapsed(measure of GC throughput) per GC and possibly expose this metrics via /_node/stats API. The computations are the same from JVMGcMonitorService which currently only serves for logging
  2. The circuit breaker can start to trip request based on configurable thresholds(controlled probably through new settings) or leveraging some factor of monitor.jvm.* settings(will add more details) consuming the above stats[1].
  3. Maintain per node Circuit breaker status OPEN/CLOSED state and time elapsed since the last state change also expose this stats.
  4. Since we are tripping requests to avoid nodes from going OOM it could also so happen that the garbage is non-collectible(possible memory leaks) or stays longer(Fail put-mapping requests sooner if they will exceed the field number limit #35564) and the node takes a fairly longer time to go OOM(longer outages), the node should gracefully shutdown if even after tripping requests the node is unable to recover from slow/long GCs based on the stats from [3] and some configurable thresholds (will add more details)
  5. The nodes can be re-started the same way as OOM scenarios(mostly externally)

@not-napoleon Please share your thoughts on the same

@Bukhtawar
Copy link
Contributor Author

Hi @not-napoleon please let me know what you think on the proposal. I'll be more that happy to work on the PR

@not-napoleon
Copy link
Member

@jaymode Would you mind commenting on this? I labeled it when it came in, but it's not really in my focus area and I don't have a good sense of what the right direction is.

@jaymode
Copy link
Member

jaymode commented Apr 16, 2019

Hi @Bukhtawar, thanks for opening the issue and providing an idea. We'll plan to discuss this as a team and summarize our thoughts after we've discussed. @danielmitterdorfer @dakrone you may be interested in this based on the work you've done on circuit breakers in the past.

@Bukhtawar
Copy link
Contributor Author

Hi @jaymode @danielmitterdorfer @dakrone, did we get a chance to review this. I'm awaiting a response on the proposal.

@jpountz
Copy link
Contributor

jpountz commented Jun 14, 2019

We discussed it in FixitFriday and made the following points:

  • the real-memory circuit breaker is expected to already helping with this since GC pressure typically correlates with high-memory usage
  • the problem is that circuit breakers don't intercept all memory allocations, some of the linked issues are about deserializing mappings for instance, which doesn't try to let circuit breakers know about memory allocations - this is a very hard problem to solve with complex graphs of tiny Java objects
  • we don't see a path forward that wouldn't also increase the risk that we prevent a request that would have run fine otherwise

As a consequence we decided not to move forward on this and close the issue. Thank you for bringing this up though @Bukhtawar!

@jpountz jpountz closed this as completed Jun 14, 2019
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Aug 28, 2019
Recovery requests are likely to trip the cirucit breaker under heavy load
which might lead to unfortunate side-effects in nodes under pressure.
A node temporarily under high load will be less likely to fail recovery
leading to permanent changes in allocation.
Moreover, the requests sent by the recovery mechanism are all bound in
byte size by design. If they are sent in rapid succession then the memory
used by strong references to them will be bounded but the memory used
by unreferenced objects that resulted from them could heavily fluctuate.
This makes the real-memory circuit breaker's memory use estimation
that does not account for GC particularly inefficient when it comes to
recoveries (elastic#40115).

Turn off the circuit breaker for recoveries. Users have other means of
limiting the memory use of recoveries by setting the recovery chunk
size and parallelism. Given the bounded amount of memory used by
recoveries a user can either lower the amount of resources allocated
to recoveries in the settings or adjust the real-memory circuit breaker
limits slightly to account for this change.
I think its a fair assumption that the number of clusters that would
see nodes running out of memory as a result of this change is small.
Also it would be a subset of those clusters that currently see recovery
failures as a result of the circuit breaker and those should be fixed
regardless.

Unfortunately, I don't think we can use the same solution for the
case of replication requests tripping the circuit breaker as those
are not naturally bounded in size.

Relates elastic#44484
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload discuss
Projects
None yet
Development

No branches or pull requests

6 participants