-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circuit-break based on real memory usage #31767
Circuit-break based on real memory usage #31767
Conversation
With this commit we introduce a new circuit-breaking strategy to the parent circuit breaker. Contrary to the current implementation which only accounts for memory reserved via child circuit breakers, the new strategy measures real heap memory usage at the time of reservation. This allows us to be much more aggressive with the circuit breaker limit so we bump it to 95% by default. The new strategy can be turned on with the new cluster setting `indices.breaker.total.userealmemory`. It is on by default if the heap size is smaller than 1GB, otherwise it is turned off.
Pinging @elastic/es-core-infra |
A few more comments on this change for reviewers: Initially I was worried about the overhead of
CPUs ran at their base frequency with the performance CPU governor to reduce measurement noise. So we can expect an overhead of several hundred nanoseconds per call even if we call that operation from lots of threads concurrently. Therefore, I concluded that this operation is cheap enough to call for every check in the parent breaker. I also tested the effectiveness of this new circuit breaker strategy with several macrobenchmarks using the Elasticsearch default distribution with X-Pack Security enabled. I set the heap size for Elasticsearch to 256MB and with the new circuit breaker strategy it could absorb the load in all but one of the benchmarks in our macrobenchmarking suite (bulk-indexing and querying). Of course, the clients got request errors from Elasticsearch - for the full-text benchmark ( Finally, I did a dedicated test to see how Elasticsearch behaves when an aggregation produces a crazy amount of buckets on a larger heap (16GB). I ingested the whole {
"aggs": {
"size_over_time": {
"date_histogram": {
"field": "@timestamp",
"interval": "second"
},
"aggs": {
"sizes": {
"percentiles": {
"field": "size",
"tdigest": {
"compression": 1000
}
}
}
}
}
}
}
While this PR greatly improves the situation w.r.t |
@@ -44,10 +47,24 @@ | |||
|
|||
private static final String CHILD_LOGGER_PREFIX = "org.elasticsearch.indices.breaker."; | |||
|
|||
private static final MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this var name in all caps since it's static and final? We tend to do that all over the place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. That was definitely unintentional.
private final ConcurrentMap<String, CircuitBreaker> breakers = new ConcurrentHashMap<>(); | ||
|
||
public static final Setting<Boolean> USE_REAL_MEMORY_USAGE_SETTING = | ||
Setting.boolSetting("indices.breaker.total.userealmemory", settings -> { | ||
ByteSizeValue maxHeapSize = new ByteSizeValue(memoryMXBean.getHeapMemoryUsage().getMax()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, memoryMXBean.getHeapMemoryUsage().getMax()
returns the maximum configured, not the maximum currently sized right? If someone did:
-Xms1g
-Xmx8g
They aren't going to get 1.37gb because that happens to be what the JVM is currently sized as for the "max"? (In otherwords, there's no heap resizing issue with this right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns the maximum configured heap, i.e. they would get 8589934592
(i.e. 8GB) in your example (I verified this).
for (CircuitBreaker breaker : this.breakers.values()) { | ||
parentEstimated += breaker.getUsed() * breaker.getOverhead(); | ||
} | ||
return parentEstimated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be parentEstimated + newBytesReserved
? We add it to the real memory usage if tracking memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because this is for the current strategy which sums up the total memory reserved by all child circuit breakers. As the corresponding child circuit breaker accounts for that amount of memory already, we do not need to do that again in the parent breaker.
For the new strategy which is based on real memory usage, we do not rely on the child memory circuit breakers but rather only on current memory usage. Hence, we need to consider the amount of memory that is about to be reserved here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -396,6 +396,14 @@ private Settings getRandomNodeSettings(long seed) { | |||
builder.put(MappingUpdatedAction.INDICES_MAPPING_DYNAMIC_TIMEOUT_SETTING.getKey(), new TimeValue(RandomNumbers.randomIntBetween(random, 10, 30), TimeUnit.SECONDS)); | |||
} | |||
|
|||
if (random.nextBoolean()) { | |||
builder.put(HierarchyCircuitBreakerService.USE_REAL_MEMORY_USAGE_SETTING.getKey(), true); | |||
// allow full allocation in tests to avoid breaking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, were we breaking tests somewhere with this set to 95%? It seems like we shouldn't hit 95% with our regular tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. I have the impression that this is mainly caused by two facts:
- We run multiple integration tests in the same JVM process so garbage will add up.
- We do not set a garbage collection algorithm in the integration tests (except for our g1 builds via
-Dtests.jvm.argline
in the Jenkins configuration). This means we let the JVM choose the algorithm. For the Java 10 builds this isG1
, for Java 8 it isParallelGC
. From my tests I had the impression that especially G1 had trouble keeping up with allocation in our tests (as G1 cleans up concurrently) and I experienced circuit breaker issues basically all the time.
I am not sure how we should move forward here in general. G1 is not really optimal for such small heaps but on the other hand we should also run our tests with G1. I already added this to our group's agenda to discuss. Finally, the 95% are heap usage + amount to be reserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, the 95% are heap usage + amount to be reserved.
I suppose in that case it would force a real GC if the heap were at 95% and then a new amount was reserved using new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the GC should get triggered even earlier. For CMS we have configured -XX:CMSInitiatingOccupancyFraction=75
. But as the garbage collector is running concurrently with the application, it can still allocate while the GC is active.
The parent-level breaker can be configured with the following setting: | ||
The parent-level breaker can be configured with the following settings: | ||
|
||
`indices.breaker.total.userealmemory`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rename this setting to be
indices.breaker.total.use_real_memory
Since we usually use _
when using multiple words for setting names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, excited for this to get in though!
|
||
Whether the parent breaker should take real memory usage into account (`true`) or only | ||
consider the amount that is reserved by child circuit breakers (`false`). Defaults to `true` | ||
if the JVM heap size is smaller than 1GB, otherwise `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about today, I think we may want to think about why this defaults to off and put it in the documentation (or at least mention it in this PR). Right now it comes off as more of "it's off by default for > 1gb heaps for no reason in particular" which seems strange given that we're trying to be as safe as possible by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to change the default to the real memory circuit breaker for all cases but initially I wanted to be a bit more conservative. I felt that for larger deployments it is ok to stick with the current implementation. I also thought that it might be one thing less to consider in a major version upgrade if we keep the default for now for those deployments.
My idea behind changing it only for lower heap sizes below 1GB deviation between real memory usage and actively tracked memory usage starts to matter more and we want to ensure we push back accordingly. I did not choose this heap size by coincidence but based on our benchmarks. With 1GB heap, Elasticsearch can handle our macrobenchmark suite but it starts to struggle below that point (e.g. for 768MB).
I could ask how the rest of the team feels turning it on in all cases. Wdyt?
With this commit we request to run a garbage collection after the integration test cluster has started. This avoids garbage from piling up and thus we avoid that the real memory circuit breaker trips.
@elasticmachine test this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea behind changing it only for lower heap sizes below 1GB deviation between real memory usage and actively tracked memory usage starts to matter more and we want to ensure we push back accordingly. I did not choose this heap size by coincidence but based on our benchmarks. With 1GB heap, Elasticsearch can handle our macrobenchmark suite but it starts to struggle below that point (e.g. for 768MB).
I could ask how the rest of the team feels turning it on in all cases. Wdyt?
I think that would be good to pursue, but that doesn't need to block this PR, I think we can discuss it for a follow-up.
LGTM assuming CI is happy :)
for (CircuitBreaker breaker : this.breakers.values()) { | ||
parentEstimated += breaker.getUsed() * breaker.getOverhead(); | ||
} | ||
return parentEstimated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -396,6 +396,14 @@ private Settings getRandomNodeSettings(long seed) { | |||
builder.put(MappingUpdatedAction.INDICES_MAPPING_DYNAMIC_TIMEOUT_SETTING.getKey(), new TimeValue(RandomNumbers.randomIntBetween(random, 10, 30), TimeUnit.SECONDS)); | |||
} | |||
|
|||
if (random.nextBoolean()) { | |||
builder.put(HierarchyCircuitBreakerService.USE_REAL_MEMORY_USAGE_SETTING.getKey(), true); | |||
// allow full allocation in tests to avoid breaking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, the 95% are heap usage + amount to be reserved.
I suppose in that case it would force a real GC if the heap were at 95% and then a new amount was reserved using new
With this commit we turn off the real memory circuit breaker for all integration tests with an internal test cluster because it leads to spurious test failures which are of no value (we cannot fully control heap memory usage in tests)
@dakrone after our discussion yesterday I pushed the following changes:
Can you please have one more look at these changes and give your ok if you're fine with them? Also, in one of my (very many) test runs I spotted the following test error:
This happened on MacOS 10.12.6 with JDK 10 (build 10.0.1+10). The stack trace points to an issue in the (native code of the) JDK. After analyzing the code, I suspect that the problem is the calculation of
I suggest that we handle this as follows:
long currentMemoryUsage() {
try {
return MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed();
} catch (IllegalArgumentException ex) {
logger.warn("Could not determine real memory usage due to JDK issue.", ex);
return 0L;
}
} I think it is better to avoid the As discussed yesterday, this change will also be backported to 6.x but the real memory circuit breaker will be turned off by default (I'll create a separate PR for this). I suggest that we integrate this workaround already in the backport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, the steps you outlined in regard to the JDK bug sound good as well.
I noticed the Netbeans bug report is for OSX also, so perhaps it's an OSX-only issue?
Thank you for the review @dakrone! The JDK issue that I mentioned above is now raised against the OpenJDK repo in https://bugs.openjdk.java.net/browse/JDK-8207200. |
With this commit we disable the real-memory circuit breaker in REST tests as this breaker is based on real memory usage over which we have no (full) control in tests and the REST client is not yet ready to retry on circuit breaker exceptions. This is only meant as a temporary measure to avoid spurious test failures while we ensure that the REST client can handle those situations appropriately. Closes elastic#32050 Relates elastic#31767 Relates elastic#31986
With this commit we disable the real-memory circuit breaker in REST tests as this breaker is based on real memory usage over which we have no (full) control in tests and the REST client is not yet ready to retry on circuit breaker exceptions. This is only meant as a temporary measure to avoid spurious test failures while we ensure that the REST client can handle those situations appropriately. Closes #32050 Relates #31767 Relates #31986 Relates #32074
With this commit we raise the limit of the child circuit breaker used in the unit test for the circuit breaker service so it is high enough to trip only the parent circuit breaker. The previous limit was 300 bytes but theoretically (considering overhead) we could reach 346 bytes. Thus any value larger than 300 bytes could trip the child circuit breaker leading to spurious failures. Relates #31767
With this commit we implement a workaround for https://bugs.openjdk.java.net/browse/JDK-8207200 which is a race condition in the JVM that results in `IllegalArgumentException` to be thrown in rare cases when we determine memory usage via `MemoryMXBean`. As we do not want to fail requests in those cases we always return zero memory usage. Relates elastic#31767
With this commit we implement a workaround for https://bugs.openjdk.java.net/browse/JDK-8207200 which is a race condition in the JVM that results in `IllegalArgumentException` to be thrown in rare cases when we determine memory usage via `MemoryMXBean`. As we do not want to fail requests in those cases we always return zero memory usage. Relates #31767 Relates #33125
When are we planning on back porting this to 6.x versions even now that the OpenJDK bug stands resolved |
Hi, we are using zing jdk11 and we getting errors from breaker about "Data too large...which is larger than the limit of 44GB":
From client side:
jdk version:
breakers stats:
Zing conf:
JVM options:
@danielmitterdorfer @dakrone do you have any idea how to debug the issue? |
Notice that the question on zing was dealt with on discuss. |
With this commit we introduce a new circuit-breaking strategy to the parent
circuit breaker. Contrary to the current implementation which only accounts for
memory reserved via child circuit breakers, the new strategy measures real heap
memory usage at the time of reservation. This allows us to be much more
aggressive with the circuit breaker limit so we bump it to 95% by default. The
new strategy is turned on by default and can be controlled with the new cluster
setting
indices.breaker.total.use_real_memory
.Note that we turn it off for all integration tests with an internal test cluster
because it leads to spurious test failures which are of no value (we cannot
fully control heap memory usage in tests). All REST tests, however, will make
use of the real memory circuit breaker.