-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure IndexSearcher.maxClauseCount() based on Node characteristics #81525
Conversation
Pinging @elastic/es-search (Team:Search) |
I'm not totally happy with the testing on this yet, and it will need some documentation as well, but it's good enough for a preliminary review I think. |
public static void configureMaxClauses(ThreadPool threadPool) { | ||
int searchThreadPoolSize = threadPool.info(ThreadPool.Names.SEARCH).getMax(); | ||
long heapSize = JvmStats.jvmStats().getMem().getHeapMax().getGb(); | ||
configureMaxClauses(searchThreadPoolSize, heapSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you start Elasticsearch with 512MB of heap, does it mean you get zero max clauses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If heapSize
is zero then we stick with the default value from lucene, currently 1024. But maybe we should stick with the current ES default of 4096 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated this so that we will always return a minimum of 4096, or larger if the heap/cpu size permits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work if we used the heap size in GB as a fractional number instead?
return; // If we don't know how to size things, keep the lucene default | ||
} | ||
|
||
int maxClauseCount = Math.toIntExact(heapInGb * 65_536 / threadPoolSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some explanation as to where this 65,536 number comes from?
+ "This limit can be set by changing the [" | ||
+ SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.getKey() | ||
+ "] setting." | ||
"Number of filters is too large, must be less than or equal to: [" + maxFilters + "] but was [" + maxFiltersPlusOne + "]." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add guidance as to how to address this error? E.g. can we check if the user overrode the size of the search thread and suggest undoing the change? Or otherwise recommend scaling up or avdoiding running queries with such large numbers of clauses?
@@ -268,7 +267,8 @@ | |||
4096, | |||
1, | |||
Integer.MAX_VALUE, | |||
Setting.Property.NodeScope | |||
Setting.Property.NodeScope, | |||
Setting.Property.DeprecatedWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we raise a deprecation warning for node settings, only on startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have confirmed with @pgomulka that this will only be emitted on startup, yes.
I've reworked things a bit: the utils class now only calculates the max clauses value, it is set directly in the node startup code, and we use the current default value (4096) as a minimum so that we won't inadvertently limit users query sizes when then upgrade. |
Interesting, this causes |
@elasticmachine run elasticsearch-ci/part-2 |
of clauses of a single `bool` query. It now applies to the total number of | ||
clauses of the rewritten query. To reduce chances of breaks, its | ||
default value has been bumped from 1024 to 4096. | ||
Elasticsearch will now dynamically set the maximum number of allowed clauses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need a word about the fact that Elasticsearch now checks the number of clauses across the entire query rather than on a per-booleanquery basis? So that anyone who has used the hack that consists of splitting boolean queries knows that they can undo it.
Elasticsearch will now dynamically set the maximum number of allowed clauses | ||
in a query, using a heuristic based on the size of the search thread pool and | ||
the size of the heap allocated to the JVM. This limit has a minimum value of | ||
4096 (the previous default) and will in most cases be larger (for example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1024 was the previous value, not 4096?
you might need to increase it further. | ||
Queries with many clauses should be avoided whenever possible. | ||
If you previously bumped this setting to accommodate heavy queries, | ||
you might need to increase the amount of memory available to elasticsearch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to increase the amount of memory available to elasticsearch, | |
you might need to increase the amount of memory available to Elasticsearch, |
public static void configureMaxClauses(ThreadPool threadPool) { | ||
int searchThreadPoolSize = threadPool.info(ThreadPool.Names.SEARCH).getMax(); | ||
long heapSize = JvmStats.jvmStats().getMem().getHeapMax().getGb(); | ||
configureMaxClauses(searchThreadPoolSize, heapSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work if we used the heap size in GB as a fractional number instead?
default value has been bumped from 1024 to 4096. | ||
Elasticsearch will now dynamically set the maximum number of allowed clauses | ||
in a query, using a heuristic based on the size of the search thread pool and | ||
the size of the heap allocated to the JVM. This limit has a minimum value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to clarify that this number increases with heap size and decreases with the size of the threadpool, wdyt?
assertEquals(4096, SearchUtils.calculateMaxClauseValue(4, 0)); | ||
|
||
// Number of processors not available | ||
assertEquals(4096, SearchUtils.calculateMaxClauseValue(-1, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually something that can happen? While the number of cores can be unknown, we are actually using the threadpool size, which should always be defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly sure that this would always be set, but put this in as a safety net. But maybe it should be replaced with an assertion that the thread pool size > 1
I'm not sure what you mean here? |
We currently get the Java heap in GB as a long. A side-effect of this is that the formula always returns 0 as the maximum number of clauses if the heap is less than 1GB since the amount of memory allocated to the JVM is rounded down. Si I wonder if we should use a fractional representation of the heap size instead, e.g. 0.5 if the JVM is given 512MB of heap size? This way, we might also not need to resort to taking the max of the return value and 4096, it should better scale to small heaps? |
or to reduce the size of your search thread pool so that more memory is | ||
available to each concurrent search. | ||
|
||
In previous versions of lucene you could get around this limit by nesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous versions of lucene you could get around this limit by nesting | |
In previous versions of Lucene you could get around this limit by nesting |
@@ -280,7 +279,7 @@ public void testTooLargeMatrix() throws Exception { | |||
|
|||
// Create more filters than is permitted by Lucene Bool clause settings. | |||
MapBuilder filtersMap = new MapBuilder(); | |||
int maxFilters = SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.get(Settings.EMPTY); | |||
int maxFilters = IndexSearcher.getMaxClauseCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number might be super high now, how slow is the test? Maybe we need to set an artificially low limit for this test to keep it reasonable?
+ SearchModule.INDICES_MAX_CLAUSE_COUNT_SETTING.getKey() | ||
+ "] setting." | ||
+ "]. " | ||
+ "You can increase this limit by scaling up your java heap or number of CPUs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scaling up the number of CPUs would increase the size of the search threadpool and reduce the number of allowed clauses, so maybe just mention memory?
💚 Backport successful
|
elastic#81525) This commit deprecates the indices.query.bool.max_clause_count node setting, and instead configures the maximum clause count for lucene based on the available heap and the size of the thread pool. Closes elastic#46433
This commit deprecates the
indices.query.bool.max_clause_count
node setting,and instead configures the maximum clause count for lucene based on the available
heap and the size of the thread pool.
Closes #46433