-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize doc-level monitor workflow for index patterns #1097
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1097 +/- ##
============================================
+ Coverage 67.72% 67.77% +0.05%
Complexity 105 105
============================================
Files 160 160
Lines 10343 10363 +20
Branches 1522 1521 -1
============================================
+ Hits 7005 7024 +19
- Misses 2672 2673 +1
Partials 666 666
|
monitorCtx.clusterService!!, | ||
monitorCtx.indexNameExpressionResolver!! | ||
) | ||
val updatedIndexName = indexName.replace("*", "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we do this indexName.replace("*", "_")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because query string query fields
with *
have a different meaning & is also not allowed in some cases.
docLevelMonitorInput.indices, | ||
monitorCtx.clusterService!!, | ||
monitorCtx.indexNameExpressionResolver!! | ||
) | ||
if (concreteIndices.isEmpty()) { | ||
throw IndexNotFoundException(docLevelMonitorInput.indices[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can we add error log with monitor Id and indices list which were not found
- in exception why are we logging only 0th index element of
indices
variable? can we log all elements in array/list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it now. it was copied from eariler logic.
// TODO: If dryrun, we should make it so we limit the search as this could still potentially give us lots of data | ||
if (isTempMonitor) { | ||
indexLastRunContext[shard] = max(-1, (indexUpdatedRunContext[shard] as String).toInt() - 10) | ||
docLevelMonitorInput.indices.forEach { indexName -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we change this list being looped from computed concrete indices list to the monitor indices list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because we want to expand 1 index pattern
& process all its concrete indices
separately. Previous logic was we expand all index patterns
& process all the concrete indices
one by one.
To create query index mappings for an index pattern would be a very big optimization but it works only under the assumption that an index pattern test* resolves to indices that do not have fields with same names but different data types. this fails in the following scenario: when we store the field mapping in query index for |
we can add this optimization for the scenarios where we can identify all indices covered in index pattern are guaranteed to have the same mappings i.e. data streams, rolling indices with index templates. should we store a field in index mapping called |
Since we are batching all mappings can we check if the above mentioned field mapping mismatch is happening for any fields and then ingest one for the generic test* and one mapping for the specific index. Similarly while fetching mappings look for both index mappings with test* and test1. If latter is found prefer it. if not found, use the former. |
hi @eirsep , that is the exact fix i'm working on. |
hi @eirsep , just addressed the comment with the test case https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R503. |
/** | ||
* Adjusts max field limit index setting for query index if source index has higher limit. | ||
* This will prevent max field limit exception, when source index has more fields then query index limit | ||
*/ | ||
private suspend fun checkAndAdjustMaxFieldLimit(sourceIndex: String, concreteQueryIndex: String) { | ||
private suspend fun checkMaxFieldLimit(sourceIndex: String): Long { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the java doc description for this method seems like it should describe adjustMaxFieldLimitForQueryIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update the code comments for this method and below method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it.
"nested_field": { "test1": "some text" }, | ||
"test_field": "12345" | ||
}""" | ||
indexDoc(testIndex2, "1", testDoc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT; since you are not explicitly creating index with different mappings
can you plz fetch second index's mapping and assert that the field type is actually different for test_field
so as to validate the test scenario. that would make the test more readable. right now it's a bit hard to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifying explicit mappings now.
@@ -495,6 +500,375 @@ class DocumentMonitorRunnerIT : AlertingRestTestCase() { | |||
assertEquals("Findings saved for test monitor expected 14, 51 and 10", 3, foundFindings.size) | |||
} | |||
|
|||
fun `test execute monitor with indices having fields with same name but different data types`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we plz add asserts on the expected mappings and content of the query index. that would be the essence of these tests and help us visualize the changes made in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserts added on actual queries stored in query index.
@@ -717,17 +738,36 @@ object DocumentLevelMonitorRunner : MonitorRunner() { | |||
*/ | |||
private fun transformDocumentFieldNames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very critical piece fo code. can we update java docs description of this method incorporating the new change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the documentation for the method.
|
||
fun getAllConflictingFields(clusterState: ClusterState, concreteIndices: List<String>): Set<String> { | ||
val conflictingFields = mutableSetOf<String>() | ||
val allFlattenPaths = mutableMapOf<String, MutableMap<String, Any>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the below flatten logic handle both the nested and non-nested type objects correctly?
what would be the difference after flattening into string in the following 2 cases
"nested_field": {
"type": "nested",
"properties": {
"test1": {
"type": "keyword"
}
}
and
"nested_field": {
"properties": {
"test1": {
"type": "keyword"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested fields
dont work with query string queries
. https://stackoverflow.com/questions/69857071/how-can-i-use-query-string-to-match-both-nested-and-non-nested-fields-at-the-sam
but if we pair them with an index containing an object field, it will work. https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R636
// | ||
val leafNodeProcessor = | ||
fun(fieldName: String, _: String, props: MutableMap<String, Any>): Triple<String, String, MutableMap<String, Any>> { | ||
return Triple(fieldName, fieldName, props) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need field name twice?
can you add some code comments around what is the Triple type object being returned to make it more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used intentionally because leafNodeProcessor
is used at another place where the 2nd param is actually used to pass modified field name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for making these changes @sbcd90
i have some minor comments not related to logic but more around readiblity and testsing.
Kindly also respond to earlier comments.
Signed-off-by: Subhobrata Dey <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/alerting/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/alerting/backport-2.x
# Create a new branch
git switch --create backport-1097-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7f0c7c7e77a9213ad5e976c2f1573321bc26b919
# Push it to GitHub
git push --set-upstream origin backport-1097-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/alerting/backport-2.x Then, create a pull request where the |
…oject#1097) Signed-off-by: Subhobrata Dey <[email protected]>
…oject#1097) Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]>
* log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize doc-level monitor workflow for index patterns (#1097) Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize doc-level monitor execution workflow for datastreams (#1302) * optimize doc-level monitor execution for datastreams Signed-off-by: Subhobrata Dey <[email protected]> * add more tests to address comments Signed-off-by: Subhobrata Dey <[email protected]> * add integTest for multiple datastreams inside a single index pattern * add integTest for multiple datastreams inside a single index pattern Signed-off-by: Subhobrata Dey <[email protected]> --------- Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Bulk index findings and sequentially invoke auto-correlations (#1355) * Bulk index findings and sequentially invoke auto-correlations Signed-off-by: Megha Goyal <[email protected]> * Bulk index findings in batches of 10000 and make it configurable Signed-off-by: Megha Goyal <[email protected]> * Addressing review comments Signed-off-by: Megha Goyal <[email protected]> * Add integ tests to test bulk index findings Signed-off-by: Megha Goyal <[email protected]> * Fix ktlint formatting Signed-off-by: Megha Goyal <[email protected]> --------- Signed-off-by: Megha Goyal <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Add jvm aware setting and max num docs settings for batching docs for percolate queries (#1435) * add jvm aware and max docs settings for batching docs for percolate queries Signed-off-by: Surya Sashank Nistala <[email protected]> * fix stats logging Signed-off-by: Surya Sashank Nistala <[email protected]> * add queryfieldnames field in findings mapping Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize to fetch only fields relevant to doc level queries in doc level monitor instead of entire _source for each doc (#1441) * optimize to fetch only fields relevant to doc level queries in doc level monitor Signed-off-by: Surya Sashank Nistala <[email protected]> * fix test for settings check Signed-off-by: Surya Sashank Nistala <[email protected]> * fix ktlint Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize sequence number calculation and reduce search requests in doc level monitor execution (#1445) * optimize sequence number calculation and reduce search requests by n where n is number of shards being queried in the executino Signed-off-by: Surya Sashank Nistala <[email protected]> * fix tests Signed-off-by: Surya Sashank Nistala <[email protected]> * optimize check indices and execute to query only write index of aliases and datastreams during monitor creation Signed-off-by: Surya Sashank Nistala <[email protected]> * fix test Signed-off-by: Surya Sashank Nistala <[email protected]> * add javadoc Signed-off-by: Surya Sashank Nistala <[email protected]> * add tests to verify seq_no calculation Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Fix tests Signed-off-by: Chase Engelbrecht <[email protected]> * Fix BWC tests Signed-off-by: Chase Engelbrecht <[email protected]> * clean up doc level queries on dry run (#1430) Signed-off-by: Joanne Wang <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Fix import Signed-off-by: Chase Engelbrecht <[email protected]> * Fix tests Signed-off-by: Chase Engelbrecht <[email protected]> * Fix BWC version Signed-off-by: Chase Engelbrecht <[email protected]> * Fix another test Signed-off-by: Chase Engelbrecht <[email protected]> * Revert order of operations change Signed-off-by: Chase Engelbrecht <[email protected]> --------- Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> Signed-off-by: Megha Goyal <[email protected]> Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Joanne Wang <[email protected]> Co-authored-by: Surya Sashank Nistala <[email protected]> Co-authored-by: Subhobrata Dey <[email protected]> Co-authored-by: Megha Goyal <[email protected]> Co-authored-by: Joanne Wang <[email protected]>
Description of changes:
This pr addresses following performance problems in
doc-level monitor execution workflow
.doc-level monitor
is monitoring anindex pattern
& a new index is introduced which matches the pattern, thedoc-level monitor
duplicates all the field mappings & queries for that index. This is reproducible using integ test https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R503Now, say, we have
1000
queries & each concrete index behind the index-pattern has 1000 field mappings. Also, lets assume1 concrete index
is generated everyday. We also know thedefault number of field mappings
an index can have is1000
& today if the no. of field mappings go over1000
in the query index, we rollover the query index.This would mean, we create a new rollover query index everyday & keep on ingesting 1000 queries in it everyday. In 30 days, we create
30 indices
(which means by default1 primary & 1 replica shards
per index) which contains30000
duplicate queries.This causes the
data nodes
to get full resulting in cluster crash.opensearch-project/security-analytics#509
https://forum.opensearch.org/t/security-analytics-error/14639/11
We do not notice this problem for small no. of queries but duplication of queries piles up over a period of time.
This pr addresses this issue by continously updating
1 set of queries
for all theconcrete indices
belonging to anindex-pattern
. this provides huge storage optimization.concrete index
, we make anupdate mapping
api call in no particular order. So, say, indextest1
has fieldsf1 & f2
&test2
has fieldf4
& both of them match patterntest*
, if we make firstupdate mapping
call fortest1
, then query index getsf1 & f2
but in the nextupdate mapping
call fortest2
we completely overwrite it withf4
. This is reproducible using integ test https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R551This pr addresses this issue by first collecting field mappings of all concrete indices belonging to an index-pattern together, & then making a
single call to update mappings api
.concrete index
, we make anupdate mapping
api call in no particular order. So, if there are100
concrete indices behind anindex-pattern
we make100 update mapping api
calls.This pr optimizes the time complexity by making a
single call to update mappings api
for eachindex pattern
.CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.