Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize doc-level monitor workflow for index patterns #1097

Merged
merged 1 commit into from
Sep 6, 2023

Conversation

sbcd90
Copy link
Collaborator

@sbcd90 sbcd90 commented Aug 17, 2023

Description of changes:
This pr addresses following performance problems in doc-level monitor execution workflow.

Now, say, we have 1000 queries & each concrete index behind the index-pattern has 1000 field mappings. Also, lets assume 1 concrete index is generated everyday. We also know the default number of field mappings an index can have is 1000 & today if the no. of field mappings go over 1000 in the query index, we rollover the query index.

This would mean, we create a new rollover query index everyday & keep on ingesting 1000 queries in it everyday. In 30 days, we create 30 indices(which means by default 1 primary & 1 replica shards per index) which contains 30000 duplicate queries.
This causes the data nodes to get full resulting in cluster crash.
opensearch-project/security-analytics#509
https://forum.opensearch.org/t/security-analytics-error/14639/11

We do not notice this problem for small no. of queries but duplication of queries piles up over a period of time.
This pr addresses this issue by continously updating 1 set of queries for all the concrete indices belonging to an index-pattern. this provides huge storage optimization.

This pr addresses this issue by first collecting field mappings of all concrete indices belonging to an index-pattern together, & then making a single call to update mappings api.

  • In the current workflow, for every concrete index, we make an update mapping api call in no particular order. So, if there are 100 concrete indices behind an index-pattern we make 100 update mapping api calls.

This pr optimizes the time complexity by making a single call to update mappings api for each index pattern.

  • Here is how the optimization looks like
    Screenshot 2023-08-29 at 2 08 52 PM

CheckList:

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Merging #1097 (6d5b661) into main (778e7ce) will increase coverage by 0.05%.
Report is 2 commits behind head on main.
The diff coverage is 92.15%.

❗ Current head 6d5b661 differs from pull request most recent head 4a85d13. Consider uploading reports for the commit 4a85d13 to get more accurate results

@@             Coverage Diff              @@
##               main    #1097      +/-   ##
============================================
+ Coverage     67.72%   67.77%   +0.05%     
  Complexity      105      105              
============================================
  Files           160      160              
  Lines         10343    10363      +20     
  Branches       1522     1521       -1     
============================================
+ Hits           7005     7024      +19     
- Misses         2672     2673       +1     
  Partials        666      666              
Files Changed Coverage Δ
.../kotlin/org/opensearch/alerting/util/IndexUtils.kt 79.31% <ø> (-0.69%) ⬇️
...opensearch/alerting/util/DocLevelMonitorQueries.kt 77.27% <88.00%> (+1.79%) ⬆️
.../opensearch/alerting/DocumentLevelMonitorRunner.kt 84.53% <96.15%> (+0.42%) ⬆️

... and 4 files with indirect coverage changes

monitorCtx.clusterService!!,
monitorCtx.indexNameExpressionResolver!!
)
val updatedIndexName = indexName.replace("*", "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we do this indexName.replace("*", "_")?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because query string query fields with * have a different meaning & is also not allowed in some cases.

docLevelMonitorInput.indices,
monitorCtx.clusterService!!,
monitorCtx.indexNameExpressionResolver!!
)
if (concreteIndices.isEmpty()) {
throw IndexNotFoundException(docLevelMonitorInput.indices[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can we add error log with monitor Id and indices list which were not found
  2. in exception why are we logging only 0th index element of indices variable? can we log all elements in array/list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it now. it was copied from eariler logic.

// TODO: If dryrun, we should make it so we limit the search as this could still potentially give us lots of data
if (isTempMonitor) {
indexLastRunContext[shard] = max(-1, (indexUpdatedRunContext[shard] as String).toInt() - 10)
docLevelMonitorInput.indices.forEach { indexName ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we change this list being looped from computed concrete indices list to the monitor indices list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because we want to expand 1 index pattern & process all its concrete indices separately. Previous logic was we expand all index patterns & process all the concrete indices one by one.

@eirsep
Copy link
Member

eirsep commented Aug 23, 2023

To create query index mappings for an index pattern would be a very big optimization but it works only under the assumption that an index pattern test* resolves to indices that do not have fields with same names but different data types.

this fails in the following scenario:
indices test1 and test2 are resolved from test* pattern. test1 has a field named f1 of type long and test2 also has a field named f1 BUT with type keyword

when we store the field mapping in query index for f1_test* this would break the query execution due to incorrect mapping

@eirsep
Copy link
Member

eirsep commented Aug 23, 2023

we can add this optimization for the scenarios where we can identify all indices covered in index pattern are guaranteed to have the same mappings i.e. data streams, rolling indices with index templates.

should we store a field in index mapping called index_template if we are creating index from index template? food for throught

@lezzago lezzago self-requested a review August 25, 2023 20:34
@eirsep
Copy link
Member

eirsep commented Aug 25, 2023

@sbcd90

To create query index mappings for an index pattern would be a very big optimization but it works only under the assumption that an index pattern test* resolves to indices that do not have fields with same names but different data types.

this fails in the following scenario: indices test1 and test2 are resolved from test* pattern. test1 has a field named f1 of type long and test2 also has a field named f1 BUT with type keyword

when we store the field mapping in query index for f1_test* this would break the query execution due to incorrect mapping

Since we are batching all mappings can we check if the above mentioned field mapping mismatch is happening for any fields and then ingest one for the generic test* and one mapping for the specific index. Similarly while fetching mappings look for both index mappings with test* and test1. If latter is found prefer it. if not found, use the former.

@sbcd90
Copy link
Collaborator Author

sbcd90 commented Aug 26, 2023

@sbcd90

To create query index mappings for an index pattern would be a very big optimization but it works only under the assumption that an index pattern test* resolves to indices that do not have fields with same names but different data types.
this fails in the following scenario: indices test1 and test2 are resolved from test* pattern. test1 has a field named f1 of type long and test2 also has a field named f1 BUT with type keyword
when we store the field mapping in query index for f1_test* this would break the query execution due to incorrect mapping

Since we are batching all mappings can we check if the above mentioned field mapping mismatch is happening for any fields and then ingest one for the generic test* and one mapping for the specific index. Similarly while fetching mappings look for both index mappings with test* and test1. If latter is found prefer it. if not found, use the former.

hi @eirsep , that is the exact fix i'm working on.

@sbcd90
Copy link
Collaborator Author

sbcd90 commented Aug 29, 2023

hi @eirsep , just addressed the comment with the test case https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R503.
Can you please look into the pr again?

/**
* Adjusts max field limit index setting for query index if source index has higher limit.
* This will prevent max field limit exception, when source index has more fields then query index limit
*/
private suspend fun checkAndAdjustMaxFieldLimit(sourceIndex: String, concreteQueryIndex: String) {
private suspend fun checkMaxFieldLimit(sourceIndex: String): Long {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the java doc description for this method seems like it should describe adjustMaxFieldLimitForQueryIndex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update the code comments for this method and below method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it.

"nested_field": { "test1": "some text" },
"test_field": "12345"
}"""
indexDoc(testIndex2, "1", testDoc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT; since you are not explicitly creating index with different mappings
can you plz fetch second index's mapping and assert that the field type is actually different for test_field so as to validate the test scenario. that would make the test more readable. right now it's a bit hard to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifying explicit mappings now.

@@ -495,6 +500,375 @@ class DocumentMonitorRunnerIT : AlertingRestTestCase() {
assertEquals("Findings saved for test monitor expected 14, 51 and 10", 3, foundFindings.size)
}

fun `test execute monitor with indices having fields with same name but different data types`() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we plz add asserts on the expected mappings and content of the query index. that would be the essence of these tests and help us visualize the changes made in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts added on actual queries stored in query index.

@@ -717,17 +738,36 @@ object DocumentLevelMonitorRunner : MonitorRunner() {
*/
private fun transformDocumentFieldNames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very critical piece fo code. can we update java docs description of this method incorporating the new change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the documentation for the method.


fun getAllConflictingFields(clusterState: ClusterState, concreteIndices: List<String>): Set<String> {
val conflictingFields = mutableSetOf<String>()
val allFlattenPaths = mutableMapOf<String, MutableMap<String, Any>>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the below flatten logic handle both the nested and non-nested type objects correctly?

what would be the difference after flattening into string in the following 2 cases

"nested_field": {
                      "type": "nested",
                      "properties": {
                        "test1": {
                          "type": "keyword"
                        }
                      }
                      

and

     "nested_field": {
                      
                      "properties": {
                        "test1": {
                          "type": "keyword"
                        }
                      }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//
val leafNodeProcessor =
fun(fieldName: String, _: String, props: MutableMap<String, Any>): Triple<String, String, MutableMap<String, Any>> {
return Triple(fieldName, fieldName, props)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need field name twice?
can you add some code comments around what is the Triple type object being returned to make it more readable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used intentionally because leafNodeProcessor is used at another place where the 2nd param is actually used to pass modified field name.

Copy link
Member

@eirsep eirsep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making these changes @sbcd90

i have some minor comments not related to logic but more around readiblity and testsing.

Kindly also respond to earlier comments.

@sbcd90 sbcd90 merged commit 7f0c7c7 into opensearch-project:main Sep 6, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/alerting/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/alerting/backport-2.x
# Create a new branch
git switch --create backport-1097-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7f0c7c7e77a9213ad5e976c2f1573321bc26b919
# Push it to GitHub
git push --set-upstream origin backport-1097-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/alerting/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport-1097-to-2.x.

engechas pushed a commit to engechas/alerting that referenced this pull request Mar 18, 2024
@engechas engechas mentioned this pull request Mar 18, 2024
1 task
engechas pushed a commit to engechas/alerting that referenced this pull request Mar 18, 2024
engechas added a commit that referenced this pull request Mar 18, 2024
* log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900)

* log errors and clean up monitor when indexing doc level queries or metadata creation fails
* refactor delete monitor action to re-use delete methods
Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* optimize doc-level monitor workflow for index patterns (#1097)

Signed-off-by: Subhobrata Dey <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* optimize doc-level monitor execution workflow for datastreams (#1302)

* optimize doc-level monitor execution for datastreams

Signed-off-by: Subhobrata Dey <[email protected]>

* add more tests to address comments

Signed-off-by: Subhobrata Dey <[email protected]>

* add integTest for multiple datastreams inside a single index pattern

* add integTest for multiple datastreams inside a single index pattern

Signed-off-by: Subhobrata Dey <[email protected]>

---------

Signed-off-by: Subhobrata Dey <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* Bulk index findings and sequentially invoke auto-correlations (#1355)

* Bulk index findings and sequentially invoke auto-correlations

Signed-off-by: Megha Goyal <[email protected]>

* Bulk index findings in batches of 10000 and make it configurable

Signed-off-by: Megha Goyal <[email protected]>

* Addressing review comments

Signed-off-by: Megha Goyal <[email protected]>

* Add integ tests to test bulk index findings

Signed-off-by: Megha Goyal <[email protected]>

* Fix ktlint formatting

Signed-off-by: Megha Goyal <[email protected]>

---------

Signed-off-by: Megha Goyal <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* Add jvm aware setting and max num docs settings for batching docs for percolate queries (#1435)

* add jvm aware and max docs settings for batching docs for percolate queries

Signed-off-by: Surya Sashank Nistala <[email protected]>

* fix stats logging

Signed-off-by: Surya Sashank Nistala <[email protected]>

* add queryfieldnames field in findings mapping

Signed-off-by: Surya Sashank Nistala <[email protected]>

---------

Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* optimize to fetch only fields relevant to doc level queries in doc level monitor instead of entire _source for each doc (#1441)

* optimize to fetch only fields relevant to doc level queries in doc level monitor

Signed-off-by: Surya Sashank Nistala <[email protected]>

* fix test for settings check

Signed-off-by: Surya Sashank Nistala <[email protected]>

* fix ktlint

Signed-off-by: Surya Sashank Nistala <[email protected]>

---------

Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* optimize sequence number calculation and reduce search requests in doc level monitor execution (#1445)

* optimize sequence number calculation and reduce search requests by n where n is number of shards being queried in the executino

Signed-off-by: Surya Sashank Nistala <[email protected]>

* fix tests

Signed-off-by: Surya Sashank Nistala <[email protected]>

* optimize check indices and execute to query only write index of aliases and datastreams during monitor creation

Signed-off-by: Surya Sashank Nistala <[email protected]>

* fix test

Signed-off-by: Surya Sashank Nistala <[email protected]>

* add javadoc

Signed-off-by: Surya Sashank Nistala <[email protected]>

* add tests to verify seq_no calculation

Signed-off-by: Surya Sashank Nistala <[email protected]>

---------

Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix tests

Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix BWC tests

Signed-off-by: Chase Engelbrecht <[email protected]>

* clean up doc level queries on dry run (#1430)

Signed-off-by: Joanne Wang <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix import

Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix tests

Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix BWC version

Signed-off-by: Chase Engelbrecht <[email protected]>

* Fix another test

Signed-off-by: Chase Engelbrecht <[email protected]>

* Revert order of operations change

Signed-off-by: Chase Engelbrecht <[email protected]>

---------

Signed-off-by: Subhobrata Dey <[email protected]>
Signed-off-by: Chase Engelbrecht <[email protected]>
Signed-off-by: Megha Goyal <[email protected]>
Signed-off-by: Surya Sashank Nistala <[email protected]>
Signed-off-by: Joanne Wang <[email protected]>
Co-authored-by: Surya Sashank Nistala <[email protected]>
Co-authored-by: Subhobrata Dey <[email protected]>
Co-authored-by: Megha Goyal <[email protected]>
Co-authored-by: Joanne Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants