-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log error messages and clean up monitor when indexing doc level queries or metadata creation fails #900
Conversation
client.execute( | ||
AlertingActions.DELETE_MONITOR_ACTION_TYPE, | ||
DeleteMonitorRequest(indexResponse.id, RefreshPolicy.IMMEDIATE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets catch this as well and log that we couldnt do the monitor cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we should separate out the logic of deleting monitors from the TransportDeleteMonitor class and call those helper functions here.
The same should be done for the other CRUD actions.
client.execute( | ||
AlertingActions.DELETE_MONITOR_ACTION_TYPE, | ||
DeleteMonitorRequest(indexResponse.id, RefreshPolicy.IMMEDIATE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets catch this as well and log that we couldnt do the monitor cleanup. Also we would need to delete the metadata that we created as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete monitor transport action deletes the metadata as well.
6c11123
to
3178347
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #900 +/- ##
============================================
+ Coverage 75.95% 76.14% +0.19%
Complexity 110 110
============================================
Files 125 125
Lines 6969 6992 +23
Branches 1043 1043
============================================
+ Hits 5293 5324 +31
+ Misses 1143 1130 -13
- Partials 533 538 +5
|
Lets add security tests for this scenario where its with and without filter by backend role setting |
request.monitor = request.monitor.copy(id = indexResponse.id) | ||
var (metadata, created) = MonitorMetadataService.getOrCreateMetadata(request.monitor) | ||
if (created == false) { | ||
log.warn("Metadata doc id:${metadata.id} exists, but it shouldn't!") | ||
var metadata: MonitorMetadata? | ||
try { // delete monitor if metadata creation fails, log the right error and re-throw the error to fail listener | ||
request.monitor = request.monitor.copy(id = indexResponse.id) | ||
var (monitorMetadata: MonitorMetadata, created: Boolean) = MonitorMetadataService.getOrCreateMetadata(request.monitor) | ||
if (created == false) { | ||
log.warn("Metadata doc id:${monitorMetadata.id} exists, but it shouldn't!") | ||
} | ||
metadata = monitorMetadata | ||
} catch (t: Exception) { | ||
log.error("failed to create metadata for monitor ${indexResponse.id}. deleting monitor") | ||
cleanupMonitorAfterPartialFailure(indexResponse) | ||
throw t | ||
} | ||
if (request.monitor.monitorType == Monitor.MonitorType.DOC_LEVEL_MONITOR) { | ||
indexDocLevelMonitorQueries(request.monitor, indexResponse.id, metadata, request.refreshPolicy) | ||
try { | ||
if (request.monitor.monitorType == Monitor.MonitorType.DOC_LEVEL_MONITOR) { | ||
indexDocLevelMonitorQueries(request.monitor, indexResponse.id, metadata, request.refreshPolicy) | ||
} | ||
// When inserting queries in queryIndex we could update sourceToQueryIndexMapping | ||
MonitorMetadataService.upsertMetadata(metadata, updating = true) | ||
} catch (t: Exception) { | ||
log.error("failed to index doc level queries monitor ${indexResponse.id}. deleting monitor", t) | ||
cleanupMonitorAfterPartialFailure(indexResponse) | ||
throw t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have one big try catch for getting/creating the metadata and updating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an outer try catch block, which returns listener failure, wrapping all this
creating and updating metadata are not consecutive calls. the update call is done for source To Query Index Mapping
we want the right error message for doc level queries related failures and metadata creation failures. Hence the separate try-catch blocks.
fun `test execute monitor without create when no monitors exists`() { | ||
val docQuery = DocLevelQuery(query = "test_field:\"us-west-2\"", name = "3") | ||
fun `test cleanup monitor on partial create monitor failure`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing an existing test?
…tadata creation fails Signed-off-by: Surya Sashank Nistala <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monitor metadata instead of workflow metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted successfully, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monitor metadata instead of workflow metadata
createMonitor(monitor) | ||
fail("monitor creation should fail due to incorrect analyzer name in test setup") | ||
} catch (e: Exception) { | ||
Assert.assertEquals(client().search(SearchRequest(SCHEDULED_JOBS_INDEX)).get().hits.hits.size, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert queryIndex has 0 docs and no new mappings applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant assert that because we dont fail delete monitor request when query index docs clean up fails.
So even though we call delete monitor there is no guarantee of that. It's only a best effort clean up
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets put the workflow metadata id in here, so it gives us the ability to delete it manually if needed.
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: Surya Sashank Nistala <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-900-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-900-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7 Then, create a pull request where the |
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
…es or metadata creation fails (#900) (#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
…es or metadata creation fails (#900) (#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> (cherry picked from commit 3440a4d)
…es or metadata creation fails (opensearch-project#900) (opensearch-project#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
…es or metadata creation fails (#900) (#912) (#914) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.3 2.3
# Navigate to the new working tree
cd .worktrees/backport-2.3
# Create a new branch
git switch --create backport/backport-900-to-2.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.3 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.5 2.5
# Navigate to the new working tree
cd .worktrees/backport-2.5
# Create a new branch
git switch --create backport/backport-900-to-2.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.5
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.5 Then, create a pull request where the |
…es or metadata creation fails (opensearch-project#900) (opensearch-project#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: AWSHurneyt <[email protected]>
* [Backport 2.x] QueryIndex rollover when field mapping limit is reached (#729) Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: AWSHurneyt <[email protected]> * log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) (#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: AWSHurneyt <[email protected]> * [Backport 2.x] Notification security fix (#861) * Notification security fix (#852) * added injecting whole user object in threadContext before calling notification APIs so that backend roles are available to notification plugin Signed-off-by: Petar Dzepina <[email protected]> * compile fix Signed-off-by: Petar Dzepina <[email protected]> * refactored user_info injection to use InjectSecurity Signed-off-by: Petar Dzepina <[email protected]> * ktlint fix Signed-off-by: Petar Dzepina <[email protected]> --------- Signed-off-by: Petar Dzepina <[email protected]> (cherry picked from commit e0b7a5a) * remove unneeded import Signed-off-by: Ashish Agrawal <[email protected]> --------- Signed-off-by: Ashish Agrawal <[email protected]> Co-authored-by: Petar Dzepina <[email protected]> Co-authored-by: Ashish Agrawal <[email protected]> Signed-off-by: AWSHurneyt <[email protected]> * Added missing imports. Signed-off-by: AWSHurneyt <[email protected]> * Multiple indices support in DocLevelMonitorInput (#784) (#808) Signed-off-by: Petar Dzepina <[email protected]> * Removed redundant calls to initDocLevelQueryIndex and indexDocLevelQueries. Signed-off-by: AWSHurneyt <[email protected]> * Fixed a bug that prevented alerts from being generated for doc level monitors that use wildcard characters in index names. (#894) (#902) Signed-off-by: AWSHurneyt <[email protected]> (cherry picked from commit 8c033b9) Co-authored-by: AWSHurneyt <[email protected]> Signed-off-by: AWSHurneyt <[email protected]> * Resolved backport issue for PR 729. Signed-off-by: AWSHurneyt <[email protected]> * Resolved backport issue for PR 758. Signed-off-by: AWSHurneyt <[email protected]> --------- Signed-off-by: Petar Dzepina <[email protected]> Signed-off-by: AWSHurneyt <[email protected]> Signed-off-by: Ashish Agrawal <[email protected]> Co-authored-by: Petar Dzepina <[email protected]> Co-authored-by: Surya Sashank Nistala <[email protected]> Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Co-authored-by: Ashish Agrawal <[email protected]>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]>
* log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize doc-level monitor workflow for index patterns (#1097) Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize doc-level monitor execution workflow for datastreams (#1302) * optimize doc-level monitor execution for datastreams Signed-off-by: Subhobrata Dey <[email protected]> * add more tests to address comments Signed-off-by: Subhobrata Dey <[email protected]> * add integTest for multiple datastreams inside a single index pattern * add integTest for multiple datastreams inside a single index pattern Signed-off-by: Subhobrata Dey <[email protected]> --------- Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Bulk index findings and sequentially invoke auto-correlations (#1355) * Bulk index findings and sequentially invoke auto-correlations Signed-off-by: Megha Goyal <[email protected]> * Bulk index findings in batches of 10000 and make it configurable Signed-off-by: Megha Goyal <[email protected]> * Addressing review comments Signed-off-by: Megha Goyal <[email protected]> * Add integ tests to test bulk index findings Signed-off-by: Megha Goyal <[email protected]> * Fix ktlint formatting Signed-off-by: Megha Goyal <[email protected]> --------- Signed-off-by: Megha Goyal <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Add jvm aware setting and max num docs settings for batching docs for percolate queries (#1435) * add jvm aware and max docs settings for batching docs for percolate queries Signed-off-by: Surya Sashank Nistala <[email protected]> * fix stats logging Signed-off-by: Surya Sashank Nistala <[email protected]> * add queryfieldnames field in findings mapping Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize to fetch only fields relevant to doc level queries in doc level monitor instead of entire _source for each doc (#1441) * optimize to fetch only fields relevant to doc level queries in doc level monitor Signed-off-by: Surya Sashank Nistala <[email protected]> * fix test for settings check Signed-off-by: Surya Sashank Nistala <[email protected]> * fix ktlint Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * optimize sequence number calculation and reduce search requests in doc level monitor execution (#1445) * optimize sequence number calculation and reduce search requests by n where n is number of shards being queried in the executino Signed-off-by: Surya Sashank Nistala <[email protected]> * fix tests Signed-off-by: Surya Sashank Nistala <[email protected]> * optimize check indices and execute to query only write index of aliases and datastreams during monitor creation Signed-off-by: Surya Sashank Nistala <[email protected]> * fix test Signed-off-by: Surya Sashank Nistala <[email protected]> * add javadoc Signed-off-by: Surya Sashank Nistala <[email protected]> * add tests to verify seq_no calculation Signed-off-by: Surya Sashank Nistala <[email protected]> --------- Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Fix tests Signed-off-by: Chase Engelbrecht <[email protected]> * Fix BWC tests Signed-off-by: Chase Engelbrecht <[email protected]> * clean up doc level queries on dry run (#1430) Signed-off-by: Joanne Wang <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> * Fix import Signed-off-by: Chase Engelbrecht <[email protected]> * Fix tests Signed-off-by: Chase Engelbrecht <[email protected]> * Fix BWC version Signed-off-by: Chase Engelbrecht <[email protected]> * Fix another test Signed-off-by: Chase Engelbrecht <[email protected]> * Revert order of operations change Signed-off-by: Chase Engelbrecht <[email protected]> --------- Signed-off-by: Subhobrata Dey <[email protected]> Signed-off-by: Chase Engelbrecht <[email protected]> Signed-off-by: Megha Goyal <[email protected]> Signed-off-by: Surya Sashank Nistala <[email protected]> Signed-off-by: Joanne Wang <[email protected]> Co-authored-by: Surya Sashank Nistala <[email protected]> Co-authored-by: Subhobrata Dey <[email protected]> Co-authored-by: Megha Goyal <[email protected]> Co-authored-by: Joanne Wang <[email protected]>
In Monitor creation flow adds logic to log correct error message and clean up monitor when there is failure in indexing doc level queries or in monitor metadata creation.
Secure test run
Issue #, if available:
#897
Description of changes:
CheckList:
[x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.