-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
separate doc-level monitor query indices created by detectors #1324
Conversation
Signed-off-by: Subhobrata Dey <[email protected]>
@@ -887,7 +887,7 @@ private IndexMonitorRequest createDocLevelMonitorMatchAllRequest( | |||
|
|||
Monitor monitor = new Monitor(monitorId, Monitor.NO_VERSION, monitorName, false, detector.getSchedule(), detector.getLastUpdateTime(), null, | |||
Monitor.MonitorType.DOC_LEVEL_MONITOR.getValue(), detector.getUser(), 1, docLevelMonitorInputs, triggers, Map.of(), | |||
new DataSources(detector.getRuleIndex(), | |||
new DataSources(detector.getRuleIndex() + "_chained_findings", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need separate index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each doc-level monitor has its own query index now. So the chained findings doc-level monitor
also has its own index. The Sigma Rules doc-level monitor & the bucket-level monitor's chained monitor run asynchronously & thus cannot share the same query index as it is created & deleted in every run.
} | ||
|
||
public static String getRuleIndexOptimized(String logType) { | ||
return String.format(Locale.getDefault(), ".opensearch-sap-%s-detectors-queries-optimized-%s", logType, UUID.randomUUID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are moving from index patttern to index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. this method is utilized by few integ tests & hence the change.
@@ -1369,7 +1369,11 @@ void onGetResponse(Detector currentDetector, User user) { | |||
request.getDetector().setAlertsHistoryIndexPattern(DetectorMonitorConfig.getAlertsHistoryIndexPattern(ruleTopic)); | |||
request.getDetector().setFindingsIndex(DetectorMonitorConfig.getFindingsIndex(ruleTopic)); | |||
request.getDetector().setFindingsIndexPattern(DetectorMonitorConfig.getFindingsIndexPattern(ruleTopic)); | |||
request.getDetector().setRuleIndex(DetectorMonitorConfig.getRuleIndex(ruleTopic)); | |||
if (currentDetector.getRuleIndex().contains("optimized")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if name contains optimized
shouldn't we set rule index name as getRuleIndexOptimized
value?
is this if else condition flipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if-else condition
checks if the detector is already using a dedicated query index, then keep on using it. else, switch from shared query index to new dedicated query index as the detector is updated.
" }\n" + | ||
" }\n" + | ||
"}"; | ||
SearchResponse response = executeSearchAndGetResponse(DetectorMonitorConfig.getRuleIndex(randomDetectorType()), request, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why has this been removed ?
Plz dont reduce test coverage..
fix test as necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query indices are created & deleted now. So, it is difficult to assert number of queries in query index now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when someone updates a detector that doesnt have exclusive query index?
@@ -825,21 +825,10 @@ public void testMultipleAggregationAndDocRules_alertSuccess() throws IOException | |||
|
|||
Response createResponse = makeRequest(client(), "POST", SecurityAnalyticsPlugin.DETECTOR_BASE_URI, Collections.emptyMap(), toHttpEntity(detector)); | |||
|
|||
|
|||
String request = "{\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz revert and fix test as necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query indices are created & deleted now. So, it is difficult to assert number of queries in query index now.
@@ -1209,21 +1209,11 @@ public void testCreateDetectorWithNotCondition_verifyFindingsAndNoFindings_succe | |||
|
|||
Response createResponse = makeRequest(client(), "POST", SecurityAnalyticsPlugin.DETECTOR_BASE_URI, Collections.emptyMap(), toHttpEntity(detector)); | |||
|
|||
String request = "{\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query indices are created & deleted now. So, it is difficult to assert number of queries in query index now.
Signed-off-by: Subhobrata Dey <[email protected]>
The detector is updated to use exclusive query index. |
Signed-off-by: Subhobrata Dey <[email protected]>
security tests fail as docker image not updated. |
looks like we create aliases not indices from 'queryIndex` field in monitor datasources |
can we add tests for scenarios setting is off. create 2 detectors. verify query index name. turn on setting. update 1 detector, verify query index name. |
request.getDetector().setRuleIndex(DetectorMonitorConfig.getRuleIndexOptimized(ruleTopic)); | ||
} else { | ||
String ruleTopicIndex = DetectorMonitorConfig.getRuleIndex(ruleTopic); | ||
request.getDetector().setRuleIndex(ruleTopicIndex.substring(0, ruleTopicIndex.length()-1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we change the existing code from
request.getDetector().setRuleIndex(DetectorMonitorConfig.getRuleIndex(ruleTopic));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ruleIndex
is changing to dedicated ruleIndexOptimzied
now. Thats why if-else condition
.
@@ -22,7 +24,11 @@ public class DetectorMonitorConfig { | |||
public static final String OPENSEARCH_SAP_RULE_INDEX_TEMPLATE = ".opensearch-sap-detectors-queries-index-template"; | |||
|
|||
public static String getRuleIndex(String logType) { | |||
return String.format(Locale.getDefault(), ".opensearch-sap-%s-detectors-queries", logType); | |||
return String.format(Locale.getDefault(), ".opensearch-sap-%s-detectors-queries*", logType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why *
this is an alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
@@ -1369,7 +1379,16 @@ void onGetResponse(Detector currentDetector, User user) { | |||
request.getDetector().setAlertsHistoryIndexPattern(DetectorMonitorConfig.getAlertsHistoryIndexPattern(ruleTopic)); | |||
request.getDetector().setFindingsIndex(DetectorMonitorConfig.getFindingsIndex(ruleTopic)); | |||
request.getDetector().setFindingsIndexPattern(DetectorMonitorConfig.getFindingsIndexPattern(ruleTopic)); | |||
request.getDetector().setRuleIndex(DetectorMonitorConfig.getRuleIndex(ruleTopic)); | |||
if (currentDetector.getRuleIndex().contains("optimized")) { | |||
request.getDetector().setRuleIndex(currentDetector.getRuleIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz add comment that if we turn OFF setting after turning on, updating detector will not change from optimized to normal and we need to re-create detector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment. added a test too.
request.getDetector().setRuleIndex(DetectorMonitorConfig.getRuleIndexOptimized(ruleTopic)); | ||
} else { | ||
String ruleTopicIndex = DetectorMonitorConfig.getRuleIndex(ruleTopic); | ||
request.getDetector().setRuleIndex(ruleTopicIndex.substring(0, ruleTopicIndex.length() - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is very confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
@@ -266,6 +266,7 @@ void setDebugLogLevel() throws IOException, InterruptedException { | |||
|
|||
|
|||
makeRequest(client(), "PUT", "_cluster/settings", Collections.emptyMap(), se, new BasicHeader("Content-Type", "application/json")); | |||
updateClusterSetting("plugins.security_analytics.enable_detectors_with_dedicated_query_indices", "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there tests with setting turned off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test for setting turned off.
@@ -1722,12 +1725,16 @@ public void testAzureMappings() throws IOException { | |||
DetectorInput input = new DetectorInput("windows detector for security analytics", List.of(indexName), List.of(), | |||
getPrePackagedRules("azure").stream().map(DetectorRule::new).collect(Collectors.toList())); | |||
Detector detector = randomDetectorWithInputs(List.of(input), "azure"); | |||
createDetector(detector); | |||
|
|||
List<SearchHit> hits = executeSearch(".opensearch-sap-azure-detectors-queries-000001", matchAllSearchBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you retain this test and add a new one - basically test for both setting turned. on and turned off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added scenario for flag turned off in test cases. updated this test case to fetch queries from index pattern. https://github.com/sbcd90/security-analytics/blob/query_idx/src/test/java/org/opensearch/securityanalytics/mapper/MapperRestApiIT.java#L1727
@@ -1104,21 +1094,11 @@ public void testCreateDetector_verifyWorkflowCreation_success_WithGroupByRulesIn | |||
|
|||
Response createResponse = makeRequest(client(), "POST", SecurityAnalyticsPlugin.DETECTOR_BASE_URI, Collections.emptyMap(), toHttpEntity(detector)); | |||
|
|||
String request = "{\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not remove this code. we lose code coverage of tests. that's an anti-pattern. let's fix these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
Signed-off-by: Subhobrata Dey <[email protected]>
added both test cases. 1st scenario: https://github.com/sbcd90/security-analytics/blob/c514a780d4b3f65ac828e3ff868c56f46f2a82b7/src/test/java/org/opensearch/securityanalytics/resthandler/DetectorRestApiIT.java#L1729 |
@@ -83,7 +83,8 @@ private void getAllRuleIndices(ActionListener<List<String>> listener) { | |||
listener.onResponse( | |||
logTypes | |||
.stream() | |||
.map(logType -> DetectorMonitorConfig.getRuleIndex(logType)) | |||
// use index pattern here to define rule topic index template for all query indices which match the pattern | |||
.map(logType -> DetectorMonitorConfig.getRuleIndex(logType) + "*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be detector specific?
why do we have this method? getAllRuleIndices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single index template containing specific settings for sigma rule executions are applied to all rule indices.
CI is failing |
ci passed now. |
Security tests failed |
Signed-off-by: Subhobrata Dey <[email protected]> (cherry picked from commit 1ca4090) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/security-analytics/backport-2.15 2.15
# Navigate to the new working tree
pushd ../.worktrees/security-analytics/backport-2.15
# Create a new branch
git switch --create backport-1324-to-2.15
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1ca40908ac7b050cc7eb90df3108ab8a21a1f6d1
# Push it to GitHub
git push --set-upstream origin backport-1324-to-2.15
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/security-analytics/backport-2.15 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/security-analytics/backport-2.16 2.16
# Navigate to the new working tree
pushd ../.worktrees/security-analytics/backport-2.16
# Create a new branch
git switch --create backport-1324-to-2.16
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1ca40908ac7b050cc7eb90df3108ab8a21a1f6d1
# Push it to GitHub
git push --set-upstream origin backport-1324-to-2.16
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/security-analytics/backport-2.16 Then, create a pull request where the |
…arch-project#1324) Signed-off-by: Subhobrata Dey <[email protected]>
…#1330) Signed-off-by: Subhobrata Dey <[email protected]>
…#1330) Signed-off-by: Subhobrata Dey <[email protected]> (cherry picked from commit 038d60a)
…arch-project#1324) Signed-off-by: Subhobrata Dey <[email protected]>
…#1344) Signed-off-by: Subhobrata Dey <[email protected]>
…#1330) (#1343) Signed-off-by: Subhobrata Dey <[email protected]> (cherry picked from commit 038d60a) Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
@sbcd90 Good morning, It appears in 2.14 that the change to document level fanout opensearch-project/alerting#1521 could be the issue. Is the change above - to the way fan outs are handled an attempt to fix the performance of the security analytics detections? |
hi @ELKHostmaster , please bounce(stop and start) OpenSearch process in every data node. This would cleanup those tasks. Please delete(after deleting of detectors you may also want to delete .opensearch-sap-detectors* indices & cleanup monitors linked to the In the end, it should result in better performance. |
Hi @sbcd90 I am having a simmilar issue to this and have been following this. I want to try your suggestion but the part I do not get it is e-
How do I define this during detector creation? I use the API to create detectors. Are you talking about the index pattern you define in the inputs.detector_input.indices value? I currently use an alias that points to a datastream. Thank you for any help at all! |
hi @mimicbox , if you're currently using an api_output-000001 -> currently points as write index for index alias api_output provide |
@sbcd90 If I understand correctly, your saying point the detector to the index alias and use ISM to rollover the indices at regular intervals. I'm already doing that. There is something related to the fan-out change introduced in 2.14 that I think has really caused the performance issues. Is this fixed in 2.18? Is upgrading from 2.16 a possible solution? |
@ELKHostmaster , are you using Sigma Aggregation Rules? |
@sbcd90 do you mean - using any of the aggregation functions in sigma rules (https://help.sigmacomputing.com/docs/aggregate-functions) ? No, I dont believe so but I am scanning through them now. |
No correlation rules either. I've stripped it down to just our sigma detections as we've been trying to get to the bottom of the issue for the last month. |
hi @ELKHostmaster , please bounce(stop and start) OpenSearch process in every data node. This would cleanup those tasks. |
@sbcd90 thank you. yes, I have stopped/started every data node several times over the last month. its the only way to keep the cluster running at this point. I'm currently on 2.16 I thought this fix was added in 2.18. Are you saying it was backported to 2.17.1 ? I didn't think that backports resulted in a new release package being built. So, I didn't think the fix would be included. Also, when you say "odd" release... is there something different about odd/even releases? Thanks again for the responses! |
hi @ELKHostmaster , i dont think data nodes need to be stopped/started frequently. i would say take a thread dump of a data node when doc-level fanout tasks are stuck. That would help root cause the issue. yes. i think all these fixes should be in |
@sbcd90 Good evening. I waited until a couple data nodes got really busy and took jstack dumps for you to look at. I also included a list of tasks running on the cluster. You can see a lot of tasks are running for 8 hours now and growing since the last reboot of the nodes. I appreciate any help you can give. Thanks! |
@sbcd90 Good morning. Could I ask a favor of you? Could you take a look at the thread dumps above and see if the issue is related to the fanout change? Our cluster is unusable right now with detectors enabled/running. Circuit breaker issues, java heap exceptions, nodes dropping out, 99% cpu on data nodes, etc. This was an upgrade from 2.13 where everything was fine. The only change was upgrading to 2.16. Same detectors/rules, etc. Do you think the fixes to fanout in 2.18 will help us? Thank You in advance! |
Description
separate doc-level monitor query indices created by detectors
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.