Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add joins, insertion & search to correlation engine #7771

Closed
wants to merge 4 commits into from

Conversation

sbcd90
Copy link
Contributor

@sbcd90 sbcd90 commented May 26, 2023

Description

This PR focusses on the Join handler, Insertion handler & the Search handler components of the Events Correlation Engine. The component level architecture diagram can be found here

Join Handler

The Join handler component determines immediate neighbors of a particular event using the Correlation Rules defined by the user for the log indices(or datastreams) they wish to correlate.

Here is a brief schematic class diagram of the Join Handler layer.
image

Insertion Handler

In the Insertion handler layer events are converted to k-dimensional vectors & are stored in the vector storage layer mentioned above along with their correlations.

Here is a brief schematic class diagram of the Insertion Handler layer.
image

Search Handler

The Search Handler layer allows user to specify a particular event, & then converts it to a k-dimensional vector & then uses it to query its neighboring eventswhich are actually its correlated events within a time window.

Here is a brief schematic class diagram of the Search Handler layer.
image

REST apis

The following are the REST apis introduced by this PR.

Correlate an event

POST /_correlation/events

Request:
{
  "index": "app_logs",
  "event": "l2UZD4kBSz-dGVL1EZFJ",
  "store": false
}

Response:
{
  "is_orphan": false,
  "neighbor_events": {
    "windows": [
      "DmyTDI8B1GiWmmBKbkSu"
    ]
  }
}

Search Correlated Events for the input event

GET /_correlation/events?index=windows&event=FZ-9DI8BfF4uBOcpmLhq&timestamp_field=winlog.timestamp&time_window=300000&nearby_events=5

Response:
{
  "events": [
    {
      "index": "app_logs",
      "event": "Fp-9DI8BfF4uBOcpmLj1",
      "score": 0.5,
      "tags": []
    }
  ]
}

Related Issues

#6854

Check List

  • New functionality includes testing.
    • All tests pass
  • [] New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jun 25, 2023
@sbcd90 sbcd90 marked this pull request as ready for review June 26, 2023 19:52
@sbcd90 sbcd90 requested a review from sachinpkale as a code owner June 26, 2023 19:52
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Attention: Patch coverage is 68.37325% with 383 lines in your changes missing coverage. Please review.

Project coverage is 71.83%. Comparing base (b15cb0c) to head (46dafcb).
Report is 441 commits behind head on main.

Files Patch % Lines
...nts/transport/TransportStoreCorrelationAction.java 66.33% 104 Missing and 32 partials ⚠️
...nts/transport/TransportIndexCorrelationAction.java 58.29% 86 Missing and 17 partials ⚠️
...ansport/TransportSearchCorrelatedEventsAction.java 64.45% 39 Missing and 20 partials ⚠️
...h/plugin/correlation/utils/CorrelationIndices.java 11.11% 40 Missing ⚠️
...h/plugin/correlation/events/model/Correlation.java 86.82% 3 Missing and 14 partials ⚠️
...ch/plugin/correlation/EventsCorrelationPlugin.java 20.00% 4 Missing ⚠️
...lation/events/action/IndexCorrelationResponse.java 80.00% 4 Missing ⚠️
...lugin/correlation/events/model/EventWithScore.java 91.66% 1 Missing and 3 partials ⚠️
...events/resthandler/RestIndexCorrelationAction.java 66.66% 4 Missing ⚠️
.../resthandler/RestSearchCorrelatedEventsAction.java 78.94% 4 Missing ⚠️
... and 5 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7771      +/-   ##
============================================
+ Coverage     71.42%   71.83%   +0.41%     
- Complexity    59978    62306    +2328     
============================================
  Files          4985     5135     +150     
  Lines        282275   293039   +10764     
  Branches      40946    42306    +1360     
============================================
+ Hits         201603   210508    +8905     
- Misses        63999    65154    +1155     
- Partials      16673    17377     +704     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 4b4a92f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Subhobrata Dey <[email protected]>
Copy link
Contributor

✅ Gradle check result for 5e649c0: SUCCESS

shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ject#9956)

Fix is pending in opensearch-project#7771, but that PR may take some time to land in main
so muting for the time being.

Signed-off-by: Andrew Ross <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
/**
* Rest action for indexing an event and its correlations
*/
public class RestIndexCorrelationAction extends BaseRestHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these actions support timeouts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added search timeouts to these actions.

client.search(searchRequest, new ActionListener<>() {
@Override
public void onResponse(SearchResponse response) {
if (response.isTimedOut()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be 429s as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added check for 429

int idx = 0;
for (MultiSearchResponse.Item response : responses) {
if (response.isFailure()) {
log.error("error:", response.getFailure());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more details to log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this.

*
* @opensearch.internal
*/
public class TransportIndexCorrelationAction extends HandledTransportAction<IndexCorrelationRequest, IndexCorrelationResponse> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move the core logic to EventsCorrelationManager or something of that sort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applicable for all transport classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can address this in a follow-up pr.

if (response.getStatus().equals(RestStatus.OK)) {
onOperation(true, new HashMap<>());
} else {
onFailures(new OpenSearchStatusException("Failed to store correlations", RestStatus.INTERNAL_SERVER_ERROR));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need specific error handling here when it failed to index vectors? lets say why would it fail with 4xx (besides 429), Also for intermittent error in case it did index but the response failed, is that an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, RestStatus.OK -> success, any other status -> failures. This is returned by a custom transport action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can enhance error handling in a follow-up pr.

List<CorrelationQuery> correlationQueries = indexQueriesEntry.getValue();

// assuming all queries belonging to an index use the same timestamp field.
String timestampField = correlationQueries.get(0).getTimestampField();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to validate here the assumption for timestamp field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation.

continue;
}

Iterator<SearchHit> searchHits = response.getResponse().getHits().iterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be all results right as you are not setting size param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. pagination is missing. can address this in follow-up prs.

searchRequest.indices(index);
searchRequest.source(searchSourceBuilder);

client.search(searchRequest, new ActionListener<>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple search requests are chained as PiT is not being used, can these results become inconsistent as they will be executed against different segments?

onFailures(new OpenSearchStatusException(response.toString(), RestStatus.REQUEST_TIMEOUT));
}

Iterator<SearchHit> hits = response.getHits().iterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any limits on the no. of hits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10000 is default limit.

});
}

private void prepRulesForCorrelatedEventsGeneration(String index, String event, List<CorrelationRule> correlationRules) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are very long method, can break into multiple small private methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can address this refactoring in a follow-up pr.

@Override
protected void doExecute(Task task, IndexCorrelationRequest request, ActionListener<IndexCorrelationResponse> listener) {
AsyncIndexCorrelationAction asyncAction = new AsyncIndexCorrelationAction(request, listener);
asyncAction.start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are heavy weight transport actions chaining multiple queries as well. In which threadpool they would run. They shouldn't run on transport threadpool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today, all of them run in transport threadpool.

*
* @opensearch.internal
*/
public class TransportSearchCorrelatedEventsAction extends HandledTransportAction<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any stats planned for CorrelatedEvents Actions?

Copy link
Contributor

❕ Gradle check result for 46dafcb: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think a correlation engine is useful and believe that the OpenSearch Project should deliver this feature. However, I remain unconvinced that the OpenSearch repository maintainers are the right people to take ownership of this code base. I personally do not have the bandwidth to build up the domain expertise required to properly review this code, and the fact that this PR has languished for over a year suggests that the case is similar for other maintainers. Since this feature is being architected as a plugin, I don't think there is any technical reason this code needs to live inside the OpenSearch repository. I also believe there are significant advantages to being able to iterate and evolve at its own pace, without being coupled to all the bottlenecks and pain points (flaky tests) that exist in the OpenSearch repository. Also note that a convention has been recently codified that plugins should be developed in their own repository as opposed to inside the OpenSearch repository. If another maintainer wants to make the case that this is the right place for this feature, then I will happily withdraw my objection.

/cc @reta @dblock @shwetathareja

@dblock
Copy link
Member

dblock commented Jun 25, 2024

I agree with @andrross, should we close this?

@praveensameneni
Copy link
Member

@dblock, @andrross , @shwetathareja , @reta
I can understand the reservations on bringing in the plugin into core opensearch repository. As per the conventions - " .. If you think it should be there anyway, please explain why in the RFC stage." We discussed this topic last year when we first started the RFC and had a core maintainer opine and another at here.

I understand maintainers are not permanent and there will be new maintainers and ideas, however as an open source project, what is the guiding principle for these kind of changes. The convention says discuss it during RFC stage and we duly followed the process.

This would be a great precedent for a non-maintainer to contribute and discuss the feature and benefits and qualify to become a core maintainer or we will always be restricting contributions and not grow the project.

@reta
Copy link
Collaborator

reta commented Jun 26, 2024

I understand maintainers are not permanent and there will be new maintainers and ideas, however as an open source project, what is the guiding principle for these kind of changes. The convention says discuss it during RFC stage and we duly followed the process.

@praveensameneni the OpenSearch ecosystem evolves fast and along the way the things that do work or don't become more apparent (plus new challenges always come in). The recently published guidelines (which we did not have a year ago) are trying to touch upon where new plugins belong. I agree with @andrross point and @dblock that it should be out of core at this moment of time.

This would be a great precedent for a non-maintainer to contribute and discuss the feature and benefits and qualify to become a core maintainer or we will always be restricting contributions and not grow the project.

The contributor could become the maintainer on the plugin repo (and continue contribution to core to become a maintainer there as well), we have quite a few successful stories like that already.

@praveensameneni
Copy link
Member

@reta
It's currently a plugin inside of core repository, creating an additional plugin repository will also introduce additional dependency and operational overheads.
Example: Security Analytics has several use cases around correlations, and that is where we started, iterated on and evolved the functionality. Similarly Observability and any log analytics (e.g. application logs) would also have a use case of correlations - show me correlated events when there is an increase in 5xx or response times. And if we follow the separate plugin path, each of the plugins would have to take a dependency on the new plugin (e.g. correlations).

While the guidelines were recently added, the discussion from last year when we created the RFC still holds true.
Also, the recently added guideline says " ..If you think it should be there anyway, please explain why in the RFC stage."
It appears the above sentence does not truly mean what it says unless am missing something.

I understand the guidelines around new plugins, which unfortunately is biased towards maintainers domain expertise and does not fully account for integrating dependencies in plugin ecosystem. Additionally, It would also be better to take into account the use case and work backwards from the problem it's trying to solve and the value add for the community.

@andrross
Copy link
Member

And if we follow the separate plugin path, each of the plugins would have to take a dependency on the new plugin (e.g. correlations).

@praveensameneni I just want to reiterate that the need to take a dependency on a new plugin is true regardless of where this code lives. The OpenSearch Project publishes many plugins, some of which reside in this repo, some of which reside in other repos, and consumers need not know or care.

the OpenSearch ecosystem evolves fast and along the way the things that do work or don't become more apparent

What has become apparent to me in this case is that the decision for the OpenSearch maintainers to take ownership of this code has not worked, as shown by this PR being open for over a year. The current state of the correlation engine is that some basic functionality has been committed in this repo on the main branch (I don't think enough is there to be practically useful) and none of it has been backported to 2.x so it has never been released.

@praveensameneni
Copy link
Member

@praveensameneni I just want to reiterate that the need to take a dependency on a new plugin is true regardless of where this code lives. The OpenSearch Project publishes many plugins, some of which reside in this repo, some of which reside in other repos, and consumers need not know or care.

If it's a separate plugin repo outside of core repo, you take a dependency through common-utils, the plugin repo jar among others, however when shipped as a plugin (e.g. ActionPlugin, ScriptPlugin, ReloadablePlugin, ExtensiblePlugin) inside of core repo, we do not have to take an additional jar, common-utils dependency.

Just to clarify, we will continue to support and maintain the code.
We are adding fundamentals of correlations and the consumers (e.g. Observability, Security Analytics) will leverage these.

@reta
Copy link
Collaborator

reta commented Jun 26, 2024

If it's a separate plugin repo outside of core repo, you take a dependency through common-utils, the plugin repo jar among others, however when shipped as a plugin (e.g. ActionPlugin, ScriptPlugin, ReloadablePlugin, ExtensiblePlugin) inside of core repo, we do not have to take an additional jar, common-utils dependency.

@praveensameneni the plugins are distributed as ZIP archives (bundles), not a JAR files.The OpenSearch distribution comes with some plugins preinstalled / prebundled. What is the plugin repo jar you are referring to?

@praveensameneni
Copy link
Member

Thank you @reta , @andrross for your insightful comments.
We will create a new plugin repository to support correlations and iterate on the development and release. However, as and when there is more interest in the correlations, we would like to integrate into opensearch-core (not as a plugin) to provide an integrated experience with search and query capabilities leveraging some of the core Lucene constructs.

@praveensameneni
Copy link
Member

We will take the PR and previously merged PR's into the new opensearch-correlations plugin repository.

@andrross
Copy link
Member

andrross commented Jun 27, 2024

If it's a separate plugin repo outside of core repo, you take a dependency through common-utils, the plugin repo jar among others, however when shipped as a plugin (e.g. ActionPlugin, ScriptPlugin, ReloadablePlugin, ExtensiblePlugin) inside of core repo, we do not have to take an additional jar, common-utils dependency.

For posterity, I want to clarify a few points here. There are a couple different types of dependencies between plugins as far as I know.

  1. A plugin adds a capability in the core, and another plugin depends on that capability. Example: the k-NN plugin adds vector storage and search capabilities to the existing indexing and search APIs. Plugins like neural-search then build additional features on top of those vector primitives. In this particular case, neural-search also takes a compile dependency on the k-NN jar to get access to some Java classes defined by k-NN.
  2. A plugin itself defines an extensible capability that other plugins can extend using the Java SPI mechanism. job-scheduler is an example of this.

I'm honestly not sure what category the correlation engine will fall into (thus far I haven't seen it defining an SPI interface so I would guess category 1), but the overall point stands that to a consumer plugin where the dependent plugin code lives should not be important. In both the k-NN and job-scheduler cases they are plugins defined in external repos. I'm not actually familiar with common-utils but I believe it is really just a common library for sharing code across multiple plugins.

@andrross
Copy link
Member

I'm going to close this PR as it looks like we have a path forward as a separate repository. As the feature matures and the code evolves, if the plugin model is no longer the right fit we can absolutely re-evaluate integrating the feature directly into the core, but for now I believe a separate repository is the right place.

@andrross andrross closed this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants