Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cogburn/ai descriptions #606

Merged
merged 12 commits into from
Aug 8, 2024
Merged

Cogburn/ai descriptions #606

merged 12 commits into from
Aug 8, 2024

Conversation

coreyogburn
Copy link
Contributor

Each engine is configured to manage the same AI summary repo and branch. These summaries are kept in memory and applied to detections that are requested over the GET /api/detection/{id} route. These summaries will be displayed in the UI if 1) the summary has been reviewed by a human or 2) the grid has been configured to show unreviewed summaries. Because we have fingerprints of the rule bodies, we can even indicate in the UI when a particular summary might be stale (i.e. generated for an older instance of the rule). Users may configure their grid to not show AI summaries if they so choose.

Each engine now keeps track of the AI Summaries generated for all the rules of that engine. When the module starts and during Syncs, an engine will update the AI repo and reload the YAML file.

The UI will show the AI Summary if it is present and marked as reviewed. Otherwise, the UI falls back to the extracted description.

On the detection getter, call the new MergeAuxilleryData function on the proper engine so that this AI info is present on the detection when requested. This is the only endpoint that returns AI data, searches will not contain these fields.

readAiSummary is mocked to make progress. It will be implemented soon.

All engines should be configured to put the AiRepo in the same location but this location should NOT be the same as Sigma's rule repos. This would present a problem if somebody didn't use SO's Sigma rules.
Moved AiSummary from detections to model so I could mock AiLoader without creating an import cycle.

To-do:
1) Ensure AiSummary's fields line up with the final model.
2) Ensure detections.readAiSummary looks for the summaries in the correct place.
Pass the language name instead of the engine name into RefreshAiSummaries to properly build the expected filename.
Refactored readAiSummary to work with publicId based dictionary.

Updated test to properly simulate repo contents.
Since the Ai Summary doc has the MD5 fingerprints of every rule to track when a rule changes and needs a new summary, I can use that on the backend to know if the summary applies to the latest rule or not. Because the rules don't change drastically, we'll continue to show the summary, but with an additional line that it might be old.

Included a way for users to turn off AI summaries on a per engine basis.
Added some logging. Updated existing log statements for consistency.
Instead of rendering this or that, I've intermingled the rendering code because the section didn't get as complicated as I expected.
New client parameter to allow AI Summaries to show even if they haven't been reviewed yet.

When cloning a repository, a branch may be specified to clone. If no branch is specified, the server will respond with the default branch.

Fix an issue where the old description would still show under the AI summary.
The branch is also necessary when pulling from a non-standard branch repo. Updated IOManager.PullRepo to accept a branch.

Because of the number of Suricata rules, parsing the AI Summaries yaml file can take 30+ seconds. Other YAML libraries weren't reading any faster. readAiSummary was refactored to watch the value of isRunning and abandon the unmarshalling (safe, no side effects) early if the module terminates before unmarshalling is finished. This allows for a good response time to quit on ctrl+c.

Renamed mio to iom in tests for consistency.
When the service starts, all 3 engines will attempt to refresh the exact same AI summary repo at about the same time. Before this change, they'd line up one at a time to pull the repo and then read the repo contents. Now, the first engine to refresh the AI summary repo successfully will allow the other engines to skip the refresh if they try to update in the next 5 seconds. Using a RWMutex allows them all to read the summaries at the same time. This means every engine loads their summaries quickly and none of them have to wait on suricata's long lock hold if the timings aren't optimal. This really only helps during service startup, but it helps it feel snappier. After that, the engines will probably be more than 5 seconds out of sync with each other.

These additions don't impact tests as unit tests aren't running multiple engines and cypress tests aren't checking for AI summaries yet.
Grammar changes, detailed logging fields, branch name tweaks, future proofing.
mc-wright
mc-wright previously approved these changes Aug 8, 2024
"publicId": detectId,
}).Error("retrieved detection with unsupported engine")
} else {
err = eng.MergeAuxilleryData(detect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "auxiliary," not auxillery.

@coreyogburn coreyogburn merged commit 0be0347 into 2.4/dev Aug 8, 2024
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 8, 2024
@coreyogburn coreyogburn deleted the cogburn/ai-descriptions branch August 8, 2024 21:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants