Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.11][Telemetry] Diagnostic Alert Telemetry #84422

Merged
merged 24 commits into from
Dec 10, 2020

Conversation

pjhampton
Copy link
Contributor

@pjhampton pjhampton commented Nov 26, 2020

Summary

This PR extends the existing security telemetry collection by transmitting diagnostic alerts via a Kibana task manager.

Related PRs:

Implementation

  • @tsg - create a task manager task that is executed every ~5 minutes
  • @pjhampton - each execution:
    • Confirm telemetry is enabled. If it's disabled, don't query for diag alerts.
    • sorting by the event.ingested field.
    • Limit the result set to 100 events.
  • @pjhampton - Call the queueTelemetryEvents function from the EventsTelemetry (See: [Security] Alert Telemetry for the Security app #77200)
  • Query the index we decide to for the time since last execution to present. Record the last execution time

Checklist

For maintainers

@pjhampton pjhampton added Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! v7.11.0 labels Nov 26, 2020
@pjhampton pjhampton self-assigned this Nov 26, 2020
@pjhampton pjhampton mentioned this pull request Nov 26, 2020
14 tasks
@pjhampton
Copy link
Contributor Author

Just dropping an update on this work item - there turned out to be a couple of background data plumbing pieces that needed to be put in place before this could be tested e2e. I hope to get them all in within the next day or 2.

@pjhampton pjhampton changed the title Diagnostic Alert Telemetry [7.10][Telemetry] Diagnostic Alert Telemetry Dec 4, 2020
@pjhampton pjhampton changed the title [7.10][Telemetry] Diagnostic Alert Telemetry [7.11][Telemetry] Diagnostic Alert Telemetry Dec 4, 2020
Remove 2nd var to track telemetry opt in.

Add ES client to start querying index.

Use query to get docs from a dummy index.

Change how index is queried.

Get diagnostic alerts to send to staging cluster.

Record last timestamp.

PoC on telemetry opt in via 2 processes.

Revert to original solution
@pjhampton pjhampton force-pushed the pjhampton/diagnostic-alert-telemetry branch from bace39b to e091be3 Compare December 4, 2020 16:56
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton pjhampton marked this pull request as ready for review December 8, 2020 16:34
@pjhampton pjhampton requested review from a team as code owners December 8, 2020 16:34
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@pjhampton pjhampton requested a review from jeska December 8, 2020 16:35
@legrego legrego added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Dec 8, 2020
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

sort: [
{
'event.ingested': {
order: 'asc',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want asc here so we get the most recent events?

Copy link
Contributor Author

@pjhampton pjhampton Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it wrong. Updated here 5638da9
desc will order by most recent I believe from my testing.

public async fetchDiagnosticAlerts() {
const query = {
expand_wildcards: 'open,hidden',
index: 'logs-endpoint.diagnostic.collection-default*',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need logs-endpoint.diagnostic.collection-* here, because I think @ferullo was saying that the diagnostic alerts will respect the namespace setting, so they might come with something else than default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated here: 2018132

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Member

@jeska jeska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc in analytics-staging looks good!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Distributable file count

id before after diff
default 46981 47743 +762

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@stevewritescode stevewritescode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I reviewed the new allowlist fields and the content is consistent with other data that we're already collecting, so good to go on that front.

One minor question about the task scheduler.

return `${TelemetryDiagTaskConstants.TYPE}:${TelemetryDiagTaskConstants.VERSION}`;
};

public runTask = async (taskId: string, searchFrom: string, searchTo: string) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an objection in the code, just a question: If there are multiple Kibana instances, it is possible for this task to be running simultaneously in multiple instances? Or does the task manager ensure that only one execution can happen at any given time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevewritescode Good question. The taskManager uses a distributed model which ensures that only a single Kibana instance will run the task. Each Kibana instance polls for new tasks on an interval and attempts to "claim" tasks that are ready to fire. If another Kibana instance tries to claim the same task, only one will succeed and the others will get a 409 Conflict as per OCC. If I'm remembering all this correctly. :)

Copy link
Contributor

@tsg tsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@pjhampton pjhampton merged commit 6e7fb4a into master Dec 10, 2020
pjhampton added a commit that referenced this pull request Dec 10, 2020
* Port @tsg's work on task manager.

Remove 2nd var to track telemetry opt in.

Add ES client to start querying index.

Use query to get docs from a dummy index.

Change how index is queried.

Get diagnostic alerts to send to staging cluster.

Record last timestamp.

PoC on telemetry opt in via 2 processes.

Revert to original solution

* Update on agreed method. Fixes race condition.

* Expand wildcards.

* stage.

* Add rule.ruleset collection.

* Update telemetry sender with correct query for loading diag alerts.

* Add similar task tests to endpont artifact work.

* Fix broken import statement.

* Create sender mocks.

* Update test to check for func call.

* Update unused reference.

* record last run.

* Update index.

* fix import

* Fix test.

* test fix.

* Pass unit to time diff calc.

* Tests should pass now hopefully.

* Add additional process fields to allowlist.

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
@pjhampton pjhampton deleted the pjhampton/diagnostic-alert-telemetry branch February 3, 2021 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:enhancement Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants