-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] Move Monitoring collection strategy to a collector #82638
[Telemetry] Move Monitoring collection strategy to a collector #82638
Conversation
b2c9661
to
2eea853
Compare
2eea853
to
a448d23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review explaining a few of my thoughts (and planned follow-up PRs) about some of these changes.
@@ -13,7 +13,6 @@ | |||
], | |||
"optionalPlugins": [ | |||
"infra", | |||
"telemetryCollectionManager", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The monitoring
plugin no longer depends on the telemetryCollectionManager
🎉
// TODO: For the telemetry plugin to work, we need to provide the new ES client. | ||
// The new client should be inititalized with a similar config to `this.cluster` but, since we're not using | ||
// the new client in Monitoring Telemetry collection yet, setting the local client allows progress for now. | ||
// The usage collector `fetch` method has been refactored to accept a `collectorFetchContext` object, | ||
// exposing both es clients and the saved objects client. | ||
// We will update the client in a follow up PR. | ||
this.telemetryElasticsearchClient = elasticsearch.client; | ||
this.telemetrySavedObjectsService = savedObjects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These clients can be removed now that the monitoring
plugin does not need to catch up with the changes in the telemetryCollectionManager
🎉
@@ -174,6 +157,11 @@ export class Plugin { | |||
}); | |||
|
|||
registerCollectors(plugins.usageCollection, config, cluster.callAsInternalUser); | |||
registerMonitoringTelemetryCollection( | |||
plugins.usageCollection, | |||
cluster.callAsInternalUser, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, if we add to the fetch
context the KibanaRequest
(only when available), the collector itself can decide to scope the request as needed (it would also apply to the other collectors registered in registerCollectors
), and avoid exposing data to users that shouldn't see it. We can revisit it in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afharo we had decided to hold off on adding the KibanaRequest
to the fetch
context until there was a need for it. Is there a new need for that now and, if so, how urgently do we need to address that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TinaHeiligers, thank you for confirming that. I am aware of that current action being held until a need is identified. That's also why I preferred to deal with this issue on a separate PR. So we can properly discuss about it.
The need I can see in this situation (both in lines 159 and 162) is that those collectors are always using the internal user so, if a user with fewer permissions requests the telemetry sample flyout, we'll expose data about the monitored clusters, potentially exposing data that user shouldn't see.
In this monitoring use-case, its collectors can't use the esClient
/callCluster
provided in the fetch
context because they need the cluster
client instead (pointing to the actual monitoring data). If we decide to provide the KibanaRequest
(only when available), these collectors (the new one I did and the existing ones) can properly scope their request to avoid any unwanted exposure of data.
Regarding the urgency, I guess it's up to us to decide. Personally, I'd vote for having it sorted before 7.11's FF because of the potential leak of information. Once we have this PR merged, I can build a quick POC to identify the implications it may have, if you think that's OK 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a reference to the existing issue: #75875 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created the PR #83413 to fix this. Once it's merged, I'll rebase this one and update accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has been merged and I've applied the changes in this one to use it
delete stats.stack_stats.kibana!.plugins.monitoringTelemetry; | ||
} | ||
return [...acc, stats, ...(monitoringTelemetry || [])]; | ||
}, [] as TelemetryAggregatedStats[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may feel like we are mixing concepts, and that this telemetry collection strategy needs to know about other existing collectors to reshape the format that is ultimately sent to the telemetry service. But, afaik, that's one of the duties in the telemetry collection strategies (the OSS strategy also extracts some collectors to shape them into something else).
On an unrelated note, I'm planning on a follow-up PR that will change the behaviour of the telemetryCollectionManager: It will use this X-Pack strategy as a replacement of the OSS strategy (instead of the current priority-based loop, trying the next lower-priority strategy if the previous one fails).
const { body } = await supertest | ||
.post('/api/telemetry/v2/clusters/_stats') | ||
.set('kbn-xsrf', 'xxx') | ||
.send({ timestamp, unencrypted: true }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp
parameter was only used for the monitoring
strategy to retrieve the right time-frame of telemetry data. Now it's useless and confusing. I'll create a follow-up PR to remove it.
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering we can de-couple even more than this. What concerns me is that there is still a good chunk of telemetry code within the monitoring
plugin.
What does the telemetry code need to function outside of the monitoring
plugin? I'm guessing at a minimum, it needs access to the monitoring cluster, or at least an exposed callCluster
that is connected to it.
Does anyone else feel that this level of separation will be ideal and attainable?
@chrisronline ideally, Monitoring will just be another usage collector and provide the data that it needs to for telemetry purposes. I'll leave it up to @afharo to discuss the details but in short, there are a number of cases that we can't really get around easily right now. |
I just worry that there is code inside of the I know we previously agreed upon this solution, but after seeing it in action, I wonder if we get to a better endgame. |
@chrisronline I'm sure we can. For starters, The only reason I didn't implement it here is to make the PRs as concise and easy to review as possible. It's been proven in the past that too-long PRs are easier to introduce bugs. However, I'm happy to push all the changes to this PR if you think it'll be clearer to see the final stage 🙂
I feel your concerns, and I agree we need to be cautious so we don't break things downstream. On the flip side, I don't think that code can be fully owned by the telemetry team: we don't fully understand the content of the That's part of the reason for us now to report I also agree, though, that it shouldn't be fully owned by the monitoring team. It feels to me more like a joined effort.
I'm not a big fan of this solution for this specific purpose. Other plugins can exploit it to gain access to |
@afharo Thanks for the thoughts, much appreciated. Let's move forward with this PR as-is! |
…ove-monitoring-strategy
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…82638) (#83686) Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
… into add-logs-to-node-details * 'add-logs-to-node-details' of github.com:phillipb/kibana: (87 commits) [Maps] Add 'crossed' & 'exited' events to tracking alert (elastic#82463) Updating code-owners to use new core/app-services team names (elastic#83731) Add Managed label to data streams and a view switch for the table (elastic#83049) [Maps] Add query bar inputs to geo threshold alerts tracked points & boundaries (elastic#80871) fix(NA): search examples kibana version declaration (elastic#83182) Fixed console error, which appears when saving changes in Edit Alert flyout (elastic#83610) [Alerting] Add `alert.updatedAt` field to represent date of last user edit (elastic#83578) Not resetting server log level if level is defined (elastic#83651) disable incremenetal build for legacy tsconfig.json (elastic#82986) [Workplace Search] Migrate SourceLogic from ent-search (elastic#83593) [Workplace Search] Port Box changes from ent-search (elastic#83675) [APM] Improve router types (elastic#83620) Bump flat to v4.1.1 (elastic#83647) Bump y18n@5 to v5.0.5 (elastic#83644) Bump jsonpointer to v4.1.0 (elastic#83641) Bump is-my-json-valid to v2.20.5 (elastic#83642) [Telemetry] Move Monitoring collection strategy to a collector (elastic#82638) Update typescript eslint to v4.8 (elastic#83520) [ML] Persist URL state for Anomaly detection jobs using metric function (elastic#83507) [ML] Performance improvements to annotations editing in Single Metric Viewer & buttons placement (elastic#83216) ...
…ic#82638) Co-authored-by: Kibana Machine <[email protected]>
Summary
As agreed in previous discussions, this PR is migrating the telemetry collection strategy registered by Monitoring to a StatsCollector.
This change allows us to evolve the Telemetry and the Monitoring plugins without affecting each other as much as they currently do. It also improves the amount and quality of data we collect, fixing use cases that would skip sending telemetry about production monitoring clusters, or missing fields only sent by local collectors.
For full details about the discussions, refer to the RFC.
Planned follow-up PRs
As stated in my initial self-review, for the sake of easier review, I think that we need to implement a few follow-up PRs (already added to the teams' roadmap):
telemetryCollectionManager
to use this X-Pack strategy as a replacement of the OSS strategy (comment).monitoring
indices #83521)timestamp
in/api/telemetry/v2/clusters/_stats
(comment)KibanaRequest
(only when available) in thefetch
method's context so collectors can scope their requests when possible (comment)Checklist
Delete any items that are not applicable to this PR.
For maintainers