-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Add usage telemetry for package policy upgrade conflicts #109870
Comments
Pinging @elastic/fleet (Team:Fleet) |
This is great thanks for filing this! |
So I was coming up with this format to add to a new custom collector under fleet usage collectors.
{ "package_policy_upgrades": [
{
"package_name": "apache",
"current_version": "0.3.3",
"new_version": "1.1.1",
"status": "success"
},
{
"package_name": "aws",
"current_version": "0.3.3",
"new_version": "1.1.1",
"status": "failure",
"error": [{
"key":"inputs.cloudtrail-aws-s3.streams.aws.cloudtrail.vars.queue_url",
"message":["Queue URL is required"]
}]
}
]} |
There are 2 ways to add upgrade telemetry:
Pros and cons of Event Based service:
Cons:
Technically for each data type, we could create a new channel, indexer and job in new telemetry cluster. |
Telemetry is challenging with our release model because any bugs in the collection process cannot be fixed for long periods of time. Processing events on the ingest side also feels much simpler when we're trying to answer questions like "how often does X happen". Basic counts of this are simple enough to do with a usage collector but it breaks down quickly if you want to segment on any property (eg. package name, package version, user role, etc.). Events naturally give us this with a very simple collection mechanism that is unlikely to have bugs with the flexibility to massage data afterwards (if needed, often it won't be). IMO experimenting with an event approach could be well worth the effort and give us deeper insight into how users are using our application and lower maintenance cost. I lean towards writing a very simple event sender, largely based on the one that Security Solution has already built. For reference, Security Solution's implementation lives here:
|
I think we'll want some guardrails in the initial implementation to be sure we don't send too much data that would either:
I think a cap on size of payloads and using periodic batching would be adequate for this purpose. |
An update on this: As for the upgrade telemetry, I got it working locally with both collector and sending directly to a fleet channel. @jen-huang @mostlyjason @joshdover Example by using collectors: {
"stack_stats": {
"kibana": {
"plugins": {
"fleet": {
"package_policy_upgrades": [
{
"package_name": "apache",
"current_version": "0.3.3",
"new_version": "1.1.1",
"status": "success"
},
{
"package_name": "aws",
"current_version": "0.6.1",
"new_version": "1.3.0",
"status": "failure",
"error": [
{
"key": "inputs.cloudtrail-aws-s3.streams.aws.cloudtrail.vars.queue_url",
"message": [
"Queue URL is required"
]
}
]
}
]
}
}
}
}
}
Example using event based service: {
"package_policy_upgrade": {
"package_name": "apache",
"current_version": "0.3.3",
"new_version": "1.1.1",
"status": "success"
}
} Also I have been playing around with the data and it might make sense to add some categorization to error messages. E.g. required fields contain their field name in the error message. At least change this to a generic "Field is required" message. These are the possible input validation errors that I found in the code https://github.com/elastic/kibana/blob/master/x-pack/plugins/fleet/common/services/validate_package_policy.ts#L227
|
@juliaElastic Is this now closed by #115180? |
Closing this as changes are done. |
Hi @juliaElastic Build details: Steps followed:
Could you please confirm if we are missing anything? cc: @EricDavisX |
@amolnater-qasource as discussed on slack, the description was outdated, since the solution is not using collectors, only sending events directly to new telemetry api. So the only way to verify this is to check debug logs and check the events on telemetry staging link |
Hi @juliaElastic As discussed this telemetry staging link is not accessible to us. @EricDavisX Please let us know if we can skip this test. Thanks |
Hi - I'll ask the team if they have coverage over Telemetry or if the risk is so minimal such that they do not need manual tests? @juliaElastic and @jen-huang it's your call. We can deprecate this case or modify it to a 'best effort', or we can submit whichever request is needed to grant access to the telementry cluster if desired / appropriate. Please advise. |
@EricDavisX I think the risk is minimal here, since we are adding telemetry. To verify that the events were sent, you can check in kibana debug logs. I don't recall having to request access to telemetry staging link, you could ask on #telemetry channel on what is needed. |
In #106048 we're adding the ability to upgrade package policies, both manually and automatically when possible. During some package policy upgrades, users will be required to take manual action when a package's inputs change in a way that make the updated package policy fail validation. This can happen due to changes like:
In some of these scenarios, there are additional enhancements we may want to consider to eliminate these conflict scenarios and increase the likelihood that packages upgrades are seamless as possible for users. In order to know where to focus our efforts, we should collect telemetry on package policy upgrades in order to answer questions like:
Related:
The text was updated successfully, but these errors were encountered: