-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kubernetes] Bump package-spec format_version to 2.9.0 #7144
[Kubernetes] Bump package-spec format_version to 2.9.0 #7144
Conversation
Upgrade the integration package-spec format_version from 1.0.0 to 2.9.0 Here're the adjustments needed due to the new validations in place: - remove the deprecated attributes `license` and `release` from the manifest.yml - remove duplicated `pod.ip` definitions - update datasets in `sample_event.json` files We need this upgrade in preparation for other changes (routing rules) that require a more recent versions of the package-spec.
🌐 Coverage report
|
@@ -81,7 +81,7 @@ | |||
"address": "kube-state-metrics:8080" | |||
}, | |||
"event": { | |||
"dataset": "kubernetes.node", | |||
"dataset": "kubernetes.state_node", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I wonder how this change occured? It looks like a breaking change to me 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Metricbeat module, and it seems this metricset is using the kubernetes.state_node
dataset since the early days.
Is it possible that we manually edited the sample_event.json
file, and we never noticed this error until the elastic-package
validation caught it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a reference to the relevant section in the Troubleshooting upgrades from Package Spec v1 to v2 guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to elastic/beats@b3f6632#diff-a60c719544b7cf37e25376068d5f8f95c520a6d0d181a1c7a8d7b0d96abdca2eR112 the state_
prefix should not be there.
I suggest to wait a bit for @constanca-m and @gizas to double check this.
Not sure what we miss here and when but looks important to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values are we getting for event.dataset
by running the latest released integration version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, for the records, here's the error that the elastic-package returns during execution:
$ elastic-package test static -v
2023/07/26 14:36:37 DEBUG Enable verbose logging
Run static tests for the package
--- Test results for package: kubernetes - START ---
FAILURE DETAILS:
kubernetes/state_node Verify sample_event.json:
[0] field "event.dataset" should have value "kubernetes.state_node", it has "kubernetes.node"
kubernetes/state_persistentvolume Verify sample_event.json:
[0] field "event.dataset" should have value "kubernetes.state_persistentvolume", it has "kubernetes.persistentvolume"
kubernetes/state_resourcequota Verify sample_event.json:
[0] field "event.dataset" should have value "kubernetes.state_resourcequota", it has "kubernetes.resourcequota"
kubernetes/state_storageclass Verify sample_event.json:
[0] field "event.dataset" should have value "kubernetes.state_storageclass", it has "kubernetes.storageclass"
@@ -1,10 +1,6 @@ | |||
- name: kubernetes.pod | |||
type: group | |||
fields: | |||
- name: ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened here? don't we have pod.ip anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have one. According to elastic-package
with the latest package-spec this one is a duplicate.
I applied my best judgement to pick the one to remove, but maybe it is worth double-check with you. Here's the output of elastic-package
when executed on today's main
branch with the package-spec format_version
set to 2.9.0
:
# elastic-package check -v
....
Error: checking package failed: linting package failed: found 4 validation errors:
1. file "/Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/manifest.yml" is invalid: field (root): Additional property release is not allowed
2. file "/Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/manifest.yml" is invalid: field (root): Additional property license is not allowed
3. field "kubernetes.pod.ip" is defined multiple times for data stream "pod", found in: /Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/data_stream/pod/fields/base-fields.yml, /Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/data_stream/pod/fields/fields.yml
4. field "kubernetes.pod.ip" is defined multiple times for data stream "state_pod", found in: /Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/data_stream/state_pod/fields/base-fields.yml, /Users/zmoog/code/projects/zmoog/integrations/packages/kubernetes/data_stream/state_pod/fields/fields.yml
Here's the relevant last two errors, unpacked:
field "kubernetes.pod.ip" is defined multiple times for data stream "pod", found in:
.../packages/kubernetes/data_stream/pod/fields/base-fields.yml,
.../packages/kubernetes/data_stream/pod/fields/fields.yml
field "kubernetes.pod.ip" is defined multiple times for data stream "state_pod", found in:
.../packages/kubernetes/data_stream/state_pod/fields/base-fields.yml,
.../packages/kubernetes/data_stream/state_pod/fields/fields.yml
@gsantoro do you think updating fields.yml
is the right choice, or should we update base-fields.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. I didn't fully understand it until now.
So both pod and state_pod define kubernetes.pod.ip twice each in fields.yml and base-fields.yml.
That's clearly a mistake. It's crazy that was only caught now.
I'm not sure which is the preferred file for that property fields or base-fields.yml. I don't think it matters much but maybe @ChrsMark knows better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to clarify #7144 (comment) before moving forward (even if not directly related to this patch).
@gizas @constanca-m could you have a look?
Yep, it makes sense. I cherry-picked this change from the main PR to discuss it earlier with you. Thanks for checking. |
@gizas @constanca-m, let me know what you think when you have time 😇 |
Quick tests on myside also prove what you said, it is state_node (8.9.0 and k8s integration version: 1.43.1) What still confuses me is that in beats still the dataset is kubernetes.node (see https://github.com/elastic/beats/blob/main/metricbeat/module/kubernetes/state_node/_meta/testdata/ksm.v2.4.2.plain-expected.json#L4). Probably then is a test error? Another question is regarding this comment: #7144 (comment) Can we have two datastreams write to same field? What if user decides to enable only one of the two datastreams then we will miss the value is not it? |
I think found the reason here: https://github.com/elastic/beats/blob/main/metricbeat/helper/kubernetes/state_metricset.go#L47 The init function will initialise the metricset as So that is why all tests have kubernetes.node inside the state_* subfloders as the replacement takes place later. |
Did you test it? If so which versions does this affect? Which PR introduced this? Since I believe that was not intentional we have a regression and we need to spot where this change was introduced and fix+backport accordingly. |
I can see this here: elastic/beats@b3f6632#diff-e68559ac427c7449ad78073317eb417a1163be73f47fb249282343130ab43941L86 So before we were passing state_* . So @constanca-m as your PR introduced this change , could you please open a new issue to track any work that needs fixing this regression? |
Sorry for being late to the party, but I am confused about what the problem is. @gizas @ChrsMark I checked the history of the |
Hey @constanca-m ! You are right I'm not sure either how this occured. |
So I did a quick test with k -n kube-system logs -f metricbeat-f8gmd | grep dataset
...
"dataset": "kubernetes.pod"
"dataset": "kubernetes.pod",
"dataset": "kubernetes.container"
"dataset": "kubernetes.pod",
"dataset": "kubernetes.container",
"dataset": "kubernetes.pod",
"dataset": "kubernetes.container",
"dataset": "kubernetes.container"
"dataset": "kubernetes.pod",
"dataset": "kubernetes.pod",
"dataset": "kubernetes.pod",
"dataset": "kubernetes.container",
"dataset": "kubernetes.pod",
"dataset": "kubernetes.container",
"dataset": "kubernetes.container",
... So maybe sth has changed on Agent side that overrides that field? |
I tested directly on metricbeat. This PR was backported in 8.6.0, so I checked 8.5.0. I found this: So we already had some |
I checked what was the difference in the documents between And |
@constanca-m that's weird that you see both 🤔 Is your ES empty? With 8.9 I don't see any |
Yes @ChrsMark , I created a new instance just to check Kubernetes module. |
What if you delete all KSM deployments to generate documents with error messages? @ChrsMark |
I tested again with a new instance, this time for 8.6.0. I started metricbeat without KSM and then applied KSM. I saw the same behavior again: If I filter the document with If I filter documents with So the conclusion: when KSM is not deployed, then we receive documents that use |
@constanca-m I can confirm that switching on/off the KSM deployment seems to be the reason. Thanks for spotting this :)! @zmoog could you verify if the update happens with a running KSM instance or not? In any case it seems that under normal circumstances the dataset values should not be as For Agent though maybe there is a replacement because of elastic/beats#20076? |
So to summarise above is that we see for kube-state metricsets:
I have opened related issue to track effort: elastic/beats#36227 |
Testing with:
I started collecting data with KSM available and received documents with After a few minutes, I deleted the KSM replicaset. Without KSM, all documents had an |
So we have verified that for Beats the value of I any case @zmoog if you can rollback the changes in the sample_event.json files I suggest we can unblock this PR and continue the investigation at elastic/beats#36227. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from elastic/beats#36227, it looks good to me. if we can roll back those specific changes it's good to go :).
@@ -6,7 +6,7 @@ | |||
"event": { | |||
"module": "kubernetes", | |||
"duration": 12149615, | |||
"dataset": "kubernetes.persistentvolume" | |||
"dataset": "kubernetes.state_persistentvolume" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rollback those changes until we figure out why it's happening?
I don't have any immediate idea. I'd first suggest looking at an agent diagnostic to figure out exactly what configuration is being passed to metricbeat in this scenario in the |
I am not sure I can easily rollback this change, IIRC it comes from a validation error reported by elastic-package after upgrading package-spec from 1.0.0 to 2.9.0. I’ll check it again to get the exact error message. |
It might be elastic/beats#20076 that applies this override. We already have some
Let's continue the investigation at elastic/beats#36227. |
I updated the PR description, adding the failed validations from |
What does this PR do?
Upgrade the integration package-spec format_version from
1.0.0
to2.9.0
and addresses all the reported by elastic-package.Here're the adjustments needed due to the new validations introduced with the newer package-spec versions:
license
andrelease
from the manifest.ymlpod.ip
definitionssample_event.json
filesWe need this upgrade in preparation for the introduction of the routing rules that require a more recent versions of the package-spec.
Failed validations
Context
Changes
elastic-package check
elastic-package test static
Checklist
changelog.yml
file.Related issues