-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reason
tag to kubernetes_state.job.failed
#25103
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
1be85ef
add reason tags to KSM and kubelet metrics
keisku 13f3195
add releasenote
keisku b014cde
Revert "add releasenote"
keisku 6a7ae4a
Revert "add reason tags to KSM and kubelet metrics"
keisku 28f358d
Merge remote-tracking branch 'origin/main' into keisku/support1663984
keisku 313399c
add reason tag to kubernetes_state.job.failed
keisku 2805b99
make validJobReason case-insensive
keisku a56a5ed
remove empty reason tag as well
keisku 1d21ebf
rename to validJobFailureReason
keisku File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -421,10 +421,26 @@ func trimJobTag(tag string) (string, bool) { | |
return trimmed, tag != trimmed | ||
} | ||
|
||
var jobFailureReasons = map[string]struct{}{ | ||
"backofflimitexceeded": {}, | ||
"deadlineexceeded": {}, | ||
} | ||
|
||
func validJobFailureReason(reason string) bool { | ||
_, ok := jobFailureReasons[strings.ToLower(reason)] | ||
return ok | ||
} | ||
|
||
// validateJob detects active jobs and adds the `kube_cronjob` tag | ||
func validateJob(val float64, tags []string) ([]string, bool) { | ||
kubeCronjob := "" | ||
for _, tag := range tags { | ||
for i, tag := range tags { | ||
if strings.HasPrefix(tag, "reason:") { | ||
if v := strings.TrimPrefix(tag, "reason:"); !validJobFailureReason(v) { | ||
tags = append(tags[:i], tags[i+1:]...) | ||
continue | ||
} | ||
} | ||
split := strings.Split(tag, ":") | ||
if len(split) == 2 && split[0] == "kube_job" || split[0] == "job" || split[0] == "job_name" { | ||
// Trim the timestamp suffix to avoid high cardinality | ||
|
@@ -482,10 +498,6 @@ func jobStatusFailedTransformer(s sender.Sender, name string, metric ksmstore.DD | |
return | ||
} | ||
|
||
if reasonTagIndex != -1 { | ||
tags = append(tags[:reasonTagIndex], tags[reasonTagIndex+1:]...) | ||
} | ||
|
||
Comment on lines
-485
to
-488
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic is now in |
||
jobMetric(s, metric, ksmMetricPrefix+"job.failed", hostname, tags) | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
releasenotes-dca/notes/add-reason-tags-to-kube_job_status_failed-755bbfbb67d7e4c6.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Each section from every release note are combined when the | ||
# CHANGELOG.rst is rendered. So the text needs to be worded so that | ||
# it does not depend on any information only available in another | ||
# section. This may mean repeating some details, but each section | ||
# must be readable independently of the other. | ||
# | ||
# Each section note must be formatted as reStructuredText. | ||
--- | ||
enhancements: | ||
- | | ||
Add ``reason:backofflimitexceeded,deadlineexceeded`` to the | ||
``kubernetes_state.job.failed`` metric to help users understand why a job failed. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we update
kube-state-metrics/v2
to at leastv2.9.0
that contains the bug fix, kubernetes/kube-state-metrics#2046, we cannot addreason:deadlineexceeded
tokubernetes_state.job.failed
.kube-state-metrics/v2
update should be in a different PR as the context below.Why do currently we use k8s.io/kube-state-metrics/v2 v2.8.2?
This is because we have to align with interface change before bumping up the
kube-state-metrics/v2
.Current implementations in
https://github.com/DataDog/datadog-agent/tree/2feb83da045935df7986e56504bd297922a32ebb/pkg/collector/corechecks/cluster/ksm/customresources don't follow
type RegistryFactory interface
updated by kubernetes/kube-state-metrics#1851.