-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore prometheus parsing errors #37383
Conversation
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services) |
This PR is supposed to replace entirely #37357 which was created by mistake from the wrong git branch |
❕ Build Aborted
Expand to view the summary
Build stats
Steps errors
Expand to view the steps failures
|
💔 Build Failed
Expand to view the summary
Build stats
Pipeline error
❕ Flaky test reportNo test was executed to be analysed. 🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
@@ -496,8 +498,15 @@ func ParseMetricFamilies(b []byte, contentType string, ts time.Time) ([]*MetricF | |||
) | |||
if et, err = parser.Next(); err != nil { | |||
// TODO: log here | |||
// if errors.Is(err, io.EOF) {} | |||
break | |||
if errors.Is(err, io.EOF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add here again the comment from Tania https://github.com/elastic/beats/pull/37357/files#r1420471049. I agree is important because now we have no clue what is happening in a similar case in the future.
Import "github.com/elastic/elastic-agent-libs/logp"
Possible suggestion:
if et, err = parser.Next(); err != nil {
if errors.Is(err, io.EOF) {
log.Errorf("EOF is reached: %v ", err)
break
}
log.Debugf("Parse Prometheus Family error : %v ", err)
break
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion.
I am a bit skeptical about using a global logger and I thought we had to pass prometheus logger to the ParseMetricFamilies instead.
I tested both solutions. They both work, I think it is about styling. The current changes use Kubernetes logger since that's the only other logger globally available.
if we wanted to use Prometheus logger there a few more changes required in both the code and the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry got confused with what is global and prometheus logger.
Currently I see that ParseMetricFamilies
is calle from GetStateMetricsFamilies which is used from state metricsets here :https://github.com/elastic/beats/blob/main/metricbeat/helper/kubernetes/state_metricset.go#L115
So if we pass the error back to the state_metricset.go in abolve logger this would be able to log it. Is not it? Just to make sure that we return it in ParseMetricFamilies
Do you agree Giuseppe? Then in this case we dont want to add a new logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I should have explain it better.
Unfortunately we can't return the error from the inner loop since there is some code that has to be executed inside the same function after that loop.
So far the logic is:
- EOF ==> not an error -> break from loop
- "data does not end with # EOF" ==> similar to EOF. it has reach the end but no EOF has been found. This "error" has surfaced today since a lot of tests were failing because of timeout when I was using "continue" instead of "break"
- "invalid metric type" ==> need to ignore the error and keep processing the remaining metrics. This is the reason for this PR
- any other error ==> we are breaking the loop and logging the error at debug level
Because we can't return from the loop, the "errors" needs to be logged from inside textparse.go/ParseMetricFamilies() and thus either using a global logger or pass Prometheus logger from the caller of that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gsantoro ! Now I think I got it. We need to return "invalid metric type" message back.
Because we already have prometheus logger, I guess is preferrable to pass it from caller function and not create another logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding logging!
Can you please also add a test for this use case? That info
is skipped but rest of metrics are parsed correctly for ContentTypeTextFormat
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
Test errors
Expand to view the tests failures> Show only the first 10 test failures
|
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errors
Expand to view the tests failures> Show only the first 10 test failures
|
Sample log entry formatted for readability {
"log.level": "debug",
"@timestamp": "2023-12-12T10:50:03.576Z",
"log.logger": "kubernetes",
"log.origin":
{
"function": "github.com/elastic/beats/v7/metricbeat/helper/prometheus.ParseMetricFamilies",
"file.name": "prometheus/textparse.go",
"file.line": 506,
},
"message": 'Ignored Parsing error from Prometheus : invalid metric type "info" ',
"service.name": "metricbeat",
"ecs.version": "1.6.0",
} |
💔 Build Failed
Expand to view the summary
Build stats
Pipeline error
❕ Flaky test reportNo test was executed to be analysed. 🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
Reusing the same prometheus logger I get this debug log
to notice that the log.logger is different now. it returns the base.Logger() as the prometheus logger. |
❕ Build Aborted
Expand to view the summary
Build stats
Steps errors
Expand to view the steps failures
|
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
Steps errors
Expand to view the steps failures
|
* ignore prometheus parsing errors (cherry picked from commit 6f13899)
* ignore prometheus parsing errors (cherry picked from commit 6f13899) Co-authored-by: Giuseppe Santoro <[email protected]>
* ignore prometheus parsing errors
Proposed commit message
Ignore parser errors from unsupported metrics types on Prometheus client and continue parsing until EOF is reached.
For example Prometheus client with default
text/plain
format doesn't supportinfo
metrics and it will generate an error at https://github.com/prometheus/prometheus/blob/965e603fa792bca0900ac76eb45ae84c81af1cdf/model/textparse/promparse.go#L319.Currently that error breaks the parsing and it ignores any remaining metrics in the list. With this change instead we will correctly parse all the supported metrics types and ignore the unsupported ones.
Metrics of type
info
are correctly supported by OpenMetrics format. KSM supports OpenMetrics text format as well but in Beats the helper is hardcoded to use only plain text format atbeats/metricbeat/helper/prometheus/prometheus.go
Line 34 in 8b18c36
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
See related issue #37198 on how to test these changes
Related issues
Use cases
Screenshots
Logs