Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: SPM Provide option to fill null gaps with zero in monitor tab #4229

Closed
Ashmita152 opened this issue Feb 9, 2023 · 3 comments · Fixed by #4985
Closed

[Feature]: SPM Provide option to fill null gaps with zero in monitor tab #4229

Ashmita152 opened this issue Feb 9, 2023 · 3 comments · Fixed by #4985

Comments

@Ashmita152
Copy link
Contributor

Ashmita152 commented Feb 9, 2023

Requirement

Currently, if there is no error in a traces(ie. no error=true tag in spans), otel-collector doesn't send any error metrics to prometheus (i.e. it doesn't create any metric with prometheus label as status_code = "STATUS_CODE_ERROR"). This generates empty dashboard on Jaeger monitor tab.

Problem

It creates empty dashboards on monitor tab which can create confusion whether:

  • data is not being ingested through the pipeline
  • data has no error

Proposal

In order to prevent confusion, it will be better if instead of showing empty dashboard, we show dashboard containing data value as zero.

cc @albertteoh

@albertteoh albertteoh changed the title [Feature]: Provide option to fill null gaps with zero in monitor tab [Feature]: SPM Provide option to fill null gaps with zero in monitor tab Feb 9, 2023
@yurishkuro
Copy link
Member

If I understand the situation correctly, this results in no data being written to Prometheus. In this case, how can Jaeger know if it's an ingestion problem or no-errors traffic? Filling gaps with zeros by Jaeger does not answer that question, it just masks the problem, making it look like there is data with zeros.

@albertteoh
Copy link
Contributor

If I understand the situation correctly, this results in no data being written to Prometheus. In this case, how can Jaeger know if it's an ingestion problem or no-errors traffic? Filling gaps with zeros by Jaeger does not answer that question, it just masks the problem, making it look like there is data with zeros.

Error metrics are subset of all calls/requests using the query:

calls_total{service_name =~ "...", status_code = "STATUS_CODE_ERROR"}

If there is data resulting from the request count query:

calls_total{service_name =~ "..."}

Should that give us enough information to know that there were no ingestion issues? Similarly, the absence of request count data means we can't assume that there is no-errors traffic, in which case, we should display "no data" instead of filling in zeroes.

What do you think?

@yurishkuro
Copy link
Member

strictly speaking, error metric is a distinct time series stored independently, but that's too in the weeds, so I agree if calls_total has data it's reasonably safe to assume no ingestion issues.

albertteoh added a commit that referenced this issue Dec 4, 2023
## Which problem is this PR solving?
- Resolves #4229

## Description of the changes
- Ensures SPM displays a 0% error rate if there are no error metrics
_and_ call rates exist.
- If call rates don't exist, the error rate will also be null.
- This ensures SPM is able to differentiate "no data" from "no errors".

## How was this change tested?
- Add unit tests to cover happy and error cases.
- Tested locally to confirm "No data" is shown in the Error graph when
there is no data, then when call rates are available, a 0% rate is
displayed.

<img width="1710" alt="Screenshot 2023-12-03 at 8 01 36 pm"
src="https://github.com/jaegertracing/jaeger/assets/26584478/e38bdefc-2e2e-4a9c-a873-2ad1857f2098">

<img width="1696" alt="Screenshot 2023-12-03 at 8 00 45 pm"
src="https://github.com/jaegertracing/jaeger/assets/26584478/3e10d5fb-03e4-4ff3-b260-0dd8045eafbe">

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Albert Teoh <[email protected]>
Co-authored-by: Albert Teoh <[email protected]>
RipulHandoo pushed a commit to RipulHandoo/jaeger that referenced this issue Dec 4, 2023
## Which problem is this PR solving?
- Resolves jaegertracing#4229

## Description of the changes
- Ensures SPM displays a 0% error rate if there are no error metrics
_and_ call rates exist.
- If call rates don't exist, the error rate will also be null.
- This ensures SPM is able to differentiate "no data" from "no errors".

## How was this change tested?
- Add unit tests to cover happy and error cases.
- Tested locally to confirm "No data" is shown in the Error graph when
there is no data, then when call rates are available, a 0% rate is
displayed.

<img width="1710" alt="Screenshot 2023-12-03 at 8 01 36 pm"
src="https://github.com/jaegertracing/jaeger/assets/26584478/e38bdefc-2e2e-4a9c-a873-2ad1857f2098">

<img width="1696" alt="Screenshot 2023-12-03 at 8 00 45 pm"
src="https://github.com/jaegertracing/jaeger/assets/26584478/3e10d5fb-03e4-4ff3-b260-0dd8045eafbe">

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Albert Teoh <[email protected]>
Co-authored-by: Albert Teoh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants