Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #3003

Closed
dotnet-eng-status bot opened this issue Jun 4, 2024 · 62 comments
Closed
Assignees
Labels
Active Alert Issues from Grafana alerts that are now active Critical Grafana Alert Issues opened by Grafana Ops - First Responder Production Tied to the Production environment (as opposed to Staging)

Comments

@dotnet-eng-status
Copy link

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 36

Metric Graph

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-d2dd705a6c724ed68fcf6955561c06dd

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active Critical Grafana Alert Issues opened by Grafana Ops - First Responder Production Tied to the Production environment (as opposed to Staging) Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Jun 4, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 5, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 23

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Jun 5, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 5, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 22

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Jun 5, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 5, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 23

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Jun 5, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot removed the Inactive Alert Issues from Grafana alerts that are now "OK" label Jun 5, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 22

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active and removed Active Alert Issues from Grafana alerts that are now active labels Jun 5, 2024
@dotnet-eng-status dotnet-eng-status bot added the Active Alert Issues from Grafana alerts that are now active label Jun 7, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 21

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot removed the Active Alert Issues from Grafana alerts that are now active label Jun 7, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 7, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 22

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot removed the Active Alert Issues from Grafana alerts that are now active label Jun 7, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 7, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 26

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Jun 7, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 7, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 24

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot removed the Active Alert Issues from Grafana alerts that are now active label Jun 7, 2024
Copy link
Author

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Inactive Alert Issues from Grafana alerts that are now "OK" labels Jun 7, 2024
Copy link
Author

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:

union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc
  • failuresCount 24

Metric Graph

Go to rule

@dotnet-eng-status dotnet-eng-status bot added the Active Alert Issues from Grafana alerts that are now active label Jun 7, 2024
@missymessa missymessa self-assigned this Jun 7, 2024
@missymessa
Copy link
Member

Errors are related to the overflow issue with Octokit's issue comment IDs.

(Octokit's PR to fix this issue: octokit/octokit.net#2928)

@riarenas
Copy link
Member

riarenas commented Jun 7, 2024

If this is still ongoing, the alert will just reopen soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Active Alert Issues from Grafana alerts that are now active Critical Grafana Alert Issues opened by Grafana Ops - First Responder Production Tied to the Production environment (as opposed to Staging)
Projects
None yet
Development

No branches or pull requests

2 participants