-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Allocation Service Metrics #414
Conversation
…undernetes into allocation_metrics
Great work! Can you add a small description on the PR? Thanks! |
Hi @dgkanatsios, I've added a description of the PR. Also a heads up, we are currently still adding some code changes so currently the PR is in draft status. |
hey @abbasahmed, let me know how I can help to land this PR! Appreciate all the hard work, thanks! |
@abbasahmed @ghov kind ping, we want to release 0.6 next week and it would be great to include your changes! |
@dgkanatsios will resolve the conflicts and publish the PR today. Sorry for the hold up on this PR! |
thanks @abbasahmed, let me know if you need any help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only 1 correction, other than that LGTM!
Problem: We currently do not have a lot of metrics around Allocations API service in Thundernetes. Reference: Issue #384
Solution:
This PR adds in 6 metrics:
The time taken for a successful allocation to complete.
The number of retries taken for an allocation to complete.
The number of 429 (too many requests) errors during allocation
The number of 404 (not found) errors during allocation
The number of 500 (internal) errors during allocation
The number of 409 (request conflict) errors during allocation
These new metrics can help us monitor the allocation service in ways such as monitoring the performance of allocation service in terms of speed and reliability. The metrics allow us to quickly monitor the errors of allocation service helping us to make quicker inferences/decisions.
Along with the metrics, we have also added a couple of panels in the Grafana dashboard to visualize these events.