Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Allocation Service Metrics #414

Merged
merged 14 commits into from
Oct 12, 2022
Merged

Conversation

abbasahmed
Copy link
Contributor

@abbasahmed abbasahmed commented Sep 24, 2022

Problem: We currently do not have a lot of metrics around Allocations API service in Thundernetes. Reference: Issue #384

Solution:

This PR adds in 6 metrics:

  • AllocationsTimeTakenDuration
    The time taken for a successful allocation to complete.
  • AllocationsRetriesCounter
    The number of retries taken for an allocation to complete.
  • Allocations429ErrorsCounter
    The number of 429 (too many requests) errors during allocation
  • Allocations404ErrorsCounter
    The number of 404 (not found) errors during allocation
  • Allocations500ErrorsCounter
    The number of 500 (internal) errors during allocation
  • Allocation409ErrorsCounter
    The number of 409 (request conflict) errors during allocation

These new metrics can help us monitor the allocation service in ways such as monitoring the performance of allocation service in terms of speed and reliability. The metrics allow us to quickly monitor the errors of allocation service helping us to make quicker inferences/decisions.

Along with the metrics, we have also added a couple of panels in the Grafana dashboard to visualize these events.

image

image

@ghost
Copy link

ghost commented Sep 24, 2022

CLA assistant check
All CLA requirements met.

@dgkanatsios
Copy link
Collaborator

Great work! Can you add a small description on the PR? Thanks!

@abbasahmed
Copy link
Contributor Author

Hi @dgkanatsios, I've added a description of the PR. Also a heads up, we are currently still adding some code changes so currently the PR is in draft status.

@abbasahmed abbasahmed closed this Sep 24, 2022
@abbasahmed abbasahmed reopened this Sep 24, 2022
@dgkanatsios
Copy link
Collaborator

hey @abbasahmed, let me know how I can help to land this PR! Appreciate all the hard work, thanks!

@dgkanatsios
Copy link
Collaborator

@abbasahmed @ghov kind ping, we want to release 0.6 next week and it would be great to include your changes!

@abbasahmed
Copy link
Contributor Author

@dgkanatsios will resolve the conflicts and publish the PR today. Sorry for the hold up on this PR!

@dgkanatsios
Copy link
Collaborator

thanks @abbasahmed, let me know if you need any help!

Copy link
Collaborator

@dgkanatsios dgkanatsios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 1 correction, other than that LGTM!

pkg/operator/controllers/metrics.go Outdated Show resolved Hide resolved
@dgkanatsios dgkanatsios marked this pull request as ready for review October 12, 2022 17:54
@dgkanatsios dgkanatsios merged commit c6eed87 into PlayFab:main Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants