Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

Closed
dkurepa opened this issue Feb 3, 2025 · 1 comment
Closed

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

dkurepa opened this issue Feb 3, 2025 · 1 comment

Comments

@dkurepa
Copy link
Member

dkurepa commented Feb 3, 2025

Context

Starting on December 20th, we started getting RateLimiting and 403 (Forbidden) exceptions from GitHub.
The running theory is that the 403s are also because of Rate Limiting, GitHub API documentation says it is a possibility.

The main fact supporting this theory is that looking at a single replica at a point in time, we can clearly see that in some cases the same API call fails with a 403 and sometimes it succeeds, even tho they're made with the same cached token.

Goals

Stop these exceptions from happening. A good start might be to add some logging so we can figure out how many calls we actually make per hour. Currently the only related metric is the number of token cache hits we have per hour (around 4.3k), but that number gives a lower limit, there's a number of places where we reuse the same HttpClient without going to the token cache again.

Also we might have to start respecting the backoff timer we get sent in these Rate Limiting responses

@dkurepa dkurepa self-assigned this Feb 3, 2025
dkurepa added a commit that referenced this issue Feb 3, 2025
<!-- Link the GitHub or AzDO issue this pull request is associated with.
Please copy and paste the full URL rather than using the
dotnet/arcade-services# syntax -->
#4410
premun added a commit that referenced this issue Feb 6, 2025
#4410

Respects the rate limiting headers that GitHub returns when the limit is
hit.
The PR also improves how we log failed attempts.
@premun
Copy link
Member

premun commented Feb 7, 2025

We have figured out why these happen and logged an issue with next immediate steps. We will go from there so closing this.

@premun premun closed this as completed Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants