PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

dkurepa · 2025-02-03T14:40:10Z

Context

Starting on December 20th, we started getting RateLimiting and 403 (Forbidden) exceptions from GitHub.
The running theory is that the 403s are also because of Rate Limiting, GitHub API documentation says it is a possibility.

The main fact supporting this theory is that looking at a single replica at a point in time, we can clearly see that in some cases the same API call fails with a 403 and sometimes it succeeds, even tho they're made with the same cached token.

Goals

Stop these exceptions from happening. A good start might be to add some logging so we can figure out how many calls we actually make per hour. Currently the only related metric is the number of token cache hits we have per hour (around 4.3k), but that number gives a lower limit, there's a number of places where we reuse the same HttpClient without going to the token cache again.

Also we might have to start respecting the backoff timer we get sent in these Rate Limiting responses

The text was updated successfully, but these errors were encountered:

#4410

#4410 Respects the rate limiting headers that GitHub returns when the limit is hit. The PR also improves how we log failed attempts.

premun · 2025-02-07T08:46:35Z

We have figured out why these happen and logged an issue with next immediate steps. We will go from there so closing this.

dkurepa self-assigned this Feb 3, 2025

dkurepa mentioned this issue Feb 3, 2025

Enable Http Logging in PCS #4411

Merged

dkurepa added a commit that referenced this issue Feb 3, 2025

Enable Http Logging in PCS (#4411)

e629d2e

 #4410

dkurepa assigned premun Feb 4, 2025

dkurepa added the Ops - First Responder label Feb 4, 2025

premun mentioned this issue Feb 5, 2025

Handle GitHub rate limiting gracefully #4421

Merged

premun added a commit that referenced this issue Feb 6, 2025

Handle GitHub rate limiting gracefully (#4421)

c6687e9

#4410 Respects the rate limiting headers that GitHub returns when the limit is hit. The PR also improves how we log failed attempts.

premun closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

dkurepa commented Feb 3, 2025 •

edited

Loading

premun commented Feb 7, 2025

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

PCS has a lot of 403 and Rate Limit Exceeded exceptions #4410

Comments

dkurepa commented Feb 3, 2025 • edited Loading

Context

Goals

premun commented Feb 7, 2025

dkurepa commented Feb 3, 2025 •

edited

Loading