x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting hangs #40106

chrisberkhout · 2024-07-04T14:05:10Z

Around the time of a rate limit reset, the rate limiting code may set a negative target rate and wait forever before the next request.

The rate comes out negative if current time is after the reset time returned in response headers. If a negative rate is set at a time when no request budget has accumulated, it will not recover. How previous events and timing affect the outcome can be seen in this example code.

We can avoid setting a negative rate by changing == 0 to <= 0 here.

There are some other corrections that can be made to the rate limiting logic, listed below. Beyond correctness, there are improvements that could be made for better operability, fault tolerance and user feedback.

A similar set of changes should be considered in the CEL input and related Mito code (in OktaRateLimit and in DraftRateLimit).

Fix rate limiting logic - #41583

Give feedback

Avoid setting a negative rate
Stop requests until reset rather than doing one more burst when x-rate-limit-remaining: 0
Use a separate rate limiter for each endpoint (/api/v1/users vs /api/v1/users/<userid>/groups)
Options

Improve operability - #41977

Give feedback

Add debug logging around waits x-pack/filebeat/input/entityanalytics/provider/{,internal/}okta: improve debug logging #40347
Add a timeout to the context passed into x/time/rate, so very long waits return an error immediately
Add configuration options to override normal rate limiting with a constant rate, for use as a workaround
Options

Improve fault tolerance and user feedback - #42094

Give feedback

Don't fail sync on HTTP 429
Consider any adjustments appropriate to handle 429 responses for exceeding the concurrent rate limit
Publish events progressively rather than after receiving all data for a full sync, which may take hours
Options

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-07-04T14:05:13Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

chrisberkhout · 2024-07-04T14:17:08Z

Stop requests until reset rather than doing one more burst when x-rate-limit-remaining: 0

Setting a negative rate in the distant past avoids generating new tokens for the period from now until waitUntil (if the rate was zero a concurrent caller could still consume the burst): <-- True, but it won't clear existing tokens

diff --git a/x-pack/filebeat/input/entityanalytics/provider/okta/internal/okta/okta.go b/x-pack/filebeat/input/entityanalytics/provider/okta/internal/okta/okta.go
index 58495cbcd6..b33c7be7e6 100644
--- a/x-pack/filebeat/input/entityanalytics/provider/okta/internal/okta/okta.go
+++ b/x-pack/filebeat/input/entityanalytics/provider/okta/internal/okta/okta.go
@@ -444,6 +444,7 @@ func oktaRateLimit(h http.Header, window time.Duration, limiter *rate.Limiter) e
 		// estimate will be overwritten when we make the next
 		// permissible API request.
 		next := rate.Limit(lim / window.Seconds())
+		limiter.SetLimitAt(time.Time{}, rate.Limit(-1))
 		limiter.SetLimitAt(waitUntil, next)
 		limiter.SetBurstAt(waitUntil, burst)
 		return nil

chrisberkhout · 2024-12-06T17:20:03Z

In Go's x/time/rate/rate.go, a reservation is made as follows:

// reserveN is a helper method for AllowN, ReserveN, and WaitN.
// maxFutureReserve specifies the maximum reservation wait duration allowed.
// reserveN returns Reservation, not *Reservation, to avoid allocation in AllowN and WaitN.
func (lim *Limiter) reserveN(t time.Time, n int, maxFutureReserve time.Duration) Reservation {
	lim.mu.Lock()
	defer lim.mu.Unlock()

	if lim.limit == Inf {
		return Reservation{
			ok:        true,
			lim:       lim,
			tokens:    n,
			timeToAct: t,
		}
	} else if lim.limit == 0 {
		var ok bool
		if lim.burst >= n {
			ok = true
			lim.burst -= n
		}
		return Reservation{
			ok:        ok,
			lim:       lim,
			tokens:    lim.burst,
			timeToAct: t,
		}
	}

	t, tokens := lim.advance(t)

	// Calculate the remaining number of tokens resulting from the request.
	tokens -= float64(n)

	// Calculate the wait duration
	var waitDuration time.Duration
	if tokens < 0 {
		waitDuration = lim.limit.durationFromTokens(-tokens)
	}

	// Decide result
	ok := n <= lim.burst && waitDuration <= maxFutureReserve

	// Prepare reservation
	r := Reservation{
		ok:    ok,
		lim:   lim,
		limit: lim.limit,
	}
	if ok {
		r.tokens = n
		r.timeToAct = t.Add(waitDuration)

		// Update state
		lim.last = t
		lim.tokens = tokens
		lim.lastEvent = r.timeToAct
	}

	return r
}

If t < lim.last, lim.advance(t) will return no new tokens (only existing tokens), because no time has elapsed to accumulate them. This case is specifically handled in advance(). Then waitDuration is set to the time it would take to accumulate the necessary tokens.

However, new tokens won't start accumulating until after lim.last. The correct way to calculate waitDuration would be:

// Calculate the wait duration
var waitDuration time.Duration
if tokens < 0 {
        if t < lim.last {
                waitDuration += lim.last.Sub(t) // non-accumulating duration
        }
        waitDuration += lim.limit.durationFromTokens(-tokens)
}

The reservation result (ok := n <= lim.burst && waitDuration <= maxFutureReserve) and the time to act (r.timeToAct = t.Add(waitDuration)) depend on having a correct value for waitDuration.

Also, lim.last should not be set to a smaller value, otherwise there can be double accumulation.

chrisberkhout added bug Team:Security-Service Integrations Security Service Integrations Team labels Jul 4, 2024

chrisberkhout self-assigned this Jul 4, 2024

chrisberkhout assigned chrisberkhout and efd6 and unassigned chrisberkhout Jul 4, 2024

chrisberkhout mentioned this issue Jul 16, 2024

x-pack/filebeat/i.../entitya.../.../okta: Avoid a negative request rate #40267

Merged

6 tasks

efd6 mentioned this issue Jul 16, 2024

x-pack/filebeat/input/cel: avoid a negative request rate #40270

Merged

6 tasks

This was referenced Jul 24, 2024

[8.14](backport #40270) x-pack/filebeat/input/cel: avoid a negative request rate #40343

Merged

[8.15](backport #40270) x-pack/filebeat/input/cel: avoid a negative request rate #40344

Merged

efd6 mentioned this issue Jul 25, 2024

x-pack/filebeat/input/entityanalytics/provider/{,internal/}okta: improve debug logging #40347

Merged

6 tasks

This was referenced Aug 8, 2024

[8.14](backport #40267) x-pack/filebeat/i.../entitya.../.../okta: Avoid a negative request rate #40459

Merged

[8.15](backport #40267) x-pack/filebeat/i.../entitya.../.../okta: Avoid a negative request rate #40460

Merged

chrisberkhout mentioned this issue Nov 11, 2024

x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting fixes #41583

Merged

6 tasks

chrisberkhout mentioned this issue Dec 10, 2024

.../input/entityanalytics/provider/okta: Rate limiting fix, improvements #41977

Merged

6 tasks

mergify bot mentioned this issue Dec 12, 2024

[8.x](backport #41977) .../input/entityanalytics/provider/okta: Rate limiting fix, improvements #42008

Merged

6 tasks

chrisberkhout mentioned this issue Dec 17, 2024

.../input/entityanalytics/provider/okta: Handle 429s, concurrent limits #42094

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting hangs #40106

x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting hangs #40106

chrisberkhout commented Jul 4, 2024 •

edited

Loading

Fix rate limiting logic - #41583

Improve operability - #41977

Improve fault tolerance and user feedback - #42094

elasticmachine commented Jul 4, 2024

chrisberkhout commented Jul 4, 2024 •

edited

Loading

chrisberkhout commented Dec 6, 2024

x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting hangs #40106

x-pack/filebeat/input/entityanalytics/provider/okta: Rate limiting hangs #40106

Comments

chrisberkhout commented Jul 4, 2024 • edited Loading

Fix rate limiting logic - #41583

Improve operability - #41977

Improve fault tolerance and user feedback - #42094

elasticmachine commented Jul 4, 2024

chrisberkhout commented Jul 4, 2024 • edited Loading

chrisberkhout commented Dec 6, 2024

chrisberkhout commented Jul 4, 2024 •

edited

Loading

chrisberkhout commented Jul 4, 2024 •

edited

Loading