-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Event-based entry cache] replayMissedEvents queries DB in loop for every missed ID #5342
Comments
As a further enhancement I'd say that the batch query for missed events should also return unique foreign keys, so that the next set of follow-on queries do not make duplicate requests in the event that multiple missed events correlate to the same entity. |
Spoke to @edwbuck and @sorindumitru over slack. Edwin is planning on tackling this to optimize for the multiple transaction loops that the query currently does by implementing:
Where we should also choose to support @sorindumitru 's suggestion:
To ensure that the list doesn't get too long and create a long running transaction. |
@amoore877 Please disregard the prior deleted post, I misunderstood which items you were referring to. We are processing events, and it is quite possible that two events validly update one record. For example, I might alter an entry's SPIFFE ID, and then decide to also alter the entry's selectors, after verifying that the SPIFFE ID was updates. In such cases, we expect two DB events, both referring to one SPIRE Entry. Currently, we did not think to collapse the two events into one update, because the logic to do so would add complexity, and we believe that the scenarios where it occurs seem rare. With that in mind, I'll create a new Issue to cover the scenario, as "multi-issues" create difficulty in getting them scheduled, worked, and completed (the last concern holds up the completed concerns). For those who might be skimming, this embedded request does not affect open transaction / skipped db event id logic, thus this request is not about optimizing any polling, but is about optimizing the single batch of fetches after an event is detected, fetching the affected entry once when it has seen two or more events in one cache refresh cycle. |
@amoore877 #5349 should cover the optimization you requested, please track it there, and let's keep this open for the issue of replaying the event id polling when a transaction is cancelled. |
@amartinezfayo Please assign this issue to me. |
We will be looking at the implementation of a different algorithm for the events-based cache event tracking in #5624. |
In the event based cache implementation, replayMissedEvents queries DB in loop for every missed ID. Since this is invoked every time we update the cache i.e. every
cacheReloadInterval
seconds, it can cause unintended load on the DB if there are a lot of missed events (eg due to a canceled transaction)Ideally we should query all the missed events at once (or batch with pagination) and process them instead of querying for each missed event.
The text was updated successfully, but these errors were encountered: