Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Event-based entry cache] replayMissedEvents queries DB in loop for every missed ID #5342

Open
chiragk25 opened this issue Jul 30, 2024 · 6 comments
Assignees
Labels
priority/backlog Issue is approved and in the backlog

Comments

@chiragk25
Copy link
Contributor

In the event based cache implementation, replayMissedEvents queries DB in loop for every missed ID. Since this is invoked every time we update the cache i.e. every cacheReloadInterval seconds, it can cause unintended load on the DB if there are a lot of missed events (eg due to a canceled transaction)

Ideally we should query all the missed events at once (or batch with pagination) and process them instead of querying for each missed event.

  • Version: 01bedb8
  • Platform: x86_64 GNU/Linux
  • Subsystem: server
@amoore877
Copy link
Member

As a further enhancement I'd say that the batch query for missed events should also return unique foreign keys, so that the next set of follow-on queries do not make duplicate requests in the event that multiple missed events correlate to the same entity.

@stevend-uber
Copy link
Contributor

Spoke to @edwbuck and @sorindumitru over slack. Edwin is planning on tackling this to optimize for the multiple transaction loops that the query currently does by implementing:

SELECT * WHERE EVENT_ID IN (id1, id2, ..., idn)

Where we should also choose to support @sorindumitru 's suggestion:

You could select event ids from a list and go through all of them. Probably with a limit to how many events you want to pull in one go.

To ensure that the list doesn't get too long and create a long running transaction.

@MarcosDY MarcosDY added the triage/in-progress Issue triage is in progress label Aug 1, 2024
@edwbuck
Copy link
Contributor

edwbuck commented Aug 2, 2024

@amoore877 Please disregard the prior deleted post, I misunderstood which items you were referring to.

We are processing events, and it is quite possible that two events validly update one record. For example, I might alter an entry's SPIFFE ID, and then decide to also alter the entry's selectors, after verifying that the SPIFFE ID was updates. In such cases, we expect two DB events, both referring to one SPIRE Entry.

Currently, we did not think to collapse the two events into one update, because the logic to do so would add complexity, and we believe that the scenarios where it occurs seem rare. With that in mind, I'll create a new Issue to cover the scenario, as "multi-issues" create difficulty in getting them scheduled, worked, and completed (the last concern holds up the completed concerns).

For those who might be skimming, this embedded request does not affect open transaction / skipped db event id logic, thus this request is not about optimizing any polling, but is about optimizing the single batch of fetches after an event is detected, fetching the affected entry once when it has seen two or more events in one cache refresh cycle.

@edwbuck
Copy link
Contributor

edwbuck commented Aug 2, 2024

@amoore877 #5349 should cover the optimization you requested, please track it there, and let's keep this open for the issue of replaying the event id polling when a transaction is cancelled.

@edwbuck
Copy link
Contributor

edwbuck commented Sep 10, 2024

@amartinezfayo Please assign this issue to me.

@rturner3 rturner3 modified the milestones: 1.11.0, 1.11.1 Oct 3, 2024
@amartinezfayo amartinezfayo removed this from the 1.11.1 milestone Nov 5, 2024
@amartinezfayo
Copy link
Member

We will be looking at the implementation of a different algorithm for the events-based cache event tracking in #5624.
The new algorithm should solve existing issues related with the events-based cache. Once the algorithm described in #5624 is implemented, we can go back to this issue and figure out if any additional work is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

8 participants