-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent cache backend reuse #11022
Closed
Closed
Prevent cache backend reuse #11022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
espadolini
force-pushed
the
espadolini/sqlite-corruption
branch
from
March 10, 2022 11:06
65c2539
to
4fc8b86
Compare
espadolini
force-pushed
the
espadolini/ephemeral-cache
branch
from
March 10, 2022 11:06
7f934ad
to
3d6abeb
Compare
espadolini
force-pushed
the
espadolini/ephemeral-cache
branch
from
March 10, 2022 11:11
3d6abeb
to
982e4ff
Compare
rosstimothy
reviewed
Mar 15, 2022
espadolini
force-pushed
the
espadolini/ephemeral-cache
branch
2 times, most recently
from
March 15, 2022 17:31
d30ba60
to
65b01f4
Compare
espadolini
force-pushed
the
espadolini/ephemeral-cache
branch
2 times, most recently
from
March 17, 2022 16:35
38f26ba
to
5b5c856
Compare
As the cache system cannot support multiple caches running over the same backend storage, we create a temporary cacheDir for each TeleportProcess instance, and we use it to store all the sqlite backends for the caches. With on-disk caches being single-use, we can run the underlying sqlite backend with the MEMORY rollback journal to improve performance and reduce filesystem activity.
As we're never reopening the same cache backend twice, this is no longer useful.
espadolini
force-pushed
the
espadolini/ephemeral-cache
branch
from
March 18, 2022 08:39
5b5c856
to
1a0333d
Compare
This was referenced Mar 21, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We're currently relying on opening the same sqlite databases as the backend for our caches (in
data_dir/cache/componentname
) because we also use the cache as some sort of persistent resource storage to be used in case auth is not available at start time; however, all non-auth services except for SSH will outright just block waiting for a connection to auth at init time, and the SSH service will get halfway through the init and then it will block waiting to see if the connection to auth is direct (meaning that the service should open a listening port) or goes through the proxy (meaning that the service should only serve connections through the reverse tunnel) - as it stands, cold starts and restarts while auth is not available are effectively only working for ssh if proxy is available and the ssh service runs in reverse tunnel mode (and even then, I'm not sure that anything would be usable while the proxy itself has no connectivity to auth).In addition, during restarts (both in-process restarts caused primarily by CA rotations and graceful process restarts caused by HUP/USR2) we end up opening multiple instances of
Cache
that use the same underlying storage (unless we're using in-memory caches). There isn't (and probably can't be) a mechanism that lets two caches gracefully use the same backend storage, especially not across the process boundary (the fanout system is entirely in memory), and as it stands, the caches will fight between each other and leave the underlying storage (and thus the actual view that the cache has over the upstream) in a potentially incoherent state.Given that the persistent cache storage is not actually used in practice (as soon as a connection to auth is established each cache will wipe itself and fill itself with fresh data) and that the particular way it's supposed to be used is actually broken in every occasion except a cold start, this PR removes the concept of cache backend reuse, by changing the default cache type to
in-memory
, setting up a temporary directory that's unique for eachTeleportProcess
instance to house thesqlite
cache backends if the cache type is explicitly configured like that, and removing the (useless, at that point) cache tombstone mechanism.Support for on-disk caches is maintained for the sake of (partial) backwards compatibility, even though memory usage for an
in-memory
cache for theNode
component wouldn't meaningfully increase over the memory used by RBAC for one or two inbound connections anyway.Persistent resource storage for startup while auth is not available should be handled in some other, more explicit (and perhaps opt-in?) way but is out of scope for this PR which just addresses the cache correctness problem that we currently have during restarts with on-disk caches.
Product decision:
in-memory
. Should we leave it assqlite
instead? Should we get rid of sqlite caches altogether?