-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggressive S3 metadata caching #556
Conversation
I will release API server based on this branch as I need good perf for tomorrow's demo. |
@@ -82,11 +82,25 @@ impl DatasetRepositoryLocalFs { | |||
)) | |||
} | |||
|
|||
fn build_dataset(layout: DatasetLayout, event_bus: Arc<EventBus>) -> Arc<dyn Dataset> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that I have severed ties between DatasetRepository
s and DatasetFactory
.
After making a bunch of changes to make DatasetFactory
super configurable, I realized that DatasetFactory
should be used only when talking to remote repositories, and DatasetRepository
implementations should have independent low-level control over how they create dataset instances. Attempting to combine the two just results in configuration and dependency nightmare.
@@ -415,3 +564,27 @@ impl DatasetRepository for DatasetRepositoryS3 { | |||
} | |||
|
|||
///////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
pub struct S3RegistryCache { | |||
state: Arc<Mutex<State>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is far from elegant but I think we will factor it out into a proper registry component in future.
I'd rather save this refactoring until #342.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note to #342
@@ -415,3 +564,27 @@ impl DatasetRepository for DatasetRepositoryS3 { | |||
} | |||
|
|||
///////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
pub struct S3RegistryCache { | |||
state: Arc<Mutex<State>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note to #342
Description
Related to: kamu-data/kamu-node#61
Adds two layers of caching:
ObjectRepositoryCachingLocalFS
- a read-through cache that stores objects in a local directory, which I use for metadata blocksS3RegistryCache
- a singleton in-memory cache of datasets contained in S3 bucketTogether they turned some of my sample queries from taking 4.5s to taking just 9ms.
Checklist before requesting a review