Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoize s3 client #5377

Merged
merged 5 commits into from
Sep 4, 2024
Merged

memoize s3 client #5377

merged 5 commits into from
Sep 4, 2024

Conversation

trinity-1686a
Copy link
Contributor

Description

memoize s3 client, keeping them until they seem unused. I don't think we ever use more than a single S3StorageConfig, so we could store a single value, but a map we gc seems less confusing if we start having multiple config at once
fix #4933
fix #5236

How was this PR tested?

ran a cluster and looked at searcher logs. Didn't look at janitor logs, though there is no reason for it to behave differently

Copy link

github-actions bot commented Sep 3, 2024

On SSD:

Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3235, ref commit: aac8b49
Link

On GCS:

Average search latency is 0.983x that of the reference (lower is better).
Ref run id: 3236, ref commit: aac8b49
Link

@trinity-1686a trinity-1686a force-pushed the trinity/memoize-s3client branch from b85cfa3 to 68207c4 Compare September 3, 2024 14:46
use crate::{
DebouncedStorage, S3CompatibleObjectStorage, Storage, StorageFactory, StorageResolverError,
};

/// S3 compatible object storage resolver.
pub struct S3CompatibleObjectStorageFactory {
storage_config: S3StorageConfig,
// we cache the S3Client so we don't rebuild one every time we need to connect to S3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without this caching we didn´t create a new client for every connection as long as the Storage object was reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't reused cross request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if the best approach wouln't be to cache the resolved storages in the storage resolver. This would solve the problem once and for all for all storage types. But I think this solution is already good as it does solve the linked issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the storages are related to a Uri (the S3Client isn't), which changes for each index. It would be interesting to be able to share the debouncer though, which isn't done here. That could make concurrent requests slightly more efficient.

@trinity-1686a trinity-1686a enabled auto-merge (squash) September 4, 2024 13:04
@trinity-1686a trinity-1686a merged commit 0820c90 into main Sep 4, 2024
5 checks passed
@trinity-1686a trinity-1686a deleted the trinity/memoize-s3client branch September 4, 2024 13:18
@PSeitz PSeitz mentioned this pull request Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Janitor logs a lot. Reduce the amount of logs using S3 region defined in storage config region
2 participants