Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libraries/doltcore/remotestorage: Improve connection reuse when fetching chunks from remote storage. #8522

Merged
merged 2 commits into from
Nov 2, 2024

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Nov 2, 2024

Improves performance of fetches from DoltHub, sql-server, and doltlab.

Improves some situations where routers and network access points do not respond well to clients which open a great number of connections to the same TCP endpoint, many of which live for a very short period of time.

http.DefaultClient comes with a default MaxIdleConnsPerHost of 2. Fetching with a high number of concurrent downloads results in many connections not being reused efficiently because they cannot get back into the idle pool because it is so small. Here we increase the MaxIdleConnsPerHost to be large enough so that our active connections during a fetch will fit within it.

Note: S3 limits a single HTTP connection to 100 requests, after which it is closed server-side. So currently dolt can still require a large number of new connections over the course of a pull, and the rate at which they are opened will depend on a number of factors including available throughput, round trip time resolving storage locations, etc. But this change will always be a big improvement over the old behavior.

…ing chunks from remote storage.

Improves performance of fetches from DoltHub, sql-server, and doltlab.

Improves some situations where routers and network access points do not respond
well to clients which open a great number of connections to the same TCP
endpoint, many of which live for a very short period of time.

http.DefaultClient comes with a default MaxIdleConnsPerHost of 2. Fetching with
a high number of concurrent downloads results in many connections not being
reused efficiently because they cannot get back into the idle pool because it
is so small. Here we increase the MaxIdleConnsPerHost to be large enough so
that our active connections during a fetch will fit within it.

Note: S3 limits a single HTTP connection to 100 requests, after which it is
closed server-side. So currently dolt can still require a large number of new
connections over the course of a pull, and the rate at which they are opened
will depend on a number of factors including available throughput, round trip
time resolving storage locations, etc. But this change will always be a big
improvement over the old behavior.
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
8d174fa ok 5937457
version total_tests
8d174fa 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c61adf4 ok 5937457
version total_tests
c61adf4 5937457
correctness_percentage
100.0

Copy link
Contributor

@bheni bheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@reltuk reltuk merged commit 92e628c into main Nov 2, 2024
21 checks passed
@tbantle22 tbantle22 deleted the aaron/grpc-remotestorage-MaxIdleConnsPerHost branch November 21, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants