-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates readline logic for azure to match s3 #826
Conversation
Loosely copies the readline buffer management from s3 to azure, improving performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mpenkov can you review and merge? |
@quantumfusion can you add a smart_open/smart_open/tests/test_azure.py Line 364 in a55794d
|
Did not need to add test, already exists further down in the file. This reverts commit 35a8c5e.
@ddelange I added it briefly, but missed doing the linting which caught that |
ah my bad! @mpenkov can you run CI? |
@mpenkov can you include this one in the next release? |
@mpenkov kind reminder:) |
Thank you @quantumfusion and @ddelange . Apologies for the delay, been a bit busy with real life recently :) |
Title
Updates readline buffer management for azure, improving performance.
Motivation
Observed slow read times when iterating using
readlines()
from a 100MB csv file in Azure Blob Storage. Simple copies and reads were fine, slowness only observed with some code that was explicitly callingreadlines()
.Borrows from the logic found in the S3 implementation, which was updated to be more efficient and does not perform as many scans on the local chunk of the remote file.
Tests
Confirmed clean run with
pytest -k test_azure
Checklist
Before you create the PR, please make sure you have:
Workflow
Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.