-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delay s3 request until read #379
Conversation
i think adding some test would be better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very solid start.
I left you some comments, please have a look.
smart_open/s3.py
Outdated
else: | ||
binary = self._body.read(size) | ||
except IncompleteReadError: | ||
# the underlying connection of streaming body was closed by peer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor English fixup.
# the underlying connection of streaming body was closed by peer | |
# The underlying connection of the streaming body was closed by the remote peer. |
smart_open/s3.py
Outdated
binary = self._body.read(size) | ||
except IncompleteReadError: | ||
# the underlying connection of streaming body was closed by peer | ||
# close the current one explicitly, prepare to open a new and try to read again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# close the current one explicitly, prepare to open a new and try to read again | |
# Close the current connection explicitly, prepare to open a new one and try to read again. |
smart_open/s3.py
Outdated
# close the current one explicitly, prepare to open a new and try to read again | ||
self._body.close() | ||
self._body_reload = True | ||
return self.read(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're setting up an infinite recursion here, in theory.
Can you think of a way to break the cycle? We could count the number of consecutive IncompleteReadErrors, and then bail out if we see more than e.g. 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a member field to count the number of retry, when counter less than the max limit, retry, other wise, terminate the action by raising an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK, but you need to make sure you're counting consecutive failures, as opposed to total failures. So, you reset the counter to zero on each successful attempt.
Looks like your tests are failing because you're dereferencing |
smart_open/s3.py
Outdated
return b'' | ||
|
||
if self._body is None: | ||
# After seek, or the first read, the self._body is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After init, the self._body is None, the connection is not built until the first read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can still read from a closed
reading stream, interesting. Does this work as designed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could read from a closed stream, even if after the close()
operation, but there is no such api offer to users.
In addition, according to the botocore reference, streaming body doesn't have the closed
attribute.
smart_open/s3.py
Outdated
binary = self._read_from_body(size) | ||
except IncompleteReadError: | ||
# The underlying connection of the streaming body was closed by the remote peer. | ||
self._body.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will delete the line.
@mpenkov , When executing This makes the unit test be failed. |
OK, some questions:
We probably want to fix this, as it deviates from the behavior of file streams in the stdlib. |
@Gapex What happened here? Why did you close the PR? |
Yes, I will open a new pull request. |
delay request seek until read
Motivation