Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delay s3 request until read #379

Closed
wants to merge 0 commits into from
Closed

delay s3 request until read #379

wants to merge 0 commits into from

Conversation

Gapex
Copy link
Contributor

@Gapex Gapex commented Nov 7, 2019

@vuryleo
Copy link

vuryleo commented Nov 8, 2019

i think adding some test would be better

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very solid start.

I left you some comments, please have a look.

smart_open/s3.py Outdated
else:
binary = self._body.read(size)
except IncompleteReadError:
# the underlying connection of streaming body was closed by peer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor English fixup.

Suggested change
# the underlying connection of streaming body was closed by peer
# The underlying connection of the streaming body was closed by the remote peer.

smart_open/s3.py Outdated
binary = self._body.read(size)
except IncompleteReadError:
# the underlying connection of streaming body was closed by peer
# close the current one explicitly, prepare to open a new and try to read again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# close the current one explicitly, prepare to open a new and try to read again
# Close the current connection explicitly, prepare to open a new one and try to read again.

smart_open/s3.py Outdated
# close the current one explicitly, prepare to open a new and try to read again
self._body.close()
self._body_reload = True
return self.read(size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're setting up an infinite recursion here, in theory.

Can you think of a way to break the cycle? We could count the number of consecutive IncompleteReadErrors, and then bail out if we see more than e.g. 10.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a member field to count the number of retry, when counter less than the max limit, retry, other wise, terminate the action by raising an error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK, but you need to make sure you're counting consecutive failures, as opposed to total failures. So, you reset the counter to zero on each successful attempt.

@mpenkov
Copy link
Collaborator

mpenkov commented Nov 11, 2019

Looks like your tests are failing because you're dereferencing None, please have a look at the buildbot output (https://travis-ci.org/RaRe-Technologies/smart_open/jobs/610158039?utm_medium=notification&utm_source=github_status).

smart_open/s3.py Outdated
return b''

if self._body is None:
# After seek, or the first read, the self._body is None.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After init, the self._body is None, the connection is not built until the first read.

Copy link

@vuryleo vuryleo Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can still read from a closed reading stream, interesting. Does this work as designed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could read from a closed stream, even if after the close() operation, but there is no such api offer to users.
In addition, according to the botocore reference, streaming body doesn't have the closed attribute.

smart_open/s3.py Outdated
binary = self._read_from_body(size)
except IncompleteReadError:
# The underlying connection of the streaming body was closed by the remote peer.
self._body.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delete the line.

@Gapex
Copy link
Contributor Author

Gapex commented Nov 12, 2019

@mpenkov , When executing python setup.py test, read from a closed streaming body always returns 0 bytes instead of raising an IncompleteReadError, thus the action of reopening an new streaming body had not been triggered. Thus, the SeekableBufferedInputBase thinks the raw reader reaches it's EOF, and just returns the rest data in the buffer.

This makes the unit test be failed.

@mpenkov
Copy link
Collaborator

mpenkov commented Nov 13, 2019

OK, some questions:

  • Who returns 0 bytes when reading from a closed stream?
  • What depends on the existing behavior (return 0 bytes)?
  • The correct behavior is to raise an exception, right?

We probably want to fix this, as it deviates from the behavior of file streams in the stdlib.

@Gapex Gapex closed this Nov 13, 2019
@Gapex Gapex deleted the delay-_get_-2-read branch November 13, 2019 04:35
@mpenkov
Copy link
Collaborator

mpenkov commented Nov 13, 2019

@Gapex What happened here? Why did you close the PR?

@Gapex
Copy link
Contributor Author

Gapex commented Nov 13, 2019

Yes, I will open a new pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

s3 open seek operation try read rest of file into buffer, which makes following read has timeout risk
3 participants