Skip to content
This repository has been archived by the owner on Jul 19, 2018. It is now read-only.

added: Option for number of retries for HcfMiddleware #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

starrify
Copy link
Member

Sometimes it's observed that requests in the upstream python-hubstorage package may fail, e.g:

    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1169, in run
        self.mainLoop()
      File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib/pymodules/python2.7/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
    --- <exception caught here> ---
      File "/usr/lib/pymodules/python2.7/scrapy/core/engine.py", line 106, in _next_request
        request = next(slot.start_requests)
      File "/tmp/eggs-3lbV9T/__main__.egg/radius/middlewares/hcf.py", line 147, in process_start_requests

      File "/tmp/eggs-3lbV9T/__main__.egg/radius/middlewares/hcf.py", line 220, in _get_new_requests

      File "/usr/lib/python2.7/dist-packages/hubstorage/frontier.py", line 60, in read
        return self.apiget((frontier, 's', slot, 'q'), params=params)
      File "/usr/lib/python2.7/dist-packages/hubstorage/resourcetype.py", line 40, in apiget
        return self.apirequest(_path, method='GET', **kwargs)
      File "/usr/lib/python2.7/dist-packages/hubstorage/resourcetype.py", line 34, in apirequest
        return jldecode(self._iter_lines(_path, **kwargs))
      File "/usr/lib/python2.7/dist-packages/hubstorage/resourcetype.py", line 27, in _iter_lines
        r = self.client.session.request(**kwargs)
      File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 461, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 610, in send
        r.content
      File "/usr/lib/python2.7/dist-packages/requests/models.py", line 730, in content
        self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
      File "/usr/lib/python2.7/dist-packages/requests/models.py", line 662, in generate
        raise ConnectionError(e)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='storage.scrapinghub.com', port=80): Read timed out.

Simply retry that job and things may be fine.

The HubstorageClient class is configurable with an optional argument max_retries. Thus I think it may be a good idea to add a new option HS_MAX_RETRIES

@redapple
Copy link
Contributor

redapple commented Nov 7, 2016

@starrify , could you re-open this PR against https://github.com/scrapy-plugins/scrapy-hcf ?
Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants