You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importloggingfromxbrl.cacheimportHttpCachefromxbrl.instanceimportXbrlParser, XbrlInstance# just to see which files are downloadedlogging.basicConfig(level=logging.INFO)
cache: HttpCache=HttpCache('./cache')
cache.set_headers({'From': '[email protected]', 'User-Agent': 'py-xbrl/2.1.0'})
parser=XbrlParser(cache)
schema_url="https://www.sec.gov/Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm"inst: XbrlInstance=parser.parse_instance(schema_url)
When I used the headers in that format (using my own email) this was the result:
>>> inst: XbrlInstance = parser.parse_instance(schema_url)
urllib3.exceptions.ResponseError: too many 403 error responses
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
return self.urlopen(
^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
return self.urlopen(
^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
return self.urlopen(
^^^^^^^^^^^^^
[Previous line repeated 2 more times]
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 938, in urlopen
retries = retries.increment(method, url, response=response, _pool=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.sec.gov', port=443): Max retries exceeded with url: /Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm (Caused by ResponseError('too many 403 error responses'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\instance.py", line 740, in parse_instance
return parse_ixbrl_url(uri, self.cache) if is_url(uri) else parse_ixbrl(uri, self.cache, instance_url, encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\instance.py", line 425, in parse_ixbrl_url
instance_path: str = cache.cache_file(instance_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\cache.py", line 81, in cache_file
query_response = self.connection_manager.download(file_url, headers=self.headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\helper\connection_manager.py", line 51, in download
response = self._session.get(url, headers=headers, allow_redirects=True, verify=self.verify_https)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\adapters.py", line 510, in send
raise RetryError(e, request=request)
requests.exceptions.RetryError: HTTPSConnectionPool(host='www.sec.gov', port=443): Max retries exceeded with url: /Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm (Caused by ResponseError('too many 403 error responses'))
Example from documentation
This documentation page:
https://py-xbrl.readthedocs.io/en/latest/usage.html
has the following example:
Note that it sets the headers as follows:
Issue
When I used the headers in that format (using my own email) this was the result:
Solution
When I set the headers as follows:
I no longer ran into the above cited issue.
The text was updated successfully, but these errors were encountered: