documentation example : headers #149

dharmatech · 2024-11-11T13:03:41Z

Example from documentation

This documentation page:

https://py-xbrl.readthedocs.io/en/latest/usage.html

has the following example:

import logging
from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser, XbrlInstance
# just to see which files are downloaded
logging.basicConfig(level=logging.INFO)

cache: HttpCache = HttpCache('./cache')
cache.set_headers({'From': '[email protected]', 'User-Agent': 'py-xbrl/2.1.0'})
parser = XbrlParser(cache)

schema_url = "https://www.sec.gov/Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm"
inst: XbrlInstance = parser.parse_instance(schema_url)

Note that it sets the headers as follows:

cache.set_headers({'From': '[email protected]', 'User-Agent': 'py-xbrl/2.1.0'})

Issue

When I used the headers in that format (using my own email) this was the result:

>>> inst: XbrlInstance = parser.parse_instance(schema_url)
urllib3.exceptions.ResponseError: too many 403 error responses

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 948, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\connectionpool.py", line 938, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.sec.gov', port=443): Max retries exceeded with url: /Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm (Caused by ResponseError('too many 403 error responses'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\instance.py", line 740, in parse_instance
    return parse_ixbrl_url(uri, self.cache) if is_url(uri) else parse_ixbrl(uri, self.cache, instance_url, encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\instance.py", line 425, in parse_ixbrl_url
    instance_path: str = cache.cache_file(instance_url)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\cache.py", line 81, in cache_file
    query_response = self.connection_manager.download(file_url, headers=self.headers)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\xbrl\helper\connection_manager.py", line 51, in download
    response = self._session.get(url, headers=headers, allow_redirects=True, verify=self.verify_https)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dharm\python-environments\env-3.12-pandas\Lib\site-packages\requests\adapters.py", line 510, in send
    raise RetryError(e, request=request)
requests.exceptions.RetryError: HTTPSConnectionPool(host='www.sec.gov', port=443): Max retries exceeded with url: /Archives/edgar/data/0000320193/000032019321000105/aapl-20210925.htm (Caused by ResponseError('too many 403 error responses'))

Solution

When I set the headers as follows:

headers = {
    'User-Agent': 'COMPANY [email protected]'
}

cache.set_headers(headers)

I no longer ran into the above cited issue.

The text was updated successfully, but these errors were encountered:

manusimidt · 2024-11-13T19:41:28Z

True, seems like the SEC has updated its requirements regarding the UserAgent.
Thanks @dharmatech !

https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data

manusimidt self-assigned this Nov 13, 2024

manusimidt added the documentation Improvements or additions to documentation label Nov 13, 2024

manusimidt added a commit that referenced this issue Nov 13, 2024

Implemented #149

658f225

manusimidt closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation example : headers #149

documentation example : headers #149

dharmatech commented Nov 11, 2024 •

edited

Loading

manusimidt commented Nov 13, 2024 •

edited

Loading

documentation example : headers #149

documentation example : headers #149

Comments

dharmatech commented Nov 11, 2024 • edited Loading

Example from documentation

Issue

Solution

manusimidt commented Nov 13, 2024 • edited Loading

dharmatech commented Nov 11, 2024 •

edited

Loading

manusimidt commented Nov 13, 2024 •

edited

Loading