-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModelXbrl.load doesn't handle concurrency well #1128
Comments
Hi @jdangerx, Thanks for reporting this and glad to hear you're getting good use out of Arelle! Regarding concurrency and filesystem operations. We'll work a ticket to avoid the race condition you mentioned, but there will still be other scenarios where multiple Arelle processes can run into race conditions writing to the web cache. There are two ways to avoid this issues:
Of these two, taxonomy packages are typically preferred because most major taxonomies provide a package that can be loaded by XBRL processors. For both security and performance reasons, most production users of Arelle operate in offline mode, and download filings to the local system before processing them with Arelle using one of the two methods mentioned above. |
Ticket requirements: The Arelle web cache doesn't handle concurrency well. Multiple Arelle processes can run into race conditions downloading and populating the web cache with the same files. To avoid these issues, we should:
|
I suspect drives/mounts will cause complications (e.g., Windows temp directory on C: but cache on D:, Linux tmpfs vs cache directory, etc.), so you'll probably want to use |
What happened?
OS: Mac OS 14 Sonoma (which is missing from your issue template, FYI), Ubuntu 22.04 (which is what I think the GHA
ubuntu-latest
images correspond to),Hello! Thanks for all the work you've done to make XBRL parsing straightforward, it's been a great help to us at Catalyst!
Unfortunately, when running some automated XBRL ingestion processes we found that the taxonomy loading seems to run into a race condition when run with concurrency.
Specifically, the error occurs in these lines of
WebCache.py
:not filepathExists
, so they initialize the file downloadI think we can work around it in our use cases (catalyst-cooperative/pudl#3449 and catalyst-cooperative/pudl-archiver#285) but I wonder if this would be a good thing to fix upstream!
Traceback inside ✨
Here's a snippet that I've been using to reproduce it fairly reliably:
Documents
No response
If running from the command line, what command did you run?
No response
Interface
Python library (pip install)
Version
2.23.13
Download
pip install
Operating System
Other (please specify)
The text was updated successfully, but these errors were encountered: