-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nightly Build Failure 2024-03-07 #3449
Comments
@jdangerx Should we say that this seems to have fixed itself for the moment? |
I've been able to reproduce this fairly consistently with the following snippet: from arelle import Cntlr, ModelManager, ModelXbrl, WebCache
from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp
def load_tax(_i):
cntlr = Cntlr.Cntlr()
model_manager = ModelManager.initialize(cntlr)
taxonomy_url = "https://eCollection.ferc.gov/taxonomy/form60/2022-01-01/form/form60/form-60_2022-01-01.xsd"
taxonomy = ModelXbrl.load(model_manager, taxonomy_url)
return 1
if __name__ == '__main__':
cntlr = Cntlr.Cntlr()
cache = WebCache.WebCache(cntlr, None)
cache.clear()
with ProcessPoolExecutor(max_workers=10, mp_context=mp.get_context('fork')) as executor:
taxonomies = [t for t in executor.map(load_tax, range(5))] The issue is, I think, that we split up the if reload or not filepathExists:
return filepath if self._downloadFile(url, filepath) else None P1 and P2 both see that
I think what we need to do is warm the cache by making an op that fetches all the taxonomies ahead of time. Then all the actual extraction ops can depend on the warmed cache - which means for all processes, |
This is the same problem as catalyst-cooperative/pudl-archiver#285, but we need to solve it separately because we're not operating in a Dagster environment. More thoughts there. |
@jdangerx I think this has been fixed with the new version of the extractor, so I'm closing. |
Overview
This seems to be another iteration of the failure from 2 days ago in #3441 stemming from Arelle having trouble with some cached file that it downloads from xbrl.org
This issue seems to maybe be related to the problems with the XBRL archiver problems.
The first questionable errors that show up in the logs seems to be:
Next steps
Verify that everything is fixed!
Once you've applied any necessary fixes, make sure that the nightly build outputs are all in their right places.
Tasks
Relevant logs
[link to build logs from internal distribution bucket]( PLEASE FIND THE ACTUAL LINK AND FILL IN HERE )
The text was updated successfully, but these errors were encountered: