-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate additional data sources using GHA #284
Conversation
.github/workflows/run-archiver.yml
Outdated
- ferc1 | ||
- ferc2 | ||
- ferc6 | ||
- ferc60 | ||
- ferc714 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still expect the FERC XBRL archives will always appear completely new, even if nothing has changed in their contents?
IIRC they were doing something like autogenerating new IDs for every post in the RSS feed every time the feed was read, rather than using persistent unique IDs for each post. Did we find some way around that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FERC archivers are certainly not working at present, but I'm tracking this in #285. Maybe this is a known and intended failure and I'm missing something, in which case these archivers shouldn't be candidates for automation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, #285 looks like real brokenness in the FERC archivers.
If we haven't addressed the unique ID thing, we'd just see that all the FERC XBRL archives get updated every time the archiver is run, but I think we'd still be able to get an idea of how much new data there is from the change in the size of the archives, and saving interim data doesn't seem like a bad idea given how flaky FERC's data curation is!
Overview
Partially addresses #276
What problem does this address?
Adds all archivers currently running out of the box to GHA, and sorts
file_changes
to improve the legibility of reported changes.What did you change in this PR?
run-archiver
method to prevent stale archives, and adding some quality of life improvements to the summary.mshamines
locally withrefresh-metadata
flag to update keywords that were causing an error, but did not resolveprevious_version
error so removed this archive from the list and moved into Fix broken archivers #285.mshamines
andepacamd_eia
to fixprevious_version
errors.mshamines
partition name toform
fromdataset
to avoid downstream errors in extractionOut of scope:
Testing
How did you make sure this worked? How can a reviewer verify this?
Run
run-archiver
in GHA. All datasets should pass.To-do list
Tasks