Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate additional data sources using GHA #284

Merged
merged 8 commits into from
Feb 20, 2024
Merged

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Feb 20, 2024

Overview

Partially addresses #276

What problem does this address?
Adds all archivers currently running out of the box to GHA, and sorts file_changes to improve the legibility of reported changes.

What did you change in this PR?

  • Adding remaining data sources to the run-archiver method to prevent stale archives, and adding some quality of life improvements to the summary.
  • Ran mshamines locally with refresh-metadata flag to update keywords that were causing an error, but did not resolve previous_version error so removed this archive from the list and moved into Fix broken archivers #285.
  • Manually approved new archives for mshamines and epacamd_eia to fix previous_version errors.
  • Updated mshamines partition name to form from dataset to avoid downstream errors in extraction

Out of scope:

Testing

How did you make sure this worked? How can a reviewer verify this?
Run run-archiver in GHA. All datasets should pass.

To-do list

Tasks

@e-belfer e-belfer added the automation Issues relating to automated archiver runs label Feb 20, 2024
@e-belfer e-belfer self-assigned this Feb 20, 2024
@e-belfer e-belfer linked an issue Feb 20, 2024 that may be closed by this pull request
11 tasks
Comment on lines 26 to 30
- ferc1
- ferc2
- ferc6
- ferc60
- ferc714
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still expect the FERC XBRL archives will always appear completely new, even if nothing has changed in their contents?

IIRC they were doing something like autogenerating new IDs for every post in the RSS feed every time the feed was read, rather than using persistent unique IDs for each post. Did we find some way around that?

Copy link
Member Author

@e-belfer e-belfer Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FERC archivers are certainly not working at present, but I'm tracking this in #285. Maybe this is a known and intended failure and I'm missing something, in which case these archivers shouldn't be candidates for automation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, #285 looks like real brokenness in the FERC archivers.

If we haven't addressed the unique ID thing, we'd just see that all the FERC XBRL archives get updated every time the archiver is run, but I think we'd still be able to get an idea of how much new data there is from the change in the size of the archives, and saving interim data doesn't seem like a bad idea given how flaky FERC's data curation is!

@e-belfer e-belfer changed the title Automate everything! Add remaining data sources to GHA Automate additional data sources using GHA Feb 20, 2024
@e-belfer e-belfer mentioned this pull request Feb 20, 2024
11 tasks
@e-belfer e-belfer merged commit ce0f16d into main Feb 20, 2024
15 of 16 checks passed
@e-belfer e-belfer deleted the automate-everything branch February 20, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation Issues relating to automated archiver runs
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Automate remaining archive runs
2 participants