Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest generator operating dates when they're within a year of one another #3419

Merged
merged 16 commits into from
Feb 27, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Feb 22, 2024

Overview

Closes #3340.

What problem does this address?
When generator operating dates are within a year of each other, keep the last observed date in the year. Then try to harvest these dates again. This fixes 38 of the 59 generators reported without an operational date in #3340. The others still have inconsistent datetimes that should not be harvested.

What did you change?
Added _gen_operating_date() method that mirrors _lat_long() method. Also actually fix the harvesting to apply changes made to special columns to the main harvesting process. This can be applied to other static entity columns.

Testing

How did you make sure this worked? How can a reviewer verify this?
Run debug_harvesting.ipynb. Look at the generators highlighted in the issue and confirm that those with dates within a year of one another have been harvested with the last date kept.

To-do list

Preview Give feedback

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer added admin Catalyst operational tasks not related to coding. eia860 Anything having to do with EIA Form 860 harvest Normalization of poorly normalized inputs and reconciliation of internal inconsistencies and removed admin Catalyst operational tasks not related to coding. labels Feb 22, 2024
@e-belfer e-belfer self-assigned this Feb 22, 2024
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think turning this on for the other operating data columns makes sense if it's easy (which it seems like it should be).

@e-belfer e-belfer enabled auto-merge February 27, 2024 16:52
@e-belfer e-belfer added this pull request to the merge queue Feb 27, 2024
Merged via the queue into main with commit e075bff Feb 27, 2024
12 checks passed
@e-belfer e-belfer deleted the gen-operating-date branch February 27, 2024 18:02
katie-lamb pushed a commit that referenced this pull request Mar 5, 2024
…nother (#3419)

* Stash debugging process

* Add _gen_operating_date method

* Restore harvesting notebook

* Actually implement special col case fixes

* Clean up logs

* Add to release notes

* Fix some docstring cut-and-paste issues.

* String / docstring cleanup.

* Update EIA860m temporal coverage in README

* Clean up docs and generalize, assert static col

---------

Co-authored-by: Zane Selvans <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Mar 6, 2024
* take out test set

* add in model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* debugging model

* Update conda environment to include new splink.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* working splink model

* update splink version

* add devtools notebook and fix fuel type filling

* updates to devtools notebook

* updates to notebook

* Update conda lockfiles after merging main.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* experiment with other blocking rules

* add initial experiment tracking

* add more metrics to experiment tracking

* update blocking rules

* add experiment tracking to old model to better compare

* change blocking rules

* change blocking rules

* Harvest generator operating dates when they're within a year of one another (#3419)

* Stash debugging process

* Add _gen_operating_date method

* Restore harvesting notebook

* Actually implement special col case fixes

* Clean up logs

* Add to release notes

* Fix some docstring cut-and-paste issues.

* String / docstring cleanup.

* Update EIA860m temporal coverage in README

* Clean up docs and generalize, assert static col

---------

Co-authored-by: Zane Selvans <[email protected]>

* Add RMI beta access to parquet.catalyst.coop (#3434)

* Add RMI beta access to builds.catalyst.coop

* s/builds/parquet

* Add new citations of Catalyst / PUDL (#3435)

* Add new citations of Catalyst / PUDL.

* Add issue/PR to harvesting bugfix release notes

* Fix some capitalization in BibTex inputs.

* Add book references, fix bad DOI formatting and capitalization.

* Fix minor citation formatting issues.

* Fix minor citation formatting issues.

* dynamically generate fuel type list

* remove old model

* debug ferc to ferc

* fix cleaning pipeline

* take out test set

* add in model

* debugging model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* working splink model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

For more information, see https://pre-commit.ci

* update splink version

* add devtools notebook and fix fuel type filling

* updates to devtools notebook

* updates to notebook

* experiment with other blocking rules

* Update conda lockfiles after merging main.

* add initial experiment tracking

* add more metrics to experiment tracking

* update blocking rules

* add experiment tracking to old model to better compare

* change blocking rules

* change blocking rules

* dynamically generate fuel type list

* remove old model

* debug ferc to ferc

* fix cleaning pipeline

* update release notes and add accuracy

* update devtools notebook

* update accuracy sig figs

* clean up after rebase

* update pyproject

* add in accuracy metric

* take out todo comments

* update notebook with correct paths

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zane Selvans <[email protected]>
Co-authored-by: Katie Lamb <[email protected]>
Co-authored-by: E. Belfer <[email protected]>
Co-authored-by: Dazhong Xia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia860 Anything having to do with EIA Form 860 harvest Normalization of poorly normalized inputs and reconciliation of internal inconsistencies
Projects
Archived in project
2 participants