-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-gigger backfilling technology_description
& make prime_mover_code
an annually harvested column
#1600
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #1600 +/- ##
=====================================
Coverage 84.0% 84.0%
=====================================
Files 65 65
Lines 7176 7181 +5
=====================================
+ Hits 6034 6039 +5
Misses 1142 1142
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks robust to me! Just a few bb comments
src/pudl/output/eia860.py
Outdated
@@ -368,13 +367,15 @@ def generators_eia860( | |||
|
|||
|
|||
def fill_generator_technology_description(gens_df: pd.DataFrame) -> pd.DataFrame: | |||
"""Fill in missing ``technology_description`` based on generator and energy source. | |||
"""Fill in missing ``technology_description`` based by backfilling & unquie mapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny comment, but unique is spelled wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol of course ty!
As a result, more than 95% of all generator records end up having a | ||
``technology_description`` associated with them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the coverage is now that you've integrated prime_mover_code
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha its 97%... which is a HUGE improvement from 96%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with moving the PM code.... this is now 98.1% 😎
I tired adding the PM code into the backfilling. this resulted in *sliiightly* less tech-types (a grand total of 1609 record) all from generators that have no PM code. After testing the staging.. it felt better to do the backfilling w/ the completely consistent map between PM/ESC:Tech first bc it feels pretty conserative. And then we come back in w/ the bfill w/o the pm code
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I added another layer of this PR. Now, ontop of the backfilling of the |
ALSO i still need to run |
technology_description
& make prime_mover_code
an annually harvested column
Okay @aesharpe I changed a few things and ran all of the validation tests and they check out!! |
Does this mean that before now we were overwriting certain PM values based on whatever we decided the "static" value was? Are there any tables that might get messed up by the addition of PM to annual? Or new documentation to add? |
Which table structure changes? The annual table? The output tables would only change insofar as the PM column differs based on the new annual data, right, so no structure changes there? |
@aesharpe good questions
Yes. 98% of PM's never change ever so we lumped this columns into the "static" (i.e. you never change ever and any change is probably a data entry issue)... but withing that 2% there were some clear instances of actual change.
Fortunately, the only mode of access to these tables that we've baked in is through On documentation, the docs should be built off of the metadata and that is now updated. This change should definitely be added to release notes when we make a new release. I've already flagged this for @zaneselvans... (should we make a running
A column (
Exactly. The structure of |
Yes, if there isn't already a v0.7.0 section in the release notes definitely add one. We should make edits to the release notes in every significant PR. |
Update: I incorporated a change of
prime_mover_code
from a "static" -> "annual" harvested/normalized column!!The previous fill-in methodology was filling in the
technology_description
by backfilling and then using the unique mapping between ESC and tech type. This PR edits the unique mapping fill-in portion to ALSO include theprime_mover_code
.Because this suggestion includes a two-column:one-column map, I had to convert the
map(dict)
methodology into a split-apply via merge-combine methodology.how I got here...
I came across this by looking at the older years of
plant_id_eia == 1961
. this generator was converted into a gas plant so the backfilling didn't work. And this was anenergy_source_code_1 == "BIT"
generator.BIT
has either mapped toConventional Steam Coal
orCoal Integrated Gasification Combined Cycle
tech types. This is not a unique mapping so noBIT
's were getting filled in.Out of 135 potential esc/pm -> tech mappings, only 7 of them had non-unique map: