Re-gigger backfilling `technology_description` & make `prime_mover_code` an annually harvested column #1600

cmgosnell · 2022-04-29T19:12:31Z

Update: I incorporated a change of prime_mover_code from a "static" -> "annual" harvested/normalized column!!

The previous fill-in methodology was filling in the technology_description by backfilling and then using the unique mapping between ESC and tech type. This PR edits the unique mapping fill-in portion to ALSO include the prime_mover_code.

Because this suggestion includes a two-column:one-column map, I had to convert the map(dict) methodology into a split-apply via merge-combine methodology.

how I got here...

I came across this by looking at the older years of plant_id_eia == 1961. this generator was converted into a gas plant so the backfilling didn't work. And this was an energy_source_code_1 == "BIT" generator. BIT has either mapped to Conventional Steam Coal or Coal Integrated Gasification Combined Cycle tech types. This is not a unique mapping so no BIT's were getting filled in.

Out of 135 potential esc/pm -> tech mappings, only 7 of them had non-unique map:

test = (
    gens.groupby(["energy_source_code_1", "prime_mover_code"])
    [["technology_description"]]
    .nunique()
)
len(test)
> 135

codecov · 2022-04-29T20:42:58Z

Codecov Report

Merging #1600 (0dd285d) into dev (5dbc337) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##             dev   #1600   +/-   ##
=====================================
  Coverage   84.0%   84.0%           
=====================================
  Files         65      65           
  Lines       7176    7181    +5     
=====================================
+ Hits        6034    6039    +5     
  Misses      1142    1142

Impacted Files	Coverage Δ
src/pudl/metadata/resources/eia.py	`100.0% <ø> (ø)`
src/pudl/metadata/resources/eia860.py	`100.0% <ø> (ø)`
src/pudl/output/eia860.py	`66.5% <100.0%> (+0.6%)`	⬆️
src/pudl/transform/eia860.py	`96.4% <100.0%> (+<0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dbc337...0dd285d. Read the comment docs.

aesharpe

This looks robust to me! Just a few bb comments

aesharpe · 2022-04-29T22:27:27Z

src/pudl/output/eia860.py

@@ -368,13 +367,15 @@ def generators_eia860(


 def fill_generator_technology_description(gens_df: pd.DataFrame) -> pd.DataFrame:
-    """Fill in missing ``technology_description`` based on generator and energy source.
+    """Fill in missing ``technology_description`` based by backfilling & unquie mapping.


Tiny comment, but unique is spelled wrong

lol of course ty!

aesharpe · 2022-04-29T22:28:29Z

src/pudl/output/eia860.py

    As a result, more than 95% of all generator records end up having a
    ``technology_description`` associated with them.


Do you know what the coverage is now that you've integrated prime_mover_code?

ha its 97%... which is a HUGE improvement from 96%.

with moving the PM code.... this is now 98.1% 😎

I tired adding the PM code into the backfilling. this resulted in *sliiightly* less tech-types (a grand total of 1609 record) all from generators that have no PM code. After testing the staging.. it felt better to do the backfilling w/ the completely consistent map between PM/ESC:Tech first bc it feels pretty conserative. And then we come back in w/ the bfill w/o the pm code

…rocess

review-notebook-app · 2022-05-03T15:23:23Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

cmgosnell · 2022-05-03T15:31:09Z

I added another layer of this PR. Now, ontop of the backfilling of the technology_description, I've also moved the prime_mover_code from a "static" to an "annual" harvested column. See details on why from this comment

cmgosnell · 2022-05-03T15:35:32Z

ALSO i still need to run tox -e nuke to run all the validation tests before merging.... this feels important given the change in the table structure. (even though - believe it or not - there seem to be NO changes required in the output layer)

cmgosnell · 2022-05-04T15:14:13Z

Okay @aesharpe I changed a few things and ran all of the validation tests and they check out!!

aesharpe · 2022-05-04T22:00:38Z

Update: I incorporated a change of prime_mover_code from a "static" -> "annual" harvested/normalized column!!

Does this mean that before now we were overwriting certain PM values based on whatever we decided the "static" value was?

Are there any tables that might get messed up by the addition of PM to annual? Or new documentation to add?

aesharpe · 2022-05-04T22:02:04Z

this feels important given the change in the table structure.

Which table structure changes? The annual table? The output tables would only change insofar as the PM column differs based on the new annual data, right, so no structure changes there?

cmgosnell · 2022-05-04T22:33:09Z

@aesharpe good questions

Does this mean that before now we were overwriting certain PM values based on whatever we decided the "static" value was?

Yes. 98% of PM's never change ever so we lumped this columns into the "static" (i.e. you never change ever and any change is probably a data entry issue)... but withing that 2% there were some clear instances of actual change.

Are there any tables that might get messed up by the addition of PM to annual? Or new documentation to add?

Fortunately, the only mode of access to these tables that we've baked in is through pudl.output.eia860.generators_eia860. And this output function was always merging all of the columns of the annually harvested tables and the static harvested generator tables together. So weirdly no output methods needed updating. I was vv surprised but that is the niceness of layering our access via the output tables.

On documentation, the docs should be built off of the metadata and that is now updated. This change should definitely be added to release notes when we make a new release. I've already flagged this for @zaneselvans... (should we make a running 0.7.0 (2022-XX-XX) section??)

Which table structure changes? The annual table?

A column (prime_mover_code) got moved from the static table (generators_entity_eia) to the annual table (generators_eia860).

The output tables would only change insofar as the PM column differs based on the new annual data, right, so no structure changes there?

Exactly. The structure of pudl_out.gens_eia80() is exactly the same. Some of the PM codes of those 2% of gens that differ in time will be different.

zaneselvans · 2022-05-04T23:46:44Z

Yes, if there isn't already a v0.7.0 section in the release notes definitely add one. We should make edits to the release notes in every significant PR.

cmgosnell added 2 commits April 29, 2022 14:56

map tech type with energy_source_code AND prime_mover_code

a9bb444

Merge branch 'dev' into non_static_tech

26beeef

cmgosnell added output Exporting data from PUDL into other platforms or interchange formats. data-repair Interpolating or extrapolating data that we don't actually have. rmi labels Apr 29, 2022

cmgosnell requested a review from aesharpe April 29, 2022 19:12

cmgosnell assigned cmgosnell and aesharpe Apr 29, 2022

left merge (duh!) for tech fill in

325e950

aesharpe reviewed Apr 29, 2022

View reviewed changes

cmgosnell added 5 commits May 2, 2022 11:19

first attempt to integrate manual overrides of prime_mover_codes

b9554c8

reorder filling of tech description and make backfill more strict

b0f7502

Move the PM code from a static to an annual harvested column

d3348d5

update harvest notebook to work w/ new(lol not so new) settings/etl p…

67d4dac

…rocess

cmgosnell mentioned this pull request May 3, 2022

check for and convert any EIA plant-part record_id_eia w/ changed PM's in overrides catalyst-cooperative/rmi-ferc1-eia#215

Open

cmgosnell changed the title ~~Fill in tech w/ energy_source_code_1 & prime_mover_code~~ Re-gigger backfilling technology_description & make prime_mover_code an annually harvested column May 3, 2022

Merge branch 'dev' into non_static_tech

c6c5f84

add overview of moving the primve_mover_code into release notes

0dd285d

aesharpe merged commit 3b9e6bd into dev May 5, 2022

zaneselvans deleted the non_static_tech branch May 5, 2022 04:50

zaneselvans mentioned this pull request Jul 7, 2022

Boiler Fuel Allocation Improvements #1608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-gigger backfilling `technology_description` & make `prime_mover_code` an annually harvested column #1600

Re-gigger backfilling `technology_description` & make `prime_mover_code` an annually harvested column #1600

cmgosnell commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading

aesharpe left a comment

aesharpe Apr 29, 2022

cmgosnell May 2, 2022

aesharpe Apr 29, 2022

cmgosnell May 2, 2022

cmgosnell May 3, 2022

review-notebook-app bot commented May 3, 2022

cmgosnell commented May 3, 2022

cmgosnell commented May 3, 2022

cmgosnell commented May 4, 2022

aesharpe commented May 4, 2022 •

edited

Loading

aesharpe commented May 4, 2022

cmgosnell commented May 4, 2022

zaneselvans commented May 4, 2022

		As a result, more than 95% of all generator records end up having a
		``technology_description`` associated with them.

Re-gigger backfilling technology_description & make prime_mover_code an annually harvested column #1600

Re-gigger backfilling technology_description & make prime_mover_code an annually harvested column #1600

Conversation

cmgosnell commented Apr 29, 2022 • edited Loading

how I got here...

codecov bot commented Apr 29, 2022 • edited Loading

Codecov Report

aesharpe left a comment

Choose a reason for hiding this comment

aesharpe Apr 29, 2022

Choose a reason for hiding this comment

cmgosnell May 2, 2022

Choose a reason for hiding this comment

aesharpe Apr 29, 2022

Choose a reason for hiding this comment

cmgosnell May 2, 2022

Choose a reason for hiding this comment

cmgosnell May 3, 2022

Choose a reason for hiding this comment

review-notebook-app bot commented May 3, 2022

cmgosnell commented May 3, 2022

cmgosnell commented May 3, 2022

cmgosnell commented May 4, 2022

aesharpe commented May 4, 2022 • edited Loading

aesharpe commented May 4, 2022

cmgosnell commented May 4, 2022

zaneselvans commented May 4, 2022

Re-gigger backfilling `technology_description` & make `prime_mover_code` an annually harvested column #1600

Re-gigger backfilling `technology_description` & make `prime_mover_code` an annually harvested column #1600

cmgosnell commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading

aesharpe commented May 4, 2022 •

edited

Loading