Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in allocate_gen_fuel #3690

Merged
merged 9 commits into from
Sep 5, 2024
Merged

Conversation

grgmiller
Copy link
Collaborator

@grgmiller grgmiller commented Jun 22, 2024

Overview

This PR fixes several issues that we identified in the analysis.allocate_gen_fuel module and had fixed in our fork of pudl for OGE. We are now trying to get rid of our dependency on the pudl code, so want to migrate all of our changes over to pudl so that we can directly use the output table from this module.

This is part of work we are tracking in OGE here: singularity-energy/open-grid-emissions#369

First, this addresses an issue where some retiring generators were incorrectly identified and being dropped: singularity-energy#1

This PR fixes a bug where during the generation and fuel allocation process, data for report months after the reported retirement date was getting dropped for generators that retire mid-year. For example, the retirement date of both generators at plant 50937 was "2022-09-01", and the previous behavior was to drop all data after september, even though this plant continued to report fuel consumption after september. This fix keeps all report dates through the end of the current year to avoid dropping this data.

Second, this addresses issues with duplicate generators, as described in this PR: singularity-energy#3

When running the pudl.analysis.allocate_gen_fuel pipeline for 2016 and 2017, we were getting a TypeError at group_duplicate_keys(), because this function was trying to groupby().sum() non-numeric columns like generator_retirement_date.
The group_duplicate_keys() will only work if we drop any datetime and boolean columns before using this function, and considering carefully whether we want to sum any of the frac columns or not. This PR, however, does not touch this function, but rather fixes the issue upstream.
We were only running into this issue with group_duplicate_keys() because there were duplicate keys in the dataframe, so this PR addresses the root cause of where duplicate keys were getting introduced in the first place.
It turns out that when creating the gen_assoc table with associate_generator_tables(), one of the steps is remove_inactive_generators(), which removes certain generators by creating six different dataframes with different generators based on their operating status: existing, retiring_generators, retired_plants, proposed_generators, proposed_plants, and unassociated_plants. These six dataframes are then concat'ed together. Previously our assumption was that these six dataframes should be non-overlapping. However, it turns out that this is not always the case.
For example, in 2016, plant 56846 generator GTG1 ended up in both proposed_generators and proposed_plants, which was causing it to be duplicated.
We fix this by simply adding .drop_duplicates() after these six dataframes are concat'ed together. This fixes the issue that we were experiencing in 2016 and 2017.
For now, we will leave group_duplicate_keys() alone even though it does not work. It effectively acts as an error if there are ever any duplicate keys since it will raise a typeerror like we saw for 2016 and 2017.

Testing

We have successfully run this after importing pudl and running it in the OGE pipeline. However, we had previously been testing this with an older release of pudl (v2023.12.01).

I have a Windows machine so there are not great instructions on getting the pudl dev environment set up on Windows. This is a pretty small amount of code change, so I'm hoping that someone with the dev environment already set up may be able to help test this.

To-do list

Preview Give feedback

@grgmiller grgmiller requested a review from cmgosnell June 23, 2024 00:11
@grgmiller
Copy link
Collaborator Author

@cmgosnell not sure who would be best to review this so added you for now.

@zaneselvans zaneselvans added analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. community labels Jun 25, 2024
@cmgosnell
Copy link
Member

hey @grgmiller !! I'm so sorry that it's taken me absolutely forever to get to this PR!! I will commit to making sure we get this merged in quickly now that my attention is turned to it.

two intentions are:

  1. make sure there are never duplicated generators
  2. for retiring generators, make sure that there are additional months included to make sure any net gen or fuel reporting post-retirement is included in the allocation

For the first one, the simple drop dupes make sense np.

The desire for the second fix makes a lot of sense. But what I interpret from what you have done here doesn't exactly make sense to me. It looks like you are trying to include any records ever from a generator that says its retired but reported producing mwh. Which I believe means if a generator retired in January 2001 but reported MWh in February, all subsequent months of that generator would be included. That seems incorrect to me. I'd propose two different alternative solutions:

  • add non-null fuel into the mask finding the retiring generator/months. Right now it is only selecting for non-full MWh.
  • change the selection to make it include retiring records for the full year (or for some set number of future months) instead of in perpetuity.

If either of these options sounds okay to you I'm happy to try to implement and use your example plant 50937 as a test case to make sure we achieve your desired behavior.

@grgmiller
Copy link
Collaborator Author

grgmiller commented Aug 28, 2024

Thanks @cmgosnell. I think what you proposed makes sense, and I think aligns with what I was thinking, although maybe I didn't explain clearly.

In my original example, I said:

For example, the retirement date of both generators at plant 50937 was "2022-09-01", and the previous behavior was to drop all data after september, even though this plant continued to report fuel consumption after september. This fix keeps all report dates through the end of the current year to avoid dropping this data.

In this case, by "current year" I meant the year in which it retired (2022), not the actual current year today (2024). So for this plant, if it retired in 2022-09-01, we'd only keep records through 2022-12-01 (which I think is consistent with what you were proposing?).

I think some of the confusion might be because in OGE we only run the pipeline a single year at a time, rather than for all years, so the code as I proposed, while compatible with our use case, might not work as intended when running all years.

I also think that adding non-null fuel (in addition to MWh) also makes sense.

@cmgosnell
Copy link
Member

cmgosnell commented Aug 28, 2024

hey @grgmiller ! Thanks for your response! Yea i was wondering/fuzzily remembering that in OGE you are working with only one year. So yes I think I understand your need here and will tweak your pr a bit to enable retiring generator (status=retired but either burning fuel or making MWh) records to show up for every month in the report year which they are retiring.

@aesharpe aesharpe added the bug Things that are just plain broken. label Sep 3, 2024
@aesharpe aesharpe self-requested a review September 3, 2024 18:52
@zaneselvans zaneselvans added the eia923 Anything having to do with EIA Form 923 label Sep 4, 2024
@cmgosnell cmgosnell requested a review from aesharpe September 5, 2024 18:49
@cmgosnell cmgosnell enabled auto-merge September 5, 2024 18:53
Comment on lines +17 to +19
* Include more retiring generators in the net generation and fuel consumption
allocation. Thanks to :user:`grgmiller` for this contirbution :pr:`3690`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, could update this to specifically call out the inclusion of mid-year retirement cases.

@cmgosnell cmgosnell added this pull request to the merge queue Sep 5, 2024
@aesharpe aesharpe removed this pull request from the merge queue due to a manual request Sep 5, 2024
@cmgosnell cmgosnell added this pull request to the merge queue Sep 5, 2024
Merged via the queue into main with commit 382a32d Sep 5, 2024
17 checks passed
@cmgosnell cmgosnell deleted the grgmiller/fix_allocate_gen_fuel_bugs branch September 5, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. bug Things that are just plain broken. community eia923 Anything having to do with EIA Form 923
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants