Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform f1_elc_op_mnt_expn #2162

Merged
merged 20 commits into from
Dec 30, 2022
Merged

Transform f1_elc_op_mnt_expn #2162

merged 20 commits into from
Dec 30, 2022

Conversation

aesharpe
Copy link
Member

No description provided.

@aesharpe aesharpe changed the title Transform f1 elc op mnt expn Transform f1_elc_op_mnt_expn Dec 27, 2022
@codecov
Copy link

codecov bot commented Dec 27, 2022

Codecov Report

Base: 85.5% // Head: 85.5% // Increases project coverage by +0.0% 🎉

Coverage data is based on head (1fb5d46) compared to base (6a02432).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2162   +/-   ##
=====================================
  Coverage   85.5%   85.5%           
=====================================
  Files         73      73           
  Lines       8851    8865   +14     
=====================================
+ Hits        7569    7583   +14     
  Misses      1282    1282           
Impacted Files Coverage Δ
src/pudl/extract/ferc1.py 86.0% <ø> (ø)
src/pudl/metadata/fields.py 100.0% <ø> (ø)
src/pudl/metadata/resources/ferc1.py 100.0% <ø> (ø)
src/pudl/transform/params/ferc1.py 100.0% <ø> (ø)
src/pudl/transform/ferc1.py 95.2% <100.0%> (+<0.1%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

…able, add align_row_numbers_dbf param and column rename param for xbrl
Add annual_oandm_expense and oandm_expense_type to fields.py

Update the resource_metadata scheme for electric_oandm_ferc1 to include all columns we want in the output.

Add process_dbf function to the transformer class for electric_oandm_ferc1 to get rid of one row that interfers with the drop_duplicates

Add duration_xbrl rename params

Add wide_to_tody, drop_duplicate_rows_dbf, and merge_xbrl_metadata params
…1 table because dropping of non annual rows is already taken care of by the process_duration_xbrl function's select_current_year_annual_records_duration_xbrl
@aesharpe
Copy link
Member Author

aesharpe commented Dec 28, 2022

Comparing current and previous year's data

This table contains both previous year and current year data. The "previous year" data is duplicate information and gets removed. However, before it gets deleted, I want to use it to sanity check the "current year" data. Ideally I would add a check into the code, but first I wanted to know if it would actually pass...

I did some preliminary tests on the dbf data to see how often the "current year" data matches next year's "previous year" data. Here's what I found:

You can only compare data when there are multiple consecutive years of data reported. If you only have one year of data, the "current year" data cannot be compared to next year's "previous year" data. Therefore only a certain subset of the data can be compared using this method.

  • The full df is 500042 rows long.
  • There are 461166 rows that are not the last year of their utility/expense type category (these rows have no "previous year" data to compare their "current year" data to.
  • Of these eligible rows, 6283 don't match their "current" values with the next year's "previous" values. That's only 1.36%!!

These non-matches consist of:

  • 1.37% (86 rows) sign miss-matches (one is positive and the other is negative)
  • The rest of the values are simply different from one another. Considering the absolute value of each of the values, the median percent difference between the miss-matched current and previous year values is about 12%.

Now, the question is...what should we do about this?

A) Set a threshold that non-matching current-previous values cannot comprise more than 2% of the data.

B) Keep the previous year data in the table so users can do their own comparisons / select the data they want to use (vs. removing it which is what we do now).

C) Create a flag when the prev year and current year data don't match up, indicating a likely data error/not to trust that value.

D) Ignore it because its only 1.36% of the data (even less if you're counting all the rows),

As much as I love getting rid of ugly data, my gut says our users would appreciate it if we keep the previous year data in there and added flags (i.e., a combination of A, B, and C). But let's ask RMI what they think.


This is now encapsulated in Issue #2164

@aesharpe aesharpe requested a review from cmgosnell December 28, 2022 20:04
@aesharpe aesharpe marked this pull request as ready for review December 29, 2022 20:36
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was relatively straightforward 🎉

I have mostly naming suggestions.

My high-level naming request is that we work on the name of the table itself. oandm takes me a while to understand. I'm wondering if operation_and_maintenance_ferc1 would work? electric_operation_and_maintenance_ferc1 is so long but feels correct. Or electric_opex_ferc1 or opex_electric_ferc1 or just opex_ferc1?

Is the electric necessary? Probably because there are other references to non-electric maintenance... Does our current shorthand of opex include maintenance? some quick internet searching tells me yes it does. and we have opex_maintenance so i hope it does lol. In that case I might go with electric_opex_ferc1.

…nsformer for consistency with other records. Use this to drop the bad duplicate from 2002 instead of doing that inside the process_dbf function
@aesharpe
Copy link
Member Author

electric_opex_ferc1

electric_opex_ferc1 it is!

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wahoo!

@zaneselvans
Copy link
Member

I think your tests are failing because you have not added electric_operations_and_maintenance_expenses_320 to the list of XBRL tables which need to be extracted in etl_fast.yml

@aesharpe
Copy link
Member Author

I think your tests are failing because you have not added electric_operations_and_maintenance_expenses_320 to the list of XBRL tables which need to be extracted in etl_fast.yml

Ah yep, I just saw that, thanks.

@zaneselvans
Copy link
Member

@cmgosnell is working on an update to the settings validations in #2168 that would catch this (since she and I also both had our own run-in with this error).

@zaneselvans zaneselvans linked an issue Dec 30, 2022 that may be closed by this pull request
2 tasks
@aesharpe aesharpe merged commit 50d68c3 into dev Dec 30, 2022
@zaneselvans
Copy link
Member

Can we delete this branch?

@aesharpe aesharpe deleted the transform_f1_elc_op_mnt_expn branch December 30, 2022 20:03
@aesharpe
Copy link
Member Author

Deleted

@jrea-rmi
Copy link
Collaborator

jrea-rmi commented Jan 3, 2023

Overall, I think the output format of this table is great.

Without labels of operation/maintenance expense and steam/nuclear/transmission/etc., it's difficult to compare the output table to the original .pdf, which is useful for understanding the categorizations of the original table. Alphabetical organization mixes that up, I'd keep it in order by row number.

Or, is the plan still to solve this with connections from each expense_type or ferc_account to those labels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transform f1_elc_op_mnt_expn xbrl + dbf
4 participants