-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform f1_elc_op_mnt_expn
#2162
Conversation
Codecov ReportBase: 85.5% // Head: 85.5% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #2162 +/- ##
=====================================
Coverage 85.5% 85.5%
=====================================
Files 73 73
Lines 8851 8865 +14
=====================================
+ Hits 7569 7583 +14
Misses 1282 1282
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
…able, add align_row_numbers_dbf param and column rename param for xbrl
…es_to_report_year_instant_xbrl function
Add annual_oandm_expense and oandm_expense_type to fields.py Update the resource_metadata scheme for electric_oandm_ferc1 to include all columns we want in the output. Add process_dbf function to the transformer class for electric_oandm_ferc1 to get rid of one row that interfers with the drop_duplicates Add duration_xbrl rename params Add wide_to_tody, drop_duplicate_rows_dbf, and merge_xbrl_metadata params
…rom the previous year
…1 table because dropping of non annual rows is already taken care of by the process_duration_xbrl function's select_current_year_annual_records_duration_xbrl
Comparing current and previous year's dataThis table contains both previous year and current year data. The "previous year" data is duplicate information and gets removed. However, before it gets deleted, I want to use it to sanity check the "current year" data. Ideally I would add a check into the code, but first I wanted to know if it would actually pass... I did some preliminary tests on the dbf data to see how often the "current year" data matches next year's "previous year" data. Here's what I found: You can only compare data when there are multiple consecutive years of data reported. If you only have one year of data, the "current year" data cannot be compared to next year's "previous year" data. Therefore only a certain subset of the data can be compared using this method.
These non-matches consist of:
Now, the question is...what should we do about this?A) Set a threshold that non-matching current-previous values cannot comprise more than 2% of the data. B) Keep the previous year data in the table so users can do their own comparisons / select the data they want to use (vs. removing it which is what we do now). C) Create a flag when the prev year and current year data don't match up, indicating a likely data error/not to trust that value. D) Ignore it because its only 1.36% of the data (even less if you're counting all the rows), As much as I love getting rid of ugly data, my gut says our users would appreciate it if we keep the previous year data in there and added flags (i.e., a combination of A, B, and C). But let's ask RMI what they think. This is now encapsulated in Issue #2164 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it was relatively straightforward 🎉
I have mostly naming suggestions.
My high-level naming request is that we work on the name of the table itself. oandm
takes me a while to understand. I'm wondering if operation_and_maintenance_ferc1
would work? electric_operation_and_maintenance_ferc1
is so long but feels correct. Or electric_opex_ferc1
or opex_electric_ferc1
or just opex_ferc1
?
Is the electric
necessary? Probably because there are other references to non-electric maintenance... Does our current shorthand of opex
include maintenance? some quick internet searching tells me yes it does. and we have opex_maintenance
so i hope it does lol. In that case I might go with electric_opex_ferc1
.
…nsformer for consistency with other records. Use this to drop the bad duplicate from 2002 instead of doing that inside the process_dbf function
|
…d table transformer name from ElectricOperationAndMaintenanceFerc1TableTransformer to ElectricOpexFerc1TableTransformer
…xpense to expense and oandm_expense_type to expense_type
…y) to release notes
…les in the validate/ferc1_test.py module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wahoo!
I think your tests are failing because you have not added |
Ah yep, I just saw that, thanks. |
@cmgosnell is working on an update to the settings validations in #2168 that would catch this (since she and I also both had our own run-in with this error). |
Can we delete this branch? |
Deleted |
Overall, I think the output format of this table is great. Without labels of operation/maintenance expense and steam/nuclear/transmission/etc., it's difficult to compare the output table to the original .pdf, which is useful for understanding the categorizations of the original table. Alphabetical organization mixes that up, I'd keep it in order by row number. Or, is the plan still to solve this with connections from each expense_type or ferc_account to those labels? |
No description provided.