Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata for ATB, EIA 930 and AEO data #3474

Merged
merged 4 commits into from
Mar 18, 2024
Merged

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Mar 15, 2024

Overview

Closes #3473.

What problem does this address?
Adds new data sources to pudl.metadata.sources.py to enable archiving.

What did you change?
Added new datasets to our sources.

Testing

How did you make sure this worked? How can a reviewer verify this?
Review existing docs and check links to make sure all information is correct.

To-do lis

Preview Give feedback

@e-belfer e-belfer self-assigned this Mar 15, 2024
@e-belfer e-belfer added eia930 Related to the EIA Form 930 gridlab Work related to open modeling input data integration funded/coordinated by GridLab eiaaeo EIA Annual Energy Outlook nrelatb NREL's Annual Technology Baseline data metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. labels Mar 15, 2024
@e-belfer e-belfer requested a review from zaneselvans March 15, 2024 19:18
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I think the working_partitions are only supposed to list those partitions which we are actually extracting and transforming and expect to work, which in the case of NREL ATB and EIA AEO I think will only be the most recent release.

"license_raw": LICENSES["us-govt"],
"license_pudl": LICENSES["cc-by-4.0"],
},
"eia_aeo": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ID is going to end up everywhere. Do we want to use eia_aeo or eiaaeo? Personally I think eia_bulk_elec isn't great because it's different from all of the other data source IDs we use which are a single alphanumeric string like eia923 so this seems like a different format. Similar question with the nrel_atb below.

Copy link
Member Author

@e-belfer e-belfer Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this more legible but I agree it's less consistent with how we've done things thus far. I'll drop the space.

},
"field_namespace": "nrel_atb",
"working_partitions": {
"years": sorted(set(range(2015, 2024))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only saw 2019-2023 in their cloud buckets as Parquet files. The older data is in a variety of different formats (like spreadsheets) and would have to be downloaded from other locations. I could see potentially archiving it but I doubt we'll want to do any transforms, and given the tight hours we've got I think probably we should just stick to archiving all the data in the current format for now, probably integrating only the most recent year of data initially, which I think means we just want years: [2023] for the working_partitions, right?

aws s3 ls --no-sign-request s3://oedi-data-lake/ATB/electricity/parquet/

},
"field_namespace": "eia",
"working_partitions": {
"years": sorted(set(range(2014, 2024))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment I think we'll probably only be extracting and working with the most recent (2023) data, although we can archive all of them. But IIRC the working_partitions are the ones that are supposed to be ETL-able.

@@ -607,6 +691,41 @@
"license_raw": LICENSES["us-govt"],
"license_pudl": LICENSES["cc-by-4.0"],
},
"nrel_atb": {
"title": "NREL Annual Technology Baseline (ATB)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NREL publishes an ATB for both Electricity and Transportation. It seems unlikely that we'll ever be working with the transportation data, but maybe it's worth noting that we're talking about the electricity data in the title and description.

Suggested change
"title": "NREL Annual Technology Baseline (ATB)",
"title": "NREL Annual Technology Baseline (ATB) for Electricity",

"title": "NREL Annual Technology Baseline (ATB)",
"path": "https://atb.nrel.gov/",
"description": (
"The NREL Annual Technology Baseline (ATB) publishes annual projections of "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The NREL Annual Technology Baseline (ATB) publishes annual projections of "
"The NREL Annual Technology Baseline (ATB) for Electricity publishes annual projections of "

@e-belfer e-belfer requested a review from zaneselvans March 18, 2024 13:41
Comment on lines 304 to 308
"half_year": [
f"{str(q).lower()}h{half}"
for q in pd.period_range(start="2015", end="2023", freq="Y")
for half in [1, 2]
][1:-1] # Begins in H2 of 2015 and currently ends in H1 of 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to include all of 2023 and the first half of 2024, right? Sorry I didn't catch this before.

Suggested change
"half_year": [
f"{str(q).lower()}h{half}"
for q in pd.period_range(start="2015", end="2023", freq="Y")
for half in [1, 2]
][1:-1] # Begins in H2 of 2015 and currently ends in H1 of 2024
"half_year": [
f"{year}h{half}" for year in range(2015, 2025) for half in [1, 2]
][1:-1] # Begins in H2 of 2015 and currently ends in H1 of 2024

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yes good catch!

@e-belfer e-belfer requested a review from zaneselvans March 18, 2024 15:25
@e-belfer e-belfer enabled auto-merge March 18, 2024 15:27
@e-belfer e-belfer added this pull request to the merge queue Mar 18, 2024
Merged via the queue into main with commit b026139 Mar 18, 2024
12 checks passed
@e-belfer e-belfer deleted the gridlab-pudl-metadata branch March 18, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia930 Related to the EIA Form 930 eiaaeo EIA Annual Energy Outlook gridlab Work related to open modeling input data integration funded/coordinated by GridLab metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. nrelatb NREL's Annual Technology Baseline data
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Define eia_aeo, nrel_atb and eia930 data source metadata
2 participants