Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing climate models and ensemble members #13

Open
jirvin16 opened this issue Jul 4, 2024 · 1 comment
Open

Missing climate models and ensemble members #13

jirvin16 opened this issue Jul 4, 2024 · 1 comment
Assignees
Labels
dataset fixed in v2 Issue that will be addressed in the next version

Comments

@jirvin16
Copy link

jirvin16 commented Jul 4, 2024

Thank you for putting together an amazing dataset for the AI+climate community!

It looks like the dataset hosted on huggingface is missing several files. It only seems to have 21 climate models (rather than 36 stated in the paper) and from the included climate models, several ensemble members seem to be missing (e.g. CAMS-CSM1-0 only has 1 but the paper states it has 2). I believe several scenarios are missing as well.

Would it be possible to upload the missing data, or was their exclusion intentional?

Thanks again.

@liellnima
Copy link
Collaborator

Hi Jeremy,

I am happy you find ClimateSet helpful!

Yes, the dataset is indeed missing several files (and having some issues here and there still). To separate the issues:

  • Climate Models: We have only included 21 climate models because we run into some issues with the remaining ones that we need to track down. Of the 21 climate models I am only recommending using the following 15 ones (since our data loader had issues in the past with the other ones): AWI-CM-1-1-MR, BCC-CSM2-MR, CAS-ESM2-0, CNRM-CM6-1-HR, EC-Earth3, EC-Earth3-Veg-LR, FGOALS-f3-L, GFDL-ESM4, INM-CM4-8, INM-CM5-0, MPI-ESM1-2-HR, MRI-ESM2-0, NorESM2-LM, NorESM2-MM, TaiESM1.
  • Ensemble members: Right now, we are providing only 1 ensemble member per climate model in the core dataset, since we want to make sure that one climate model is not overrepresented in the data. However, we want to add e.g. the 97 ensemble members of the EC-Earth3-Veg model, so it can be used to assess intra-model variability.

In summary: The exclusion is intentional, however, we would like to add the missing data.

We are currently working on re-doing the whole ClimateSet pipeline and hope to be able to provide a ClimateSet python package that includes the full dataset and a smooth pipeline by the end of this year (2024). Unfortunately, the folks working on this (including me) are doing this as a side thing and have all other main tasks / research projects keeping us occupied.

I think that our new approach will help us to have a cleaner setup / dataset and track down the issues of the currently missing datasets :)

If you want to contribute and accelerate things, please let me know - I am super happy to include anyone who has time for this :)

@liellnima liellnima self-assigned this Jul 16, 2024
@liellnima liellnima added fixed in v2 Issue that will be addressed in the next version dataset labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset fixed in v2 Issue that will be addressed in the next version
Projects
None yet
Development

No branches or pull requests

2 participants