Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching historical and sspXYZ using df with incomplete columns make the historical period disappear #286

Closed
1 task
coxipi opened this issue Nov 14, 2023 · 5 comments · Fixed by #287
Closed
1 task
Labels
bug Something isn't working

Comments

@coxipi
Copy link
Contributor

coxipi commented Nov 14, 2023

Setup Information

  • xscen version: 0.7.3-beta
  • Python version: 3.11.5
  • Operating System: Linux mint

Description

I parse a local directory to create a df. The following fields:

"activity"
'bias_adjust_institution'
'bias_adjust_project'

are unspecified, which causes the historical period to disappear when I proceed to make a catalog that matches hist and fut.

Steps To Reproduce

from xscen.catutils import parse_directory

df = parse_directory(
    directories=["/home/eridup1/tank/etiages/CMIP6_ornl_gov/"],
    patterns=[
        "{variable}_{?}_{source}_{experiment}_{member}_{?grid}_{DATES}.nc"
    ],
    homogenous_info={
        "mip_era": "CMIP6",
        "type": "simulation",
        "institution": "our",
        "processing_level": "extracted",
        "xrfreq": "MS",
        "frequency": "mon",
        "domain": "global",
        # I need to fill some or all of these fields, else historical datasets just disappear with match_hist_and_fut
        "activity": ".",
        'bias_adjust_institution':".",
        'bias_adjust_project':".",
    },
    read_from_file=["variable", "date_start", "date_end"],
)
subcat = xs.ProjectCatalog.from_df(df)
ds_dict = xs.search_data_catalogs(subcat,  variables_and_freqs={"rsds": "MS", "rsus":"MS"}, match_hist_and_fut=True)

The elements in ds_dict only contain the historical period when I fill the fields specified above. Otherwise, the historical period is ignored.

If I remove match_hist_and_fut=True, the historical periods remain, and they're separated. I checked and both sspXYZ and historical have [None,None,None] for the 3 specified fields, so it doesn't seem to be because of mismatching fields, but really the presence of None rather than some random string that makes the difference.

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to address this bug.
@coxipi coxipi added the bug Something isn't working label Nov 14, 2023
@juliettelavoie
Copy link
Contributor

The problem is the missing activity (

for activity_id in set(sdf.activity) - {"HighResMip", np.NaN}:
). The code assumes that you are passing an official catalog with all the right columns (withhistorical in activity CMIP and sspXYZ, in ScenarioMIP).

Maybe we could throw a warning if necessary columns are not filled?

@coxipi
Copy link
Contributor Author

coxipi commented Nov 14, 2023

I see. I had the impression that the "experiment" column would be used in this way. I think a warning would be good maybe yes, because I would characterize this as a silent failure otherwise. Then again, if my use case is just not the intended way to work with these tools, feel free to ignore this issue.

@juliettelavoie
Copy link
Contributor

Both colums are used (activity and experiment).
I can do a PR to add a warning. I don't think passing your own catalog is the "wrong" way to use this, even if it is not the typical case. Though, in general, I would encourage you to fill in as many columns as you can when creating your own catalogue https://xscen.readthedocs.io/en/latest/columns.html

@aulemahal
Copy link
Collaborator

I can't remember the reason we explicitly skip np.NaN as the activity. I guess HighResMIP is skipped because such an SSP wouldn't be compatible with a CMIP hist, but why skip NaN ?

@juliettelavoie
Copy link
Contributor

I don't remember either...

@juliettelavoie juliettelavoie mentioned this issue Nov 16, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants