Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer CRS #440

Merged
merged 6 commits into from
Aug 8, 2024
Merged

Fix buffer CRS #440

merged 6 commits into from
Aug 8, 2024

Conversation

RondeauG
Copy link
Collaborator

@RondeauG RondeauG commented Aug 8, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes #xyz
  • (If applicable) Documentation has been added / updated (for bug fixes / features).
  • (If applicable) Tests have been added.
  • This PR does not seem to break the templates.
  • CHANGELOG.rst has been updated (with summary of main changes).
    • Link to issue (:issue:number) and pull request (:pull:number) has been added.

What kind of change does this PR introduce?

  • The buffer argument in _subset_shape needs to be in the same units as the shapefile, so it's simpler to always project the shapefile to WGS84 before the call to clisops, when using tile_buffer.
  • Also a small fix in ensure_time to accurately catch missing timesteps.

Does this PR introduce a breaking change?

  • This changes the behaviour, but this is a bug fix.

Other information:

@aulemahal
Copy link
Collaborator

@RondeauG Ton fix est le bon. Le changement se cache dans reindex qui est fait par convert_calendar lorsque missing est donné. Avant, reindex de 1 seul chunk ne changeait pas le nombre de chunks, mais maintenant, ça conserve la taille du chunk, donc si le reindex ajoute des points, il ajoute un chunk de même taille à la fin.

interpolate_na veut un seul chunk, donc chunk(time=-1) est la bonne solution.

Exemple:

ds = xr.open_zarr('xscen/docs/notebooks/_data/CMIP6_ScenarioMIP_NCC_NorESM2-MM_ssp245_r1i1p1f1_example-region.finer-grid.biasadjusted.day.zarr')

dsc = ds.convert_calendar('360_day', align_on='date',missing=0).chunk(time=-1)

dsc.convert_calendar('standard', missing=-9999, align_on='year').chunks
# dask 2024.08.0 :  Frozen({'time': (25550, 17), 'lon': (4,), 'lat': (5,)})
# dask avant :  Frozen({'time': (25567,), 'lon': (4,), 'lat': (5,)})

C'est dû au premier changement du changelog de dask : https://docs.dask.org/en/stable/changelog.html#highlights
Quand xarray fait un reindex, il indexe le array par une liste d'entiers. Si cette liste est plus longue que le array initial (i.e. qu'on reindexe par une coordonnée plus longue) le nouveau dask préserve les chunks alors que les anciens non. Selon le changelog, c'est maintenant plus efficace.

Exemple bas niveau:

import dask.array as dsk

a = dsk.array([0, 1, 2, 3])   # chunks = (4,)

a[[0, 0, 0, 0, 0, 0, 1]]  # donc plus d'indices dans le indexing que d'éléments
# Dask 2024.08.0 : chunks = (4, 3)
# Dask avantt : chunks = (7,)

@RondeauG
Copy link
Collaborator Author

RondeauG commented Aug 8, 2024

Merci pour ce travail de détective !

Copy link
Contributor

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This receives the @aulemahal approval as well

@RondeauG RondeauG merged commit 183d2e3 into main Aug 8, 2024
16 checks passed
@RondeauG RondeauG deleted the fixbuffer branch August 8, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants