-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_geometry #889
set_geometry #889
Conversation
# Conflicts: # CHANGELOG.md
…really _are_ failing now
This reverts commit d6d2859.
# Conflicts: # CHANGELOG.md # climada/hazard/base.py # climada/hazard/centroids/centr.py # climada/hazard/centroids/test/test_centr.py # doc/tutorial/climada_hazard_Hazard.ipynb
# Conflicts: # climada/util/coordinates.py
Quick note: while the fix works, I think this is not a very forward-looking fix. Currently, the method assumes that the inputs are geodataframes without a geometry column, but instead latitude and longitude columns. This is what we are trying to remove from the codebase (done in centroids, ongoing in exposures). I suggest that we at least flag this method as one that should be refactored as soon as exposures have proper geodataframes. |
You don't need a temporary column. |
Er - are you sure? When running with a scheduler, dask.dataframe complains if there is no column to assign the values to. Isn't it? |
No need to flag this. Any occurrence of geodataframes with latitude columns will eventually show up in the list of failed tests. 😁 |
Ah, sorry. I have no idea about dask dataframes 🙈 For the single process using only GeoDataFrame, it should not be necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this! I think the CRS handling can be improved a bit.
However, I am wondering if the set_df_geometry_points
needs the bit about the scheduler at all. geopandas.points_from_xy
has become very efficient since Shapely v2.0, because the latter now uses multithreading. As the final geometry is anyway stored in a regular GeoDataFrame, and not in a distributed dask dataframe, I think we can safely remove the whole scheduler thing from set_df_geometry_points
and make the function much simpler.
The linter complains anyway that the scheduler part is not tested 😅
climada/util/coordinates.py
Outdated
try: | ||
crs = df_val.geometry.crs | ||
except AttributeError: | ||
crs = None | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any GeoDataFrame has the attribute .crs
. If it is not set, the attribute already returns None. So this can simply be
crs = df_val.crs if crs is None else crs # crs might now still be None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@peanutfun : I'm inclined to believe you but would like to separate that from this PR, as it is not related to #888 |
Changes proposed in this PR:
This PR fixes #888
PR Author Checklist
develop
)PR Reviewer Checklist