-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize concatenation of centroids #989
base: develop
Are you sure you want to change the base?
Conversation
@NicolasColombi Thanks for coming up with this bugfix! If I understand correctly, the problem is that You solve the issue by introducing a new attribute and function, which I think is error-prone. I would suggest that def append(self, *centr):
for cc in centr:
if not u_coord.equal_crs(self.crs, cc.crs):
raise ValueError(
f"The given centroids use different CRS: {self.crs}, {cc.crs}. "
"The centroids are incompatible and cannot be concatenated."
)
self.gdf = pd.concat([self.gdf] + [cc.gdf for cc in centr]) I think this is a better solution because calling |
Hi @peanutfun, fine by me! With the way you propose we could even get rid completely of |
|
Not necessarily. It's a bit unclear, but
Arguably, |
Yes, that is why I thought that a single function could do both, but as you mentioned it is probably better to keep them separate, one of the reason being that all tests pass and that it shouldn't compromise the rest of the code this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the PR! It looks much cleaner now. Please fix the remaining small issues and update the PR description.
climada/hazard/centroids/centr.py
Outdated
"The centroids are incompatible and cannot be concatenated." | ||
) | ||
self.gdf = pd.concat([self.gdf, centr.gdf]) | ||
for cc in centr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the linter issue. cc
is too short, maybe use other
?
@@ -331,11 +331,16 @@ def from_pnt_bounds(cls, points_bounds, res, crs=DEF_CRS): | |||
} | |||
) | |||
|
|||
def append(self, centr): | |||
"""Append Centroids | |||
def append(self, *centr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a dedicated test for append
with multiple arguments
Co-authored-by: Lukas Riedel <[email protected]>
Co-authored-by: Lukas Riedel <[email protected]>
@peanutfun I added a test to verify that |
Co-authored-by: Lukas Riedel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this improvements, it makes a lot of sense and the performance gain is significant indeed! A few small comments but looks good to me overall.
climada/hazard/centroids/centr.py
Outdated
centr : Centroids | ||
Centroids to append. The centroids need to have the same CRS. | ||
centr : list | ||
List of Centroids to append. The centroids need to have the same CRS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since centr
is given as a *args
, it does not have to be a list but can be several centroids, no? Formulated differently, can this method be called these two ways with equivalent results or does only one way work?
centr.append(new_centrs1, new_centrs2)
centr.append([new_centrs1, new_centrs2])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, the code as currently written would work if you call either centr.append(new_centrs1, new_centrs2)
or centr.append(*[new_centrs1, new_centrs2])
but would not work calling centr.append([new_centrs1, new_centrs2]), since it wouldn't unpack the list into single centroid objects before the loop for other in centr
which checks the CRS.
@NicolasColombi and @peanutfun, should we maybe consider adapting the method so that it would also work when passing a list without *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Benoit, I think you are right, I will update the docs strings, it shouldn't be a list.
climada/hazard/centroids/centr.py
Outdated
others : list of Centroids | ||
Centroids contributing to the union. | ||
others : list | ||
List of Centroids contributing to the union. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above: if it's an *args
then it's not a list, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good to me, thanks a lot!
Just fixed some minor typos in the docstrings.
As @bguillod highlighted, there could be some inconsistency associated with how the centr
argument is passed to the method/how it is defined in the docstrings. I think we should either add a fix to support different ways of passing this argument or clarify the docstrings (if my understanding is correct here, happy to defer to your expertise @NicolasColombi & @peanutfun).
climada/hazard/centroids/centr.py
Outdated
centr : Centroids | ||
Centroids to append. The centroids need to have the same CRS. | ||
centr : list | ||
List of Centroids to append. The centroids need to have the same CRS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, the code as currently written would work if you call either centr.append(new_centrs1, new_centrs2)
or centr.append(*[new_centrs1, new_centrs2])
but would not work calling centr.append([new_centrs1, new_centrs2]), since it wouldn't unpack the list into single centroid objects before the loop for other in centr
which checks the CRS.
@NicolasColombi and @peanutfun, should we maybe consider adapting the method so that it would also work when passing a list without *
?
@sarah-hlsn exactly! I personally wouldn't modify it so that it can accept a list. |
Makes perfect sense, we can just adapt the docstrings as you say :) |
Update docstrings to clarify the type of argument of union and append
Co-authored-by: Sarah Hülsen <[email protected]>
Co-authored-by: Sarah Hülsen <[email protected]>
Co-authored-by: Sarah Hülsen <[email protected]>
May I merge ? 👍 @peanutfun |
Changes proposed in this PR:
Refer to #988 for detailed information on the profiling results.
Modify
hazard.append()
function inclimada/hazard/centroids/centr.py
so that concatenation of centroids is made only once, when all centroids has already been appended, instead of concatenating every time a centroid is appended (extremely time consuming)batch_gdf
in hazard.append()batch_gdf
finalize_append()
finalize_append()
to concatenate at the end of the appending processbatch_gdf
attributeThis PR fixes #988
Optimize the function
TropCyclone.from_tracks
from 53 minutes to 4.03 minutes.PR Author Checklist
develop
)PR Reviewer Checklist