Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TropCyclone.from_tracks() takes considerably longer in climada 5.0.0 #988

Open
sarah-hlsn opened this issue Dec 18, 2024 · 3 comments · May be fixed by #989
Open

TropCyclone.from_tracks() takes considerably longer in climada 5.0.0 #988

sarah-hlsn opened this issue Dec 18, 2024 · 3 comments · May be fixed by #989
Labels

Comments

@sarah-hlsn
Copy link
Collaborator

TropCyclone.from_tracks() takes orders of magnitude longer in climada 5.0.0 compared to 3.3.0 (all else being equal in terms of numbers of tracks, centroid resolution etc.)

Code example:

from climada.hazard import TCTracks, Centroids, TropCyclone, tc_tracks_synth
import numpy as np

# get tracks
tracks_SI_proba = TCTracks.from_ibtracs_netcdf(year_range=(1990, 2023), basin='SI')
tracks_SI_proba.equal_timestep(time_step_h=1)
tracks_SI_proba.calc_perturbed_trajectories(nb_synth_tracks=9, max_shift_ini=0.75, max_dspeed_rel=0.3)

# create centroids
min_lat, max_lat, min_lon, max_lon = -26, -12, 40, 54
resol = 10
grid = (np.mgrid[min_lat : max_lat : complex(0, resol),
min_lon : max_lon : complex(0, resol)]). \
reshape(2, resol*resol).transpose()
cent = Centroids.from_lat_lon(grid[:,0], grid[:,1])
cent.id = np.arange(cent.lat.size)

# Calculate windfields
tc_SI_proba = TropCyclone.from_tracks(tracks_SI_proba, centroids=cent) ## this is the line that takes so long
tc_SI_proba.check()
tc_SI_proba.plot_intensity(event=0, smooth=False);

Screenshots
If applicable, add screenshots to help explain your problem.
Screenshot 2024-12-18 at 16 18 34
Screenshot 2024-12-18 at 16 18 22

Climada Version: 5.0.0

System Information (please complete the following information):
Run on IAC hub

@sarah-hlsn sarah-hlsn added the bug label Dec 18, 2024
@NicolasColombi
Copy link
Collaborator

NicolasColombi commented Dec 18, 2024

Okay so, profiling the function shows that 84% of the time is spent at line 375 of tropical_cyclone.py:
haz = cls.concat(tc_haz_list) . In that function, 100% of the time is spend on line 1056: haz_concat.append(*haz_list). @emanuel-schmid did the appending of the hazard changed between 3.0.0 and 5.0.0 ?

Note: The profile results are run only with:

tracks_SI_proba = TCTracks.from_ibtracs_netcdf(year_range=(1990, 2023), basin='SI')

with out the following code, as it takes 1h per run... I will run the profiling for the slow version as well. But I would assume, maybe wrongly, that the behaviour will be the same.

tracks_SI_proba.equal_timestep(time_step_h=1)
tracks_SI_proba.calc_perturbed_trajectories(nb_synth_tracks=9, max_shift_ini=0.75, max_dspeed_rel=0.3)

@emanuel-schmid I can send you the profiling data per slack, github do not support the.Iprofformat

@NicolasColombi
Copy link
Collaborator

NicolasColombi commented Dec 19, 2024

Update: haz = cls.concat(tc_haz_list) is also the slowest line with the perturbed trajectories.

One step further (without perturbed trajectoires), in hazard.append() the bottle neck is line 916 of base.py:
centroids = Centroids.union(*[haz.centroids for haz in haz_list]). Here, is Centroid.union which takes time
--> centroids.append(cent) --> self.gdf = pd.concat([self.gdf, centr.gdf]) is the guilty one.
(At least without perturbed trajectoires)

climada/hazard/centroids/centr.py line 347:

    def append(self, centr):
        """Append Centroids

        Note that the result might contain duplicate points if the object to append has an overlap
        with the current object.

        Parameters
        ----------
        centr : Centroids
            Centroids to append. The centroids need to have the same CRS.

        Raises
        ------
        ValueError

        See Also
        --------
        union : Union of Centroid objects.
        remove_duplicate_points : Remove duplicate points in a Centroids object.
        """
        if not u_coord.equal_crs(self.crs, centr.crs):
            raise ValueError(
                f"The given centroids use different CRS: {self.crs}, {centr.crs}. "
                "The centroids are incompatible and cannot be concatenated."
            )
        self.gdf = pd.concat([self.gdf, centr.gdf])

@NicolasColombi
Copy link
Collaborator

I think I found a potential solution, I'll make a PR 🙃

@NicolasColombi NicolasColombi linked a pull request Dec 19, 2024 that will close this issue
13 tasks
@NicolasColombi NicolasColombi linked a pull request Dec 19, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants