Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major bug? get_land_geometry and tc_tracks.track_land_params #495

Closed
bguillod opened this issue Jun 27, 2022 · 18 comments
Closed

Major bug? get_land_geometry and tc_tracks.track_land_params #495

bguillod opened this issue Jun 27, 2022 · 18 comments
Labels

Comments

@bguillod
Copy link
Collaborator

I think I found a major bug in the assignment of TC track points to land or ocean, which occurs at least in tc_track_synth. This seems to be related to how the land geom is retrieved from the track extents. However I am afraid this might have implications elsewhere, too.

In essence, in climada.hazard.tc_tracks_synth, land_geom is retrieved using:

extent = tracks.get_extent()
land_geom = climada.util.coordinates.get_land_geometry(
    extent=extent, resolution=10
)

However, if I read global data as follows

tracks = TCTracks.from_ibtracs_netcdf(
  provider = 'usa'
)
extent = tracks.get_extent()
land_geom = u_coord.get_land_geometry(
    extent=extent, resolution=10
)

Then I get the following value for extent: (31.4, 373.6, -44.800000762939455, 70.79999694824218), and the following for land_geom:
image

Hence the extent of the land geometry that is being loaded only covers longitude from 31.4 to 180 degrees East. Looks like the underlying function get_land_geometry does not handle spherical coordinates.

The implication is that any track westward of 31.4 degrees East will be said not to be over land. The issue probably won't show up if I had loaded data only for the, e.g. North Atlantic basin, because then the extent would probably be within the [-180, +180] longitude range.

This is a major issue and it might apply to many other code instances than just for the TC synthetic tracks generation. My questions are then:

  • Is there a way for get_land_geometry to properly account for spherical coordinates? That would be the easiest way to fix. If not, then at least the function should raise an error if longitude values are outside of [-180, +180] (and since we're at it let's also make sure latitude never goes outside of [-90,90], although this case is very unlikely).
  • Where else in the code might this issue lead to mistakes, too?
@bguillod bguillod added the bug label Jun 27, 2022
@bguillod
Copy link
Collaborator Author

As an add-on, I think generally the function coord_on_land might not do what's expected if the tracks longitude are normalized and go outside of [-180, +180], which for instance is the case for tracks that cross the anti-meridian in from_ibtracs_netcdf I believe. If get_land_geometry cannot easily be fixed to account for this, then probably coord_on_land must re-normalize longitude values back to [-180, +180] degrees within this function.

@chahank
Copy link
Member

chahank commented Jul 21, 2022

@bguillod : if you have worked on this, it would be great if you could share the proposal in the form of a pull request.

@bguillod
Copy link
Collaborator Author

Since I'm working on improvements to the TC random walk model I will make sure to include a fix within the same PR. Fine for you @chahank?

@chahank
Copy link
Member

chahank commented Jul 27, 2022

Yes, thank you very much!

In case you modify the helper methods in the utils, it might however be easier to review if you then make separate pull requests for the utils and for the TC elements.

@bguillod
Copy link
Collaborator Author

Sounds good, if I see changes to utils are needed I'll make a separate PR.

@bguillod
Copy link
Collaborator Author

Hi @tovogt,

On this one I think you might know a very simple way to solve it. I recall in some discussion previously with a similar problem in a different context you mentioned you can shift the longitude range in shapely object or so. Would that be applicable here?

For a more concrete example: if I load all IBTraCS data from 1980 to 220 I get the following extent:

extent = tracks_gaps.get_extent()
extent
# output: (31.4, 373.6, -44.800000762939455, 70.79999694824218)

Would it be possible to do something like:

land_geom = u_coord.get_land_geometry(
    extent=(-180, 180, extent[2], extent[3]), resolution=10
)

and then split land_geom, shift longitude and re-merge so I then get land_geom within longitude range (31.4, 373.6)?

That would be much more efficient than what I'm trying to implement right now: in tc_track.track_land_params something like:

lon_0 = track.lon.values.copy()
u_coord.lon_normalize(lon_0)
track['on_land'] = ('time',
                    u_coord.coord_on_land(track.lat.values, lon_0, land_geom))
track['dist_since_lf'] = ('time', _dist_since_lf(track))

What do you think? This could be implemented directly into u_coord.get_land_geometry so that anytime someone in climada uses that function with extent bound crossing the anti-meridian the land geom returned is within the same bounds as the input parameter asks for.

Although, thinking about it twice: the extent of tracks does not mean that all track points are within that extent, so my adjustment would still likely be needed. Unless we also adjust u_coord.coord_on_land so the adjustment is happening there...

What do you think?

If you don't have time or opinion just let me know and I'll see how I go about it.

@bguillod
Copy link
Collaborator Author

bguillod commented Jul 29, 2022

Look at this for instance, both points within Fiji, one on each side of the anti-meridian:

lon_test = np.array([179.18, 180.05])
lat_test = np.array([-16.56, -16.85])

(
    u_coord.coord_on_land(lat = lat_test, lon = lon_test),
    u_coord.coord_on_land(lat = lat_test, lon = u_coord.lon_normalize(lon_test))
)

output:

(array([ True, False]), array([ True,  True]))

We can blame shapely for not handling spherical coordinates, but that library is very clear about this. I think we need to find a way to ensure these kind of things don't cause mistakes in climada's calculations, because currently they likely do, and this at many places in the code. Best would probably to handle this robustly in all u_coord functions with checks on extents compatibility and corrections in case they don't match.

@tovogt
Copy link
Collaborator

tovogt commented Jul 29, 2022

Thanks, this is indeed a serious bug that is quite easy to solve. The solution is to store the land geometries in a GeoDataFrame and then rewrap the longitudes according to the central longitude as extracted from the given extent.

I would suggest to rewrite the current implementation of get_land_geometries by starting off with gpd.read_file instead of using the shapereader.Reader (cf. the current implementation of get_country_geometries). The important ingredient is then to use GeoPandas' to_crs method together with a "lon_wrap" parameter:

df = gpd.read_file(shp_file)
if country_names is not None:
    country_mask = np.isin(df[['ISO_A3', 'WB_A3', 'ADM0_A3']].values, country_names).any(axis=1)
    df = df[country_mask]
if extent is not None:
    lon_mid = 0.5 * (extent[0] + extent[1])
    # we don't really change the CRS when rewrapping, so we reset the CRS attribute afterwards
    df = df.to_crs({"proj": "longlat", "lon_wrap": lon_mid}).set_crs(DEF_CRS, allow_override=True)
    bbox = Polygon([
        (extent[0], extent[2]),
        (extent[0], extent[3]),
        (extent[1], extent[3]),
        (extent[1], extent[2])
    ])
    bbox = gpd.GeoSeries(bbox, crs=DEF_CRS)
    bbox = gpd.GeoDataFrame({'geometry': bbox}, crs=DEF_CRS)
    df = gpd.overlay(df, bbox, how="intersection")
return df.geometry.unary_union

I wrote the code from scratch without testing, so please check it thoroughly. Please also consider applying the same fix to the function get_country_geometries while you're at it.

@bguillod
Copy link
Collaborator Author

Thanks @tovogt. I'll give it a try.

However, since the climada.hazard.tracks.TCTracks.get_extent() method does not imply that all individual tracks are within the bounds, we'd still need to normalize longitude for each track in climada.hazard.tracks.track_land_param().
Can you just confirm that this is correct?

Of course solving the issue in get_land_geometries and get_country_geometries is needed in any case.

@tovogt
Copy link
Collaborator

tovogt commented Jul 29, 2022

However, since the climada.hazard.tracks.TCTracks.get_extent() method does not imply that all individual tracks are within the bounds, we'd still need to normalize longitude for each track in climada.hazard.tracks.track_land_param(). Can you just confirm that this is correct?

The individual tracks are not within the bounds, but all track points are within the bounds. The only thing that might go wrong is that a track segment between two subsequent track points might not lie completely within the bounds. But I don't think that this is relevant for the track_land_param function.

@bguillod
Copy link
Collaborator Author

The individual tracks are not within the bounds, but all track points are within the bounds. The only thing that might go wrong is that a track segment between two subsequent track points might not lie within the bounds. But I don't think that this is relevant for the track_land_param function.

I'm not sure I agree. In from_ibtracs_netcdf I see:

lons = ibtracs_ds[tc_var].values.copy()
lon_valid_mask = np.isfinite(lons)
lons[lon_valid_mask] = u_coord.lon_normalize(lons[lon_valid_mask], center=0.0)
ibtracs_ds[tc_var].values[:] = lons
# Make sure that the longitude is always chosen positive if a track crosses the
# antimeridian:
crossing_mask = ((ibtracs_ds[tc_var] > 170).any(dim="date_time")
& (ibtracs_ds[tc_var] < -170).any(dim="date_time")
& (ibtracs_ds[tc_var] < 0)).values
ibtracs_ds[tc_var].values[crossing_mask] += 360

Doesn't that imply that any track not crossing the anti-meridian is within [-180,+180] but tracks crossing it are more like within [0,360]? So if one track has all longitude values within [-179.9, 0] and another has longitude range [-200, 160], wouldn't the first track keep its longitude range and the second one have range [160, 220]? Thereby the total real exent would be (on a plane) [-179.9, 220], but the get_extent method handels this and would provide something like [160, 220]?

BTW, currently in get_land_geometries if country_names is provided then extent is not used. Is this intended? If so I think this should be documented.

I'm not sure in which use case one would have to retain only certain countries AND crop them to an extent, but making that clear would be cleaner.

@tovogt
Copy link
Collaborator

tovogt commented Jul 29, 2022

I'm not sure I agree. In from_ibtracs_netcdf I see:

I think we are talking about different things: I say that all track points actually lie, geographically, within the extent returned by get_extent. It seems to me like you are saying that the lat-lon-representation of track points stored in TCTracks might not always lie within the (numerical) extent returned by get_extent. I think we are both right. But I don't understand how any of those observations might cause any harm.

BTW, currently in get_land_geometries if country_names is provided then extent is not used. Is this intended? If so I think this should be documented.

That's true and I don't think that it is intended. It should work the same as in get_country_geometries.

I'm not sure in which use case one would have to retain only certain countries AND crop them to an extent, but making that clear would be cleaner.

That's a different question. You basically say that if a user provides both the country_names and the extent kwarg than it's most likely not intentional. I would say that it might happen. For example, the user might define a rectangular extent around the Danube river, but is only interested in those parts of that territory which are within the European Union, so they might additionally provide a list of all EU member states.

@bguillod
Copy link
Collaborator Author

Excellent, yes we do agree on everything now. Many thanks!

And since get_land_geometry does essentially the same thing as get_country_geometries but just return the unary_union of the geometry, I will simply adjust this in the latter and call it from the former. If you don't agree please just shout ;-)

@tovogt
Copy link
Collaborator

tovogt commented Jul 29, 2022

Yes, there is a lot of redundancy in the two functions, and I would very much appreciate if you remove that. However, make sure to take the best from both implementations: get_land_geometry not only considers the ISO_A3 field, but also some other fields. On the other hand, get_country_geometries deals with some gaps in natural earth.

@bguillod
Copy link
Collaborator Author

Will do. Your approach does not work as well as thought because shapely is bad at splitting shapes:
image
I'm working on splitting the extent into two boxes if it crossed the anti-meridian, that should do it.

@bguillod
Copy link
Collaborator Author

Ok so I've made an attempt which I think works well in PR #524 . Please have a look. I've already added a small test but feel free to add some or suggests further tests.

@tovogt
Copy link
Collaborator

tovogt commented Aug 1, 2022

Yes, true. The intersection needs to be done in the [-180, 180] range, with two bounding boxes, if necessary. Then the lon_wrap is applied afterwards to the result. Thanks for double-checking!

@bguillod
Copy link
Collaborator Author

bguillod commented Aug 8, 2022

Fixed in #524.

@bguillod bguillod closed this as completed Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants