Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ability to track on one dataset, segment on another #242

Merged
merged 29 commits into from
Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
cfe3b67
added new functionality to transform points to track on one dataset a…
freemansw1 Feb 8, 2023
e4a0348
added documentation on transformation
freemansw1 Feb 9, 2023
b23536e
2D coordinate fix
freemansw1 Feb 9, 2023
67b3e63
fixes to time issues
freemansw1 Feb 9, 2023
c681991
Merge remote-tracking branch 'upstream/RC_v1.5.0' into lat_lon_transform
freemansw1 Feb 9, 2023
402421c
fixes to update tests to v1.5
freemansw1 Feb 9, 2023
870c189
Merge remote-tracking branch 'upstream/RC_v1.5.0' into lat_lon_transform
freemansw1 Mar 17, 2023
b06bbb7
Merge remote-tracking branch 'upstream/RC_v1.5.0' into lat_lon_transform
freemansw1 May 22, 2023
d4d1f67
merge in v1.5.0 changes
freemansw1 May 22, 2023
856bfc2
updates to tutorial notebook
freemansw1 May 22, 2023
55230ce
switch max time/space away defaults to 0
freemansw1 May 22, 2023
730f111
refactor: speed up query of tree in point transform
freemansw1 May 24, 2023
fa9c847
refactor: remove extra distance calculation
freemansw1 May 24, 2023
8a3364a
removed case sensitivity for "auto" on latitude/longitude name.
freemansw1 May 24, 2023
f195321
added transform_segmentation to side bar of documentation
freemansw1 May 24, 2023
5d19390
added ability to transform with 3D feature
freemansw1 May 24, 2023
56f65c7
consolidate internal testing
freemansw1 May 24, 2023
cdf62ac
updates to transform docstrings
freemansw1 May 24, 2023
cdb8637
updates to make time more robust
freemansw1 May 24, 2023
cafb8c8
Merge remote-tracking branch 'upstream/RC_v1.5.0' into lat_lon_transform
freemansw1 Jul 5, 2023
b8397aa
updating sat_radar_combined picture to not use green on green
freemansw1 Jul 5, 2023
c4e01e3
added warning for dropping features
freemansw1 Jul 5, 2023
1ca94f2
updates based on comments
freemansw1 Jul 5, 2023
dbc4fb2
switch "auto" to None for vertical coordinate
freemansw1 Jul 6, 2023
d916a9c
update checking notebooks workflow
freemansw1 Jul 6, 2023
c1b0ef1
adding s3fs explicitly to example requirements
freemansw1 Jul 7, 2023
e807bab
adding more comments to the utilities tests
freemansw1 Jul 7, 2023
44b6881
updates to deprecate "auto" as a coordinate name
freemansw1 Jul 7, 2023
c77396e
update to radar/satellite image
freemansw1 Jul 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/images/sat_radar_combined.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions doc/transform_segmentation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Track on one dataset, segment on another
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
----------------------------------------

*tobac* also has the capability to combine datasets through :doc:`segmentation`, which includes the ability to track on one dataset (e.g., gridded radar data) and run segmentation on a different dataset *on a different grid* (e.g., satellite data).

.. image:: images/sat_radar_combined.png
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
:width: 500 px

To do this, users should first run :doc:`feature_detection_overview` with a dataset that contains latitude and longitude coordinates, such that they appear in the output dataframe from Feature Detection. Next, use :func:`tobac.utils.transform_feature_points` to transform the feature dataframe into the new coordinate system. Finally, use the output from :func:`tobac.utils.transform_feature_points` to run segmentation with the new data.
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 2 additions & 1 deletion tobac/segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
"""

import logging
import numpy as np

import skimage
import numpy as np
Expand Down Expand Up @@ -588,7 +589,7 @@ def segmentation(

for i, field_i in enumerate(field_time):
time_i = field_i.coord("time").units.num2date(field_i.coord("time").points[0])
features_i = features.loc[features["time"] == time_i]
features_i = features.loc[features["time"] == np.datetime64(time_i)]
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
segmentation_out_i, features_out_i = segmentation_timestep(
field_i,
features_i,
Expand Down
33 changes: 33 additions & 0 deletions tobac/tests/test_internal_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import tobac.utils.internal as internal_utils
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
import numpy as np
import xarray as xr
import pytest


@pytest.mark.parametrize(
"lat_name, lon_name, lat_name_test, lon_name_test, expected_result",
[
("lat", "lon", "auto", "auto", ("lat", "lon")),
("lat", "long", "auto", "auto", ("lat", "long")),
("lat", "altitude", "auto", "auto", ("lat", None)),
("lat", "longitude", "lat", "longitude", ("lat", "longitude")),
],
)
def test_detect_latlon_coord_name(
lat_name, lon_name, lat_name_test, lon_name_test, expected_result
):
"""Tests tobac.utils.internal.detect_latlon_coord_name"""

in_arr = np.empty((50, 50))
lat_vals = np.empty(50)
lon_vals = np.empty(50)

in_xr = xr.Dataset(
{"data": ((lat_name, lon_name), in_arr)},
coords={lat_name: lat_vals, lon_name: lon_vals},
)
out_lat_name, out_lon_name = internal_utils.detect_latlon_coord_name(
in_xr["data"].to_iris(), lat_name_test, lon_name_test
)
assert out_lat_name == expected_result[0]
assert out_lon_name == expected_result[1]
79 changes: 79 additions & 0 deletions tobac/tests/test_utils.py
w-k-jones marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import pandas.testing as pd_test
import numpy as np
from scipy import fft
import xarray as xr


def lists_equal_without_order(a, b):
Expand Down Expand Up @@ -399,3 +400,81 @@ def test_combine_tobac_feats():
)
assert np.all(list(combined_feat["old_feat_column"].values) == [1, 1])
assert np.all(list(combined_feat["feature"].values) == [1, 2])


def test_transform_feature_points():
"""Tests tobac.utils.general.transform_feature_points"""

orig_feat_df_1 = tb_test.generate_single_feature(0, 95, max_h1=1000, max_h2=1000)
orig_feat_df_2 = tb_test.generate_single_feature(5, 105, max_h1=1000, max_h2=1000)

orig_feat_df = tb_utils.combine_tobac_feats([orig_feat_df_1, orig_feat_df_2])

orig_feat_df["latitude"] = orig_feat_df["hdim_1"]
orig_feat_df["longitude"] = orig_feat_df["hdim_2"]

test_lat = np.linspace(-25, 24, 50)
test_lon = np.linspace(90, 139, 50)
in_xr = xr.Dataset(
{"data": (("latitude", "longitude"), np.empty((50, 50)))},
coords={"latitude": test_lat, "longitude": test_lon},
)

new_feat_df = tb_utils.general.transform_feature_points(
orig_feat_df, in_xr["data"].to_iris()
)

assert np.all(new_feat_df["hdim_1"] == [25, 30])
assert np.all(new_feat_df["hdim_2"] == [5, 15])

# now test max space apart
test_lat = np.linspace(-49, 0, 50)
in_xr = xr.Dataset(
{"data": (("latitude", "longitude"), np.empty((50, 50)))},
coords={"latitude": test_lat, "longitude": test_lon},
)

new_feat_df = tb_utils.general.transform_feature_points(
orig_feat_df, in_xr["data"].to_iris(), max_space_away=20000
)

assert np.all(new_feat_df["hdim_1"] == [49])
assert np.all(new_feat_df["hdim_2"] == [5])

# now test max time apart
test_lat = np.linspace(-25, 24, 50)
in_xr = xr.Dataset(
{"data": (("time", "latitude", "longitude"), np.empty((2, 50, 50)))},
coords={
"latitude": test_lat,
"longitude": test_lon,
"time": [
datetime.datetime(2023, 1, 1, 0, 0),
datetime.datetime(2023, 1, 1, 0, 5),
],
},
)

orig_feat_df["time"] = datetime.datetime(2023, 1, 1, 0, 0, 5)
new_feat_df = tb_utils.general.transform_feature_points(
orig_feat_df,
in_xr["data"].to_iris(),
max_time_away=datetime.timedelta(minutes=1),
)
# we should still have both features, but they should have the new time.
assert np.all(new_feat_df["hdim_1"] == [25, 30])
assert np.all(new_feat_df["hdim_2"] == [5, 15])
assert np.all(
new_feat_df["time"]
== [datetime.datetime(2023, 1, 1, 0, 0), datetime.datetime(2023, 1, 1, 0, 0)]
)

orig_feat_df["time"] = datetime.datetime(2023, 1, 2, 0, 0)
new_feat_df = tb_utils.general.transform_feature_points(
orig_feat_df,
in_xr["data"].to_iris(),
max_time_away=datetime.timedelta(minutes=1),
)

assert np.all(new_feat_df["hdim_1"] == [])
assert np.all(new_feat_df["hdim_2"] == [])
1 change: 1 addition & 0 deletions tobac/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
get_spacings,
get_bounding_box,
combine_tobac_feats,
transform_feature_points,
)
from .mask import (
mask_cell,
Expand Down
135 changes: 131 additions & 4 deletions tobac/utils/general.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
"""General tobac utilities

"""
import copy
import logging
import sklearn.neighbors
from . import internal as internal_utils
import numpy as np
import sklearn
import sklearn.neighbors
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
import datetime


def add_coordinates(t, variable_cube):
Expand Down Expand Up @@ -72,7 +77,6 @@ def add_coordinates(t, variable_cube):
logging.debug("adding coord: " + coord)
# interpolate 2D coordinates:
if variable_cube.coord(coord).ndim == 1:

if variable_cube.coord_dims(coord) == (hdim_1,):
f = interp1d(
dimvec_1,
Expand All @@ -91,7 +95,6 @@ def add_coordinates(t, variable_cube):

# interpolate 2D coordinates:
elif variable_cube.coord(coord).ndim == 2:

if variable_cube.coord_dims(coord) == (hdim_1, hdim_2):
f = interp2d(dimvec_2, dimvec_1, variable_cube.coord(coord).points)
coordinate_points = [f(a, b) for a, b in zip(t["hdim_2"], t["hdim_1"])]
Expand All @@ -104,7 +107,6 @@ def add_coordinates(t, variable_cube):
# mainly workaround for wrf latitude and longitude (to be fixed in future)

elif variable_cube.coord(coord).ndim == 3:

if variable_cube.coord_dims(coord) == (ndim_time, hdim_1, hdim_2):
f = interp2d(
dimvec_2, dimvec_1, variable_cube[0, :, :].coord(coord).points
Expand Down Expand Up @@ -151,7 +153,6 @@ def add_coordinates(t, variable_cube):
def add_coordinates_3D(
t, variable_cube, vertical_coord="auto", assume_coords_fixed_in_time=True
):

"""Function adding coordinates from the tracking cube to the trajectories
for the 3D case: time, longitude&latitude, x&y dimensions, and altitude

Expand Down Expand Up @@ -553,3 +554,129 @@ def combine_tobac_feats(list_of_feats, preserve_old_feat_nums=None):

combined_df = combined_df.reset_index(drop=True)
return combined_df


@internal_utils.irispandas_to_xarray
def transform_feature_points(
features,
new_dataset,
latitude_name="auto",
longitude_name="auto",
max_time_away=datetime.timedelta(minutes=1),
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
max_space_away=20000,
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
):
"""Function to transform input feature dataset horizontal grid points to a different grid.
The typical use case for this function is to transform detected features to perform
segmentation on a different grid.

The existing feature dataset must have some latitude/longitude coordinates associated
with each feature, and the new_dataset must have latitude/longitude available with
the same name.
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
features: pd.DataFrame
Input feature dataframe
new_dataset: iris.cube.Cube or xarray
The dataset to transform the
latitude_name: str
The name of the latitude coordinate. If "auto", tries to auto-detect.
longitude_name: str
The name of the longitude coordinate. If "auto", tries to auto-detect.
max_time_away: datetime.timedelta
The maximum time delta to associate feature points away from.
max_space_away: float
The maximum horizontal distance (in meters) to transform features to.

Returns
-------
transformed_features: pd.DataFrame
A new feature dataframe, with the coordinates transformed to
the new grid, suitable for use in segmentation

"""
from .. import analysis as tb_analysis

lat_coord, lon_coord = internal_utils.detect_latlon_coord_name(
new_dataset, latitude_name=latitude_name, longitude_name=longitude_name
)

if lat_coord not in features or lon_coord not in features:
raise ValueError("Cannot find latitude and/or longitude coordinate")

lat_vals_new = new_dataset[lat_coord].values
lon_vals_new = new_dataset[lon_coord].values

if len(lat_vals_new.shape) != len(lon_vals_new.shape):
raise ValueError(
"Cannot work with lat/lon coordinates of unequal dimensionality"
)

# the lat/lons must be a 2D grid, so if they aren't, make them one.
if len(lat_vals_new.shape) == 1:
lon_vals_new, lat_vals_new = np.meshgrid(lon_vals_new, lat_vals_new)

# we have to convert to radians because scikit-learn's haversine
w-k-jones marked this conversation as resolved.
Show resolved Hide resolved
# requires that the input be in radians.
flat_lats = np.deg2rad(lat_vals_new.flatten())
flat_lons = np.deg2rad(lon_vals_new.flatten())
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved

# we have to drop NaN values.
either_nan = np.logical_or(np.isnan(flat_lats), np.isnan(flat_lons))
# we need to remember where these values are in the array so that we can
# appropriately unravel them.
loc_arr = np.arange(0, len(flat_lats), 1)
loc_arr_trimmed = loc_arr[~either_nan]
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
flat_lats_nona = flat_lats[~either_nan]
flat_lons_nona = flat_lons[~either_nan]
ll_tree = sklearn.neighbors.BallTree(
np.array([flat_lats_nona, flat_lons_nona]).T, metric="haversine"
)
new_h1 = list()
new_h2 = list()
# there is almost certainly room for speedup in here.
for index in features["index"]:
dist, closest_pt = ll_tree.query(
[
[
np.deg2rad(features["latitude"][index]),
np.deg2rad(features["longitude"][index]),
]
]
)
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
unraveled_h1, unraveled_h2 = np.unravel_index(
loc_arr_trimmed[closest_pt[0][0]], np.shape(lat_vals_new)
)

new_h1.append(unraveled_h1)
new_h2.append(unraveled_h2)
ret_features = copy.deepcopy(features)
ret_features["hdim_1"] = ("index", new_h1)
ret_features["hdim_2"] = ("index", new_h2)

# find where distances are too large and drop them.
new_lat = lat_vals_new[new_h1, new_h2]
new_lon = lon_vals_new[new_h1, new_h2]
dist_apart = (
tb_analysis.haversine(
new_lat, new_lon, features[lat_coord], features[lon_coord]
)
* 1000.0
)
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
ret_features = ret_features.where(dist_apart < max_space_away, drop=True)
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved

# force times to match, where appropriate.
if "time" in new_dataset.coords:
# this is necessary due to the iris/xarray/pandas weirdness that we have.
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
old_feat_times = ret_features["time"].astype("datetime64[s]")
closest_times = np.min(np.abs(old_feat_times - new_dataset["time"]), axis=1)
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
closest_time_locs = np.abs(old_feat_times - new_dataset["time"]).argmin(axis=1)
JuliaKukulies marked this conversation as resolved.
Show resolved Hide resolved
# force to seconds to deal with iris not accepting ms
ret_features["time"] = new_dataset["time"][closest_time_locs].astype(
"datetime64[s]"
)
ret_features = ret_features.where(
closest_times < np.timedelta64(max_time_away), drop=True
)
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved

return ret_features
43 changes: 43 additions & 0 deletions tobac/utils/internal.py
Original file line number Diff line number Diff line change
Expand Up @@ -662,3 +662,46 @@ def find_hdim_axes_3D_iris(field_in, vertical_coord=None, vertical_axis=None):
all_axes[np.logical_not(np.isin(all_axes, [time_axis, vertical_coord_axis]))]
)
return output_vals


@irispandas_to_xarray
def detect_latlon_coord_name(in_dataset, latitude_name="auto", longitude_name="auto"):
w-k-jones marked this conversation as resolved.
Show resolved Hide resolved
"""Function to detect the name of latitude/longitude coordinates

Parameters
----------
in_dataset: iris.cube.Cube, xarray.Dataset, or xarray.Dataarray
Input dataset to detect names from
latitude_name: str
The name of the latitude coordinate. If "auto", tries to auto-detect.
longitude_name: str
The name of the longitude coordinate. If "auto", tries to auto-detect.

Returns
-------
lat_name, lon_name: tuple(str)
the detected names of the latitude and longitude coordinates. If
coordinate is not detected, returns None for that coordinate.

"""
out_lat = None
out_lon = None
JuliaKukulies marked this conversation as resolved.
Show resolved Hide resolved
test_lat_names = ["lat", "latitude"]
test_lon_names = ["lon", "long", "longitude"]
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
if latitude_name != "auto":
if latitude_name in in_dataset.coords:
out_lat = latitude_name
else:
for test_lat_name in test_lat_names:
if test_lat_name in in_dataset.coords:
out_lat = test_lat_name
break
if longitude_name != "auto":
if longitude_name in in_dataset.coords:
out_lon = longitude_name
else:
for test_lon_name in test_lon_names:
if test_lon_name in in_dataset.coords:
out_lon = test_lon_name
break
freemansw1 marked this conversation as resolved.
Show resolved Hide resolved
return out_lat, out_lon