Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more comprehensive tests to consolidate.add_latlon #769

Closed
leewujung opened this issue Aug 4, 2022 · 10 comments
Closed

Add more comprehensive tests to consolidate.add_latlon #769

leewujung opened this issue Aug 4, 2022 · 10 comments
Assignees
Labels
enhancement This makes echopype better tests
Milestone

Comments

@leewujung
Copy link
Member

This is a leftover to #749 , in which we added a new function to interpolate lat-lon locations from the EchoData["Platform"] group to calibrated Sv data. That PR includes a basic test to run add_latlon for an EK60 file and check for existence of the latitude and longitude variables.

We still need tests to cover the following cases:

  • Data with empty latitude and longitude variables in the Platform group
  • Data with a single point latitude and longitude point in the Platform group
  • Checking the added (interpolated) lat/lon values between the add_latlon function and directly computed from scipy.interpolate
@leewujung
Copy link
Member Author

This will ensure #730 runs through without problem.

@emiliom
Copy link
Collaborator

emiliom commented Mar 20, 2023

@leewujung in add_location, the time1 coordinate variable is dropped at the very end:

return interp_ds.drop_vars("time1")

I've looked closely at a couple of sample datasets. For two EK60 datasets, after compute_Sv, there are no variables with a time1 dimension. The only other time dimension in use besides ping_time is time3, used with water_level. But for an AZFP dataset, after compute_Sv, water_level uses time1, not time3. Off the top of my head I don't remember which time dimension water_level is supposed to have or whether it'd be the same regardless of instrument. If water_level having a time1 dimension is in fact valid, then it shouldn't be dropped for AZFP.

I also investigated the result of interpolation. For some reason that I haven't investigated, the first interpolated value appears to always be nan (in two EK60 example files). For now, I'm going to modify add_location and set the first value of interpolated latitude and longitude equal to the second value. We can discuss here or in the PR that I'll submit shortly.

@emiliom emiliom moved this from Todo to In Progress in Echopype Mar 20, 2023
@leewujung
Copy link
Member Author

For I looked at the docs and couldn't completely remember what's up with our various time* dimensions, but I think within a single group the time* changes depending on the source of data (since they can be on different time base), which varies depending on the sonar_model.

I found this issue #656 that documents what those time* are. I think we should replicate those in the docs.

You can see that water_depth is called out here to be aligned with time3.

AZFP should not have a water_depth variable stored by default, see code below -- which data file were you looking at?

ds = xr.Dataset(
{
"tilt_x": (["time2"], unpacked_data["tilt_x"]),
"tilt_y": (["time2"], unpacked_data["tilt_y"]),
**{
var: ([], np.nan, self._varattrs["platform_var_default"][var])
for var in [
"MRU_offset_x",
"MRU_offset_y",
"MRU_offset_z",
"MRU_rotation_x",
"MRU_rotation_y",
"MRU_rotation_z",
"position_offset_x",
"position_offset_y",
"position_offset_z",
"transducer_offset_x",
"transducer_offset_y",
"transducer_offset_z",
"vertical_offset",
"water_level",
]
},
},

@emiliom
Copy link
Collaborator

emiliom commented Mar 21, 2023

I looked at the AZFP file that I've used in test_consolidate_integration.py: "azfp/17082117.01A" with XML "azfp/17041823.XML". This file is used in other tests, too. A water_level (not water_depth) variable is present but it contains all nan's. Looking more closely, it appears that it's an outcome of update_platform (with external data), so it's possible that the presence of a time1 dimension in the water_level variable (which is empty, anyways) may be an artifact or bug in echodata.update_platform. Worth following up later in a separate issue, but now it's looking less central to how consolidate.add_latlon behaves.

@emiliom
Copy link
Collaborator

emiliom commented Mar 21, 2023

This is outside the scope of this issue, but how about we add an option to calibrate.compute_Sv that would run consolidate.add_latlon if Platform lat-lon data are present in the echodata object? That could even be the default behavior that could be turned off with an argument like add_latlon=False.

I can open a new issue if this idea has legs. I don't know if it's something we'd have the bandwidth to fit into 0.7.0.

@leewujung
Copy link
Member Author

Hmm.. I think that may be meddling too much with the workflow side of things -- but, interestingly, I was pondering about the scenario for adding split-beam angle for broadband data. There, one of the computing step (and in fact an expensive one to do pulse compression) is the same, so to calibrate first and then to compute the split-beam angle later is doing the works twice. There I could see adding an option flag to do both together. In the case of add_latlon, since the computation is completely separate, I suggest that we keep this option on the workflow side.

@emiliom
Copy link
Collaborator

emiliom commented Mar 23, 2023

In the case of add_latlon, since the computation is completely separate, I suggest that we keep this option on the workflow side.

Sounds good, at least at this stage. I'll close this issue now, since its goals were met in PR #1000

@emiliom
Copy link
Collaborator

emiliom commented Mar 23, 2023

I've created a new issue (#1003) to follow up on the issue of the time* dimension in the water_level variable.

@emiliom
Copy link
Collaborator

emiliom commented Mar 24, 2023

I'm reopening this issue, temporarily ...

In PR #1000 I added several new consolidate.add_latlon tests to test_consolidate_integration.py, including two new test datasets (including one AZFP). One thing I did not add was an EK80 test file. While running consolidate.add_latlon on two EK80 test files just now, I ran into the same error during the interpolation step (return position_var.interp(time1=ds["ping_time"])):

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

The two files are ek80_bb_with_calibration/2018115-D20181213-T094600.raw and ek80/Summer2018--D20180905-T033113.raw. Examining the data a bit, I found two factors in common among those two files:

  • Their time-varying lat-lon locations are effectively at a single point, +/- very small floating-point variations.
  • They span a tiny time range, of a couple of seconds.

The error is actually emitted by Pandas. In searching for this error message, I see some references to problems with datetimes. I wasted a lot of time thinking the problem might be with the interpolation of near-constant values, but I now think it's somehow related to the tiny datetime range.

I'm going to set this aside as an edge case that'll take too much more effort to track down for 0.7.0. I'll open a new issue later.

@emiliom
Copy link
Collaborator

emiliom commented Mar 25, 2023

I'll close this again, now that I've opened #1009

@emiliom emiliom closed this as completed Mar 25, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Echopype Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This makes echopype better tests
Projects
Status: Done
Development

No branches or pull requests

2 participants