Support lazy-loaded EK80 broadband-complex data #1311

leewujung · 2024-04-24T15:27:31Z

This PR addresses the issues with for "Indexing with a boolean dask array is not allowed" encountered when calibrating EK80 complex data that is lazy-loaded.

This issue essentially calls for forcing to compute in the places where such indexing occurs. The cases I've seen (and "fixed") so far are using boolean indexing to select channels (based on data from the first ping), so the impact on performance is likely small. Right now this is fine since we restrict the use case to when the transmit signals are identical across all ping_time (but can vary across channel). Performance issue may arise when we want to accommodate transmit signal change across ping _time (#925).

I also adjusted one place in the convolution code to allow dask to function in parallel, by rechunking the range_sample dimension into 1 chunk.

Note: this is not yet ready for review.

for more information, see https://pre-commit.ci

… into dask_bool_index

leewujung · 2024-04-24T17:36:55Z

The failed tests are a little weird. I've just looked into one of them: test_ek80_BB_power_from_complex. My changes didn't actually change the type of operations being performed on the data, but the convolution output seems different.

compute convolution channels together

ctuguinay · 2024-05-24T20:59:21Z

@leewujung if everything here looks good, could you merge this?

leewujung · 2024-05-25T05:24:46Z

echopype/calibrate/ek80_complex.py

-        replica = np.flipud(np.conj(tx))
+        replica_dict[str(channel.values)] = np.flipud(np.conj(tx))
+    # Concatenate backscatter channels with dropped NaN beam dimensions.
+    backscatter_NaN_beam_drop_all = xr.concat(backscatter_NaN_beam_drop_all, dim="channel")


Question: I think the original intention to dropna with dim="beam" is to remove the padded sections along the range_sample dimension, since some channels contain data that are shorter. So, if you dropna for a particular channel, but other channels are longer, wouldn't this xr.concat recreate the dropped NaN values?

Oh yeah, I think you're right. Let me try a toy example

Code:

data = np.array([ [np.nan, np.nan], [4, 5], [np.nan, 6], [8, 9] ]) da = xr.DataArray( data, dims=["channel", "beam"], coords={ "channel": np.arange(4), "beam": np.arange(2) }) dropped_nan_beam_da_list = [] for channel_index in range(len(da["channel"])): dropped_da = da.isel(channel=channel_index).dropna("beam", how="all") print(f"`dropped_da` shape (chan idx {channel_index}): {dropped_da.shape[0]}") dropped_nan_beam_da_list.append(dropped_da) print("concatenated da:", xr.concat(dropped_nan_beam_da_list, dim="channel"))

Output:

`dropped_da` shape (chan idx 0): 0 `dropped_da` shape (chan idx 1): 2 `dropped_da` shape (chan idx 2): 1 `dropped_da` shape (chan idx 3): 2 concatenated da: <xarray.DataArray (channel: 4, beam: 2)> Size: 64B array([[nan, nan], [ 4., 5.], [nan, 6.], [ 8., 9.]]) Coordinates: * beam (beam) int64 16B 0 1 * channel (channel) int64 32B 0 1 2 3

So yeah you're right, this padding does happen. Let me tinker with the implementation that's currently in this PR. There's probably a clever way skip to skip this concat.

But I guess even if we can skip this concat, this concat and padding happens anyway during the apply_ufunc so that the final array that comes out of apply_ufunc could have all-NaN beam vectors. The benefit of the old implementation where we looped through channels was that we could skip having to convolve on all-NaN beam vectors, and the tradeoff was that the operation could not be delayed. So tricky 😆

Oh, I think your code above is missing the range_sample dimension, which is where that padding happens.

But yeah, I think the easiest is to just not dropna here. We have 2 options:

try drop the NaNs in the _convolve_per_channel function, since there you do have a loop that is channel by channel. The only thing is that you have to drop the NaN and add them back to ensure the output length is the same

just pay the computational price

Maybe you could try seeing how the speed differ for these 2 approaches? It is possible that with all the optimization people have put into convolution, the speed difference is small (and hopefully the memory consumption is reasonable too).

That sounds good, I'll try it out. I have a suspicion that the 2nd option of just not dropping NaNs might be faster, since the additional time needed to convolve on arrays that contain a lot of NaNs may be faster than the additional time needed to remove and add back NaNs, because the writers of the convolve operation made it fast for arrays with NaNs (I think). I could be completely wrong though haha

Yeah, I agree, it very possibly would be that way!

ctuguinay · 2024-05-29T03:44:19Z

Recent changes will come from: leewujung#37

No Longer Drop Beam in Pulse Compression

codecov-commenter · 2024-05-29T04:02:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.19%. Comparing base (9f56124) to head (3f230ed).
Report is 53 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1311      +/-   ##
==========================================
+ Coverage   83.52%   89.19%   +5.67%     
==========================================
  Files          64       11      -53     
  Lines        5686      990    -4696     
==========================================
- Hits         4749      883    -3866     
+ Misses        937      107     -830

Flag	Coverage Δ
unittests	`89.19% <100.00%> (+5.67%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

echopype/calibrate/ek80_complex.py

leewujung · 2024-06-10T21:00:11Z

echopype/tests/calibrate/test_calibrate_ek80.py

-    if dask_array:
-        ep_vals = ds_Sv["Sv"].sel(channel=ch_sel).squeeze().data.compute()
-    else:
-        ep_vals = ds_Sv["Sv"].sel(channel=ch_sel).squeeze().data
+    ep_vals = ds_Sv["Sv"].sel(channel=ch_sel).squeeze().data.compute()


Does this mean that right now with the dask_array=False test case the data is still lazy-loaded? Asking because I seem to remember the original intention of this is to see if lazy-loading works explicitly, but I could also just be misremembering things.

Ah yeah, it's because of the .chunk({"range_sample": -1, "channel": -1}) that's now in the convolution. Let me make this chunk conditional on whether or not the backscatter variable is already chunked.

ctuguinay · 2024-06-10T22:43:18Z

@leewujung Thanks! Should be ready for another short review haha

leewujung · 2024-06-11T05:17:48Z

Awesome! Thanks @ctuguinay, this looks great. I'll merge it now.

fiddles to get cal code work with dask array

95cd2e6

leewujung added the enhancement This makes echopype better label Apr 24, 2024

leewujung added this to the v0.9.0 milestone Apr 24, 2024

pre-commit-ci bot and others added 3 commits April 24, 2024 15:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

2cb0916

for more information, see https://pre-commit.ci

revert accidentally pushed test change

25d90f3

Merge branch 'dask_bool_index' of https://github.com/leewujung/echopype…

e253376

… into dask_bool_index

leewujung mentioned this pull request Apr 24, 2024

Enable Start to End Lazy Loaded / Delayed Computation for compute_Sv #1312

Closed

3 tasks

ctuguinay and others added 3 commits April 25, 2024 17:51

compute channels together

0058454

add rechunk message

4e2ca87

Merge pull request #36 from ctuguinay/compute_channels_together

73aceec

compute convolution channels together

ctuguinay marked this pull request as ready for review May 23, 2024 20:04

leewujung commented May 25, 2024

View reviewed changes

ctuguinay marked this pull request as draft May 25, 2024 06:01

ctuguinay added 2 commits May 29, 2024 03:39

remove drop beam

8e01556

fix comment

109a23e

ctuguinay mentioned this pull request May 29, 2024

No Longer Drop Beam in Pulse Compression leewujung/echopype#37

Merged

ctuguinay and others added 2 commits May 29, 2024 03:45

fix comment

787c9e3

Merge pull request #37 from ctuguinay/remove_drop_beam

889c955

No Longer Drop Beam in Pulse Compression

ctuguinay marked this pull request as ready for review May 29, 2024 19:26

leewujung commented Jun 10, 2024

View reviewed changes

echopype/calibrate/ek80_complex.py Outdated Show resolved Hide resolved

leewujung commented Jun 10, 2024

View reviewed changes

echopype/calibrate/ek80_complex.py Outdated Show resolved Hide resolved

leewujung commented Jun 10, 2024

View reviewed changes

incorporate wu-jung's suggestions

3f230ed

leewujung merged commit 46b6296 into OSOceanAcoustics:main Jun 11, 2024
5 checks passed

leewujung deleted the dask_bool_index branch July 21, 2024 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support lazy-loaded EK80 broadband-complex data #1311

Support lazy-loaded EK80 broadband-complex data #1311

leewujung commented Apr 24, 2024 •

edited

Loading

leewujung commented Apr 24, 2024

ctuguinay commented May 24, 2024

leewujung May 25, 2024

ctuguinay May 25, 2024 •

edited

Loading

ctuguinay May 25, 2024 •

edited

Loading

ctuguinay May 25, 2024

ctuguinay May 25, 2024 •

edited

Loading

leewujung May 25, 2024

ctuguinay May 25, 2024 •

edited

Loading

leewujung May 25, 2024

ctuguinay commented May 29, 2024

codecov-commenter commented May 29, 2024 •

edited

Loading

leewujung Jun 10, 2024

ctuguinay Jun 10, 2024

ctuguinay commented Jun 10, 2024

leewujung commented Jun 11, 2024

Support lazy-loaded EK80 broadband-complex data #1311

Support lazy-loaded EK80 broadband-complex data #1311

Conversation

leewujung commented Apr 24, 2024 • edited Loading

leewujung commented Apr 24, 2024

ctuguinay commented May 24, 2024

leewujung May 25, 2024

Choose a reason for hiding this comment

ctuguinay May 25, 2024 • edited Loading

Choose a reason for hiding this comment

ctuguinay May 25, 2024 • edited Loading

Choose a reason for hiding this comment

ctuguinay May 25, 2024

Choose a reason for hiding this comment

ctuguinay May 25, 2024 • edited Loading

Choose a reason for hiding this comment

leewujung May 25, 2024

Choose a reason for hiding this comment

ctuguinay May 25, 2024 • edited Loading

Choose a reason for hiding this comment

leewujung May 25, 2024

Choose a reason for hiding this comment

ctuguinay commented May 29, 2024

codecov-commenter commented May 29, 2024 • edited Loading

Codecov Report

leewujung Jun 10, 2024

Choose a reason for hiding this comment

ctuguinay Jun 10, 2024

Choose a reason for hiding this comment

ctuguinay commented Jun 10, 2024

leewujung commented Jun 11, 2024

leewujung commented Apr 24, 2024 •

edited

Loading

ctuguinay May 25, 2024 •

edited

Loading

ctuguinay May 25, 2024 •

edited

Loading

ctuguinay May 25, 2024 •

edited

Loading

ctuguinay May 25, 2024 •

edited

Loading

codecov-commenter commented May 29, 2024 •

edited

Loading