Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose func arg and skip_na to compute_MVBS users #1269

Merged

Conversation

ctuguinay
Copy link
Collaborator

@ctuguinay ctuguinay commented Feb 29, 2024

PR for #1268

In addition, modified test_compute_MVBS_values to include a test for adding NaNs to the ds_Sv to illustrate how that would affect the computation of compute_MVBS using both func="nanmean" and func="mean".

@ctuguinay ctuguinay self-assigned this Feb 29, 2024
@ctuguinay ctuguinay added enhancement This makes echopype better feature request labels Feb 29, 2024
@ctuguinay ctuguinay changed the title add func arg add func arg to compute_MVBS Feb 29, 2024
@ctuguinay ctuguinay requested a review from leewujung February 29, 2024 22:27
@ctuguinay ctuguinay changed the base branch from main to dev March 1, 2024 03:57
@ctuguinay ctuguinay changed the base branch from dev to main March 1, 2024 03:59
@ctuguinay ctuguinay modified the milestone: 0.8.3 Mar 1, 2024
@ctuguinay ctuguinay changed the title add func arg to compute_MVBS expose func arg and skip_na to compute_MVBS users Mar 6, 2024
@leewujung leewujung added this to the v0.8.4 milestone Mar 14, 2024
Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ctuguinay : Thanks for the investigation and changes! I was thinking through the use of func and skipna, and am thinking that we should probably avoid the confusion (which stemmed from the way xarray_reduce of using func and skipna at the same time.

For compute_MVBS, since it is supposed to be a mean operation, we do not have to allow for other func, but just need the skipna parameter. The same for compute_raw_MVBS. The place where we really need the func is _groupby_x_along_channels. So, I think we can have:

  • compute_MVBS and compute_raw_MVBS will only have skipna as a parameter
  • _groupby_x_along_channels have both func and skipna, and since it is under the hood, we can make sure func="nanmean"+skipna=True or func="mean"+skipna=False are used with it.

The compute_NASC computation comes to mind naturally, that we could allow the same skipna argument, and under the hood use either func="nansum"+skipna=True or func="sum" + skipna=False

Thoughts?

@ctuguinay
Copy link
Collaborator Author

Hey @leewujung, thanks for the review!

I agree with you on the implementation details. I also think that not all combinations should be allowed (ie the combinations func = nansum and skipna = False is a bit contradictory) and I was gonna add some restriction as to the combinations that the user could pass in, but this solution is much cleaner.

Would you also like me to make this same change of exposing skipna and using func under the hood to compute_NASC?

@leewujung
Copy link
Member

Would you also like me to make this same change of exposing skipna and using func under the hood to compute_NASC?

Sounds good if you could add this for compute_NASC in this PR also. Thanks!

@ctuguinay
Copy link
Collaborator Author

ctuguinay commented Mar 27, 2024

Hey @leewujung, I made the changes, but only the changes for compute_MVBS.

I do have a few questions on the compute_NASC part of this PR:

In compute_raw_NASC, the operation change should only be in the setting of mean/nanmean in the sv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...) right? Not sum/nansum? Since sv_mean and not sv_sum is being calculated here. Let me know if this implementation is incorrect, since your comment above said nansum /sum.

But, I do also see the sum operations being used to calculate h_mean_denom and h_mean_num and I'm not sure if in those computations I should do anything with nansum. The same applies with ds_ping_time and it using nanmean currently. Should these three computations' aggregate functions interact at all with the skipna being passed into compute_raw_NASC?

@leewujung
Copy link
Member

In compute_raw_NASC, the operation change should only be in the setting of mean/nanmean in the sv_mean = _groupby_x_along_channels(...,func=func="nanmean" if skipna else "mean",...) right? Not sum/nansum? Since sv_mean and not sv_sum is being calculated here. Let me know if this implementation is incorrect, since your comment above said nansum /sum.

Oh right, you're right that it would still be mean/nanmean in the compute_NASC function the way it is implemented.

But, I do also see the sum operations being used to calculate h_mean_denom and h_mean_num and I'm not sure if in those computations I should do anything with nansum. The same applies with ds_ping_time and it using nanmean currently. Should these three computations' aggregate functions interact at all with the skipna being passed into compute_raw_NASC?

For h_mean_* and ds_ping_time, I think we can hard-code it to use nanmean/nansum and skipna=False, because:

  • the h_mean_* provides the "height" or the steps in the integral, so NaN comes from the Sv side but not the steps
  • ds_ping_time is used to come up with a new coordinate along the time or distance dimension, so even if the values end up being NaN, the coordinate should not be

@ctuguinay
Copy link
Collaborator Author

Sounds good, thanks!

@ctuguinay
Copy link
Collaborator Author

ctuguinay commented Apr 3, 2024

This one is taking me a bit since I am getting some weird NaN values in the compute_NASC tests when I shouldn't be getting them (in the case where I am using nanmean/nansum). I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think. This may take me a while longer, so I'll come back to this after addressing a few of the Echopype user issues.

@leewujung
Copy link
Member

I think if I take a very thorough look at the bins that are being created, that will allow me to make sense of this...I think.

Yeah, and making the test data with smaller dimensions may help with figuring out what is going on. For example computing NASC using just 2 pings of data should give you some flexibility to look into where NaNs should be created and where it should not.

@ctuguinay ctuguinay marked this pull request as draft April 5, 2024 04:32
@ctuguinay
Copy link
Collaborator Author

Looking back at that compute MVBS test that I modified, I think I'll revert the change I made for that and have two separate tests (1 for compute MVBS and 1 for compute NASC) with smaller data arrays to do as you suggested above. The _parse_nans may hide some unintentional NaNs in the ds_MVBS and it's just so messy working with such a large dataset.

@ctuguinay ctuguinay marked this pull request as ready for review April 22, 2024 16:55
@ctuguinay ctuguinay requested a review from leewujung April 22, 2024 17:07
@leewujung
Copy link
Member

@ctuguinay : I direct-pushed some changes to fix mock_Sv_dataset_irregular outputs, see below:

  • for Sv datasets generated by compute_Sv, the Sv and the echo_range dataarrays will have identical NaN elements. You can see it in at the end of the compute_Sv code. The original mock_Sv_dataset_irregular fixture only set some echo_range elements to NaN but not the corresponding Sv elements.
  • once I fixed that, I removed 2 lines in your test_compute_MVBS_NASC_skipna_nan_and_non_nan_values to set the NaN elements
  • and then I fixed test_compute_NASC_values by adding skipna=True in the compute_NASC call, and a few fixtures called in it in conftest.py and testing.py

@leewujung leewujung self-requested a review April 23, 2024 14:01
Comment on lines 464 to 465
# Already has 2 channels, so subset for only ping time and range sample
subset_ds_Sv = ds_Sv.isel(ping_time=slice(0,2), range_sample=slice(0,2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having only 2 depth values makes the mock dataset too small for the test to actually check whether the number and location of NaN elements at the output of NASC or MVBS computation is correct. I think you can select 2 of the pings, with one of them having some NaN elements and the other doesn't, but retain all elements along echo_range. This dataarray size should be manageable but would be a lot more realistic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll make these changes.

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctuguinay : see my comments for increasing the test data size along echo_range. Thanks!

ctuguinay and others added 6 commits April 23, 2024 10:01
@codecov-commenter
Copy link

codecov-commenter commented Apr 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.54%. Comparing base (9f56124) to head (8e95b99).
Report is 47 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1269       +/-   ##
===========================================
- Coverage   83.52%   64.54%   -18.99%     
===========================================
  Files          64       14       -50     
  Lines        5686      925     -4761     
===========================================
- Hits         4749      597     -4152     
+ Misses        937      328      -609     
Flag Coverage Δ
unittests 64.54% <100.00%> (-18.99%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leewujung leewujung self-requested a review April 23, 2024 22:04
@leewujung leewujung merged commit 1f50c5f into OSOceanAcoustics:main Apr 23, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This makes echopype better feature request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants