Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attributes for Sp and MVBS from index binning #615

Merged
merged 11 commits into from
Apr 14, 2022

Conversation

emiliom
Copy link
Collaborator

@emiliom emiliom commented Apr 6, 2022

Addresses remaining tasks in #582 and #481

@emiliom emiliom added data conversion data format Anything about data format labels Apr 6, 2022
@emiliom emiliom added this to the 0.6.0 milestone Apr 6, 2022
@emiliom emiliom requested a review from leewujung April 6, 2022 07:58
_set_MVVBS_attrs(ds_MVBS)
ds_MVBS["Sv"] = ds_MVBS["Sv"].assign_attrs(
{
"comment": "MVBS binned on the basis of range_sample and ping number specified as index numbers",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leewujung pay attention to this addition. I couldn't come up with a cell_methods alternative. I was getting hung up with the specification of bins based on index numbers rather than physical units.
FYI, comment is a CF attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what you had for the case when MVBS is computed based on physical units (actual range in meters and ping time), how about something like below?

{
            "cell_methods": (
                f"ping_time: mean (interval: {ping_num} pings "
                "comment: ping_time is the interval start) "
                f"echo_range: mean (interval: {range_sample_num} samples along range "
                "comment: echo_range is the interval start)"
            ),
            "binning_mode": "sample number",
            "range_meter_interval": str(range_sample_num) + "samples along range",
            "ping_time_interval": str(ping_num) + "pings",
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding link to where I discussed CF cell_methods: #594 (review)

Copy link
Collaborator Author

@emiliom emiliom Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A challenge with your suggestion is the interval units string. Per CF:

The syntax is (interval: value unit), where value is a numerical value and unit is a string that can be recognized by UNIDATA’s Udunits package [UDUNITS].

So, using "pings" and "samples along range" would not be CF compliant. The difficulty is in trying to specify a constant interval that's based on the index number rather than the dimension units. Though CF also says this:

The unit will usually be dimensionally equivalent to the unit of the corresponding dimension, but this is not required (which allows, for example, the interval for a standard deviation calculated from points evenly spaced in distance along a parallel to be reported in units of length even if the zonal coordinate of the cells is given in degrees).

Let me chew on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"range_meter_interval": str(range_sample_num) + "samples along range",
"ping_time_interval": str(ping_num) + "pings",

This seems confusing, since the value after "range_meter_interval" will not be in meters, nor will the value after "ping_time_interval" be in time units. It seems like the goal of these custom echopype attributes was to record the function parameters used, as is. If we find a way to settle on using cell_methods, the linkage between ping_time and echo_range and the binning method (mean) & interval values used would be explicit, so I'd then vote for retaining the original attribute names of "range_sample_num" and "ping_num".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely understanding what you are proposing, but when proposing the cell_methods I actually forgot to check the attribute names. It should be

"range_sample_interval": str(range_sample_num) + "samples along range",
"ping_interval": str(ping_num) + "pings",

ie "range_sample_interval" instead of "range_meter_interval", "ping_time_interval" instead of "ping_time_interval"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pointing out that these custom (echopype) attributes generated by compute_MVBS and compute_MVBS_index_binning seemed to mirror the parameters and the values fed to those functions, as is. But I realize now that wasn't true, though it was close.
I've updated the code to reflect what you have here.

@codecov-commenter
Copy link

codecov-commenter commented Apr 10, 2022

Codecov Report

Merging #615 (5a685f4) into dev (642154c) will decrease coverage by 5.70%.
The diff coverage is 94.87%.

@@            Coverage Diff             @@
##              dev     #615      +/-   ##
==========================================
- Coverage   78.74%   73.04%   -5.71%     
==========================================
  Files          40       19      -21     
  Lines        3614     1165    -2449     
==========================================
- Hits         2846      851    -1995     
+ Misses        768      314     -454     
Flag Coverage Δ
unittests 73.04% <94.87%> (-5.71%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
echopype/echodata/echodata.py 56.50% <0.00%> (-30.78%) ⬇️
echopype/preprocess/api.py 70.58% <90.90%> (-19.21%) ⬇️
echopype/calibrate/__init__.py 100.00% <100.00%> (ø)
echopype/calibrate/api.py 83.78% <100.00%> (+0.45%) ⬆️
echopype/calibrate/calibrate_azfp.py 95.74% <100.00%> (ø)
echopype/calibrate/calibrate_base.py 84.94% <100.00%> (-2.16%) ⬇️
echopype/calibrate/calibrate_ek.py 92.77% <100.00%> (-0.73%) ⬇️
echopype/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
echopype/echodata/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
echopype/preprocess/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
... and 36 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 12, 2022

Pushed new commits:

@leewujung
Copy link
Member

Great, thanks @emiliom !
The failed pre-commit check are just 2 lines that are too long:

f"ping_time: mean (interval: {ping_time_bin_resvalue} {ping_time_bin_resunit_label} "

and
"comment": "MVBS binned on the basis of range_sample and ping number specified as index numbers",

For readability an #noqa may be in order.

Comment on lines 75 to 77
if "water_level" in echodata.platform.data_vars.keys():
# add water_level to the created xr.Dataset
sv_dataset["water_level"] = echodata.platform.water_level
Copy link
Member

@leewujung leewujung Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emiliom : Not sure if this should be in another PR:

Looking at this now, I realized the check and addition of water_level should be in the outputs for both compute_Sv and compute_TS so that users can use this to convert echo_range to depth easily.

Also in the realm of this PR, the attributes of water_level can be set in _set_MVBS_attrs below under echopype/preprocess/api.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #516 #583 #586

Copy link
Collaborator Author

@emiliom emiliom Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a separate PR. But this looks like an easy change w/o additional ramifications that will lead to protracted discussions. It's also in line with the broader improvements to compute_Sp/TS. So, I'll just add it to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thank you @emiliom !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the realm of this PR, the attributes of water_level can be set in _set_MVBS_attrs below under echopype/preprocess/api.py.

Actually, water_level retains its original attributes from the echodata Platform group. So, no attribute assignment is taking place in calibrate/api.py.

I've implemented the addition of water_level to compute_TS (and in the process it made the code quite a bit simpler!). I'll push it to the PR in a bit.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 13, 2022

For readability an #noqa may be in order.

Done

@leewujung
Copy link
Member

I think you had a typo on noqa 🤪

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 13, 2022

I think you had a typo on noqa

😩 Sigh. I'll fix it now.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 13, 2022

pre-commit is now passing. Nice!

@leewujung
Copy link
Member

Thanks @emiliom ! I'll add a small thing about TS applicability in the docstring later today, and then merge this.

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 13, 2022

I think what remains an open question is the cell_methods implementation. As I wrote earlier (above):

A challenge with your suggestion is the interval units string. Per CF:

The syntax is (interval: value unit), where value is a numerical value and unit is a string that can be recognized by UNIDATA’s Udunits package [UDUNITS].

So, using "pings" and "samples along range" would not be CF compliant. The difficulty is in trying to specify a constant interval that's based on the index number rather than the dimension units.

I don't have a solution. I think we should just merge this PR (after your additions), then come back to it in the future.

@leewujung
Copy link
Member

leewujung commented Apr 14, 2022

@emiliom : I think you should make a decision here since I am not very familiar with the CF conventions in general.

I will merge this after my addition of TS-related docstring.

@leewujung
Copy link
Member

@emiliom : do you wanna capture this leftover dilemma in a new issue? Thanks!

@emiliom
Copy link
Collaborator Author

emiliom commented Apr 14, 2022

@emiliom : do you wanna capture this leftover dilemma in a new issue? Thanks

Done. #629

@emiliom emiliom deleted the sp_mvbs_attr branch December 24, 2022 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data conversion data format Anything about data format
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants