Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transient and impulse noise masks #1142

Conversation

beatfactor
Copy link

Overview

This PR introduces two new functions, get_impulse_noise_mask and get_transient_noise_mask that provide masks for impulsive noise (noise sources shorter than 1 ping) and transient noise (short, but still multi-ping noise sources). These are ported from the echopy library:
mask_impulse.py
mask_transient.py

Methodologies

The methodology employed uses both previously published methods defined:

  • Reducing bias due to noise and attenuation in open-ocean echo integration. ICES Journal of Marine Science 72, 2482-2493 – Ryan T. E., Downie R. A., Kloser R. J., Keith G. (2015) (ryan as mask generation method)
  • A noise removal algorithm for acoustic data with strong interference based on post-processing techniques CCAMLR SG-ASAM: 15/02 – Wang et al. (2015) (wang, for impulse noise only - unlike in echopy, this has been modified to provide a mask rather than a modified Sv array).
  • otherwise-unpublished methods implemented by the authors of echopy (fielding as mask-generation method, for transient noise only).

Key Features

  1. Impulse noise - implements the Ryan two-sided comparison method as "ryan", an iterable modification of it that allows it to detect adjacent noise spikes, and the Wang erosion/dilation/median filtering method as "wang"
  2. Transient noise - implements the Ryan identified Sv spikes method as "ryan" and a previously unpublished method proposed by Fielding et al, which compares the data, ping by ping, with respect to a bloc in a reference layer – should the ping median be greater than the block median by a user-defined threshold, the ping will be masked until transient noise disappears or until it gets to the minimum range allowed by the user

Key Considerations

In order to ensure compatibility between the echopy and echopype libraries, several important factors were taken into account while adapting the echopy code – responsible for noise masking:

  • Data type discrepancies: the differences in input and output data types between the two libraries required adjustment.
  • Mask delivery variation: in echopy, two separate masks are typically generated: one indicating the locations of noise (represented as true) and another showing where the method applied may not have been suitable for certain samples. In contrast, echopype usually delivers a single mask marking samples that are free of noise.
  • Differences in the ‘pings’ and ‘range’ axes: the two libraries have divergent approaches to these axes, which needed to be addressed.

For each noise type, specific decisions were made to combine and adapt the masks appropriately within the echopype environment.

Function Signatures

def get_impulse_noise_mask(
    source_Sv: xr.Dataset,
    desired_channel: str,
    thr: Union[Tuple[float, float], int, float],
    m: Optional[Union[int, float]] = None,
    n: Optional[Union[int, Tuple[int, int]]] = None,
    erode: Optional[List[Tuple[int, int]]] = None,
    dilate: Optional[List[Tuple[int, int]]] = None,
    median: Optional[List[Tuple[int, int]]] = None,
    method: str = "ryan",
) -> xr.DataArray:

def get_transient_noise_mask(
    source_Sv: Union[xr.Dataset, str, pathlib.Path],
    desired_channel: str,
    mask_type: str = "ryan",
    **kwargs
) -> xr.DataArray:

Usage

The functions can be utilised to create masks for Sv data based on identified single-ping or short multi-ping noise sources.

mihaiboldeanu and others added 17 commits August 7, 2023 18:25
* Completed feature #3 transient noise mask
---------

Co-authored-by: Mihai Boldeanu <[email protected]>
Co-authored-by: Raluca Simedroni <[email protected]>
Co-authored-by: ruxandra valcu <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2023

Codecov Report

Merging #1142 (ec60163) into dev (6b734eb) will decrease coverage by 9.89%.
Report is 49 commits behind head on dev.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##              dev    #1142      +/-   ##
==========================================
- Coverage   80.71%   70.82%   -9.89%     
==========================================
  Files          67       10      -57     
  Lines        6056     1042    -5014     
==========================================
- Hits         4888      738    -4150     
+ Misses       1168      304     -864     
Flag Coverage Δ
unittests 70.82% <77.77%> (-9.89%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
echopype/utils/mask_transformation.py 62.99% <62.99%> (ø)
echopype/mask/mask_transient_noise.py 90.00% <90.00%> (ø)
echopype/mask/mask_impulse_noise.py 99.12% <99.12%> (ø)
echopype/mask/__init__.py 100.00% <100.00%> (ø)

... and 66 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@beatfactor
Copy link
Author

Codecov Report

@mihaiboldeanu Looks like we could do a with a few more tests.

@mihaiboldeanu mihaiboldeanu force-pushed the feature/add_transient_noise_mask branch from c08a9d1 to 6ac0450 Compare August 31, 2023 08:47
@leewujung
Copy link
Member

Hey folks, thanks for the PR!

I did a high-level review of the function/module organization and have the following suggestions. @emiliom also contributed to the discussion.

  • Could you please revert all the changes in tests/mask/test_mask.py? These changes are due to formatting, and even though the current one is not optimal, this would cause conflicts for the ongoing other PR Improved input syntax for frequency_differencing operation #1106 .
  • Based on what we have for the subpackage organization (see here), these noise mask generation functions would be in the clean subpackage. It is true that the functions you added here are generating masks, but noise assessment is a pretty central task for data processing that having its own subpackage is logical. So, could you move them? In the comment below I'll use clean to avoid confusion, but let me know if it is confusing!
  • Could you use pytest fixtures for the setup parts of the two test modules test_mask_transient.py and test_mask_impulse_noise.py, and put them in tests/clean/conftest.py so that they can be reused in the test modules. Also suggest renaming:
    • test_mask_transient.py to test_clean_transient_noise.py
    • test_mask_impulse_noise.py to test_clean_transient_noise.py
  • For get_impulse_noise_mask.py:
    • move the function get_impulse_noise_mask to within echopype/clean/api.py
    • move all the subfunctions (ryan, ryan_iterable, wang, etc) to a separate module echopype/clean/impulse_noise.py, and add _ to the beginning of the function call so it is clear they are private functions
  • For get_transient_noise_mask.py:
    • move the function get_transient_noise_mask to within echopype/clean/api.py
    • move the other two functions _get_transient_noise_mask_ryan and _get_transient_noise_mask_fielding to a separate module echopype/clean/transient_noise.py, and rename them to just _ryan and _fielding

For the last 2 points, our convention has been to keep the main API call under a subpackage in subpackage/api.py, and put supporting functions into separate modules (.py files) with filenames indicating which api function they are used in.

In addition, in clean we currently only have one function with a generic name (remove_noise), even though that function implements a specific algorithm that aims to estimate the time-varying background noise. With your addition of new noise removal functions (once masked), we need to change this function name and figuring out the API pattern for noise-related functions that generate masks (like these added here) and functions that suppress some noise (the current remove_noise). But all that would be a separate PR.

ruxandra-valcu and others added 8 commits October 4, 2023 11:27
… that we had had to adapt it quite a bit to make it generate a mask rather than modify the Sv directly (which made it quite aggressive) and trying to rebuild it to work directly on binary image operations rather than grayscale ones (dask image only has binary erode/dilate implemented currently) would have worsened it even further. Will likely make a separate PR for it once grayscale image operators are introduced in dask image.
)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4.7.0 to 4.7.1.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4.7.0...v4.7.1)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0)
- [github.com/codespell-project/codespell: v2.2.5 → v2.2.6](codespell-project/codespell@v2.2.5...v2.2.6)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
simedroniraluca and others added 5 commits October 16, 2023 13:55
* Refactor _fielding function for consistent return type

* Refactor signal attenuation functions for consistent return type
updates:
- [github.com/psf/black: 23.9.1 → 23.10.1](psf/black@23.9.1...23.10.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@ruxandra-valcu
Copy link

Hello @leewujung

I've rewritten this PR (both the already-reviewed impulse noise algorithms and the transient noise and signal attenuation ones) to use xarray functionality rather than numpy.

Please let me know if there are any other improvements I should make!

ruxandra-valcu and others added 13 commits November 2, 2023 14:25
* added support for AZFP multiple phase attributes parsing (OSOceanAcoustics#1182)

* added support for multiple phase attributes parsing

* minor changes to code

* Update echopype/convert/set_groups_azfp.py

Co-authored-by: Emilio Mayorga <[email protected]>

* Update set_groups_azfp.py

* Update test_convert_azfp.py

---------

Co-authored-by: Emilio Mayorga <[email protected]>

* ci: Update CI to barebone python [all tests ci] (OSOceanAcoustics#1192)

* ci: Add installed packages listing

* ci: Add pip listing

* ci: Remove conda from PR action

* ci: Update worklows and add P2Z skip

* ci: More cleanup to workflow

* ci: Bump actions/setup-python from 4.7.0 to 4.7.1

* fix: Fix encoding in 'to_zarr' to remove 'preferred_chunks' before saving (OSOceanAcoustics#1128)

* Encoding and Chunk matching only if DaskArray

* Removing preffered chunks from encoding dict

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* `parse_azfp` now parses AZFP pressure data according to given matlab code (OSOceanAcoustics#1189)

* added support for parsing azfp pressure data

* fixed failing ci tests

* Revert "fixed failing ci tests"

This reverts commit c378dc0.

* minor changes to tests

* Update test_convert_azfp.py

* Update echopype/tests/convert/test_convert_azfp.py

---------

Co-authored-by: Emilio Mayorga <[email protected]>

* Enhanced `update_platform` to Auto-assign first `ping_time` as `Platform` timestamp for fixed-location update case without timestamp. [all tests ci] (OSOceanAcoustics#1196)

* first  is auto assigned as  timestamp for lon and lat update without timestamp

* fix small typo

---------

Co-authored-by: Wu-Jung Lee <[email protected]>

* Added `depth_from_pressure` method required for the calculation of `vertical_offset` value (OSOceanAcoustics#1207)

* added depth_from_pressure method

* Function arguments can be scalars or sequences; and add more tests (#2)

* Reverse order of equation terms to match UNESCO 1983 source

* For depth_from_pressure, accept scalars, lists and arrays for all 3 arguments. Include consistency checks.

* Add a greater variety of depth_from_pressure tests.

* Update echopype/utils/misc.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Emilio Mayorga <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(commongrid): fix bugs and improve `compute_NASC` using flox (OSOceanAcoustics#1167)

* chore(deps): add flox dependency >=0.7.2

* fix(commongrid): fixes 'compute_MVBS' so it can work better and scale

Under the hood, binning along ping time and echo range now uses flox.
This allows for scalability and more community-maintained.

* docs: add small code comment

* refactor: change how ping_time index is retrieved

* refactor: remove for loop for channel

* test(mvbs): add mock Sv datasets and tests for dims (#2)

Note that @leewujung also changed mean to nanmean for skipping NaNs in each bin.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: change dask to numpy

Changed the use of dask for log10 to numpy instead since numpy can also handle dask array inputs properly.

* feat: Add method argument

Added 'method' argument to 'get_MVBS_along_channels'
and also expose additional keyword arguments control for
flox.

* fix(commongrid): Fixed to include lat lon

Fixed 'compute_MVBS' function to now
include latitude and longitude if the variables
exists in the Sv dataset.

Additionally, the flox method and keyword arguments
are now exposed within the 'compute_MVBS' function.

Ref: Issue OSOceanAcoustics#1002

* refactor: Set defaults to recommended

After some investigation, @lsetiawan
concluded that at this time the
method 'map-reduce', engine 'numpy',
and reindex True works the best,
so this is now set as default.

Also, getting echo range maximum is
through direct data slicing rather than
computation.

* feat(commongrid): Add 'range_var' argument to 'compute_MVBS'

Added a new argument 'range_var' so that user can set
the range variable to perform binning with.
There are 2 options of 'echo_range' and 'depth':

- 'echo_range': When this is set, variable 'water_level'
                is now included in the resulting MVBS dataset
- 'depth': A check is in place to ensure that this variable exists
           before moving forward and use this to perform range binning.

Ref: Issue OSOceanAcoustics#1002

* fix: Add missing attributes for lat lon

* test: Update test to use random generator

* fix: Add case for no 'water_level'

Added a case for dataset that doesn't have water level variable.

* test(nasc): Remove 'compute_NASC' import to avoid failure

* fix: Removed assumption on echo range max

Reverted back the echo range max computation
to computing on the fly since there may be some
NaN values.

* test: Extract api test and add markings

Extracted fixtures to conftest.py for commongrid.
Additionally, clean up unused functions and mark
tests b/w unit and integration. Added a new test module
called 'test_api.py' for 'commongrid.api'.

* test: Add latlon test for 'compute_MVBS'

Added a test for Sv dataset that contains latitude
and longitude going through 'compute_MVBS'
to ensure that those variables gets propagated through.

Ref: OSOceanAcoustics#1002

* test: Add small get_MVBS_along_channels test

Added test for 'get_MVBS_along_channels' with either 'depth'
as the 'range_var' or checking for 'has_positions' is True
or False.

* refactor: Integrate suggested changes

Integrated suggested changes from review such as
additional comments in code, fixing some variable names,
and extracting out the lin2log and log2lin functions.

Additionally, now echopype imports pint library
to start having unit checks in the input for compute_MVBS.

* test: Added check for position values

* test: Update range_meter_bin to strings

* test: Added 'compute_MVBS' values test

* Update echopype/tests/utils/test_processinglevels_integration.py

compute_MVBS now should preserve the processing level attributes. So, test for presence rather than absence

* test: Add 'nan' sprinkles

Sprinkled 'nan' values all over 'echo_range' to ensure that
computed values from 'compute_MVBS' doesn't take into account
the 'nan'. Added check for the expected distribution of 'nan'
in the resulting array.

Ref: OSOceanAcoustics#1124 (comment)

* revert: Revert the use of 'pint'

Removed dependency to 'pint' and use simple regex
to ensure that 'range_bin' input is unit 'm'.
Renamed 'range_meter_bin' argument to 'range_bin'.

Ref: OSOceanAcoustics#1124 (comment)

* feat: Allow 'range_bin' to have space

* fix: Apply suggestions from code review

Applied fix for regex not capturing decimal values by @emiliom

Ref: https://github.com/OSOceanAcoustics/echopype/pull/1124/files#r1320422121

Co-authored-by: Emilio Mayorga <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test: Fix test text for wrong unit

* test: Remove the 'e.g.' part on pytest

Removed the part with '(e.g., '10m')' since
it's messing up pytests regex matching.

* revive the function to make changes easier to see

* add TODOs

* add computation steps

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused _lin2log

* test: Remove remnant for test_ek.py

* refactor: Extract range_bin parsing and add close arg

Extracts out the 'range_bin' string to float into a private function.
Additionally now there's a fine tune argument for bin close edges
so user can specify either close is 'left' or 'right'.
Bins are converted to pandas interval index before passing
into 'get_MVBS_along_channels'.

* refactor: Update arg types to include interval index

Added argument type options for 'range_interval' and 'ping_interval' to
also be interval index.

* test: Update tests to have brute force creation

Changed mock mvbs to be created by doing brute force
rather than hard coding.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test: Fix brute force mvbs gen

Fixes the generation of expected mvbs with brute force
as well as tweaks to mvbs_along_channel test.

* chore: Clean up old code for doing compute MVBS

Removes old code that perfoms compute_MVBS since
now we've switched over to flox

* chore(pytest): Added custom markers 'unit' and 'integration'

* docs: Update docstring for `compute_MVBS`

Added options for literal arguments

* refactor: Change 'parse_range_bin' to 'parse_x_bin'

Make bin parsing to be more general by making it
to 'parse_x_bin'.

* refactor: Initial unification of MVBS and NASC

Added setup and validate function for shared checks
between compute MVBS and NASC so only unique
checks are in its individual function.

* fix typo when porting from notebook

* correct attribute units from m to nmi

* refactor: Add typehints and use method

* feat: Add get_x_along_channels

Added 'get_x_along_channels' function that generalizes
the reduction routines from 'get_MVBS_along_channels'.
This now removes the old function in mvbs.py module.

Additionally, uses of 'get_MVBS_along_channels' has
been removed from the test and code for 'compute_MVBS'.

* feat: Implement new 'compute_NASC'

Use 'get_x_along_channels' for 'compute_NASC'
and turn on old 'test_nasc.py' for initial nasc testing

* test: Renamed and moved get_x_along_channels test

* fix: Use 'ffill' and 'bfill'

Fixes the 'FutureWarning' coming from pandas
since as of pandas version 2.1.0 the 'method'
argument for 'fillna' is deprecated.

Ref: OSOceanAcoustics#1167 (comment)

* feat: Allow import 'compute_NASC' from 'commongrid' module

* fix: Fix bug on setup and validate and test

Fixes bug on 'setup_and_validate' during variable checks.

Also added simple testing for values from flox vs echoview.

* test: Update simple NASC integration test

* test: Add brute force values test for NASC

* chore: Apply suggestions from code review

Co-authored-by: Wu-Jung Lee <[email protected]>

* refactor: Extract position reduction

* refactor: Separate sv mean and raw computations

* test: Remove empty test_nasc.py

* docs: Update docs for functions

* refactor: Move helper funcs to utils.py

* add L4 processing level to compute_NASC

---------

Co-authored-by: Landung 'Don' Setiawan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Emilio Mayorga <[email protected]>

---------

Co-authored-by: Praneeth Ratna <[email protected]>
Co-authored-by: Emilio Mayorga <[email protected]>
Co-authored-by: Don Setiawan <[email protected]>
Co-authored-by: Soham Butala <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wu-Jung Lee <[email protected]>
* Added signal attenuation fix

* Test fix

* Added fill value parametrization for log/lin functions
* Ryan transient testing

* Ryan transient testing
* Memory optimizations for signal & transient noise

* Memory optimizations for signal & transient noise
@ctuguinay
Copy link
Collaborator

Closing this since the Echopy functions were added in #1316 and #1326

@ctuguinay ctuguinay closed this Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants