Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt signal_fixpeaks to deal with larger gaps in data #650

Closed
danibene opened this issue Jun 3, 2022 · 4 comments
Closed

Adapt signal_fixpeaks to deal with larger gaps in data #650

danibene opened this issue Jun 3, 2022 · 4 comments

Comments

@danibene
Copy link
Collaborator

danibene commented Jun 3, 2022

Describe the solution you'd like
I noticed that the "neurokit" method of signal_fixpeaks does not work well with larger gaps in the data, sometimes leading to negative indices or getting stuck in a loop: https://github.com/danibene/fixpeaks_with_large_gaps/blob/main/example_signal_with_large_gaps.ipynb

How could we do it?
I think there are a few possible ways to go about dealing with larger gaps (not mutually exclusive):

  1. Adapting the number of NaNs depending on the interval size rather than always inserting 2 NaNs for large intervals
  2. Interpolating the peaks directly rather than the intervals
  3. Providing a warning if there are negative indices & breaking the loop after a certain number of iterations

If you're interested in any of the above, I'm happy to try my hand & open a PR!

@DominiqueMakowski
Copy link
Member

Fixing peaks is a tricky issue, but there's definitely room for improvement!
n°3 should be a starting point, then I don't know about what's best between 1 and 2... but please do open a PR so we can check.

Would be also nice to add an example / study explaining and benchmarking these methods

@danibene
Copy link
Collaborator Author

danibene commented Jun 6, 2022

Fixing peaks is a tricky issue

Indeed!

I've opened the PR for now with the 3 changes I mentioned above, it's a work in progress but I thought I should get your input before overcomplicating things: #647

The new version is able to fix ECG peaks for data where the old version got stuck in a loop.

I still see a couple of issues though:

  • there is no extrapolation in case there is a NaN at the final index (so when interpolating the peaks directly, the index stays constant rather than increasing)
    → modify the signal_interpolate function to extrapolate?
    → could remove NaNs at the end?
  • sometimes the indices are not negative, but still out of order
    → provide warning for peak indices that are not increasing?
  • if there are many missing peaks, it might be better to exclude them instead of linearly interpolating for certain analyses, e.g. short term HRV
    → have option to keep NaNs rather than interpolate above a certain threshold?

@danibene
Copy link
Collaborator Author

Hi @DominiqueMakowski & @JanCBrammer , I've noticed a couple of problematic parts of the code from the previous PR

  1. If the intervals are standardized, the calculation of the number of NaNs to insert based on the mean interval does not make sense:
    interval = signal_period(peaks, sampling_rate=sampling_rate, desired_length=None)
    interval = standardize(interval, robust=robust)
    peaks = _interpolate_missing(
    peaks=peaks,
    interval=interval,
    interval_max=relative_interval_max,
    sampling_rate=sampling_rate,
    )

    interval_without_outliers = interval[np.invert(outliers)]
    mean_interval = np.nanmean(interval_without_outliers)

    E.g.:
import neurokit2 as nk
peaks = [250,  1250,  1350,  2250,  3250]
interval = nk.signal_period(peaks, sampling_rate=1000, desired_length=None)
interval
array([0.75, 1.  , 0.1 , 0.9 , 1.  ])

nk.standardize(interval)
array([ 0.        ,  0.66226618, -1.72189206,  0.39735971,  0.66226618])
  1. The calculation of the number of NaNs to insert is always with the floor rather than rounding:
    # compute number of NaNs to insert based on the mean interval
    n_nan = int(interval[loc] / mean_interval)

This is probably not ideal if interval[loc] / mean_interval is closer to the next integer:

int(1.99)
1

round(1.99)
2

Should I open a new PR? Sorry I didn't notice these earlier 🙈

@danibene danibene reopened this Jul 19, 2022
@DominiqueMakowski
Copy link
Member

Should I open a new PR? Sorry I didn't notice these earlier 🙈

No worries, better late than never :) Yes please do open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants