Refactors and fixes sample entropy (last M values must be ignored) #15

CSchoel · 2019-12-28T16:34:08Z

@DominiqueMakowski called my attention to a discrepancy between this implementation of the sample entropy and my implementation in nolds in this issue: neuropsychology/NeuroKit#53 .

I think there is a small issue in pyEntropy that causes this inconsistence: The sample entropy is the conditional probability that two pieces of the input sequence that are similar for M time steps will remain similar for M+1 time steps. If we count the number of similar template vector pairs of length M we therefore must ignore the last template vector, since it cannot be followed for another time step. If we would include it in the calculation, this would introduce a bias that underestimates the number of template vectors that remain similar for a length of M+1.

Reference: Richman and Moorman (2000), page H2042

At Dominique's hint I found similar issues with entro-py (ixjlyons/entro-py#2) and pyeeg (forrestbao/pyeeg#29). With the suggested fix in this pull request, pyEntropy produces the same output as nolds and the R-package pracma (which I used as reference for the implementation of nolds), as well as the fixed versions of pyeeg and entro-py,

import nolds
import entropy as e
import pyentrp.entropy as pent
import pyeeg.entropy as peeg
import numpy as np

num = 100
dim = 2
tol = 0.2
signal = np.cos(np.linspace(start=0, stop=30, num=num))
print("entro-py", e.sampen(signal, dim, tol, False))
print(" pyentrp", pent.sample_entropy(signal, dim + 1, tol)[-1])
print("   pyeeg", peeg.samp_entropy(signal, dim, tol))
print("   nolds", nolds.sampen(signal, emb_dim=dim, tolerance=tol))

entro-py 0.27672305866206137
 pyentrp 0.2767230586620615
   pyeeg 0.27672305866206137
   nolds 0.2767230586620615

Since I found the code in pyEntropy hard to understand, I took the liberty to refactor it. After I was done with that I could also identify the actual culprit in the original code. So if you would like to incorporate the fix but not my refactored version, you can alternatively pull the branch fix_sampen in my fork.

reason for fix: last M values in time series cannot be followed for M steps reference: Richman and Moorman (2000), page H2042 reason for refactoring: code was unnecessarily complicated with a for loop disguised as a while loop and an unnecessary separation of the first iteration

also used assert_allclose instead of assert_array_equal for correct float testing

codecov · 2019-12-28T17:09:11Z

Codecov Report

Merging #15 into master will decrease coverage by 0.46%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #15      +/-   ##
==========================================
- Coverage   92.25%   91.79%   -0.47%     
==========================================
  Files           2        2              
  Lines         142      134       -8     
==========================================
- Hits          131      123       -8     
  Misses         11       11

Impacted Files	Coverage Δ
pyentrp/entropy.py	`88.77% <100%> (-0.85%)`	⬇️
tests/test_entropy.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae2bf71...1f8792f. Read the comment docs.

CSchoel · 2019-12-28T17:10:39Z

Update: I also adjusted the tests accordingly (and used assert_allclose instead of assert_array_equals for correct float handling.

nikdon · 2020-01-10T15:16:41Z

Hey @CSchoel. Thanks for the PR. I think your refactoring is good to be merged, and I don't see any objections as well :) I'll merge it shortly

CSchoel mentioned this pull request Dec 28, 2019

Bugfix for sample entropy: last template vector of length M must be ignored forrestbao/pyeeg#29

Open

adjusts tests to fit fixed sample entropy

1f8792f

also used assert_allclose instead of assert_array_equal for correct float testing

CSchoel mentioned this pull request Dec 28, 2019

Dicrepancy in sample entropy (SampEn) between two python implementations (nolds and entro-py); which one is correct? neuropsychology/NeuroKit#53

Closed

DominiqueMakowski mentioned this pull request Dec 29, 2019

Indices of complexity, entropy and fractal dimension neuropsychology/NeuroKit#40

Closed

45 tasks

nikdon merged commit e270bab into nikdon:master Jan 10, 2020

CSchoel mentioned this pull request Jan 24, 2020

Refactors and fixes sample entropy (last M values must be ignored) jialunzhang-psu/MultiscaleEntropy#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactors and fixes sample entropy (last M values must be ignored) #15

Refactors and fixes sample entropy (last M values must be ignored) #15

CSchoel commented Dec 28, 2019

codecov bot commented Dec 28, 2019 •

edited

Loading

CSchoel commented Dec 28, 2019

nikdon commented Jan 10, 2020

Refactors and fixes sample entropy (last M values must be ignored) #15

Refactors and fixes sample entropy (last M values must be ignored) #15

Conversation

CSchoel commented Dec 28, 2019

codecov bot commented Dec 28, 2019 • edited Loading

Codecov Report

CSchoel commented Dec 28, 2019

nikdon commented Jan 10, 2020

codecov bot commented Dec 28, 2019 •

edited

Loading