Refactors and fixes sample entropy (last M values must be ignored) #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@DominiqueMakowski called my attention to a discrepancy between this implementation of the sample entropy and my implementation in nolds in this issue: neuropsychology/NeuroKit#53 .
I think there is a small issue in pyEntropy that causes this inconsistence: The sample entropy is the conditional probability that two pieces of the input sequence that are similar for M time steps will remain similar for M+1 time steps. If we count the number of similar template vector pairs of length M we therefore must ignore the last template vector, since it cannot be followed for another time step. If we would include it in the calculation, this would introduce a bias that underestimates the number of template vectors that remain similar for a length of M+1.
Reference: Richman and Moorman (2000), page H2042
At Dominique's hint I found similar issues with entro-py (ixjlyons/entro-py#2) and pyeeg (forrestbao/pyeeg#29). With the suggested fix in this pull request, pyEntropy produces the same output as nolds and the R-package pracma (which I used as reference for the implementation of nolds), as well as the fixed versions of pyeeg and entro-py,
Since I found the code in pyEntropy hard to understand, I took the liberty to refactor it. After I was done with that I could also identify the actual culprit in the original code. So if you would like to incorporate the fix but not my refactored version, you can alternatively pull the branch fix_sampen in my fork.