Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampenc implementation #27

Open
kohleth opened this issue Oct 5, 2019 · 2 comments
Open

sampenc implementation #27

kohleth opened this issue Oct 5, 2019 · 2 comments

Comments

@kohleth
Copy link

kohleth commented Oct 5, 2019

Hi,

I am trying to understand the logic behind sampenc.
I figured it is a modification of douglas lake's code?
But there seems to be 2 issues:

  1. the line p <- A[2]/B[1] I think should be p<-A[M]/B[M-1]

issue 2 is, I don't quite get lake's code (is it correct?).
I tried to follow his argument on https://www.physionet.org/content/sampen/1.0.0/ especially

If a particular run ends up being of length 4, for example, then that means that 1 is added to the count for template matches of length 4. In addition, there are 2 template matches of length 3, 3 of length 2, and 4 of length 1 that need to be added to the corresponding counts.

But I couldn't verify this behaviour using some simple simulated series.

So I

  1. re-implement Richman and Moorman original algorithm literally
  2. use pracma::sample_entropy
  3. re-implement Lake's matlab literally
  4. modify 'tsfeatures::sampenc` according to point 1 above

and did a comparison

my_sam_en=function(ts,M,r){  ## richmand and moormand literally
  N=length(ts)
  subts=function(.ts,l,exclude_last=T){
    N=length(.ts)
    if(exclude_last)return(lapply(1:(N-l),function(i).ts[i:(i+l-1)]))
    lapply(1:(N-l+1),function(i).ts[i:(i+l-1)])
  }
  ## all length M sequences, up to begining from N-M
  x_M=subts(ts,M,exclude_last = T)
  x_Mp1=subts(ts,M+1,exclude_last=F)
  
  count_r=function(xx){
    # browser()
    d=Vectorize(function(a,b)max(abs(xx[[a]]-xx[[b]])),c('a','b'))
    N=length(xx)
    O=outer(1:N,1:N,d)
    sum(O<(r-1e-10))-nrow(O)
  }
  C_M_r=count_r(x_M)
  C_Mp1_r=count_r(x_Mp1)
  
  -log(C_Mp1_r/C_M_r)  
}

modified_sampenc=function (y, M = 6, r = 0.3)   ## tsfeatures::sampenc
{
  N <- length(y)
  lastrun <- numeric(N)
  run <- numeric(N)
  A <- numeric(M)
  B <- numeric(M)
  for (i in 1:(N - 1)) {
    y1 <- y[i]
    for (jj in 1:(N - i)) {
      j <- i + jj
      if (isTRUE(abs(y[j] - y1) < r)) {
        run[jj] <- lastrun[jj] + 1
        M1 <- min(M, run[jj])
        A[1:M1] <- A[1:M1] + 1
        if (j < N) 
          B[1:M1] <- B[1:M1] + 1
      }
      else {
        run[jj] <- 0
      }
    }
    for (j in 1:(N - i)) {
      lastrun[j] <- run[j]
    }
  }
  p <- A[M]/B[M-1]
  e <- -log(p)
  return(e)
}
lake=function(y,M,r){
  n=length(y);
  lastrun=rep(0,n);
  run=rep(0,n);
  A=rep(0,M);
  B=rep(0,M);
  p=rep(0,M);
  e=rep(0,M);
  for(i in 1:(n-1)){
    nj=n-i;
    y1=y[i];
    for(jj in 1:nj){
      j=jj+i;      
      if(abs(y[j]-y1)<r){
        run[jj]=lastrun[jj]+1;
        M1=min(M,run[jj]);
        for(m in 1:M1){
          A[m]=A[m]+1;
          if(j<n)B[m]=B[m]+1;
        }           
      }else{
        run[jj]=0;
      }
    }
    for(j in 1:nj){
      lastrun[j]=run[j];    
    }
  }
  N=n*(n-1)/2;
  B=c(N,B[1:(M-1)]);
  p=A/B;
  e=-log(p);
  e[length(e)] ## only return the last value
}
y=sin(1:100)
M=2;r=0.3
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
  my_sam_en(y,M,r),
  modified_sampenc(y,M,r),
  lake(y,M,r),
  tsfeatures::sampenc(y,M,r))
# [1] 0.1006652 0.1006652 0.8065081 0.8065081 0.8065081
 
M=3;r=0.2
c(pracma::sample_entropy(sin(1:100),edim=M,r=r),
  my_sam_en(y,M,r),
  modified_sampenc(y,M,r),
  lake(y,M,r),
  tsfeatures::sampenc(y,M,r))
# [1] 0.0000000 0.0000000 0.1459539 0.1459539 0.9808293

So pracma:;sample_entropy (and my translation of the original paper) more or less agree while strongly disagreeing with lake's code (and thus tsfeatures::sampenc).

But again, I couldn't follow lake's algorithm at all.

@wgova
Copy link

wgova commented Apr 24, 2022

@robjhyndman - is this repo/package still being maintained?

Are the features computed here equivalent to those in FEASTS - or are the two slowly diverging to make way for multi-series computations?

@robjhyndman
Copy link
Owner

tsfeatures is still being maintained, at least fixing bugs, etc., and we will continue to ensure it stays on CRAN. The features in feasts may diverge as that is under more active development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants