Building a classifier using BIRDMAn #70

gibsramen · 2022-05-18T21:58:15Z

gibsramen
May 18, 2022
Collaborator

@mortonjt and I discussed using BIRDMAn as a classifier.

The idea is that you can fit parameters on a training dataset and use these parameters on a testing dataset. The testing model will evaluate the log-likelihood of the parameters on each class and you can use the highest log-likelihood to determine the most likely class.

Training model:

data {
  int<lower=1> N;
  int<lower=1> p;
  vector[N] depth;
  int y[N];
  matrix[N, p] x;
  real<lower=0> B_p;
  real<lower=0> phi_s;
}
parameters {
  vector[p] beta_var;
  real<lower=0> reciprocal_phi;
}
transformed parameters {
  real phi = 1 / reciprocal_phi;
  vector[N] lam = x*beta_var + depth;
}
model {
  beta_var[1] ~ normal(-6, B_p);
  for (j in 2:p) {
    beta_var[j] ~ normal(0, B_p);
  }
  reciprocal_phi ~ cauchy(0, phi_s);

  y ~ neg_binomial_2_log(lam, phi);
}
generated quantities {
  vector[N] y_predict;
  vector[N] log_lhood;

  for (n in 1:N) {
    y_predict[n] = neg_binomial_2_log_rng(lam[n], phi);
    log_lhood[n] = neg_binomial_2_log_lpmf(y[n] | lam[n], phi);
  }
}

Testing model:

data {
  int<lower=1> D;                           // Number of microbes
  int<lower=1> N;                           // Number of samples
  int<lower=1> draws;                       // Number of draws

  real log_depths[N];                       // Log sequencing depths
  int<lower=0> y[N, D];                     // Count data
  matrix[N, 2] x;                           // Design matrix (all ones)
  array[draws] matrix[2, D] post_beta_var;  // Posterior draws
  array[draws] vector[D] post_phi;          // Overdispersion per microbe
}
parameters {

}
model {

}
generated quantities {
  array[draws] matrix[N, 2] all_log_lhood;  // Log-likelihood of each class

  for (i in 1:draws) {
    matrix[N, 2] log_lhood = rep_matrix(rep_row_vector(0, 2), N);
    matrix[2, D] beta_var = post_beta_var[i];
    matrix[N, D] lam1 = col(x, 1) * row(beta_var, 1);  // Intercept only
    matrix[N, D] lam2 = x * beta_var;  // Intercept + beta
    vector[D] phi = post_phi[i];
    for (n in 1:N) {
      for (d in 1:D) {
        log_lhood[n, 1] += neg_binomial_2_log_lpmf(y[n, d] | lam1[n, d] + log_depths[n], phi[d]);
        log_lhood[n, 2] += neg_binomial_2_log_lpmf(y[n, d] | lam2[n, d] + log_depths[n], phi[d]);
      }
    }
    all_log_lhood[i] = log_lhood;
  }
}

I've tried this out on a small dataset (Qiita study ID: 11402) and it seems pretty promising. Using only Intercept + one predictor, we get a 60% accuracy. With a stronger microbial effect and a more robust model, we should hopefully see better performance. Log-likelihoods were summed across all chains and draws.

mortonjt · 2022-05-19T14:38:06Z

mortonjt
May 19, 2022

That looks promising!

…

On Wed, May 18, 2022 at 5:58 PM Gibs ***@***.***> wrote: @mortonjt <https://github.com/mortonjt> and I discussed using BIRDMAn as a classifier. The idea is that you can fit parameters on a training dataset and use these parameters on a testing dataset. The testing model will evaluate the log-likelihood of the parameters on each class and you can use the highest log-likelihood to determine the most likely class. Training model: data { int<lower=1> N; int<lower=1> p; vector[N] depth; int y[N]; matrix[N, p] x; real<lower=0> B_p; real<lower=0> phi_s; }parameters { vector[p] beta_var; real<lower=0> reciprocal_phi; }transformed parameters { real phi = 1 / reciprocal_phi; vector[N] lam = x*beta_var + depth; }model { beta_var[1] ~ normal(-6, B_p); for (j in 2:p) { beta_var[j] ~ normal(0, B_p); } reciprocal_phi ~ cauchy(0, phi_s); y ~ neg_binomial_2_log(lam, phi); }generated quantities { vector[N] y_predict; vector[N] log_lhood; for (n in 1:N) { y_predict[n] = neg_binomial_2_log_rng(lam[n], phi); log_lhood[n] = neg_binomial_2_log_lpmf(y[n] | lam[n], phi); } } Testing model: data { int<lower=1> D; // Number of microbes int<lower=1> N; // Number of samples int<lower=1> draws; // Number of draws real log_depths[N]; // Log sequencing depths int<lower=0> y[N, D]; // Count data matrix[N, 2] x; // Design matrix (all ones) array[draws] matrix[2, D] post_beta_var; // Posterior draws array[draws] vector[D] post_phi; // Overdispersion per microbe }parameters { }model { }generated quantities { array[draws] matrix[N, 2] all_log_lhood; // Log-likelihood of each class for (i in 1:draws) { matrix[N, 2] log_lhood = rep_matrix(rep_row_vector(0, 2), N); matrix[2, D] beta_var = post_beta_var[i]; matrix[N, D] lam1 = col(x, 1) * row(beta_var, 1); // Intercept only matrix[N, D] lam2 = x * beta_var; // Intercept + beta vector[D] phi = post_phi[i]; for (n in 1:N) { for (d in 1:D) { log_lhood[n, 1] += neg_binomial_2_log_lpmf(y[n, d] | lam1[n, d] + log_depths[n], phi[d]); log_lhood[n, 2] += neg_binomial_2_log_lpmf(y[n, d] | lam2[n, d] + log_depths[n], phi[d]); } } all_log_lhood[i] = log_lhood; } } I've tried this out on a small dataset (Qiita study ID: 11402) and it seems pretty promising. Using only Intercept + one predictor, we get a 60% accuracy. With a stronger microbial effect and a more robust model, we should hopefully see better performance. Log-likelihoods were summed across all chains and draws. [image: image] <https://user-images.githubusercontent.com/4030868/169162448-6e82adda-7870-42cf-b274-563233bc15cf.png> — Reply to this email directly, view it on GitHub <https://github.com/gibsramen/BIRDMAn/discussions/70>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA75VXKQUP33NOJ7BIPLOI3VKVRYNANCNFSM5WJ3P2GQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

mortonjt · 2022-05-20T20:08:30Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building a classifier using BIRDMAn #70

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Building a classifier using BIRDMAn #70

gibsramen May 18, 2022 Collaborator

Replies: 2 comments · 2 replies

mortonjt May 19, 2022

mortonjt May 20, 2022

mortonjt May 20, 2022

mortonjt May 20, 2022

gibsramen
May 18, 2022
Collaborator

Replies: 2 comments 2 replies

mortonjt
May 19, 2022

mortonjt
May 20, 2022