Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QIIME 2 plugin #26

Open
gibsramen opened this issue Apr 7, 2021 · 5 comments
Open

Add QIIME 2 plugin #26

gibsramen opened this issue Apr 7, 2021 · 5 comments
Milestone

Comments

@gibsramen
Copy link
Collaborator

With the upcoming release of QIIME 2 2021.4 the underlying version of Python is being updated from 3.6 to 3.8. Hopefully this should allow BIRDMAn to be converted into a valid plugin.

Need to consider semantic type & transformer for multi-dimensional arrays (relevant Q2 forum post cc @mortonjt).

See preview for additional details.

@mortonjt
Copy link

mortonjt commented Apr 7, 2021

This is going to be very exciting! CC @ebolyen, @thermokarst

I've been prototyping out what the underlying types could look like in q2-differential. Right now, I'm converging to xarray / arviz, since the netCDF format seems to fit the bill (see _transformer.py). Futhermore, I've had some success with these types in a couple of different packages namely q2-batch and q2-fido.

Regarding turning this into a pure qiime2 plugin, I'm not exactly sure what the best route is. The problem is that MCMC is extremely compute hungry and process standard microbiome datasets is not practical without cluster support. I personally found that dask-jobqueue can be extremely useful for parallelizing these processes (see PR here); but the problem is that qiime2 doesn't currently play nicely with dask-jobqueue due to network issues that I don't currently understand.

So I see 3 possible options to consider

  1. Create a qiime2 plugin that operates on a single taxon at a time; that way power hungry users can wrap qiime2 with slurm et al and parallelize as fast as the scheduler allows them to
  2. Forget the qiime2 plugin_setup.py and just input / save qiime2 Artifacts directly and use cluster schedulers in the back end. This won't make the plugin visible, or keep track of provenance or enable type checking.
  3. Push to figure out how to integrate dask into qiime2

@gibsramen gibsramen added this to the v0.1.0 milestone Apr 7, 2021
@mortonjt
Copy link

More thoughts on the qiime2 plugin, I think the more pressing issue is to finalize the types.

It'll be difficult to anticipate all of the possible use-cases of the FeatureTensor, but perhaps there are a few things we can set in stone, namely this tensor type

  1. Can have optional sampleid axis
  2. Is required to have featureid axis
  3. Is required to have monte_carlo_samples axis

Of course, there can be multiple subtypes like FeatureTensor[Differential] that is required to have a covariates axis, or FeatureTensor[Longitudinal] is required to have a time axis and a covariate axis. From what I can tell xarray and/or arviz seems to fit the bill for this. The key thing here is figuring out how to standardize the allowed vs optional axes.

@gibsramen
Copy link
Collaborator Author

I agree with (1) & (2) but I'm not sure about (3). Do we know that all use cases of FeatureTensor will involve MC sampling? I'm not super familiar with q2-micom but from the Q2 forum post @cdiener mentioned that fluxes are stored as (sample) x (taxon) x (flux for each reaction) which would seem applicable here.

Related to this I think xarray.Dataset would be better than arviz.InferenceData for generalizability.

@mortonjt
Copy link

@gibsramen , what do you think about refining this to MonteCarloTensor to make it explicitly for Bayesian computation?
I'm suggesting this since we already have quite a few use cases with upcoming qiime2 Bayesian plugins; which we have a good handle on fleshing out the requirements.

I'm less familiar with q2-micom, but @cdiener feel free to comment and we can brainstorm additional tensor types.

@gibsramen
Copy link
Collaborator Author

Yeah I think MonteCarloTensor is a great idea to standardize specifically for the Bayesian stuff going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants