-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-human species #19
Comments
Hi @FarmOmics. Yes, Avocado can be applied to any compendium of bulk genomic experiments. However, you need many experiments across tissues and assays for Avocado to be accurate. I don't know whether cattle have that many genomic experiments performed in them. Yes, the human model can be extended to other species (see https://www.biorxiv.org/content/10.1101/801183v3) if you have an alignment between species genomes or can remap the reads from the experiments performed in human to the cattle genome. The first is less computationally intensive, because you don't need to remap several thousand experiments. However, you still need to have many experiments performed in cattle. Let me know if you have any other qustions. |
I have cattle chipseq for five marks and ~20 tissues, if this set of data is enough to train a model? |
The way that Avocado is set up is that it can make predictions, even across species, for any assay that is measured at least once and any cell type that is assayed at least once. However, the predictions will be higher quality the more assays are available and the more related they are to the activity you're trying to predict. If you're trying to predict the binding of a very cell type-specific TF and only have a few histone modifications, you probably won't get great accuracy. But, if you're just trying to predict transcription from those histone modifications, you'll likely do pretty well because many histone modifications are correlated with expression. The way you get your data into the model is just by extracting the -log10 p-values from your bigWig, probably using pyBigWig, and binning those values at 25bp resolution, taking the average across the positions. You can drop the last bin if your genome isn't divisible by 25. Lifting over across species is more challenging because I didn't write clean code for that part. If you have som compute available, I'd actually recommend that you remap the human experiments you think are relevant to the cattle genome. The mapper will automatically take care of all the issues you might have using an alignment chain file (which I did). LiftOver would probably work as well. Let me know if you have any other questions. |
Can use this software in non-human species, e.g. cattle? If yes, how I can build pre-trained model? If human model can be extended to other species?
The text was updated successfully, but these errors were encountered: