Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Outline the Study section #136

Merged
merged 2 commits into from
Dec 20, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions references/tags.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
tag citation
Zhou2015_deep_sea doi:10.1038/nmeth.3547
Chen2015_trans_species doi:10.1093/bioinformatics/btv315
Arvaniti2016_rare_subsets doi:10.1101/046508
Angermueller2016_single_methyl doi:10.1101/055715
Shaham2016_batch_effects arxiv:1610.04181
90 changes: 90 additions & 0 deletions sections/04_study.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
## How is deep learning used to study basic biological processes in a manner that may provide future insights into human disease?

*The (awkward) placeholder section title is intended to help define the scope.
We do not want this section to become a miscellaneous collection of everything
that does not fit in Categorize and Treat.*

*One proposal is that we organize this roughly by what is being predicted,
which will generally correspond to the types of data being used. For each
sub-section we can quickly introduce the prediction problem and cite some
examples of the relevance to disease. Hypothetically, if we had an algorithm
that produced perfect predictions on the task, what would we learn and how
could those predictions be used?*

*Existing reviews could be mentioned briefly.*

*It may not fit here, but there could be a general discussion of why different
neural network architectures are particularly well-suited for different types
of input data. For example, CNNs and RNNs for 1-dimensional data are used
in several categories below.*

*A few suggestions for sub-sections follow. Some of these could be left out
because our goal is not an exhaustive enumeration of methods. Some
are important areas of biology, but there may not be much deep learning-
specific content to present. Others may be important areas where we lack
expertise, in which case we may acknowledge the application area but not
dive into merits or weaknesses of individual methods.*

### Gene expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to assign these bits out like you did with the metagenomics section, you can assign this and splicing to me. I may see if we can snag a splicing guru from Penn (Yoseph Barash). If not, I did read a few of those papers and could write that bit.


*Predicting gene expression levels and unsupervised approaches for learning
from gene expression. Those could be divided into separate sub-sections.*

### Splicing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like splicing as separate from gene expression, unless we change things to "transcript expression"


*A separate section from general gene expression section above.*

### Transcription factors and RNA-binding proteins

*Existing reviews have covered some of these papers rather well and we do not
want to repeat what has already been well-stated elsewhere. This could
be split into two sub-sections or kept very brief.*

### Promoters, enhancers, and related epigenomic tasks

*We may want to be selective about what we discuss and not list every
application in this area.*

### Micro-RNA binding

*miRNAs are important biologically, but have neural networks produced anything
particularly notable in this area?*

### Protein secondary and tertiarty structure

*We have not surveyed this area comprehensively yet.*

### Signaling

*There is not much content here. Can [@tag:Chen2015_trans_species] be covered
elsewhere?*

### Cellular phenotypes

*These are primarily imaging-based phenotypes. We have not surveyed this area
very comprehensively. We could decide to not make imaging a primary focus,
refer to existing reviews, and mention only a few particularly noteworthy
representative papers. Alternatively, we need to expand our literature review
and summaries immediately if someone wants to be responsible for this
sub-section.*

*Transfer learning from non-biological datasets to biological imaging
data could fit here, and that does seem like an important topic. Or
transfer learning could be a more general topic for the Discussion section.*

### Single-cell

*There are not many neural network papers in this area (yet), unless we count
imaging applications. But there is still plenty to discuss. The existing
methods [@tag:Arvaniti2016_rare_subsets @tag:Angermueller2016_single_methyl]
use interesting network architectures to approach single-cell data.
[@tag:Shaham2016_batch_effects] could fit here.*

### Metagenomics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to update this to confirm that there will be a section? @gailrosen has sufficient content to fill out a section.

*@gailrosen will write this*

### Sequencing and variant calling

*We have one nanopore paper in the issues and very recent work on variant calling
that looks worthy of inclusion.*