Skip to content

genemine/BioBasics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Foundamentals to master for learning bioinformatics (under construction...).

1. Statistical and machine learning algorithms

Normal distribution
Poisson distribution
Probability density distribution and probability mass distribution
t test, Mann-Whitely U test
Why Multiple testing correction? Boferroni correction, FDR correction

t-SNE
Principal component analysis (PCA)
Eigenvector and eigen value of a matrix
Canonical correlation analysis

hierachical clustring
K-means clustering

Ordinary least square regression
Partial least squares
Logistic regression

Random Forests
Support vector machines

ROC curves/AUROC
Precision recall curves/AUPRC
Sensitivity
Specificity
F1-score

2. Programming:

Must to master: Python, R

3. Biology

  1. What are chromosomes, DNAs and genes? How are they related?

  2. Central dogma: how genetic info flows from DNA to proteins?

  3. What is transcription? What is translation?

  4. For a given organism such as a mouse or a human being, the genetic materials i.e. DNA of all the cells are almost the same. So, why do different tissues have so disparate cell types and functions? Understand that the differences among tissues are shaped by the different gene transcription networks: different tissues have the same set of genes, but their "working" (or "expressed", "active") genes were different. Genotypes are static, and molecular phenotypes such as gene expression and methylation are dynamic.

  5. Understand exons and introns.

  6. Understand alternative splicing (AS). Through AS, a single multi-exon gene can produce multiple different isoforms that may have different sequences, structures and functions. So, a single gene may produce multiple transcript isoforms and further multiple protein isoforms.

  7. Understand the basics of microarrays. Microarray is a generic techque.Based on the design of the chip, microarray can be used to measure gene expression, SNPs, methylation, etc.

  8. Understand the basics of DNA sequencing techniques. Sequencing techniques can be used to measure DNA sequences, gene expression, methylation, etc.

  9. Understand that each tissue contains different cell types and sub-cell types. Different cell types have different functions. So, tissues are heterogeneous in terms of cell composition.

  10. Single cell sequencing. A revolutionary technique to understand and dissect the tissue complexity. Measuring omics data at single-cell resoution.

  11. DNA differ from individual to individual. The differences among DNAs are sequence variation. They can be SNPs, and structural variants.

  12. Some genes are protein-coding, others are not. The non-coding genes are also important and may carry out regulatory functions.

  13. If genes are expressed, the chromatin containing the gene is open and thus can be accessed by e.g. transcription factors. Methods such as DNAase-seq and ATAC-seq can be used to measure the chromosome accessibility.

  14. What detemines the "open" or "close" status of chromosomes? Then you may want to learn some epigenetics. Gentics studies changes in DNA, while epigenetics focuses on non-DNA changes.

  15. What are transcription factors (TFs)? How many TFs are there for humans?

  16. How do transcrition factors work? What DNA sequences do TFs perfer to bind to?

  17. What are motifs? DNAs, RNAs and proteins have motifs. Transcription factor binding sites (TFBS) are an example of motifs. Sequence logos provide an important and intuitive model to visualize motifs. You can search the database, i.e. JASPAR, to find motifs.

  18. Genes may be mutated. Genes may gain new functions or lose functions as a result of mutation.

  19. Structures determine functions. Functions are what genes do. Understand Gene Ontology (GO) functional annotation databse. A gene may have multiple functions.

  20. Life is all about genes, gene products and their interactions. Changes in genes, gene products or the interactions may result in abnormality or diseases.

  21. A set of genes may work together for some purpose; they form a pathway or a network. The KEGG and MSigDB databases provide known pathways (gene sets).

  22. For complex diseases, it is of essential value to understand the genetic basis: causative genes and their related pathways.

  23. Genome-wide association studies (GWAS)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published