Name		Name	Last commit message	Last commit date
parent directory ..
PST_host_prediction @ 3efe75d		PST_host_prediction @ 3efe75d
genome_embeddings		genome_embeddings
protein_embeddings		protein_embeddings
README.md		README.md

README.md

PST manuscript methods

The PST manuscript methods section map to different repositories or files located in this main repository. Files referenced in these notebooks are located in the DRYAD repository (datasets, supplementary data, or supplementary tables). The supplementary tables may also be found associated with the manuscript itself.

The files should have the same names. However, due to the combined sized of all datasets/ files (>170GB), these files are individually grouped into subgroups in the DRYAD repository. The specific file names are the same as referenced in these notebooks, but the DRYAD README will tell you what specific tarball you need.

Be warned that the memory requirements of some of these analyses can reach up to 1TB if you try to reproduce these analyses with the full datasets.

ESM2 protein language model embeddings
Modified Leave-One-Group-Out cross validation and hyperparameter tuning
- Part of the specific implementation is also found here
GenSLM open reading frame (ORF) and genome embeddings
Hyena-DNA genome embeddings
Tetranucleotide frequency vectors as simple genome embeddings
Clustering genome and protein embeddings
- Genomes
- Proteins
Genome and protein clustering evaluation
- Genome viral and host taxonomy purity
- Protein functional purity
Average amino acid identity (AAI)
- Averaging AAI over each genome cluster found here
Average amino acid identity (AAI) genome clustering
Protein functional annotation
- Re-categorizing VOG
Protein attention scaling and analysis
Protein annotation improvement
Protein function co-clustering
Protein functional module detection
Capsid structure searches
Graph-based host prediction framework
Constructing the virus-host interaction network
- Specific knowledge graphs can be found here
Host prediction model evaluation
- Choosing the best models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manuscript

manuscript

README.md

PST manuscript methods

Table of contents

Files

manuscript

Directory actions

More options

Directory actions

More options

Latest commit

History

manuscript

Folders and files

parent directory

README.md

PST manuscript methods

Table of contents