Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
fastg2protlib		fastg2protlib
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

FASTG to Protein Library

This package generates a candidate protein library in two phases:

Parsing a FASTG file to create graph traversals of longer stretches of DNA
- FASTG is parsed into a directed graph. A depth-first search is made on all connecting edges. The DFS traversal is then used to concatenate all DNA sequences in the path.
- DNA sequences are translated to mRNA and split into candidate proteins at the stop codon. Each DFS traversal can, and will, produce a set of candidate protein sequences.
- Protein sequences are filtered on length and amino acid redundancy.
- Protein sequences are cleaved into peptide sequences.
- DFS traversals, proteins and peptides are stored in a SQLite database. The linking relationship between all three is maintained in the DB.
- A FASTA file of peptides is produced for the user. This FASTA file is to be used in a search against MSMS data.
Using verified peptides as a filter to produce a final candidate peptide library
- The user will invoke the code with
  - DB
  - list of peptide sequences or peptide FASTA
  - It is expected that the submitted peptides have been verified against MSMS and they represent found and identified peptide sequences
- The verified peptides are used to filter proteins from the database, these proteins become the final library.
- The verified peptides are used to score the proteins for
  - coverage
  - percent of verified v. total peptide association
- Final user output
  - SQLite database
  - Protein score text file, comma delimited
  - Filtered protein FASTA file

About

FASTG to Protein FASTA

Custom properties

Report repository

Releases 3

Scehma Edit Latest

Packages

No packages published

Languages

Python 100.0%