Skip to content

Commit

Permalink
refactor, added examples
Browse files Browse the repository at this point in the history
  • Loading branch information
Robaina committed Sep 3, 2022
1 parent 8545862 commit 43ee17f
Show file tree
Hide file tree
Showing 21 changed files with 469 additions and 78,510 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ gbk_data/*
data/*
tree_test/
demo_data/*
traits/*

# Files
test.ipynb
Expand Down
309 changes: 51 additions & 258 deletions README.ipynb

Large diffs are not rendered by default.

96 changes: 95 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,95 @@
# Genbankpy
# Downloading and parsing GenBank files from Python

## Installation
1. Fork git repo into local machine (click on fork) and clone, or simply clone main branch with
```
git clone https://github.com/Robaina/GenBankpy.git
```
2. CD to project directory and set conda environment if not already set:
```
conda env create -n genbankpy -f environment.yml
```

3. Activate environment:
```
conda activate genbankpy
```


```python
from pathlib import Path
from genbankpy.parser import GenBankFastaWriter, GBK


sp_list = [
'Halobacterium salinarum',
'Escherichia coli',
'Pseudomonas aeruginosa',
'Proteus mirabilis',
'Klebsiella pneumoniae',
'Prochlorococcus marinus',
'Pelagibacter ubique'
]

gbkwriter = GenBankFastaWriter.fromSpecies(species_list=sp_list,
only_latest=True,
data_dir="demo_data")

# Alternatively, if already downloaded:
# gbkwriter = GenBankFastaWriter.fromGBKdirectory(gbk_dir="demo_data")

gbkwriter.writeSequencesInFasta(
gene_keywords={'product': ['pyruvate kinase']},
output_fasta='demo_results/pyruvate_kinase_demo.faa',
sequence='protein'
)
```

Now we infer a phylogenetic tree using [MUSCLE](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-113) and [fasttree](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0009490). Results will be stored in /results/demo_tree.

Next, we visualize the infered tree with [ETE3](http://etetoolkit.org/) (not included in the provided genbankpy conda environment):


```python
# Visualizing tree
from ete3 import Tree


t = Tree("demo_results/demo_tree/ref_database.newick")
t.render("%%inline")
```





![png](README_figures/output_3_0.png)




# Additional examples of the filtering capabilities


```python
# Write fasta containing nucleotide sequences corresponding to Urease alpha
gbkwriter.writeSequencesInFasta(
gene_keywords={'product': ['urease', 'alpha']},
output_fasta='demo_results/ureC.fasta',
sequence='nucleotide'
)

# Write fasta containing peptide sequences corresponding to Urease alpha
gbkwriter.writeSequencesInFasta(
gene_keywords={'product': ['urease', 'alpha']},
output_fasta='demo_results/ureC.faa',
sequence='protein'
)

# Write fasta containing nucleotide sequences to 16S
gbkwriter.writeSequencesInFasta(
gene_keywords={'product': ['16S']},
output_fasta='demo_results/16s.fasta',
sequence='nucleotide'
)
```
Binary file added README_figures/output_3_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
86 changes: 86 additions & 0 deletions demo_results/16s.fasta

Large diffs are not rendered by default.

132 changes: 132 additions & 0 deletions demo_results/demo_tree/ref_database.faln
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
>GCF_004799605_Halobacterium_salinarum_WP_010902196.1_pyruvate_kinase
------MRNAKIVCTLGPATNDDASVRALADAGMTVARINASHGDAAARRELIETTRAVD
AQTDKPLAVMLDTQGPEVRTAPIDDDGTIHIETGSDVAFV---EGDTATPDRIGISTS--
IAAAEPGDTVLLDDGRIEADVDRVA--GDTVHATVVSGGSLGSRKGVNVPGVELDVDVVT
EKDRRDLALAAELDVDYVAASFVRDADDVLAVNRVLESH-GADIPIVAKIERAGAV---E
NLDGIIDAAQGVMVARGDLGVECPMEDVPMIQKRIIAQCRDAGVPVITATEMLDSMVHAR
RPTRAEASDVANAVLDGTDGVMLSAETAVGDNPTRVVETMDRIVREVEGSAEYSERQEQ-
AVPTADGTAKTDALARSARYLARDID-ASAVVVASESGYTARKAAKFRPSVPVVCATPSH
DVRRQLALNWGVHANYAAVAEGDATTVVERAVQAAVDSGVVASGDTVVVLVGMMTELEGA
STTNTLKVHVAAETLSTGRAVVDGRTTGRTYRAD----SGDLSDAPENAVVLLAHGFDGE
FDGDLSKIGAIVSADAGLTGYPAVIARELDVPMVGDVDVDAVPAGDLVTVDGERGVVYEA
DQ---------
>GCF_000069965_Proteus_mirabilis_WP_004245632.1_pyruvate_kinase
------MRKTKIVATLGPASCSEQMIEKLILAGANVFRLNFSHGTREQHQSTAATIRQVA
KKHRVFIGILADLQGPKIRIASFKNN-AIQLTQGDSFILNADLDSTQGDEQQVGLDYPQL
VQEVTPGNILLLDDGNIQLKVRAVN--NNNIETVVTVGGKLSNRKGINLLGGGLSAPALT
DKDKQDIHTAAAIQADYIAVSFPRNGADIEYARRLIVAA-GSQAKIVAKVERAEVVSCQA
NMDDIIQASDVIMVARGDLAVEIGDACLPGAQKQLIARCRALGCPVITATQMMESMIENP
MPTRAEVMDIANAVGDGTDAVMLSAETAAGKYPEEAVRAMARVAEGA--ERSFAANAENP
WQSPSYYSQTGRWIALAATTAAFHNDKHLSVAALTDNGQSVTLLSRFMPNNNIYALTDNP
ALAGQLTVLRGVTPVTYQRQN--NNDGDENIMQKLQTEGLLTDIHSLLITRLSTFEKTG-
-ESDCCHLVPVKQVTTALT-----------------------------------------
------------------------------------------------------------
-----------
>GCF_000006765_Pseudomonas_aeruginosa_NP_253019.1_pyruvate_kinase
----MSVRRTKIVATLGPASNSPEVLEQLILAGIDVARLNFSHGTPDEHRARARLVRELA
AKHGRFVALLGDLQGPKIRIAKFANK-RIELQVGDKFRFSTSHARDAGTQEVVGIDYPDL
VKDCGVGDELLLDDGRVVMVVEEVA--ADELRCRVLIGGPLSDHKGINRRGGGLTAPALT
DKDKADIKLAADMDLDYVAVSFPRDAKDMEYARRLLTEA-GGKAWLVAKIERAEAVADDD
ALDGLIRASDAVMVARGDLGVEIGDAELVGIQKKIILHARRNNKVVITATQMMESMIHSP
MPTRAEVSDVANAVLDYTDAVMLSAESAAGEYPVEAVKAMARVCQGA--EKHPTSQKSS-
HRLGQTFDRCDESIALASMYTANHFPGIKAIICLTESGFTPLIMSRIRSSVPIYAYSPHR
ETQARVAMFRGVETIPFDPAALPAEKVSQAAVDELLKRGVVTKGDWVILTKGDSYTAQG-
-GTNTMKVLHVGDLLV--------------------------------------------
------------------------------------------------------------
-----------
>GCF_000069965_Proteus_mirabilis_WP_004242765.1_pyruvate_kinase
--MSRRLRRTKIVTTLGPATDRDNNLEKIITAGANVVRLNFSHGSAEDHLARANRTREIA
ARLGRHVAILGDLQGPKIRVSTFKEG-KVFLNVGDKFLLDANLEKGEGDQTKVGIDYKGL
PADVVPGDILLLDDGRVQLKVLKVE--GLKVFTEVTVGGPLSNNKGINKLGGGLSAEALT
EKDKQDIITAAKIGVDYLAVSFPRTGEDLNYARRLARDA-GCECQIVAKVERAEAVANDE
IIDEIILASDVVMVARGDLGVEIGDPELVGVQKKLIRRARQLNRVVITATQMMESMITNP
MPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPAETVAAMAQVCLGA--EKMPAANVSK-
HRLDMVFDNAEEAIAMSTMYAANHMKGVNAIIAMTESGRTARMMSRISTGLPIFSMSRHE
KTLNQTALYRGVTPVYCSTHT-DGIAAANEAIMRLCEKGFLVSGDLVLVTQGDQMGTIG-
-STNTCRILTVE------------------------------------------------
------------------------------------------------------------
-----------
>GCF_000008865_Escherichia_coli_NP_310591.1_pyruvate_kinase
--MSRRLRRTKIVTTLGPATDRDNNLEKVIAAGANVVRMNFSHGSPEDHKMRADKVREIA
AKLGRHVAILGDLQGPKIRVSTFKEG-KVFLNIGDKFLLDANLGKGEGDKEKVGINYKGL
PADVVPGDILLLDDGRVQLKVLEVQ--GMKVFTEVTVGGPLSNNKGINKLGGGLSAEALT
EKDKADIKTAALIGVDYLAVSFPRCGEDLNYARRLARDA-GCDAKIVAKVERAEAVCSQD
AMDDIILASDVVMVARGDLGVEIGDPELVGIQKALIRRARQLNRAVITATQMMESMITNP
MPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPSETVAAMARVCLGA--EKIPSINVSK-
HRLDVQFDNVEEAIAMSAMYAANHLKGVTAIITMTESGRTALMTSRISSGLPIFAMSRHE
RTLNLTALYRGVTPVHFDSAN-DGVAAASEAVNLLRDKGYLMSGDLVIVTQGDVMSTVG-
-STNTTRILTVE------------------------------------------------
------------------------------------------------------------
-----------
>GCF_000240185_Klebsiella_pneumoniae_YP_005227690.1_pyruvate_kinase
--MSRRLRRTKIVTTLGPATDRDNNLEKVIAAGANVVRMNFSHGTPEDHQLRADKVREIA
AKLGRHVAILGDLQGPKIRVSTFKEG-KIFLNVGDKFLLDANLGKGEGDKEKVGIDYKGL
PADVVPGDILLLDDGRVQLKVLEVQ--GMKVFTEVTVGGPLSNNKGINKLGGGLSAEALT
DKDKADIVTAAKIGVDYLAVSFPRCGEDLNYARRLARDA-GCDAKIVAKVERAEAVCDQD
AMDDVILASDVVMVARGDLGVEIGDPELVGIQKALIRRARQLNRSVITATQMMESMITNP
MPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPSETVAAMARVCLGA--EKIPSLNVSK-
HRLDVQFDNVEEAIAMSAMYAANHLKGITAIITMTESGRTALMTSRISSGLPIFALSRHE
RTLNLTALYRGVTPVFFDSQN-DGVAAAHDAVNLLRDKGYLVSGDLVVVTQGDVMSTIG-
-STNTTRILTVE------------------------------------------------
------------------------------------------------------------
-----------
>GCF_000006765_Pseudomonas_aeruginosa_NP_250189.1_pyruvate_kinase
MTA---DKKAKILATLGPATRSRDDIRALVEAGANLLRLNFSHGDYADHAQRFAWVREVE
AELNYPIGVLMDLQGPKLRVGRFAAG-AVQLQRGQTFTLDLS-DAP-GDERRVNLPHPEI
IHALEPGMSLLLDDGKIRLEVVNCH--SDAIETRVAVGGELSDRKGVNVPEAVLQLSPLT
DKDRRDLAFGLELGVDWVALSFVQRPEDIDEARGLI----GDKAFLMAKIEKPSAV---S
AIEAIAERADAIMVARGDLGVEVPAESVPGIQKRIVQVCRQLGKPVVVATQMLESMRFSP
APTRAEVTDVATAVGAGADAVMLSAETASGQYPREAVEMMAKIVRQVEAEPDYHVQLEV-
NRPQPDA-TVSDAISCAIRRVSRILP-VAVLVNYTESGNSTLRAARERPKAPILSLTPNL
RTARRLTVAWGVYSVVNEQLA-HVDEICSTALDIALAQRMARRGDTVVVTAGVPFGRPG-
-STNMLRIETV---------------------------APPLGDL---------------
------------------------------------------------------------
-----------
>GCF_000007925_Prochlorococcus_marinus_WP_011125075.1_pyruvate_kinase
MTTIDLKRRTKIVATIGPATESPEKITELIKAGATTFRLNFSHGDHEEHAKRIKTIRSVA
SDLGVNIGILQDLQGPKIRLGRFKEG-PVNLKTGDVFALTSE-NKDC-NQEIANVTYENL
VNEVEKGKRILLDDGRVEMIVENVDKKNKSLICKVTVGGILSNNKGVNFPDVQLSINALT
EKDKIDLSFGLKQGVDWVALSFVRNPADIQEIKELIRRH-GYTTPIVAKIEKFEAI---D
QIDSILSLCDGVMVARGDLGVEMDAEEVPLLQKELIKKANSLGIPIITATQMLDSMASSP
RPTRAEVSDVANAILDGTDAVMLSNETAVGDYPIEAVETMAKIARRI--ERDYPQRALE-
SHLPST---IPNAISAAVSTIARQLN-AAAILPLTKSGATAHNVSKFRPATPILAITSEI
SVARRLQLVWGVSPLLIDAQK-STSKTFGIAMEQAMDMKLLKPGDQVVETAGTLTGISG-
-STDLIKVGIVSKIVASGKTKPIVKEGTISGKLRVINKATDLSDLKSGEILVLAENVQYN
LDTKTHISAMIFEGEYSLINTNNQIEDNNIIPAIYNVEGACTKFKNGEIVTLDLKDGNII
KGLAQDLKTYN
>GCF_000069965_Proteus_mirabilis_WP_004248168.1_pyruvate_kinase_PykF
------MKKTKIVCTIGPKTESEEKLTQLLDAGMNVMRLNFSHGDYEEHGNRIKNLRNVC
AKTGKKAAILLDTKGPEIRTIKLEGGNDVSLVAGQTFTFTTD-TSVVGNKDRVAVTYDGF
ARDLTVGNTVLVDDGLIGMKVIKVT--DTEVVCEVLNNGDLGENKGVNLPGVSIGLPALA
EKDKQDLIFGCEQGVDFVAASFIRKRSDVEEMRAHLKAHGGENIMIISKIENQEGL---N
NFDEILEASDGIMVARGDLGVEIPVEEVIFAQKMMIEKCNAARKVVITATQMLDSMIKNP
RPTRAEAGDVANAILDGTDAVMLSGESAKGKYPVEAVTIMATICDRT--DRIMKSRLES-
YQLGAKL-RVTEAVCRGAVEMAEKLD-APLIVVATYGGKSARSIRKYFPTAPILALTNNE
ETARQLLLVKGVTTQLVNKIA-STDDFYRIGKDAALSSGLAHAGDRVVMVTGALVD-SG-
-TTNTSSVHVL-------------------------------------------------
------------------------------------------------------------
-----------
>GCF_000008865_Escherichia_coli_NP_310410.1_pyruvate_kinase
------MKKTKIVCTIGPKTESEEMLAKMLDAGMNVMRLNFSHGDYAEHGQRIQNLRNVM
SKTGKTAAILLDTKGPEIRTMKLEGGNDVSLKAGQTFTFTTD-KSVIGNSEMVAVTYEGF
TTDLSVGNTVLVDDGLIGMEVTAIE--GNKVICKVLNNGDLGENKGVNLPGVSIALPALA
EKDKQDLIFGCEQGVDFVAASFIRKRSDVIEIREHLKAHGGENIHIISKIENQEGL---N
NFDEILEASDGIMVARGDLGVEIPVEEVIFAQKMMIEKCIRARKVVITATQMLDSMIKNP
RPTRAEAGDVANAILDGTDAVMLSGESAKGKYPLEAVSIMATICERT--DRVMNSRLEF-
NNDNRKL-RITEAVCRGAVETAEKLD-APLIVVATQGGKSARAVRKYFPDATILALTTNE
KTAHQLVLSKGVVPQLVKEIT-STDDFYRLGKELALQSGLAHKGDVVVMVSGALVP-SG-
-TTNTASVHVL-------------------------------------------------
------------------------------------------------------------
-----------
>GCF_000240185_Klebsiella_pneumoniae_YP_005227418.1_pyruvate_kinase
------MKKTKIVCTIGPKTESEEMLTKMLEAGMNVMRLNFSHGDYAEHGQRIQNLRNVM
SKTGKKAAILLDTKGPEIRTIKLEGGNDVSLKAGQTFTFTTD-KSVIGNNEIVAVTYEGF
TSDLAVGNTVLVDDGLIGMEVTAIE--GNKVICKVLNNGDLGENKGVNLPGVSIALPALA
EKDKQDLIFGCEQGVDFVAASFIRKRSDVVEIREHLKAHGGENIQIISKIENQEGL---N
NFDEILEASDGIMVARGDMGVEIPVEEVIFAQKMIIEKCIRARKVVITATQMLDSMIKNP
RPTRAEAGDVANAILDGTDAVMLSGESAKGKYPLEAVTIMATICERT--DRVMTSRLDF-
NNDNRKL-RITEAVCRGAVETAEKLE-APLIVVATQGGKSARAVRKYFPDATILALTTNE
TTARQLVLSKGVVPQLVEEIA-STDDFYHLGKDLALKSGLARKGDVVVMVSGALVP-SG-
-TTNTASVHVL-------------------------------------------------
------------------------------------------------------------
-----------
1 change: 1 addition & 0 deletions demo_results/demo_tree/ref_database.newick
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
((GCF_000069965_Proteus_mirabilis_WP_004245632.1_pyruvate_kinase:0.540564072,(GCF_000006765_Pseudomonas_aeruginosa_NP_253019.1_pyruvate_kinase:0.462181802,(GCF_000069965_Proteus_mirabilis_WP_004242765.1_pyruvate_kinase:0.097084550,(GCF_000008865_Escherichia_coli_NP_310591.1_pyruvate_kinase:0.024134197,GCF_000240185_Klebsiella_pneumoniae_YP_005227690.1_pyruvate_kinase:0.023515710)0.931:0.057851236)1.000:0.280157502)0.957:0.161444222)1.000:0.415252181,(GCF_004799605_Halobacterium_salinarum_WP_010902196.1_pyruvate_kinase:0.849641007,(GCF_000069965_Proteus_mirabilis_WP_004248168.1_pyruvate_kinase_PykF:0.068872124,(GCF_000008865_Escherichia_coli_NP_310410.1_pyruvate_kinase:0.025305557,GCF_000240185_Klebsiella_pneumoniae_YP_005227418.1_pyruvate_kinase:0.025183099)0.984:0.100534754)1.000:0.556959195)0.948:0.156115197,(GCF_000006765_Pseudomonas_aeruginosa_NP_250189.1_pyruvate_kinase:0.660070918,GCF_000007925_Prochlorococcus_marinus_WP_011125075.1_pyruvate_kinase:0.520263194)0.879:0.120950005);
22 changes: 22 additions & 0 deletions demo_results/pyruvate_kinase_demo.faa
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
>GCF_004799605_Halobacterium_salinarum_WP_010902196.1_pyruvate_kinase
MRNAKIVCTLGPATNDDASVRALADAGMTVARINASHGDAAARRELIETTRAVDAQTDKPLAVMLDTQGPEVRTAPIDDDGTIHIETGSDVAFVEGDTATPDRIGISTSIAAAEPGDTVLLDDGRIEADVDRVAGDTVHATVVSGGSLGSRKGVNVPGVELDVDVVTEKDRRDLALAAELDVDYVAASFVRDADDVLAVNRVLESHGADIPIVAKIERAGAVENLDGIIDAAQGVMVARGDLGVECPMEDVPMIQKRIIAQCRDAGVPVITATEMLDSMVHARRPTRAEASDVANAVLDGTDGVMLSAETAVGDNPTRVVETMDRIVREVEGSAEYSERQEQAVPTADGTAKTDALARSARYLARDIDASAVVVASESGYTARKAAKFRPSVPVVCATPSHDVRRQLALNWGVHANYAAVAEGDATTVVERAVQAAVDSGVVASGDTVVVLVGMMTELEGASTTNTLKVHVAAETLSTGRAVVDGRTTGRTYRADSGDLSDAPENAVVLLAHGFDGEFDGDLSKIGAIVSADAGLTGYPAVIARELDVPMVGDVDVDAVPAGDLVTVDGERGVVYEADQ
>GCF_000008865_Escherichia_coli_NP_310410.1_pyruvate_kinase
MKKTKIVCTIGPKTESEEMLAKMLDAGMNVMRLNFSHGDYAEHGQRIQNLRNVMSKTGKTAAILLDTKGPEIRTMKLEGGNDVSLKAGQTFTFTTDKSVIGNSEMVAVTYEGFTTDLSVGNTVLVDDGLIGMEVTAIEGNKVICKVLNNGDLGENKGVNLPGVSIALPALAEKDKQDLIFGCEQGVDFVAASFIRKRSDVIEIREHLKAHGGENIHIISKIENQEGLNNFDEILEASDGIMVARGDLGVEIPVEEVIFAQKMMIEKCIRARKVVITATQMLDSMIKNPRPTRAEAGDVANAILDGTDAVMLSGESAKGKYPLEAVSIMATICERTDRVMNSRLEFNNDNRKLRITEAVCRGAVETAEKLDAPLIVVATQGGKSARAVRKYFPDATILALTTNEKTAHQLVLSKGVVPQLVKEITSTDDFYRLGKELALQSGLAHKGDVVVMVSGALVPSGTTNTASVHVL
>GCF_000008865_Escherichia_coli_NP_310591.1_pyruvate_kinase
MSRRLRRTKIVTTLGPATDRDNNLEKVIAAGANVVRMNFSHGSPEDHKMRADKVREIAAKLGRHVAILGDLQGPKIRVSTFKEGKVFLNIGDKFLLDANLGKGEGDKEKVGINYKGLPADVVPGDILLLDDGRVQLKVLEVQGMKVFTEVTVGGPLSNNKGINKLGGGLSAEALTEKDKADIKTAALIGVDYLAVSFPRCGEDLNYARRLARDAGCDAKIVAKVERAEAVCSQDAMDDIILASDVVMVARGDLGVEIGDPELVGIQKALIRRARQLNRAVITATQMMESMITNPMPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPSETVAAMARVCLGAEKIPSINVSKHRLDVQFDNVEEAIAMSAMYAANHLKGVTAIITMTESGRTALMTSRISSGLPIFAMSRHERTLNLTALYRGVTPVHFDSANDGVAAASEAVNLLRDKGYLMSGDLVIVTQGDVMSTVGSTNTTRILTVE
>GCF_000006765_Pseudomonas_aeruginosa_NP_250189.1_pyruvate_kinase
MTADKKAKILATLGPATRSRDDIRALVEAGANLLRLNFSHGDYADHAQRFAWVREVEAELNYPIGVLMDLQGPKLRVGRFAAGAVQLQRGQTFTLDLSDAPGDERRVNLPHPEIIHALEPGMSLLLDDGKIRLEVVNCHSDAIETRVAVGGELSDRKGVNVPEAVLQLSPLTDKDRRDLAFGLELGVDWVALSFVQRPEDIDEARGLIGDKAFLMAKIEKPSAVSAIEAIAERADAIMVARGDLGVEVPAESVPGIQKRIVQVCRQLGKPVVVATQMLESMRFSPAPTRAEVTDVATAVGAGADAVMLSAETASGQYPREAVEMMAKIVRQVEAEPDYHVQLEVNRPQPDATVSDAISCAIRRVSRILPVAVLVNYTESGNSTLRAARERPKAPILSLTPNLRTARRLTVAWGVYSVVNEQLAHVDEICSTALDIALAQRMARRGDTVVVTAGVPFGRPGSTNMLRIETVAPPLGDL
>GCF_000006765_Pseudomonas_aeruginosa_NP_253019.1_pyruvate_kinase
MSVRRTKIVATLGPASNSPEVLEQLILAGIDVARLNFSHGTPDEHRARARLVRELAAKHGRFVALLGDLQGPKIRIAKFANKRIELQVGDKFRFSTSHARDAGTQEVVGIDYPDLVKDCGVGDELLLDDGRVVMVVEEVAADELRCRVLIGGPLSDHKGINRRGGGLTAPALTDKDKADIKLAADMDLDYVAVSFPRDAKDMEYARRLLTEAGGKAWLVAKIERAEAVADDDALDGLIRASDAVMVARGDLGVEIGDAELVGIQKKIILHARRNNKVVITATQMMESMIHSPMPTRAEVSDVANAVLDYTDAVMLSAESAAGEYPVEAVKAMARVCQGAEKHPTSQKSSHRLGQTFDRCDESIALASMYTANHFPGIKAIICLTESGFTPLIMSRIRSSVPIYAYSPHRETQARVAMFRGVETIPFDPAALPAEKVSQAAVDELLKRGVVTKGDWVILTKGDSYTAQGGTNTMKVLHVGDLLV
>GCF_000069965_Proteus_mirabilis_WP_004242765.1_pyruvate_kinase
MSRRLRRTKIVTTLGPATDRDNNLEKIITAGANVVRLNFSHGSAEDHLARANRTREIAARLGRHVAILGDLQGPKIRVSTFKEGKVFLNVGDKFLLDANLEKGEGDQTKVGIDYKGLPADVVPGDILLLDDGRVQLKVLKVEGLKVFTEVTVGGPLSNNKGINKLGGGLSAEALTEKDKQDIITAAKIGVDYLAVSFPRTGEDLNYARRLARDAGCECQIVAKVERAEAVANDEIIDEIILASDVVMVARGDLGVEIGDPELVGVQKKLIRRARQLNRVVITATQMMESMITNPMPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPAETVAAMAQVCLGAEKMPAANVSKHRLDMVFDNAEEAIAMSTMYAANHMKGVNAIIAMTESGRTARMMSRISTGLPIFSMSRHEKTLNQTALYRGVTPVYCSTHTDGIAAANEAIMRLCEKGFLVSGDLVLVTQGDQMGTIGSTNTCRILTVE
>GCF_000069965_Proteus_mirabilis_WP_004248168.1_pyruvate_kinase_PykF
MKKTKIVCTIGPKTESEEKLTQLLDAGMNVMRLNFSHGDYEEHGNRIKNLRNVCAKTGKKAAILLDTKGPEIRTIKLEGGNDVSLVAGQTFTFTTDTSVVGNKDRVAVTYDGFARDLTVGNTVLVDDGLIGMKVIKVTDTEVVCEVLNNGDLGENKGVNLPGVSIGLPALAEKDKQDLIFGCEQGVDFVAASFIRKRSDVEEMRAHLKAHGGENIMIISKIENQEGLNNFDEILEASDGIMVARGDLGVEIPVEEVIFAQKMMIEKCNAARKVVITATQMLDSMIKNPRPTRAEAGDVANAILDGTDAVMLSGESAKGKYPVEAVTIMATICDRTDRIMKSRLESYQLGAKLRVTEAVCRGAVEMAEKLDAPLIVVATYGGKSARSIRKYFPTAPILALTNNEETARQLLLVKGVTTQLVNKIASTDDFYRIGKDAALSSGLAHAGDRVVMVTGALVDSGTTNTSSVHVL
>GCF_000069965_Proteus_mirabilis_WP_004245632.1_pyruvate_kinase
MRKTKIVATLGPASCSEQMIEKLILAGANVFRLNFSHGTREQHQSTAATIRQVAKKHRVFIGILADLQGPKIRIASFKNNAIQLTQGDSFILNADLDSTQGDEQQVGLDYPQLVQEVTPGNILLLDDGNIQLKVRAVNNNNIETVVTVGGKLSNRKGINLLGGGLSAPALTDKDKQDIHTAAAIQADYIAVSFPRNGADIEYARRLIVAAGSQAKIVAKVERAEVVSCQANMDDIIQASDVIMVARGDLAVEIGDACLPGAQKQLIARCRALGCPVITATQMMESMIENPMPTRAEVMDIANAVGDGTDAVMLSAETAAGKYPEEAVRAMARVAEGAERSFAANAENPWQSPSYYSQTGRWIALAATTAAFHNDKHLSVAALTDNGQSVTLLSRFMPNNNIYALTDNPALAGQLTVLRGVTPVTYQRQNNNDGDENIMQKLQTEGLLTDIHSLLITRLSTFEKTGESDCCHLVPVKQVTTALT
>GCF_000240185_Klebsiella_pneumoniae_YP_005227418.1_pyruvate_kinase
MKKTKIVCTIGPKTESEEMLTKMLEAGMNVMRLNFSHGDYAEHGQRIQNLRNVMSKTGKKAAILLDTKGPEIRTIKLEGGNDVSLKAGQTFTFTTDKSVIGNNEIVAVTYEGFTSDLAVGNTVLVDDGLIGMEVTAIEGNKVICKVLNNGDLGENKGVNLPGVSIALPALAEKDKQDLIFGCEQGVDFVAASFIRKRSDVVEIREHLKAHGGENIQIISKIENQEGLNNFDEILEASDGIMVARGDMGVEIPVEEVIFAQKMIIEKCIRARKVVITATQMLDSMIKNPRPTRAEAGDVANAILDGTDAVMLSGESAKGKYPLEAVTIMATICERTDRVMTSRLDFNNDNRKLRITEAVCRGAVETAEKLEAPLIVVATQGGKSARAVRKYFPDATILALTTNETTARQLVLSKGVVPQLVEEIASTDDFYHLGKDLALKSGLARKGDVVVMVSGALVPSGTTNTASVHVL
>GCF_000240185_Klebsiella_pneumoniae_YP_005227690.1_pyruvate_kinase
MSRRLRRTKIVTTLGPATDRDNNLEKVIAAGANVVRMNFSHGTPEDHQLRADKVREIAAKLGRHVAILGDLQGPKIRVSTFKEGKIFLNVGDKFLLDANLGKGEGDKEKVGIDYKGLPADVVPGDILLLDDGRVQLKVLEVQGMKVFTEVTVGGPLSNNKGINKLGGGLSAEALTDKDKADIVTAAKIGVDYLAVSFPRCGEDLNYARRLARDAGCDAKIVAKVERAEAVCDQDAMDDVILASDVVMVARGDLGVEIGDPELVGIQKALIRRARQLNRSVITATQMMESMITNPMPTRAEVMDVANAVLDGTDAVMLSAETAAGQYPSETVAAMARVCLGAEKIPSLNVSKHRLDVQFDNVEEAIAMSAMYAANHLKGITAIITMTESGRTALMTSRISSGLPIFALSRHERTLNLTALYRGVTPVFFDSQNDGVAAAHDAVNLLRDKGYLVSGDLVVVTQGDVMSTIGSTNTTRILTVE
>GCF_000007925_Prochlorococcus_marinus_WP_011125075.1_pyruvate_kinase
MTTIDLKRRTKIVATIGPATESPEKITELIKAGATTFRLNFSHGDHEEHAKRIKTIRSVASDLGVNIGILQDLQGPKIRLGRFKEGPVNLKTGDVFALTSENKDCNQEIANVTYENLVNEVEKGKRILLDDGRVEMIVENVDKKNKSLICKVTVGGILSNNKGVNFPDVQLSINALTEKDKIDLSFGLKQGVDWVALSFVRNPADIQEIKELIRRHGYTTPIVAKIEKFEAIDQIDSILSLCDGVMVARGDLGVEMDAEEVPLLQKELIKKANSLGIPIITATQMLDSMASSPRPTRAEVSDVANAILDGTDAVMLSNETAVGDYPIEAVETMAKIARRIERDYPQRALESHLPSTIPNAISAAVSTIARQLNAAAILPLTKSGATAHNVSKFRPATPILAITSEISVARRLQLVWGVSPLLIDAQKSTSKTFGIAMEQAMDMKLLKPGDQVVETAGTLTGISGSTDLIKVGIVSKIVASGKTKPIVKEGTISGKLRVINKATDLSDLKSGEILVLAENVQYNLDTKTHISAMIFEGEYSLINTNNQIEDNNIIPAIYNVEGACTKFKNGEIVTLDLKDGNIIKGLAQDLKTYN
8 changes: 8 additions & 0 deletions demo_results/ureC.faa
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
>GCF_000008865_Escherichia_coli_NP_309351.2_urease_subunit_alpha
MSNISRQAYADMFGPTTGDKIRLADTELWIEVEDDLTTYGEEVKFGGGKVIRDGMGQGQMLSAGCADLVLTNALIIDYWGIVKADIGVKDGRIFAIGKAGNPDIQPNVTIPIGVSTEIIAAEGRIVTAGGVDTHIHWICPQQAEEALTSGITTMIGGGTGPTAGSNATTCTPGPWYIYQMLQAADSLPVNIGLLGKGNCSNPDALREQVAAGVIGLKIHEDWGATPAVINCALTVADEMDVQVALHSDTLNESGFVEDTLTAIGGRTIHTFHTEGAGGGHAPDIITACAHPNILPSSTNPTLPYTVNTIDEHLDMLMVCHHLDPDIAEDVAFAESRIRQETIAAEDVLHDLGAFSLTSSDSQAMGRVGEVVLRTWQVAHRMKVQRGPLPEESGDNDNVRVKRYIAKYTINPALTHGIAHEVGSIEVGKLADLVLWSPAFFGVKPATIVKGGMIAMAPMGDINGSIPTPQPVHYRPMFAALGSARHRCRVTFLSQAAAANGVAEQLNLHSTTAVVKGCRTVQKADMRHNSLLPDITVDSQTYEVRINGELITSEPADILPMAQRYFLF
>GCF_000006765_Pseudomonas_aeruginosa_NP_253555.1_urease_subunit_alpha
MKISRQAYADMFGPTVGDRVRLADTDLWIEVERDFTVYGEEVKFGGGKVIRDGMGQSQLGAAQVVDTVITNALILDHWGVVKADVGLKDGRIQAIGKAGNPDIQPGVNIAIGAGTEVIAGEGMILTAGGIDTHIHFICPQQIEEALMSGVTTMIGGGTGPAAGTNATTCTSGPWHMARMLQAADAFPMNIGFTGKGNASLPLPLEEQVLAGAIGLKLHEDWGSTPAAIDNCLEVAERHDIQVAIHTDTLNESGFVETTLGAFKGRTIHTYHTEGAGGGHAPDIIKACGFANVLPSSTNPTRPFTRNTIDEHLDMLMVCHHLDPAIAEDVAFAESRIRRETIAAEDILHDLGAFSMISSDSQAMGRVGEVITRTWQTADKMKRQRGRLDGDGARNDNFRARRYIAKYTINPAITHGISHEVGSVEAGKWADLVLWRPAFFGVKPSLILKGGAIAASLMGDINGSIPTPQPVHYRPMFASYAGSRHATSLTFVSQAAFAAGVPQQLGLRKAIGVVSGCRGVQKTDLIHNGYLPTIEVDAQNYQVRADGQLLWCEPADVLPMAQRYFLF
>GCF_000069965_Proteus_mirabilis_WP_004245262.1_urease_subunit_alpha
MKTISRQAYADMFGPTTGDRLRLADTELFLEIEKDFTTYGEEVKFGGGKVIRDGMGQSQVVSAECVDVLITNAIILDYWGIVKADIGIKDGRIVGIGKAGNPDVQPNVDIVIGPGTEVVAGEGKIVTAGGIDTHIHFICPQQAQEGLVSGVTTFIGGGTGPVAGTNATTVTPGIWNMYRMLEAVDELPINVGLFGKGCVSQPEAIREQITAGAIGLKIHEDWGATPMAIHNCLNVADEMDVQVAIHSDTLNEGGFYEETVKAIAGRVIHVFHTEGAGGGHAPDVIKSVGEPNILPASTNPTMPYTINTVDEHLDMLMVCHHLDPSIPEDVAFAESRIRRETIAAEDILHDMGAISVMSSDSQAMGRVGEVILRTWQCAHKMKLQRGTLAGDSADNDNNRIKRYIAKYTINPALAHGIAHTVGSIEKGKLADIVLWDPAFFGVKPALIIKGGMVAYAPMGDINAAIPTPQPVHYRPMYACLGKAKYQTSMIFMSKAGIEAGVPEKLGLKSLIGRVEGCRHITKASMIHNNYVPHIELDPQTYIVKADGVPLVCEPATELPMAQRYFLF
>GCF_000240185_Klebsiella_pneumoniae_YP_005228898.1_urease_subunit_alpha
MSNISRQAYADMFGPTVGDKVRLADTELWIEVEDDLTTYGEEVKFGGGKVIRDGMGQGQMLAADCVDLVLTNALIVDHWGIVKADIGVKDGRIFAIGKAGNPDIQPNVTIPIGASTEVIAAEGKIVTAGGIDTHIHWICPQQAEEALVSGVTTMVGGGTGPAAGTHATTCTPGPWYISRMLQAADSLPVNIGLLGKGNVSQPDALREQVAAGVIGLKIHEDWGATPAAIDCALTVADEMDIQVALHSDTLNESGFVEDTLAAIGGRTIHTFHTEGAGGGHAPDIITACAHPNILPSSTNPTLPYTLNTIDEHLDMLMVCHHLDPDIAEDVAFAESRIRRETIAAEDVLHDLGAFSLTSSDSQAMGRVGEVILRTWQVAHRMKVQRGALAEETGDNDNFRVKRYIAKYTINPALTHGIAHEVGSIEVGKLADLVVWSPAFFGVKPATVIKGGMIAIAPMGDINASIPTPQPVHYRPMFGALGSARHHCRLTFLSQAAAANGVAERLNLRSAIAVVKGCRTVQKADMVHNSLQPNITVDAQTYEVRVDGELITSEPADVLPMAQRYFLF
Loading

0 comments on commit 43ee17f

Please sign in to comment.