@info "Pfam docs"
MIToS defines methods and types useful for any MSA. The Pfam
module uses other MIToS
modules in the context of Pfam MSAs, where it’s possible to us determine how structure and
sequence information should be mapped. This module defines functions that go from a Pfam
MSA to the protein contact prediction performance of pairwise scores estimated from that MSA.
using MIToS.Pfam # to load the Pfam module
- [Download and read](@ref Getting-a-Pfam-MSA) Pfam MSAs.
- Obtain [PDB information](@ref Getting-PDB-information-from-an-MSA) from alignment annotations.
- [Map](@ref Getting-PDB-information-from-an-MSA) between sequence/alignment residues/columns and PDB structures.
- Measure of [AUC](@ref PDB-contacts-and-AUC) (ROC curve) for [protein contact](@ref PDB-contacts-and-AUC) prediction of MI scores.
Pages = ["Pfam.md"]
Depth = 4
The function downloadpfam
takes a Pfam accession and downloads a Pfam MSA in Stockholm
format. Use read
function and the Stockholm
FileFormat
to get a
AnnotatedMultipleSequenceAlignment
object with the MSA and its Pfam annotations.
You must set generatemapping
and useidcoordinates
to true
the first time you read
the downloaded MSA. This is necessary to some of the methods in the Pfam
module.
using MIToS.Pfam
pfamfile = downloadpfam("PF12464")
msa = read(pfamfile, Stockholm, generatemapping=true, useidcoordinates=true)
The function getseq2pdb
parses the MSA annotations to return a Dict
from the sequence
identifier in the MSA to PDB and chain codes.
getseq2pdb(msa)
Once you know the association between PDB chains and sequences, you can use that
information together with the msacolumn2pdbresidue
function to get the PDB residue
number that correspond to each MSA column for given a determined sequence and PDB chain.
That function downloads information from SIFTS to generate the mapping.
col2res = msacolumn2pdbresidue(msa, "MAA_ECOLI/7-58", "1OCX", "C")
The returned dictionary can be used to get the PDB residue associated to each column
(using the msaresidues
function)...
using MIToS.PDB
pdbfile = downloadpdb("1OCX")
pdb = read(pdbfile, PDBML)
resdict = @residuesdict pdb model "1" chain "C" group "ATOM" residue All
msaresidues(msa, resdict, col2res)
...or to delete the columns without PDB residues (using the hasresidues
function):
using MIToS.MSA
filtercolumns!(msa, hasresidues(msa, col2res))
The Dict
between MSA columns and PDB residue number also can be used to generate a
protein contact map associated to the MSA.
cmap = msacontacts(msa, resdict, col2res)
That protein contact map can be used to calculate the Area Under the ROC Curve for a given
score with the AUC
function.
using MIToS.Information
ZMIp, MIp = buslje09(msa)
using ROCAnalysis # You need to load ROCAnalysis to use the AUC function
AUC(ZMIp, cmap)