-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy path03_RMSF.jl
83 lines (69 loc) · 3.27 KB
/
03_RMSF.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# # Root Mean Squared Fluctuation (RMSF)
#
# md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/cookbook/notebooks/03_RMSF.ipynb)
# md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/cookbook/notebooks/03_RMSF.ipynb)
#
#
# ## Problem description
#
# The [Root Mean Squared Fluctuation (RMSF)](https://en.wikipedia.org/wiki/Mean_squared_displacement)
# is a common way to measure residue flexibility in a structural ensemble.
# It is a measure of how far is the residue moving from its average position
# in the group of structures. Usually, we represent a residue position with
# the spatial coordinates of its alpha carbon.
#
# The protein structures should be previously superimposed to calculate the
# RMSF, for example, by using the `superimpose` function of the
# [`PDB` module of `MIToS`](@ref Module-PDB). In this example, we are going
# to measure the RMSF of each residue from an NMR ensemble using the
# `rmsf` function.
#
# The structure superimposition could be the most complicated step of the
# process, depending on the input data. In particular, it structures come
# from different PDB structures or homologous proteins can require the use
# of external programs,
# as [MAMMOTH-mult](https://ub.cbm.uam.es/software/online/mamothmult.php) or
# [MUSTANG](https://lcb.infotech.monash.edu/mustang/) among others,
# tailored for this task.
#
# In this case, we are going to use an NMR ensemble. Therefore, we are not
# going to need to superimpose the structures as NMR models have the
# same protein sequence and are, usually, well-aligned.
#
#
# ## MIToS solution
import MIToS
using MIToS.PDB
using Plots
# Lets read the NMR ensemble:
pdb_file = abspath(pathof(MIToS), "..", "..", "test", "data", "1AS5.pdb")
pdb_res = read_file(pdb_file, PDBFile, occupancyfilter = true)
#md nothing # hide
# We set `occupancyfilter` to `true` to ensure that we have one single set of
# coordinates for each atom. That filter isn't essential for NMR structures,
# but It can avoid multiple alpha carbons in crystallographic structures with
# disordered atoms.
# We can get an idea of the alpha carbon positions by plotting these residues:
scatter(pdb_res, legend = false)
# As we saw in the previous plot, the structure doesn't need to be
# superimposed. Now, we are going to separate each model into different
# vectors, storing each vector into a `Dict`:
models = Dict{String,Vector{PDBResidue}}()
for res in pdb_res
push!(get!(models, res.id.model, []), res)
end
# Then, we simply need to collect all the PDB models in the values
# of the `Dict`, to get the vector of `PDBResidues` vectors required
# to calculate the RMSF.
pdb_models = collect(values(models))
#md nothing # hide
# And, finally, call the `rmsf` function on the list of structures. It is
# important that all the vectors has the same number of `PDBResidue`s. This
# function assumes that the nth element of each vector corresponds to the same
# residue:
RMSF = rmsf(pdb_models)
# This return the vector of RMSF values for each residue, calculated using
# the coordinates of the alpha carbons.
# You can plot this vector to get an idea of the which are the most flexible
# position in your structure:
plot(RMSF, legend = false, xlab = "Residue", ylab = "RMSF [Å]")