-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathsolute.py
1016 lines (921 loc) · 40 KB
/
solute.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""
========
Solute
========
:Author: Orion Cohen
:Year: 2021
:Copyright: GNU Public License v3
Solute is the core of solvation_analysis. Once you have specified the
solute and solvent, Solute can calculate the coordination, networking,
pairing, speciation, and residence information for every solute in
the system.
If you are just getting started with solvation_analysis, the
tutorials are the best place to start. If you are looking to learn
more, this page will provide a full API overview of the Solute class
and go into more detail on it's inner working. There are three core
concepts to understand in the Solute class: solute_atoms vs solvents,
solvation data DataFrame, and the ``atom_solutes`` dictionary.
**solute vs solvents**
To instantiate a Solute, one must first use ``MDAnalysis`` to select
the solute_atoms and solvents. The solute_atoms can be organized
in different ways (described below) but the solvents are always
a dictionary of solvent names (strings) and solvent AtomGroups. In
some cases, specifying the solvent and solutes is the hardest part.
Note that one species might be both the solute and a solvent. Such
as if you are interested in hydrogen bond networks in water.
**the solvation data DataFrame**
The solvation data DataFrame is the core of the Solute class. It contains
the full solvation shell of every solute in the system at every timestep.
It is a labeled adjacency matrix of the solvation network. From this, all
other solvation information is derived: coordination numbers, residence
times, solute-solvent pairing, etc. The DataFrame does not need to be
accessed by the user, but it is at the core of how solvation_analysis works.
**the atom_solutes dictionary**
Solvation can be considered on two scales, the scale of the solute molecule
or the scale of atoms in the solute. For example, one might be interested in
the solvation shell around a whole solute molecule, or around just a specific
functional group. The ``Solute`` allows both scales to be analyzed simultaneously.
It first constructs a ``Solute`` for every unique atom in the `solute_atoms`
AtomGroup and then combines them into a single ``Solute`` representing the whole
molecule. The ``Solute`` for each atom is stored in the ``atom_solutes`` dictionary.
The key's of the dictionary are the ``solute_name`` of each ``Solute``, and
depending on on how the ``Solute`` was instantiated, these might be generic names
(e.g. "solute_0", "solute_1", etc.) or user-specified.
A Solute can be instantiated with the ``from_atoms``, ``from_atoms_dict``, or
``from_solutes_list`` methods. Below, the basic use of each constructor is
demonstrated.
``from_atoms`` takes an ``MDAnalysis.AtomGroup`` as input and
automatically parses it into individual ``atom_solutes``. This is the
least flexible constructor and does the most work behind the scenes.
.. code-block:: python
water = u.select_atoms("resname water")
water_O = water.select_atoms("element O")
water_O_solute = Solute.from_atoms(water_O, {"water": water})
# OR
water_solute = Solute.from_atoms(water, {"water": water})
The `from_atoms_dict` constructor takes a dict of solute names (strings)
and `AtomGroups`. Each `AtomGroup` must have only have one atom per
solute. This provides an intermediate level of flexibility because
it allows you to name your atom solutes.
.. code-block:: python
water_solute = Solute.from_atoms_dict({"water_O": water_O, "water_H1": water_H1, "water_H2": water_H2}, {"water": water)
Finally, the `from_solutes_list` is the most flexible because
it allows you to create all the solutes individually with
different settings. It might be useful if you wanted to use
specify radii for each ``solute_atom``.
.. code-block:: python
water_H1 = u.select_atoms(...)
water_H2 = u.select_atoms(...)
solute_O = Solute(water_O, {"water": water})
solute_H1 = Solute(water_H1, {"water": water})
solute_H2 = Solute(water_H2, {"water": water})
water_solute = Solute.from_solutes_list([solute_O, solute_H1, solute_H2])
Once the ``Solute`` is instantiated, it must be run by calling ``Solute.run()``.
This will calculate the solvation data and instantiate the analysis classes. At
this point, the ``Solute`` is ready to be analyzed. Solute inherits from
MDAnalysis.analysis.base.AnalysisBase, so ``run`` supports all arguments
from the base class.
By default, ``Coordination``, ``Pairing``, and ``Speciation`` are instantiated
as attributes of the ``Solute``. ``Networking`` and ``Residence`` can also be
added. These are covered in greater detail in their respective API pages.
Solute also provides several functions to select a particular solute and its
solvation shell, returning an AtomGroup for visualization or further analysis.
This is covered in the visualization tutorial.
"""
from collections import defaultdict
from functools import reduce
from typing import Any, Callable, Optional, Union
import matplotlib.pyplot as plt
import pandas as pd
import warnings
import MDAnalysis as mda
from MDAnalysis.analysis.base import AnalysisBase
from MDAnalysis.analysis.rdf import InterRDF
from MDAnalysis.lib.distances import capped_distance
import numpy as np
from solvation_analysis._utils import (
verify_solute_atoms,
verify_solute_atoms_dict,
get_closest_n_mol,
get_radial_shell,
)
from solvation_analysis.rdf_parser import identify_cutoff_scipy
from solvation_analysis.coordination import Coordination
from solvation_analysis.networking import Networking
from solvation_analysis.pairing import Pairing
from solvation_analysis.residence import Residence
from solvation_analysis.speciation import Speciation
from solvation_analysis._column_names import (
FRAME,
DISTANCE,
SOLUTE,
SOLUTE_IX,
SOLVENT,
SOLVENT_IX,
SOLVENT_ATOM_IX,
SOLUTE_ATOM_IX,
)
try:
import rdkit
from rdkit.Chem import Draw
from rdkit.Chem import rdCoordGen
from rdkit.Chem.Draw.MolDrawing import DrawingOptions
except ImportError:
rdkit = None
class Solute(AnalysisBase):
"""
Analyze the solvation structure of a solute.
Solute determines the coordination between the solute and solvents,
collecting that data in a pandas.DataFrame (solvation_data) for further
analysis. The solvation data DataFrame is a labeled adjacency matrix of
the first-shell solvation network.
By default, Solute will determine the coordination cutoff
radii from the Solute-solvent RDF. Radii can also be supplied
with the radii parameter. The RDF's of each solute-solvent
pari can be visualize with the plot_solvation_radius method.
Solute instantiates Speciation, Coordination, and Pairing
objects from the solvation_data, providing a convenient interface for
further analysis. Networking and Residence can be added by modifying
the ``analysis_classes`` parameter.
The parameters below are the ``kwargs`` supplied to the ``from_*``
constructors.
Parameters
----------
radii : dict of {str: float}, optional
an optional dictionary of solvent names and associated solvation radii
e.g. {"name_2": radius_2, "name_5": radius_5} Any radii not given will
be calculated. The solvent names should match the solvents parameter.
rdf_kernel : function, optional
this function must take RDF bins and data as input and return
a solvation radius as output. e.g. rdf_kernel(bins, data) -> 3.2. By default,
the rdf_kernel is `solvation_analysis.rdf_parser.identify_cutoff_scipy`.
kernel_kwargs : dict, optional
kwargs passed to rdf_kernel. See `identify_cutoff_scipy` for options. This can
be used to set a default fallback radius for all solvents.
rdf_init_kwargs : dict, optional
kwargs passed to the initialization of the MDAnalysis.InterRDF used to plot
the solute-solvent RDFs. By default, ``range`` will be set to (0, 7.5).
rdf_run_kwargs : dict, optional
kwargs passed to the internal MDAnalysis.InterRDF.run() command
e.g. ``inner_rdf.run(**rdf_run_kwargs)``. By default, step, start, and
stop will use any kwargs provided to ``solute.run(**kwargs)``.
skip_rdf : bool, optional
skip the RDF generation step. This is useful if you have already
calculated the RDFs and have the radii. If True, all radii must be
specified in the radii parameter.
solute_name: str, optional
the name of the solute, used for labeling.
analysis_classes : List[str] or str, optional
a list of the analysis classes to be instantiated, current options are:
"pairing", "coordination", "speciation", "residence", and "networking".
By default, only "pairing", "coordination", and "residence" are instantiated.
If networking is included, the networking_solvents kwarg must be specified.
Enter 'all' to include all analysis classes.
networking_solvents : str, optional
see the `solvents` parameter of the Networking class.
verbose : bool, optional
Turn on more logging and debugging, default ``False``
Attributes
----------
u : Universe
An MDAnalysis.Universe object
atom_solutes : dict of {str: Solute}
a dictionary of solute names and their associated Solute objects.
n_solutes : int
number of solute atoms
radii : dict
a dictionary of solvation radii for each solvent
e.g. {"name_2": radius_2, "name_2": radius_2, ...}
rdf_data : dict
a dictionary of RDF data, keys are solvent names and values
are (bins, data) tuples.
solvation_data : pandas.DataFrame
a dataframe of solvation data with columns "frame", "solute_atom", "solvent_atom",
"distance", "solvent_name", and "solvent". If multiple entries share a frame, solute_atom,
and solvent_atom, all but the closest atom is dropped.
solvation_data_dupicates : pandas.DataFrame
All rows that are dropped from solvation_data when duplicates are dropped.
solute_res_ix : np.array
a numpy array of the residue indices of every solute.
solute_atom_ix : np.array
a numpy array of the atom indices of every solute.
solvent_counts : dict of {str: int}
a dictionary of the number of residues for each solvent.
res_name_map : pandas.Series
a map from residue indices in the Universe to solvent and solute names from
the solute. For example, if the first residue in the universe was in
``self.solvent['res_one']``, then ``res_name_map[0] == 'res_one'``.
pairing : pairing.Pairing (optional)
pairing analyzes the fraction solutes and solvents that are coordinated.
coordination : coordination.Coordination (optional)
coordination analyses the coordination numbers of solvents and which
solvent atoms are coordinated.
speciation : speciation.Speciation (optional)
speciation provides an interface for finding and selecting the solvation shells
surrounding each solute.
residence : residence.Residence (optional)
residence calculates the residence times of each solvent on the solute. Only
instantiated if 'residence' is included in the analysis_classes kwarg.
networking : networking.Networking (optional)
networking analyses the connectivity of solute-solvent networks. Only instantiated
if 'networking' is included in the analysis_classes kwarg. the networking_solvents
kwarg must be specified.
"""
def __init__(
self,
solute_atoms: mda.AtomGroup,
solvents: dict[str, mda.AtomGroup],
atom_solutes: Optional[dict[str, "Solute"]] = None,
radii: Optional[dict[str, float]] = None,
rdf_kernel: Optional[Callable[[np.ndarray, np.ndarray], float]] = None,
kernel_kwargs: Optional[dict[str, Any]] = None,
rdf_init_kwargs: Optional[dict[str, Any]] = None,
rdf_run_kwargs: Optional[dict[str, Any]] = None,
skip_rdf: bool = False,
solute_name: str = "solute_0",
analysis_classes: Optional[list[str]] = None,
networking_solvents: Optional[str] = None,
verbose: bool = False,
internal_call: bool = False,
):
"""
This method is not intended to be called directly. Instead, use
``from_atoms``, ``from_atoms_dict`` or ``from_solute_list`` to create a Solute.
"""
if not internal_call:
raise RuntimeError(
"Please use Solute.from_atoms, Solute.from_atoms_dict, or "
"Solute.from_solute_list instead of the default constructor."
)
super(Solute, self).__init__(solute_atoms.universe.trajectory, verbose=verbose)
self.solute_atoms = solute_atoms # TODO: this shit!
self.atom_solutes = atom_solutes
if self.atom_solutes is None or len(atom_solutes) <= 1:
self.atom_solutes = {solute_name: self}
self.radii = radii or {}
self.solvent_counts = {
name: atoms.n_residues for name, atoms in solvents.items()
}
self.kernel = rdf_kernel or identify_cutoff_scipy
self.kernel_kwargs = kernel_kwargs or {}
self.rdf_init_kwargs = rdf_init_kwargs or {}
self.rdf_run_kwargs = rdf_run_kwargs or {}
self.has_run = False
self.u = solute_atoms.universe
self.n_solutes = solute_atoms.n_residues
self.solute_res_ix = pd.Series(
solute_atoms.atoms.resindices, solute_atoms.atoms.ix
)
self.solute_name = solute_name
self.solvents = solvents
if skip_rdf:
assert set(self.radii.keys()) >= set(
self.solvents.keys()
), "To skip RDF generation, all solvent radii must be specified."
self.skip_rdf = skip_rdf
# instantiate the res_name_map
self.res_name_map = pd.Series(["none"] * len(self.u.residues))
self.res_name_map[solute_atoms.residues.ix] = self.solute_name
for name, solvent in solvents.items():
self.res_name_map[solvent.residues.ix] = name
# instantiate analysis classes.
if analysis_classes is None:
self.analysis_classes = ["pairing", "coordination", "speciation"]
elif analysis_classes == "all":
self.analysis_classes = [
"pairing",
"coordination",
"speciation",
"residence",
"networking",
]
else:
self.analysis_classes = [cls.lower() for cls in analysis_classes]
if "networking" in self.analysis_classes and networking_solvents is None:
raise ValueError(
"networking analysis requires networking_solvents to be provided."
)
else:
self.networking_solvents = networking_solvents
@staticmethod
def from_atoms(
solute_atoms: mda.AtomGroup,
solvents: dict[str, mda.AtomGroup],
rename_solutes: Optional[dict[str, str]] = None,
**kwargs: Any,
) -> "Solute":
"""
Create a Solute from a single AtomGroup. The solute_atoms AtomGroup must
should contain identical residues and identical atoms on each residue.
For example, if one wanted to look at the O and H atoms of an ethanol molecule,
solute_atoms should contain the O and H atoms of all ethanol molecules. It
could not contain C atoms from some ethanol or include residues
that were not ethanol.
Parameters
----------
solute_atoms : AtomGroup
an AtomGroup or ResidueGroup containing one atom per residue.
solvents: dict of {str: MDAnalysis.AtomGroup}
a dictionary of solvent names and associated MDAnalysis.AtomGroups.
e.g. {"name_1": solvent_group_1,"name_2": solvent_group_2, ...}
rename_solutes : dict, optional
a dictionary of solute names to rename the solutes. Keys are the
default solute_name, and values are the new names.
For example, from_atoms might return a solute with solute_name
"solute_0", but you want to rename it to "functional_group_X".
In this case, rename_solutes={"solute_0": "functional_group_X"}.
If None, solutes will be numbered in the order their atoms appear
on the residue.
kwargs: dict
All kwargs listed in the Parameters section of the Solute class.
Returns
-------
Solute
"""
rename_solutes = rename_solutes or {}
# a dict with keys as integers and values as AtomGroups
# this will get called again later on with the same input, but for now, it's fine
solute_atom_group_dict = verify_solute_atoms(solute_atoms)
solute_atom_group_dict_renamed = {
rename_solutes.get(f"solute_{i}") or f"solute_{i}": atom_group
for i, atom_group in solute_atom_group_dict.items()
}
return Solute.from_atoms_dict(
solute_atom_group_dict_renamed, solvents, **kwargs
)
@staticmethod
def from_atoms_dict(
solute_atoms_dict: dict[str, mda.AtomGroup],
solvents: dict[str, mda.AtomGroup],
**kwargs: Any,
) -> "Solute":
"""
Create a Solute object from a dictionary of solute atoms.
Parameters
----------
solute_atoms_dict: dict of {str: MDAnalysis.AtomGroup}
a dictionary of solute atoms, e.g. {"name_1": solute_1, "name_2": solute_2}.
Each AtomGroup should contain one type of atom, and only one atom per
residue.
solvents: dict of {str: MDAnalysis.AtomGroup}
a dictionary of solvent names and associated MDAnalysis.AtomGroups.
e.g. {"name_1": solvent_group_1,"name_2": solvent_group_2, ...} kwargs
kwargs: dict
All kwargs listed in the Parameters section of the Solute class.
Returns
-------
Solute
"""
# all solute AtomGroups in one AtomGroup + verification
assert isinstance(solute_atoms_dict, dict), "Solute_atoms_dict must be a dict."
solute_atom_group = verify_solute_atoms_dict(solute_atoms_dict)
# create the solutes for each atom
atom_solutes = {}
for solute_name, atoms in solute_atoms_dict.items():
atom_solutes[solute_name] = Solute(
atoms,
solvents,
internal_call=True,
**{**kwargs, "solute_name": solute_name},
)
# create the solute for the whole solute
solute = Solute(
solute_atom_group,
solvents,
atom_solutes=atom_solutes,
internal_call=True,
**kwargs,
)
if len(atom_solutes) > 1:
solute.run = solute._run_solute_atoms
return solute
@staticmethod
def from_solute_list(
solutes: list["Solute"], solvents: dict[str, mda.AtomGroup], **kwargs: Any
) -> "Solute":
"""
Create a Solute from a list of Solutes. All Solutes must have only a
single solute atom on each solute residue. Essentially, from_solute_list
allows you to preconstruct the ``atom_solutes`` dictionary. This is useful
if you want to individually specify initialization parameters for each
atom Solute, otherwise, a different constructor is recommended.
Parameters
----------
solutes: list of Solute
A list of Solutes. All Solutes must have only a single solute atom.
on each solute residue.
solvents: dict of {str: MDAnalysis.AtomGroup}
a dictionary of solvent names and associated MDAnalysis.AtomGroups.
e.g. {"name_1": solvent_group_1,"name_2": solvent_group_2, ...}
kwargs: dict
All kwargs listed in the Parameters section of the Solute class.
Returns
-------
Solute
"""
# check types and name uniqueness
for solute in solutes:
assert type(solute) == Solute, "solutes must be a list of Solute objects."
assert len(solute.solute_atoms.atoms) == len(
solute.solute_atoms.atoms.residues
), "Each Solute in solutes must have only a single atom per residue."
solute_names = [solute.solute_name for solute in solutes]
assert len(np.unique(solute_names)) == len(
solute_names
), "The solute_name for each solute must be unique."
solute_atom_group = reduce(
lambda x, y: x | y, [solute.solute_atoms for solute in solutes]
)
verify_solute_atoms(solute_atom_group)
atom_solutes = {solute.solute_name: solute for solute in solutes}
solute = Solute(
solute_atom_group,
solvents,
atom_solutes=atom_solutes,
internal_call=True,
**kwargs,
)
if len(atom_solutes) > 1:
solute.run = solute._run_solute_atoms
return solute
def _run_solute_atoms(
self,
start: Optional[int] = None,
stop: Optional[int] = None,
step: Optional[int] = None,
verbose: Optional[bool] = None,
):
# like prepare
atom_solutes = {}
rdf_data = {}
solvation_datas = []
solvation_data_dups = []
start, stop, step = self.u.trajectory.check_slice_indices(start, stop, step)
self.start, self.stop, self.step = start, stop, step
self.n_frames = len(range(start, stop, step))
# like run
for solute in self.atom_solutes.values():
if not solute.has_run:
solute.run(start=start, stop=stop, step=step, verbose=verbose)
if (start, stop, step) != (solute.start, solute.stop, solute.step):
warnings.warn(
f"The start, stop, or step for {solute.solute_name} do not"
f"match the start, stop, or step for the run command so it "
f"is being re-run."
)
solute.run(start=start, stop=stop, step=step, verbose=verbose)
if self.solvents != solute.solvents:
warnings.warn(
f"The solvents for {solute.solute_name} do not match the "
f"solvents for the run command so it is being re-run."
)
solute.run(start=start, stop=stop, step=step, verbose=verbose)
atom_solutes[solute.solute_name] = solute
rdf_data[solute.solute_name] = solute.rdf_data[solute.solute_name]
solvation_datas.append(solute.solvation_data)
solvation_data_dups.append(solute.solvation_data_duplicates)
self.rdf_data = rdf_data
self.solvation_data = pd.concat(solvation_datas).sort_index()
self.solvation_data_duplicates = pd.concat(solvation_data_dups)
self.has_run = True
# like conclude
analysis_classes = {
"speciation": Speciation,
"pairing": Pairing,
"coordination": Coordination,
"residence": Residence,
"networking": Networking,
}
for analysis_class in self.analysis_classes:
if analysis_class == "networking":
setattr(
self,
"networking",
Networking.from_solute(self, self.networking_solvents),
)
else:
setattr(
self,
analysis_class,
analysis_classes[analysis_class].from_solute(self),
)
def _prepare(self):
"""
This function identifies the solvation radii and saves the associated RDF data.
"""
self.rdf_data = defaultdict(dict)
self.solvation_data = None
self.solvation_data_duplicates = None
self._solvation_frames = []
for name, solvent in self.solvents.items():
if self.skip_rdf:
self.rdf_data = None
break
# set kwargs with defaults
self.rdf_init_kwargs["range"] = self.rdf_init_kwargs.get("range") or (
0,
7.5,
)
self.rdf_init_kwargs["norm"] = self.rdf_init_kwargs.get("norm") or "density"
self.rdf_run_kwargs["stop"] = self.rdf_run_kwargs.get("stop") or self.stop
self.rdf_run_kwargs["step"] = self.rdf_run_kwargs.get("step") or self.step
self.rdf_run_kwargs["start"] = (
self.rdf_run_kwargs.get("start") or self.start
)
# generate and save RDFs
rdf = InterRDF(
self.solute_atoms,
solvent,
**self.rdf_init_kwargs,
exclude_same="residue",
)
rdf.run(**self.rdf_run_kwargs)
bins, data = rdf.results.bins, rdf.results.rdf
self.rdf_data[self.solute_name][name] = (bins, data)
# generate and save plots
if name not in self.radii.keys():
self.radii[name] = self.kernel(bins, data, **self.kernel_kwargs)
calculated_radii = set(
[name for name, radius in self.radii.items() if not np.isnan(radius)]
)
missing_solvents = set(self.solvents.keys()) - calculated_radii
missing_solvents_str = " ".join([str(i) for i in missing_solvents])
assert len(missing_solvents) == 0, (
f"Solute could not identify a solvation radius for "
f"{missing_solvents_str}. Please manually enter missing radii "
f"by editing the radii dict and rerun the analysis."
)
def _single_frame(self):
"""
This function finds the solvation shells of each solute at a given frame.
"""
# initialize empty lists
pairs_list = []
dist_list = []
tags_list = []
# loop to find solvated atoms of each type
for name, solvent in self.solvents.items():
pairs, dist = capped_distance(
self.solute_atoms.positions,
solvent.positions,
self.radii[name],
box=self.u.dimensions,
)
# make sure pairs don't include intra-molecular interactions
filter = (
self.solute_atoms.resindices[pairs[:, 0]]
== solvent.resindices[pairs[:, 1]]
)
pairs = pairs[~filter]
dist = dist[~filter]
# replace local ids with absolute ids
pairs[:, 0] = self.solute_atoms.atoms.ix[tuple([pairs[:, 0]])]
pairs[:, 1] = solvent.ix[tuple([pairs[:, 1]])]
# extend
pairs_list.append(pairs)
dist_list.append(dist)
tags_list.append(np.full(len(dist), name)) # creating a name array
# create full length features arrays
pairs_array = np.concatenate(pairs_list, dtype=int)
frame_number_array = np.full(len(pairs_array), self._ts.frame)
solute_res_ix_array = self.u.atoms[pairs_array[:, 0]].resindices
solvent_res_ix_array = self.u.atoms[pairs_array[:, 1]].resindices
dist_array = np.concatenate(dist_list)
solute_res_name_array = np.full(len(pairs_array), self.solute_name)
solvent_res_name_array = np.concatenate(tags_list)
# stack the data into one large array
solvation_data_np = np.column_stack(
(
frame_number_array,
solute_res_ix_array,
pairs_array[:, 0],
pairs_array[:, 1],
dist_array,
solute_res_name_array,
solvent_res_name_array,
solvent_res_ix_array,
)
)
# add the current frame to the growing list of solvation arrays
self._solvation_frames.append(solvation_data_np)
def _conclude(self):
"""
Creates a clean solvation_data pandas.DataFrame and instantiates several analysis classes.
"""
# stack all solvation frames into a single data structure
solvation_data_np = np.vstack(self._solvation_frames)
solvation_data_df = pd.DataFrame(
solvation_data_np,
columns=[
FRAME,
SOLUTE_IX,
SOLUTE_ATOM_IX,
SOLVENT_ATOM_IX,
DISTANCE,
SOLUTE,
SOLVENT,
SOLVENT_IX,
],
)
# clean up solvation_data df
for column in [
FRAME,
SOLUTE_IX,
SOLUTE_ATOM_IX,
SOLVENT_ATOM_IX,
DISTANCE,
SOLVENT_IX,
]:
solvation_data_df[column] = pd.to_numeric(solvation_data_df[column])
solvation_data_df = solvation_data_df.sort_values(
[FRAME, SOLUTE_ATOM_IX, DISTANCE]
)
solvation_data_duplicates = solvation_data_df.duplicated(
subset=[FRAME, SOLUTE_ATOM_IX, SOLVENT_IX]
)
solvation_data = solvation_data_df[~solvation_data_duplicates]
self.solvation_data = solvation_data.set_index(
[FRAME, SOLUTE_IX, SOLUTE_ATOM_IX, SOLVENT_ATOM_IX]
)
duplicates = solvation_data_df[solvation_data_duplicates]
self.solvation_data_duplicates = duplicates.set_index(
[FRAME, SOLUTE_IX, SOLUTE_ATOM_IX, SOLVENT_ATOM_IX]
)
# instantiate analysis classes
self.has_run = True
analysis_classes = {
"speciation": Speciation,
"pairing": Pairing,
"coordination": Coordination,
"residence": Residence,
"networking": Networking,
}
for analysis_class in self.analysis_classes:
if analysis_class == "networking":
setattr(
self,
"networking",
Networking.from_solute(self, self.networking_solvents),
)
else:
setattr(
self,
analysis_class,
analysis_classes[analysis_class].from_solute(self),
)
@staticmethod
def _plot_solvation_radius(
bins: np.ndarray, data: np.ndarray, radius: float
) -> tuple[plt.Figure, plt.Axes]:
"""
Plot a solvation radius on an RDF.
Includes a vertical line at the radius of interest.
Parameters
----------
bins : np.array
the RDF bins
data : np.array
the RDF data
radius : float
the cutoff radius to draw on the plot
Returns
-------
fig : matplotlib.Figure
ax : matplotlib.Axes
"""
fig, ax = plt.subplots()
ax.plot(bins, data, "b-", label="rdf")
ax.axvline(radius, color="r", label="solvation radius")
ax.set_xlabel("Radial Distance (A)")
ax.set_ylabel("Probability Density")
ax.legend()
return fig, ax
def plot_solvation_radius(
self, solute_name: str, solvent_name: str
) -> tuple[plt.Figure, plt.Axes]:
"""
Plot the RDF of a solvent molecule
Specified by the residue name in the solvents dict. Includes a vertical
line at the radius of interest.
Parameters
----------
solvent_name : str
the name of the residue of interest, as written in the solvents dict
Returns
-------
fig : matplotlib.Figure
ax : matplotlib.Axes
"""
if len(self.atom_solutes) == 1:
solute_name = self.solute_name
assert self.has_run, "Solute.run() must be called first."
assert not self.skip_rdf, "RDFs were skipped, so no RDFs are available."
bins, data = self.rdf_data[solute_name][solvent_name]
radius = self.atom_solutes[solute_name].radii[solvent_name]
fig, ax = self._plot_solvation_radius(bins, data, radius)
ax.set_title(f"{self.solute_name} solvation distance for {solvent_name}")
return fig, ax
def draw_molecule(
self,
residue: Union[str, mda.core.groups.Residue],
filename: Optional[str] = None,
) -> "rdkit.Chem.rdchem.Mol":
"""
Returns
Parameters
----------
residue: the str name of the solute or any electrolytes or a MDAnalysis.Residue
filename: If true, writes a file using the RDKit backend.
Returns
-------
RDKit.Chem.rdchem.Mol
"""
if rdkit is None:
raise ImportError(
"The RDKit package is required to use this function. "
"Please install RDKit with `conda install -c conda-forge rdkit`."
)
DrawingOptions.atomLabelFontSize = 100
if isinstance(residue, str):
if residue in [self.solute_name, "solute"]:
mol = self.solute_atoms.residues[0].atoms.convert_to("RDKIT")
mol_mda_ix = self.solute_atoms.residues[0].atoms.ix
solute_atoms_ix0 = {
solute.solute_atoms.atoms.ix[0]: solute_name
for solute_name, solute in self.atom_solutes.items()
}
for i, atom in enumerate(mol.GetAtoms()):
atom_name = solute_atoms_ix0.get(mol_mda_ix[i])
label = f"{i}, " + atom_name if atom_name else str(i)
atom.SetProp("atomNote", label)
elif residue in self.solvents.keys():
mol = self.solvents[residue].residues[0].atoms.convert_to("RDKIT")
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp("atomNote", str(i))
else:
raise ValueError(
"If the residue is a string, it must be the name of a solute, "
"the name of a solvent, or 'solute'."
)
else:
assert isinstance(residue, mda.core.groups.Residue)
mol = residue.atoms.convert_to("RDKIT")
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp("atomNote", str(i))
if filename:
Draw.MolToFile(mol, filename=filename)
rdCoordGen.AddCoords(mol)
return mol
def get_shell(
self,
solute_index: int,
frame: int,
as_df: bool = False,
remove_mols: Optional[dict[str, int]] = None,
closest_n_only: Optional[int] = None,
) -> Union[mda.AtomGroup, pd.DataFrame]:
"""
Select the solvation shell of the solute.
The solvation shell can be returned either as an
AtomGroup, to be visualized or passed to other routines,
or as a pandas.DataFrame for convenient inspection.
The solvation shell can be truncated before being returned,
either by removing specific residue types with remove_mols
or by introducing a hard cutoff with closest_n_only.
Parameters
----------
solute_index : int
The index of the solute of interest
frame : int
the frame in the trajectory to perform selection at. Defaults to the
current trajectory frame.
as_df : bool, default False
if true, this function will return a DataFrame representing the shell
instead of a AtomGroup.
remove_mols : dict of {str: int}, optional
remove_dict lets you remove specific residues from the final shell.
It should be a dict of molnames and ints e.g. ``{'mol1': n, 'mol2', m}``.
It will remove up to n of mol1 and up to m of mol2. So if the dict is
``{'mol1': 1, 'mol2', 1}`` and the shell has 4 mol1 and 0 mol2,
get_shell will return a shell with 3 mol1 and 0 mol2.
closest_n_only : int, optional
if given, only the closest n residues will be included
Returns
-------
MDAnalysis.AtomGroup or pandas.DataFrame
"""
assert self.has_run, "Solute.run() must be called first."
assert frame in self.frames, (
"The requested frame must be one " "of an analyzed frames in self.frames."
)
remove_mols = {} if remove_mols is None else remove_mols
# select shell of interest
shell = self.solvation_data.xs((frame, solute_index), level=(FRAME, SOLUTE_IX))
# remove mols
for mol_name, n_remove in remove_mols.items():
# first, filter for only mols of type mol_name
is_mol = shell[SOLVENT] == mol_name
res_ix = shell[is_mol][SOLVENT_IX]
mol_count = len(res_ix)
n_remove = min(mol_count, n_remove)
# then truncate resnames to remove mols
remove_ix = res_ix[(mol_count - n_remove) :]
# then apply to original shell
remove = shell[SOLVENT_IX].isin(remove_ix)
shell = shell[np.invert(remove)]
# filter based on length
if closest_n_only:
assert closest_n_only > 0, "closest_n_only must be at least 1"
closest_n_only = min(len(shell), closest_n_only)
shell = shell[0:closest_n_only]
if as_df:
return shell
else:
return self._df_to_atom_group(shell, solute_index=solute_index)
def get_closest_n_mol(
self,
solute_atom_ix: int,
n_mol: int,
guess_radius: Union[float, int] = 3,
return_ordered_resix: bool = False,
return_radii: bool = False,
) -> Union[
mda.AtomGroup,
tuple[mda.AtomGroup, np.ndarray],
tuple[mda.AtomGroup, np.ndarray, np.ndarray],
]:
"""
Select the n closest mols to the solute.
The solute is specified by its index within solvation_data.
n is specified with the n_mol argument. Optionally returns
an array of their resids and an array of the distance of
the closest atom in each molecule.
Parameters
----------
solute_atom_ix : int
The index of the solute of interest
n_mol : int
The number of molecules to return
guess_radius : float or int
an initial search radius to look for closest n mol
return_ordered_resix : bool, default False
whether to return the resix of the closest n
molecules, ordered by radius
return_radii : bool, default False
whether to return the distance of the closest atom of each
of the n molecules
Returns
-------
full shell : MDAnalysis.AtomGroup
the atoms in the shell
ordered_resids : numpy.array of int, optional
the residue id of the n_mol closest atoms
radii : numpy.array of float, optional
the distance of each atom from the center
"""
return get_closest_n_mol(
self.u.atoms[solute_atom_ix],
n_mol,
guess_radius,
return_ordered_resix,
return_radii,
)
def radial_shell(
self, solute_atom_ix: int, radius: Union[float, int]
) -> mda.AtomGroup:
"""
Select all residues with atoms within r of the solute.
The solute is specified by it's index within solvation_data. r is
specified with the radius argument. Thin wrapper around
solvation.get_radial_shell.
Parameters
----------
solute_atom_ix : int
the index of the solute of interest
radius : float or int
radius used for atom selection
Returns
-------
MDAnalysis.AtomGroup
"""
return get_radial_shell(self.solute_atoms[solute_atom_ix], radius)
def _df_to_atom_group(
self, df: pd.DataFrame, solute_index: Optional[int] = None
) -> mda.AtomGroup:
"""
Selects an MDAnalysis.AtomGroup from a pandas.DataFrame with solvent.