-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryReader facelift #2080
MemoryReader facelift #2080
Conversation
I'm a little confused what the expectations are for the MemoryReader wrt modifying attributes of Timesteps. From what I can tell, it tries to give views not copies of the underlying arrays. This means that if you edit the positions, change frame, your changes are saved (ie go back to that frame and you've still got the modified positions rather than the original). Is this correct? Therefore, does it make sense to make |
Yes, I'd say the expectation is that everything is permanent. In this regard, MemoryReader is a very different beast from all the other Readers. But I think it's required so that MemoryReader can become a swiss-army knife for trajectory manipulation. I think that the new transformations are also permanent for MemoryReader (@jbarnoud @davidercruz ?) |
The transformations are indeed permanent on the memory reader. In practice, the transformation is applied on all frames in one go, and then purged from the list of transformation to not apply the same transformation again when changing frame. The expectation with the memory reader is that changes done to the coordinates are kept, on the contrary to the other readers. Having the box following the same pattern is what I would expect. |
from .base import SingleFrameReaderBase | ||
|
||
|
||
class DummyReader(SingleFrameReaderBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be deprecated first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was only used internally, and I think it's so buggy (it doesn't iterate?) I doubt anyone found any good use for it.
29f904f
to
91e59d9
Compare
dt=self.ts.dt, | ||
filename=self.filename, | ||
) | ||
new[self.ts.frame] | ||
new.ts = self.ts.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line was problematic as base.Timestep.copy()
isn't respectful of MemoryReader behaviour, see #2081. I don't think removing it causes any problems, as new[self.ts.frame]
should populate the Timestep with everything needed
@MDAnalysis/coredevs ok this could do with some eyes, it's got some nice MR fixes and improvements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor comments/questions; someone else should probably also have a look.
@@ -315,8 +359,18 @@ def __init__(self, coordinate_array, order='fac', | |||
|
|||
self.ts = self._Timestep(self.n_atoms, **kwargs) | |||
self.ts.dt = dt | |||
if dimensions is not None: | |||
self.ts.dimensions = dimensions | |||
if dimensions is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do
if dimensions is None:
pass
elif dimensions.shape == (6,):
dimensions = np.tile(dimensions, (self.n_frames, 1))
elif dimensions.shape != (self.n_frames, 6):
raise Hell
self.dimensions_array = dimensions
package/MDAnalysis/core/universe.py
Outdated
n_atoms=n_atoms, | ||
velocities=velocities, forces=forces) | ||
coords = np.zeros((n_atoms, 3), dtype=np.float32) | ||
if velocities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ternary?
vels = np.zeros((n_atoms, 3), dtype=np.float32) if velocities else None
forces = np.zeros((n_atoms, 3), dtype=np.float32) if forces else None
for compactness?
forces = np.zeros((n_atoms, 3), dtype=np.float32) | ||
else: | ||
forces = None | ||
u.trajectory = get_reader_for(coords)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does an numpy array trigger MemoryReader selection? (I think so... can't remember the discussion a while back.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that this creates a MemoryReader.
# This is significantly faster, but only implemented for certain | ||
# trajectory file formats | ||
try: | ||
coordinates = self.trajectory.timeseries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that we do not have dedicated C-level timeseries anymore and thus the self.trajectory.timeseries
was essentially just doing what happened in the except
block? Or are there still cases where the timeseries
approach would be beneficial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only DCDReader ever had a fast timeseries (and still does). It didn't return dimensions information though, which is needed. We could change DCD.timeseries
to have DCD.timeseries(...., dimensions=True)
to then return both coordinates and dimensions.
Then maybe the base.Reader
could implement the entire else:
branch below (I started this in #1400 once...)
So yeah maybe I should try and use the fast DCD routine here still since it still exists, then manually grab the dimensions..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that you improve performance by cycling through the trajectory twice. Maybe leave as you have, add a note, and raise an issue so that it is not forgotten. A before/after benchmark (or Benchmark) would be good, just so that we have an idea how much slower this gets. analysis.encore uses DCD timeseries reading quite a bit, IIRC.
package/MDAnalysis/core/universe.py
Outdated
pm = ProgressMeter(n_frames, interval=1, | ||
verbose=verbose, format=pm_format) | ||
coordinates = np.zeros((n_frames, n_atoms, 3), dtype=np.float32) | ||
# TODO: This assumes constant... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment.
package/MDAnalysis/core/universe.py
Outdated
has_fors = ts.has_forces | ||
has_dims = ts.dimensions is not None | ||
|
||
if has_vels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ternary operator for readability?
velocities = np.zeros_like(coordinates) if has_vels or None
Does Python short-circuit, i.e., is the creation of np.zeros_like(coordinates)
skipped if not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python (at least 3.5.3) properly short-circuits and does not execute the function in the skipped part:
In [1]: def dome():
...: print("did it")
...: return 10
...:
In [2]: x = dome() if True else 0
did it
In [3]: x = dome() if False else 0
mr2 = mr_reader.copy() | ||
|
||
ts = mr2.ts | ||
setattr(ts, attr, 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me how this works: Are you setting e.g. ts.positions
to 7? All positions in the array or are you replacing the array with the integer "7"?
If it is the latter, shouldn't we doing the former?
ts = mr2[1] | ||
ts = mr2[0] | ||
|
||
assert_equal(getattr(ts, attr), 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array comparison? But the arrays should be float32 so you really ought to use assert_almost_equal
.
Codecov Report
@@ Coverage Diff @@
## develop #2080 +/- ##
===========================================
- Coverage 89.41% 89.41% -0.01%
===========================================
Files 159 157 -2
Lines 18731 18759 +28
Branches 2696 2709 +13
===========================================
+ Hits 16749 16774 +25
- Misses 1381 1382 +1
- Partials 601 603 +2
Continue to review full report at Codecov.
|
# single box, tile this to trajectory length | ||
# allows modifying the box of some frames | ||
dimensions = np.tile(dimensions, (self.n_frames, 1)) | ||
elif dimensions.shape != (self.n_frames, 6): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be protected against AttributeError
if anything is passed through that is not a numpy array?
package/MDAnalysis/core/universe.py
Outdated
mapping of residues to segments | ||
trajectory : bool, optional | ||
if True, attaches a dummy reader to the Universe, therefore | ||
if True, attaches a MemoryReader to the Universe, therefore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this (and others) a link
:class:`~MDAnalysis.coordinates.memory.MemoryReader`
"This must be a array of shape (6,) or " | ||
"(n_frames, 6)".format(dimensions.shape)) | ||
self.dimensions_array = dimensions | ||
dimensions = np.zeros((self.n_frames, 6), dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you decide to always have dimensions as an array, even if it contains zeros? (I assume this is more consistent with how ts.dimensions
works, and it allows setting the box if required – more flexibility!) 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of type checking, I'd prefer calling numpy.asarray(attr)
if attr
is supposed to be a numpy array (eventually). That allows greater type flexibility and will hopefully throw an appropriate error message if converting to a numpy array fails.
try: | ||
if velocities.ndim == 2: | ||
velocities = velocities[np.newaxis, :, :] | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just checks whether the velocities
object has an attribute ndim
. For type checking, better use isinstance()
. But since that's not really "pythonic" anyway, I'd prefer calling numpy.asarray(velocities)
before doing attribute checks.
try: | ||
if forces.ndim == 2: | ||
forces = forces[np.newaxis, :, :] | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above,I'd prefer using numpy.asarray(forces)
.
raise ValueError("Provided dimensions array has shape {}. " | ||
"This must be a array of shape (6,) or " | ||
"(n_frames, 6)".format(dimensions.shape)) | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, better use numpy.asarray(dimensions)
68900cc
to
d86dc42
Compare
Ok changed to use |
@MDAnalysis/coredevs if this gets reviewed I can finish up 0.19.0 |
@zemanj could you please have a look when you can spare a moment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me – but I am not an expert on the MemoryReader. If anyone else (@zemanj @jbarnoud @wouterboomsma ) might want to have a quick look then that would be great.
(We would like to merge this soon in order to get the 0.19.0 release out...)
@richardjgowers perhaps open an issue noting that we might want to revisit the MemoryReader once we have timeseries support with dimensions for all readers (so that fast timeseries-slurpers such as the one for DCD and possibly XTC) can be used.
Thanks for notifying me, @orbeckst. I haven't followed recent development in MDAnalysis very closely, and therefore don't have anything intelligent to add to what has already been discussed. Good to see that MemoryReader is still being used. Thanks for putting in the effort in bringing this up to date @richardjgowers. |
Fixes #2076 #2077 #1041 #2081
Replaces #1537
Changes made in this Pull Request:
MemoryReader
now has velocities and forces (which hopefully work identically to positions)MemoryReader
now has variable dimensions (ie NPT conditions)Universe.transfer_to_memory
takes advantage of all of the aboveUniverse.empty
now uses a MemoryReader (now it does the above)PR Checklist