Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neighbour analysis for the entire trajectory #251

Merged
merged 38 commits into from
Jul 19, 2021

Conversation

sudarsan-surendralal
Copy link
Member

The idea is to easily perform a neighbor analysis over the trajectory of an atomistic simulation. This is something I've been doing in my notebooks for some time which could be useful to others as well. There are also ways to make this more efficient. So please feel free to suggest improvements!

@sudarsan-surendralal sudarsan-surendralal added the enhancement New feature or request label Jun 27, 2021
@sudarsan-surendralal sudarsan-surendralal marked this pull request as draft June 27, 2021 07:53
@coveralls
Copy link

coveralls commented Jun 27, 2021

Pull Request Test Coverage Report for Build 1045104153

  • 65 of 67 (97.01%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 68.031%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_atomistics/atomistics/job/atomistic.py 26 27 96.3%
pyiron_atomistics/atomistics/structure/neighbors.py 39 40 97.5%
Totals Coverage Status
Change from base Build 1037555151: 0.1%
Covered Lines: 10889
Relevant Lines: 16006

💛 - Coveralls

@samwaseda
Copy link
Member

I see that the results are stored in private variables. How are they supposed to be used?

@sudarsan-surendralal
Copy link
Member Author

I see that the results are stored in private variables. How are they supposed to be used?

I've updated the structure of the class! This can now be accessed by

neighbors = job.get_neighbors()
neighbors.neighbor_vectors

@sudarsan-surendralal sudarsan-surendralal marked this pull request as ready for review June 28, 2021 08:14
@samwaseda
Copy link
Member

Currently it's mostly equivalent to [job.get_structure(i).get_neighbors() for i in range(given_range)], but I presume that this class is going to be extended to do more physics in the future, right? Can you tell (roughly, no detail needed) how you use it (i.e. what kind of physical quantities do you usually extract in your work)? Depending on the purpose I might have different feedback.

@sudarsan-surendralal
Copy link
Member Author

Currently it's mostly equivalent to [job.get_structure(i).get_neighbors() for i in range(given_range)],

In principle, yes. Such code is trivial but I have to repeat this in pretty much all my notebooks. Options to do this only for certain parts of the trajectory is also pretty useful for me

but I presume that this class is going to be extended to do more physics in the future, right? Can you tell (roughly, no detail needed) how you use it (i.e. what kind of physical quantities do you usually extract in your work)? Depending on the purpose I might have different feedback.

I'll mostly use this to analyze bond breaking/formation for molecules (reactivity), species sensitive radial distribution functions and so on!


"""

def __init__(self, init_structure=None, positions=None, cells=None, num_neighbors=12, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be extremely useful for this to take any object that implements HasStructure instead of passing all of these things separately. Calling get_structure every step might be a bit slower, but since you're doing essentially what AtomisticGenericJob.get_structure does anyway, it shouldn't be that much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it turns out much slower, then I'd say we add a method roughly like this

@classmethod
def from_structures(cls, has_structure):
  structs = list(has_structure.iter_structures())
  return cls(structs[0], [s.positions for s in structs], [s.cell for s in structs])

to NeighborTraj to achieve the same generality, but offer a small escape hatch for cases where we know get_structure is too slow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it turns out much slower, then I'd say we add a method roughly like this

@classmethod
def from_structures(cls, has_structure):
  structs = list(has_structure.iter_structures())
  return cls(structs[0], [s.positions for s in structs], [s.cell for s in structs])

to NeighborTraj to achieve the same generality, but offer a small escape hatch for cases where we know get_structure is too slow.

I think this is a great way to generalize this for any derivative of HasStructure. However, I don't think building the positions and cells from the individual structures is not that efficient.

So how about something like this:

def __init__(self, has_structure=None, init_struct=None, positions=None, cells=None):
    if has_structure is None:
        if positions is None or init_struct is None:
            raise ValueError()

and then

def _get_neighbors_hs(has_struct, num_neighbors=20, **kwargs):
    [struct.get_neighbors() for struct in has_struct.iter_structures()]

which gets called instead of _get_neighbors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should simply give it a try to use get_structure() everywhere and see if it really is slower. Checking Trajectory._get_structure and your code in _get_neighbors() looks essentially the same to me, so I don't expect a big hit.

If you don't want to do it, I can give it a go tonight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also __init__ needs to accept an init argument as the first argument and pass it to super().__init__(), otherwise recursive loading from HDF will not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should simply give it a try to use get_structure() everywhere and see if it really is slower. Checking Trajectory._get_structure and your code in _get_neighbors() looks essentially the same to me, so I don't expect a big hit.

If you don't want to do it, I can give it a go tonight.

Sure I'll merge your PR into this one and see what happens!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also __init__ needs to accept an init argument as the first argument and pass it to super().__init__(), otherwise recursive loading from HDF will not work.

Not sure what you mean by this. Could you clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also __init__ needs to accept an init argument as the first argument and pass it to super().__init__(), otherwise recursive loading from HDF will not work.

Not sure what you mean by this. Could you clarify?

When you derive from DataContainer the new __init__ method must be compatible to the original one. One of the ways DataContainer is instantiated is by passing a collection to initialize it, like so

d = DataContainer({'a': 1, 'b': 3, 2: 42})
assert d[1] == 3

Because this is used by the DataContainer internally when loading from hdf, subclasses also need to offer this. There's a little bit more explanation here in the attention box.

# Conflicts:
#	pyiron_atomistics/atomistics/job/atomistic.py
Comment on lines 1060 to 1078
def __init__(self, init_structure=None, positions=None, cells=None, num_neighbors=12, **kwargs):
"""

Args:
init_structure (pyiron_atomistics.atomistics.structure.atoms.Atoms): Any given structure of the trajectory
positions (numpy.ndarray): The cartesian positions of the trajectories
cells (numpy.ndarray/None): The varying cell shapes
num_neighbors (int): The cutoff for the number of neighbors
**kwargs (dict): Additional arguments to be passed to the `get_neighbors()` routine
(eg. cutoff_radius, norm_order , etc.)
"""
self._init_structure = init_structure
self._neighbor_indices = None
self._neighbor_distances = None
self._neighbor_vectors = None
self._positions = positions
self._cells = cells
self._num_neighbors = num_neighbors
self._get_neighbors_kwargs = kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For __init__ to be compatible a change like this would be necessary. When init is given the other parameters are not set, because presumably they are set in the init read from HDF5. If you want to be complete, you can check that init really contains all the attributes that __init__ normally sets and set those manually that are not in `init.

Suggested change
def __init__(self, init_structure=None, positions=None, cells=None, num_neighbors=12, **kwargs):
"""
Args:
init_structure (pyiron_atomistics.atomistics.structure.atoms.Atoms): Any given structure of the trajectory
positions (numpy.ndarray): The cartesian positions of the trajectories
cells (numpy.ndarray/None): The varying cell shapes
num_neighbors (int): The cutoff for the number of neighbors
**kwargs (dict): Additional arguments to be passed to the `get_neighbors()` routine
(eg. cutoff_radius, norm_order , etc.)
"""
self._init_structure = init_structure
self._neighbor_indices = None
self._neighbor_distances = None
self._neighbor_vectors = None
self._positions = positions
self._cells = cells
self._num_neighbors = num_neighbors
self._get_neighbors_kwargs = kwargs
def __init__(self, init=None, init_structure=None, positions=None, cells=None, num_neighbors=12, **kwargs):
"""
Args:
init_structure (pyiron_atomistics.atomistics.structure.atoms.Atoms): Any given structure of the trajectory
positions (numpy.ndarray): The cartesian positions of the trajectories
cells (numpy.ndarray/None): The varying cell shapes
num_neighbors (int): The cutoff for the number of neighbors
**kwargs (dict): Additional arguments to be passed to the `get_neighbors()` routine
(eg. cutoff_radius, norm_order , etc.)
"""
super().__init__(init=init)
if init is None:
self._init_structure = init_structure
self._neighbor_indices = None
self._neighbor_distances = None
self._neighbor_vectors = None
self._positions = positions
self._cells = cells
self._num_neighbors = num_neighbors
self._get_neighbors_kwargs = kwargs

@pmrv pmrv self-requested a review July 14, 2021 12:44
@sudarsan-surendralal
Copy link
Member Author

Since a HasStructure instance is now an attribute of the neighbor class, calling to_hdf now tries to write the entire trajectory (or structure container) again in the hdf5 file. I think this is a bit redundant!

@pmrv
Copy link
Contributor

pmrv commented Jul 16, 2021

@samwaseda & @sudarsan-surendralal I just saw that the attribute names between Neighbors and NeighborsTraj are not consistent. I don't have a preference either way, but I think they should be the same, otherwise it'll be confusing to use.

@sudarsan-surendralal sudarsan-surendralal merged commit e954f09 into master Jul 19, 2021
@delete-merged-branch delete-merged-branch bot deleted the neighbors_traj branch July 19, 2021 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants