Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strict attribute setting issue (#1092) #1186

Merged
merged 23 commits into from
Dec 7, 2017

Conversation

richardjgowers
Copy link
Member

This PR will hopefully not allow "random" attributes to be set to mda objects. Eg:

u = mda.Universe()

ag = u.atoms[:10]

# set an attribute not present in the topology
ag.colours = 'red'  # this should raise an AttributeError

This is to avoid confusion like in #1092

@richardjgowers richardjgowers added this to the 0.16.0 milestone Jan 23, 2017
@orbeckst orbeckst changed the title WIP: Start of issue #1092 WIP: strict attribute setting (Start of issue #1092) Jan 30, 2017
Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we generally disallow setting attributes like u.atoms.attr = blub and instead add a method that makes this explicit u.atoms.add_attr('name'). That would mean no one can set and attribute by accident and we would might get around the whitelist we need to maintain across topologies. Or at least allow a u.atoms.set_attr('name', force=True)

@@ -149,6 +149,9 @@ def make_classes():
# The 'GBase' middle man is needed so that a single topologyattr
# patching applies automatically to all groups.
GBase = bases[GroupBase] = _TopologyAttrContainer._subclass()
GBase._SETATTR_WHITELIST = {'positions', 'velocities', 'forces',
'atoms', 'segments', 'residues', 'is_uptodate'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about charges?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is updated as attributes are added

@richardjgowers
Copy link
Member Author

@kain88-de yeah I could add an explicit method, and then maybe it's clearer that setting things isn't normal, but then we lose that u.atoms.masses = 6.0 just works. I don't want a weird API like matplotlib where we have to use set_X methods.

The currently failing test is because mock is trying to do a setattr to mock something, so that's not a real problem.

I think I might have to make the WHITELIST level dependent, currently this works when it shouldn't:

u = mda.Universe()

ag = u.atoms[:10]
ag.segid = 'this'

The problem is that segid exists, but isn't relevant to AtomGroup. I should probably try and make it fall into a wrong level error so we can get the nice error message that tells users how to do this correctly.

@kain88-de
Copy link
Member

I meant that we add a new attr with the method. Changing existing attributes should be still allowed.

@kain88-de
Copy link
Member

So code should work like this if possible.

u.atoms.add_attr('foo')
u.atoms.foo = 5
# this fails
u.atoms.bar = 5

@richardjgowers
Copy link
Member Author

Ah right, yeah so we could add a shortcut around our current restrictions. We'll still need the WHITELIST stuff here to make existing attributes work though.

Would foo in your example be a fully fledged topologyattr or just an attribute on that instance of AG?

@kain88-de
Copy link
Member

Would foo in your example be a fully fledged topologyattr or just an attribute on that instance of AG?

Whats the difference again?

@richardjgowers
Copy link
Member Author

If the attribute is just added to the a1 instance, then it doesn't touch anything like u._topology, or you could make it propagate down to topology so that other Groups also know about foo.

a1 = u.atoms[:10]
a2 = u.atoms[:10]

a1.add_attr('foo')
a1.foo  # returns something as default?
a1.foo = 10.0

a2.foo # does this even exist?

a3 = u.atoms[10:]
a3.foo  # does this have foo?

Our AtomGroups are essentially a window onto a database (populated with trj and top), so does setting an arbitrary attribute affect the database or just the window?

@kain88-de
Copy link
Member

Oh how should something like this then work

a1 = u.atoms[:10]
a2 = u.atoms[5:15]
a3 = u.atoms[:30]

a1.add_attr('foo')

# Does this work? What would be values of foo?
a2.foo

The overlap values should be NaN in theory. So maybe don't let it propagate to the topology until we have a proposal about the expected behavior of a topology with added attributes that specifies the corner cases we can think of.

Also ideally I would like something like this to work. This is the most common use case for that feature

# say u doesn't have charges
u = mda.Universe('some.pdb')
a1 = u.atoms[:50]
a1.add_attr('charges')
a1.charges = np.ones(50)
a1.write('test.pqr')

@richardjgowers
Copy link
Member Author

@kain88-de should be possible, currently it works like this

In [2]: import MDAnalysis as mda

In [3]: u = mda.Universe('2r9r-1b.xyz')

In [4]: u.atoms.charges
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-6c4cd27709aa> in <module>()
----> 1 u.atoms.charges

/home/richard/miniconda2/envs/mdadev/lib/python2.7/site-packages/MDAnalysis/core/groups.pyc in __getattr__(self, attr)
   1002                 pass
   1003         raise AttributeError("{cls} has no attribute {attr}".format(
-> 1004             cls=self.__class__.__name__, attr=attr))
   1005 
   1006     def __reduce__(self):

AttributeError: AtomGroup has no attribute charges

In [5]: from MDAnalysis.core.topologyattrs import Charges

In [6]: import numpy as np

In [7]: u.add_TopologyAttr(Charges(np.repeat(np.nan, len(u.atoms))))

In [8]: u.atoms.charges = 1.0

In [9]: u.atoms.charges
Out[9]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

What you're proposing is just a convenience layer that matches the string 'charges' to the correct class.

@kain88-de
Copy link
Member

@richardjgowers OK if we can add those then I'm OK with the proposed changes. None of my additions are needed. Making the adding of existing topology attribute classes easier can be done in 0.17.0

@richardjgowers richardjgowers force-pushed the issue-1092-restrict_setattr branch from 2c337f0 to 6a49887 Compare February 17, 2017 21:25
@richardjgowers richardjgowers changed the title WIP: strict attribute setting (Start of issue #1092) strict attribute setting issue #1092) Feb 17, 2017
@@ -148,15 +148,21 @@ def make_classes():

# The 'GBase' middle man is needed so that a single topologyattr
# patching applies automatically to all groups.
GBase = bases[GroupBase] = _TopologyAttrContainer._subclass()
GBase = bases[GroupBase] = _TopologyAttrContainer._subclass(singular=False)
GBase._SETATTR_WHITELIST = { # list of Group attributes we can set
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create a new version of the GroupBase, we also create a fresh list of 'allowed' attributes

# `ag.this = 42` calls setattr(ag, 'this', 42)
# we scan 'this' to see if it is either 'private'
# or a known attribute (WHITELIST)
if (not attr.startswith('_') and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private attributes aren't scrutinised

# Don't change this to isinstance,
# it is literally meant, if this class is the same
# Subclasses are free to setattr as they wish
type(self) in (self.universe._classes[AtomGroup],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only apply this strict attribute setting rule to the 3 canonical groups (Atom, Residue and SegmentGroup). This means that if someone makes a subclass of AtomGroup they can do whatever with their attribute space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I subclass AtomGroup, I loose the whitelist mechanism? Isn't that going to create surprises as it is a rather non-standard behaviour for python?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was more thinking the mechanism itself is non standard so it would avoid surprises my removing it. It's not a real feature that provides functionality, more of a safety mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather remain canonical Python and propagate the whitelist mechanisms through any derived classes.

# I want to create a subclass of AtomGroup
# that has a new attribute, should be possible..
def __init__(self, ix, u, thing):
super(AwesomeGroup, self).__init__(ix, u)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically testing that someone could subclass AtomGroup and not notice our weird __setattr__ hack. This does pass, but is reliant on super being called first. If someone calls super second in their init then it won't work, because when doing the setattr for thing it will try and check the Universe which doesn't exist in the namespace yet. I'm thinking of how to circumvent this, maybe just pass it along if ._u isn't set yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I try to subclass a group and set a class attribute?

class GreaterGroup(groups.AtomGroup):
    random_attribute = 'Jonathan is awesome'

and what about:

class GreaterGroup(groups.AtomGroup):
    _SETATTR_WHITELIST = {'toto', 'plop', 'thing'}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random attribute should work. We could try and make the whitelist extensible, but I'd rather just remove the safety rail in subclasses

@richardjgowers
Copy link
Member Author

Ok this could do with some eyes on it from @MDAnalysis/coredevs

@jbarnoud
Copy link
Contributor

@richardjgowers Your in-line comment on github could be actual comments in the code. They really help to understand what you are doing here.

bases[cls] = GBase._subclass(singular=False)
# CBase for patching all components
CBase = bases[ComponentBase] = _TopologyAttrContainer._subclass(singular=True)
CBase._SETATTR_WHITELIST = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have instead CBase._SETATTR_WHITELIST = copy.copy(GBAse._SETATTR_WHITELIST)? The two sets are identical at that point, and it would avoid code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CBase has singular versions of the white list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed. I read to fast.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions here and there.

# Don't change this to isinstance,
# it is literally meant, if this class is the same
# Subclasses are free to setattr as they wish
type(self) in (self.universe._classes[AtomGroup],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I subclass AtomGroup, I loose the whitelist mechanism? Isn't that going to create surprises as it is a rather non-standard behaviour for python?

@@ -1809,9 +1831,20 @@ def __init__(self, ix, u):
self._ix = ix
self._u = u

def __setattr__(self, attr, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method and the equivalent one in GroupBase are almost identical, couldn't their code be factorized in an external function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably

# I want to create a subclass of AtomGroup
# that has a new attribute, should be possible..
def __init__(self, ix, u, thing):
super(AwesomeGroup, self).__init__(ix, u)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I try to subclass a group and set a class attribute?

class GreaterGroup(groups.AtomGroup):
    random_attribute = 'Jonathan is awesome'

and what about:

class GreaterGroup(groups.AtomGroup):
    _SETATTR_WHITELIST = {'toto', 'plop', 'thing'}

@orbeckst
Copy link
Member

orbeckst commented Apr 4, 2017

Being able to set attributes like occupancy and charge even if they were not defined in the reader would be very good (ie., solving the problem in #1092). Can we get this PR into develop before 0.16.0?

@richardjgowers
Copy link
Member Author

@orbeckst I'd rather get 0.16 rather than delay it any more with nice-to-haves like this

@richardjgowers richardjgowers modified the milestones: 0.16.x, 0.16.0 Apr 4, 2017
@richardjgowers richardjgowers force-pushed the issue-1092-restrict_setattr branch from f8184db to 3c4d607 Compare December 6, 2017 11:50
@richardjgowers
Copy link
Member Author

Cool, good to go hopefully!

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments (see inline) but overall looks good.

@@ -965,7 +969,7 @@ Fixes

04/01/14 orbeckst, jandom, zhuyi.xue, xdeupi, tyler.je.reddy,
manuel.nuno.melo, danny.parton, sebastien.buchoux, denniej0,
rmcgibbo, richardjgowers, lennardvanderfeltz, bernardin.alejandro
rmcgibbo, richardjgowers, lennardvanderfeltz, bernardin.alejandro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation

@@ -1480,7 +1484,7 @@ Fixes
alternative atoms and insertion codes that are not needed for
handling MD simulation data).
- The default behaviour of MDAnalysis can be set through the flag
MDAnalysis.core.flag['permissive_pdb_reader']. The default is True.
MDAnalysis.core.flag['permissive_pdb_reader']. The default is True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation

return type(cls.__name__, (cls,), {})
newcls._SETATTR_WHITELIST = {
'position', 'velocity', 'force', 'dimensions',
'atoms', 'residue', 'residues', 'segment',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that I could add atoms or residues to an Atom or are there additional checks for the different levels or does it not matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you've found a loophole in everything :)

Erm, so currently there's two whitelists, one for Groups and one for Components. The way to fix this would be to instead have a whitelist for each class (Atom, Res, Seg, AG, RG, SG).

Not sure I'll find the time to fix this soonish, so this might have to get spun out into a low priority issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO comment to the code and raise an issue once this is merged...

>>> u.add_TopologyAttr('bfactors')
>>> u.atoms.bfactors
array([ 0., 0., 0., ..., 0., 0., 0.])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a

.. versionchanged:: 0.17.0
   Can now also accept the *name* of an existing TopologyAttr to be added; new `values` optional
   keyword to fill the attribute.

@orbeckst orbeckst modified the milestones: 0.16.x, 0.17.0 Dec 6, 2017
@orbeckst
Copy link
Member

orbeckst commented Dec 7, 2017

Ready to merge?

P.S.: Since dabbling in asv I have become a fan of merges again because we only benchmark merges by default in order to ensure non-broken states of the branch.

-------
For example to add bfactors to a Universe:

>>> u.add_TopologyAttr('bfactors')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: does bfactors exist?

A while back I said #1359 (comment) that we renamed it to tempfactors. Was I wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we have both as independent Attributes. These are strictly equivalent right?

So currently MMTF returns "Bfactors" with CRD/PDB/PDBQT returning tempfactors. So should probably just add an alias to the tempfactors Attribute which allows ".bfactors" as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are the same thing.

Copy link
Member

@orbeckst orbeckst Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .bfactors native to MMTF? I like Python's "There is one way of doing it" but then we'd need to decide if what we want to call it – bfactors or tempfactors. For users it would be easier to access the same TopolAttr with two different names, though. Let's discuss in a separate issue.

@richardjgowers richardjgowers merged commit 54d5bab into develop Dec 7, 2017
@richardjgowers richardjgowers deleted the issue-1092-restrict_setattr branch December 7, 2017 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants