-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support reading/writing atom props from SD files #2297
Conversation
still needs more tests
boost::token_compress_on); | ||
if (tokens.size() < mol.getNumAtoms()) { | ||
BOOST_LOG(rdWarningLog) | ||
<< "Property list " << pn << " too short, only " << tokens.size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a bit clearer
Atom Property List ... found, require properties for X atoms. Ignoring it.
<< " elements found. Ignoring it." << std::endl; | ||
continue; | ||
} | ||
std::string mv = missingValueMarker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the standard missingValueMarker n/a?
So with this code, one can't write: [n/a] 1 2 since the token can only be one character. Also, what if the form is:
[] 1 2
This would make the missing token "]"
I would take everything between [ and ] for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was incorrect about what substr was doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add regression test for atom prop with multiple chars
[n/a] 1 2 [n/a]
<< std::endl; | ||
} | ||
unsigned int atomid = i - first_token; | ||
mol.getAtomWithIdx(atomid)->setProp(atompn, apv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make it clear in the spec that "missing values" don't assign properties at all. This will be important in the docs.
} | ||
inline void processMolPropertyLists( | ||
ROMol &mol, const std::string &missingValueMarker = "n/a") { | ||
applyMolListPropsToAtoms<std::string>(mol, "atom.prop.", missingValueMarker); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might think about moving the loop through getPropList out of the apply:
for(auto prop:mol.getPropList()) {
if (prop.find("atom.prop.") == 0 ...)
applyMolListPropsToAtoms<std::string>(mol, prop, missingValueMarker);
}
This would remove some work (3 extra loops and checks) and perhaps make the code a bit clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this would make the code less flexible.
It would save a bit of work, but this is not going to end up being a measurable fraction of the time required to construct a molecule, so I think that's ok
@@ -106,13 +106,21 @@ class RDKIT_FILEPARSERS_EXPORT ForwardSDMolSupplier : public MolSupplier { | |||
virtual ROMol *next(); | |||
virtual bool atEnd(); | |||
|
|||
void setProcessPropertyLists(bool val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for this not to be the default (true)?
self.assertTrue(m.HasProp("atom.prop.AtomLabel")) | ||
self.assertFalse(m.GetAtomWithIdx(0).HasProp("AtomLabel")) | ||
|
||
sio = BytesIO(self.sdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the python api, I feel that exposing the property processing to python outside of the forward mol supplier might not be such a bad idea.
@greglandrum I added my comments, I think a few things need to be resolved, but it mostly looks god. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the parser should accept
[n/a] 1 2 n/a
which it currently doesn't (see comments).
We can get rid of some loops in the atom prop handling by removing the loops out of the lowest function.
<< " elements found. Ignoring it." << std::endl; | ||
continue; | ||
} | ||
std::string mv = missingValueMarker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add regression test for atom prop with multiple chars
[n/a] 1 2 [n/a]
first_token = 1; | ||
} | ||
if(mv.empty()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
This still needs documentation in the "RDKit Book", but the basic idea is to allow automatic extraction of atomic property values from SDF data fields.
Here's a sample SDF showing the format:
This does not happen by default when processing SDFs; you have to call
supp.SetProcessPropertyLists(True)
in order to have it go.