support reading/writing atom props from SD files #2297

greglandrum · 2019-02-27T17:44:13Z

This still needs documentation in the "RDKit Book", but the basic idea is to allow automatic extraction of atomic property values from SDF data fields.
Here's a sample SDF showing the format:

test-mol
     RDKit  2D

  3  3  0  0  0  0  0  0  0  0999 V2000
    0.8660    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4330    0.7500    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4330   -0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  1  1  0
M  END
>  <atom.dprop.PartialCharge>  (1) 
0.008 -0.314 0.008

>  <atom.iprop.NumHeavyNeighbors>  (1) 
2 2 2

>  <atom.prop.AtomLabel>  (1) 
C1 N2 C3

>  <atom.bprop.IsCarbon>  (1) 
1 0 1

>  <atom.prop.PartiallyMissing>  (1) 
one n/a three

>  <atom.iprop.PartiallyMissingInt>  (1) 
[?] 2 2 ?
$$$$

This does not happen by default when processing SDFs; you have to call supp.SetProcessPropertyLists(True) in order to have it go.

still needs more tests

bp-kelley · 2019-03-01T18:24:25Z

Code/GraphMol/FileParsers/FileParserUtils.h

+                   boost::token_compress_on);
+      if (tokens.size() < mol.getNumAtoms()) {
+        BOOST_LOG(rdWarningLog)
+            << "Property list " << pn << " too short, only " << tokens.size()


I would make this a bit clearer

Atom Property List ... found, require properties for X atoms. Ignoring it.

bp-kelley · 2019-03-01T18:27:31Z

Code/GraphMol/FileParsers/FileParserUtils.h

+            << " elements found. Ignoring it." << std::endl;
+        continue;
+      }
+      std::string mv = missingValueMarker;


Isn't the standard missingValueMarker n/a?

So with this code, one can't write: [n/a] 1 2 since the token can only be one character. Also, what if the form is:

[] 1 2

This would make the missing token "]"

I would take everything between [ and ] for consistency.

I was incorrect about what substr was doing.

Add regression test for atom prop with multiple chars

[n/a] 1 2 [n/a]

bp-kelley · 2019-03-01T18:29:53Z

Code/GraphMol/FileParsers/FileParserUtils.h

+                << std::endl;
+          }
+          unsigned int atomid = i - first_token;
+          mol.getAtomWithIdx(atomid)->setProp(atompn, apv);


We need to make it clear in the spec that "missing values" don't assign properties at all. This will be important in the docs.

Code/GraphMol/FileParsers/FileParserUtils.h

bp-kelley · 2019-03-01T18:36:39Z

Code/GraphMol/FileParsers/FileParserUtils.h

+}
+inline void processMolPropertyLists(
+    ROMol &mol, const std::string &missingValueMarker = "n/a") {
+  applyMolListPropsToAtoms<std::string>(mol, "atom.prop.", missingValueMarker);


I might think about moving the loop through getPropList out of the apply:

for(auto prop:mol.getPropList()) { if (prop.find("atom.prop.") == 0 ...) applyMolListPropsToAtoms<std::string>(mol, prop, missingValueMarker); }

This would remove some work (3 extra loops and checks) and perhaps make the code a bit clearer.

I actually think this would make the code less flexible.
It would save a bit of work, but this is not going to end up being a measurable fraction of the time required to construct a molecule, so I think that's ok

bp-kelley · 2019-03-01T18:43:34Z

Code/GraphMol/FileParsers/MolSupplier.h

@@ -106,13 +106,21 @@ class RDKIT_FILEPARSERS_EXPORT ForwardSDMolSupplier : public MolSupplier {
  virtual ROMol *next();
  virtual bool atEnd();

+  void setProcessPropertyLists(bool val) {


Any reason for this not to be the default (true)?

bp-kelley · 2019-03-01T18:45:12Z

Code/GraphMol/Wrap/testPropertyLists.py

+        self.assertTrue(m.HasProp("atom.prop.AtomLabel"))
+        self.assertFalse(m.GetAtomWithIdx(0).HasProp("AtomLabel"))
+
+        sio = BytesIO(self.sdf)


Looking at the python api, I feel that exposing the property processing to python outside of the forward mol supplier might not be such a bad idea.

bp-kelley · 2019-03-02T13:57:01Z

@greglandrum I added my comments, I think a few things need to be resolved, but it mostly looks god.

bp-kelley

I think the parser should accept

[n/a] 1 2 n/a

which it currently doesn't (see comments).

We can get rid of some loops in the atom prop handling by removing the loops out of the lowest function.

bp-kelley · 2019-03-02T14:45:10Z

Code/GraphMol/FileParsers/FileParserUtils.h

+            << " elements found. Ignoring it." << std::endl;
+        continue;
+      }
+      std::string mv = missingValueMarker;


Add regression test for atom prop with multiple chars

[n/a] 1 2 [n/a]

bp-kelley · 2019-03-03T18:16:36Z

Code/GraphMol/FileParsers/FileParserUtils.h

        first_token = 1;
      }
+      if(mv.empty()){


good catch!

greglandrum added 20 commits February 26, 2019 03:23

first crude pass

77ed691

fix a deprecation

ff4ce92

change naming scheme, support bools

d0c52d0

add standalone function

87c2caf

add a default value for missings

ddac611

support long lines

9aebc7c

stupid typo

f4177cf

make operator[] work

574589a

revisit missing value handling

130389c

modify missing value handling

c71873c

switch to an alternate scheme for specifying missing values

8a54f43

clang-format

de4d1d3

First pass at property list parser

4c18b00

still needs more tests

add test for processMolPropertyLists

2a14a88

get this working as part of the ForwardSDMolSupplier

31ef937

first pass at python wrappers and tests

a0ed639

clang-format run

f6b136d

add creation of property lists at the mol level

8217239

wrap long lines on output

4e1a4c2

remove PoC implementation

c143df9

greglandrum added the enhancement label Feb 27, 2019

greglandrum added this to the 2019_03_1 milestone Feb 27, 2019

greglandrum added 2 commits February 28, 2019 04:32

fix python wrappers

a9b579d

remove out-of-date reference to the Python PoC

29e6007

bp-kelley reviewed Mar 1, 2019

View reviewed changes

Code/GraphMol/FileParsers/FileParserUtils.h Show resolved Hide resolved

bp-kelley reviewed Mar 1, 2019

View reviewed changes

bp-kelley requested changes Mar 2, 2019

View reviewed changes

changes in response to review

1bfdc60

bp-kelley reviewed Mar 3, 2019

View reviewed changes

Code/GraphMol/FileParsers/FileParserUtils.h

first_token = 1;

}

if(mv.empty()){

Copy link

Contributor

bp-kelley Mar 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

bp-kelley merged commit 180c15f into rdkit:master Mar 3, 2019

greglandrum deleted the feat/atom_props_from_sdf branch March 8, 2019 15:49

j-wags mentioned this pull request Apr 19, 2019

Add support for loading partial charges from SDF tags using new standard openforcefield/openff-toolkit#250

Closed

proteneer mentioned this pull request Aug 17, 2023

Which procedures require OpenEye? proteneer/timemachine#1111

Closed

diogomart mentioned this pull request Dec 4, 2024

Read Partial Charges from mol2 and SDF forlilab/Meeko#258

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support reading/writing atom props from SD files #2297

support reading/writing atom props from SD files #2297

greglandrum commented Feb 27, 2019

bp-kelley Mar 1, 2019

bp-kelley Mar 1, 2019

bp-kelley Mar 2, 2019

bp-kelley Mar 2, 2019

bp-kelley Mar 1, 2019

bp-kelley Mar 1, 2019

greglandrum Mar 2, 2019

bp-kelley Mar 1, 2019

bp-kelley Mar 1, 2019

bp-kelley commented Mar 2, 2019

bp-kelley left a comment

bp-kelley Mar 2, 2019

bp-kelley Mar 3, 2019

support reading/writing atom props from SD files #2297

support reading/writing atom props from SD files #2297

Conversation

greglandrum commented Feb 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bp-kelley commented Mar 2, 2019

bp-kelley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment