-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read Partial Charges from mol2 and SDF #258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks!
Minor requests:
- If mol2 files are the only source of partial charges in rdkit atom properties (as far as I know they are, but I could be wrong), set default of
charge_propname
to "_TriposPartialCharge" instead ofNone
inMoleculePreparation.__init__
and remove the warning in lines 168-173 of preparation.py. - in molsetup.py, keep the warning and default of
read_charges_from_prop
toNone
exactly as they are. Maybe some user is building a molsetup manually and these warnings can be helpful. - suggestion: rename
charge_propname
in MoleculePreparation tocharge_atom_prop
, andread_charges_from_prop
also tocharge_atom_prop
. Makes it easier to find connections in the code withgrep
.
Another thing is to add "read" choice to mk_prepare_ligand.py
|
If any other property name was to be used, it would use the MOL/SDF format, as MOL2 charges would always be parsed into the |
Hi @diogomart
The
I did this on purpose because they have different dominance over other options.
Tomorrow I will work on:
|
Ok, all of that sounds good to me. SDF is already supported, it's just a different atom property name. |
(If you happen to have one) could you provide an SDF with partial charges? Do people write partial charges in property field or somewhere else? The SDF on PubChem usually doesn’t come with partial charges. But other software / packages might be able to write them in custom fields. The charges will be parsed as molecule property but not atom property. This PR currently needs charges in atom property no worry though, I can look that up from some other databases tomorrow. |
They can actually be parsed to an atom property (I learned that today), see my post above, the link to an rdkit PR 2297 has an example SDF with charges. |
Ahhhh!!! I see, thanks! I will look into that |
…g and warning msg
preparator = MoleculePreparation.from_config(config); minor edits to help msg
… MoleculePreparation.from_config(config); minor edits to help msg
Summary of added commits:
Expected CLI usage:
The following will not work:
Because the default charge_model in I tested with a few Python and command-line script examples. @diogomart thanks again for your suggestions, and please let me know what you think! |
meeko/cli/mk_prepare_ligand.py
Outdated
@@ -78,6 +78,11 @@ def cmd_lineparser(): | |||
"--name_from_prop", | |||
help="set molecule name from RDKit/SDF property", | |||
) | |||
io_group.add_argument( | |||
"--charge_from_prop", | |||
nargs="*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for nargs
meeko/cli/mk_prepare_ligand.py
Outdated
|
||
preparator.charge_model = "read" | ||
if args.charge_from_prop: | ||
preparator.charge_atom_prop = args.charge_from_prop[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for [0]
after nargs above is removed
meeko/cli/mk_prepare_ligand.py
Outdated
preparator.charge_atom_prop = args.charge_from_prop[0] | ||
else: | ||
if ext=="mol2": | ||
preparator.charge_atom_prop = "_TriposPartialCharge" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is inside the if args.charge_from_prop is not None
so i think this never gets executed
Hi @diogomart if args.charge_from_prop is not None:
if args.charge_model != "read":
print(f'--charge_from_prop must be used with --charge_model "read", but the current charge_model is "{args.charge_model}". ')
sys.exit(1)
config["charge_atom_prop"] = args.charge_from_prop if args.charge_from_prop else "_TriposPartialCharge" if ext == "mol2" else None
else:
if args.charge_model == "read":
if ext=="mol2":
config["charge_atom_prop"] = "_TriposPartialCharge" Here I wanted to override config["charge_atom_prop"] according to arguments and |
minor fix in error msg
0568941 should fix the current issues that are specific to the added arguments. There might still be a small problem with how command line arguments override config, specifically those with a default value for all other command-line arguments, like Something like the following might work: # command line arguments override config
explicitly_provided_args = {
arg.lstrip('-').replace('-', '_') for arg in remaining_argv if arg.startswith('--')
}
for key in config:
if key in explicitly_provided_args:
config[key] = args.__dict__[key] To handle arguments that are provided in the short-form, a mapping can be created, like: # Mapping arguments short-form to long-form
short_to_long = {}
for action in parser._actions:
option_strings = action.option_strings
short_option = next((opt for opt in option_strings if opt.startswith('-') and not opt.startswith('--')), None)
long_option = next((opt for opt in option_strings if opt.startswith('--')), None)
if short_option and long_option:
short_to_long[short_option.lstrip('-')] = long_option.lstrip('--')
explicitly_provided_args = explicitly_provided_args | {
short_to_long[arg.strip('-')] for arg in remaining_argv if arg.startswith('-') and not arg.startswith('--')
} But maybe it's all unnecessary. @diogomart Could you please confirm that we want to keep the config option, and should make the changes to only allow argument overriding config for arguments that are explicitly provided? Does the current type of overriding align with what we want, and it's not necessary to change? I will come back and take a closer look later this evening. Thank you for your time and kind advice in advance. |
fix the special handling of some agrs that unexpectly end up with default (without referring to config)
The charge parsing should be working now, hopefully.. Also fixed some small problems with command line arguments defaults unexpectedly overriding config options. The changes are, specifically:
config["load_atom_params"] = args.load_atom_params Because config should already be synced after config[key] = args.__dict__[key]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you for all the clean up. Great catch with the add_atom_types option.
We do want to keep the option to pass a JSON config file, and have any command line arguments override those in the config file, but only if they are explicitly passed. This is what is implemented, and based on some testing it is working as expected.
I have only minor requests, which are in-lined. Thanks again!
meeko/cli/mk_prepare_ligand.py
Outdated
@@ -52,7 +51,11 @@ def cmd_lineparser(): | |||
if confargs.config_file is not None: | |||
with open(confargs.config_file) as f: | |||
c = json.load(f) | |||
config.update(c) | |||
if any(key not in config for key in c): | |||
print(f"Error: Got unsupported keyword ({key}) for MoleculePreparation from the config file ({confargs.config_file}). ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key is not defined in the f string (that's why we iterate over the keys in MoleculePreparation.from_config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure we need to test the keys here, MoleculePreparation.from_config
does a check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right on both. I made the unnecessary changes but I forgot to revert.
meeko/cli/mk_prepare_ligand.py
Outdated
@@ -78,6 +81,10 @@ def cmd_lineparser(): | |||
"--name_from_prop", | |||
help="set molecule name from RDKit/SDF property", | |||
) | |||
io_group.add_argument( | |||
"--charge_atom_prop", | |||
help="set atom partial charges from an RDKit atom property based on the input file. The default is 'PartialCharge' for SDF and '_TriposPartialCharge' for MOL2 unless overriden by a user defined property name. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this next to --charge_model
in the Molecule Preparation group. While it is certainly related to I/O, being next to charge_model is more important, and charge_model belongs in molecule preparation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
meeko/cli/mk_prepare_ligand.py
Outdated
# command line arguments override config | ||
for key in config: | ||
if key in args.__dict__: | ||
for key in args.__dict__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would like the "command line arguments override config" comment back :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, I would revert that
Hi @diogomart thanks again for the comments. Sorry this took so many commits; I hope things are ok now. I did not understand the overriding at first, so I did too much rewrite that was unnecessary. |
Thanks for the edits. The overriding is done in a complicated way... couldn't come up with a better one. I just pushed a commit to add a couple comments and moved a block of code to make it more understandable. Sorry for the lack of comments! |
Hi @diogomart Thanks so much for the comments and for your kind support, as always, merging! |
This is for #224.
Chem.MolFromMol2File
in RDKit parses partial charges into atom property name_TriposPartialCharge
. This PR simply reads the charges from atom property at the point init_atom.An additional attribute,
charge_propname
, is added toMoleculePreparation
, and it needs to be used with optioncharge_model = "read"
. The default is_TriposPartialCharge
.The corresponding argument
read_charges_from_prop
is also added to thefrom_mol
method for RDKitMoleculeSetup to be accessed at a lower level. It must not be conflicting with argumentcompute_gasteiger_charges
(formerlyassign_charges
).Some error messages for conflict handling are added.
Notes:
_TriposPartialCharge
, any other named RDKit Atom attribute can be used as the source of charges.Python usage example:
212163792.mol2.txt