Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QChem: add CMIRS solvent model support #2215

Merged
merged 40 commits into from
Nov 22, 2022

Conversation

rkingsbury
Copy link
Contributor

@rkingsbury rkingsbury commented Aug 11, 2021

Summary

Additions to QCInput and QChemDictSet needed to support the new CMIRS implicit solvent model and the isodensity implementation of SS(V)PE.

See relevant Q-Chem manual pages:

Specific changes include:

  • Support $svp and $pcm_nonels sections in QCInput
  • Add isosvp_dielectric and cmirs_solvent kwargs and documentation to QChemDictSetand all inherited InputSet classes
  • Add warnings or errors as appropriate to alert users to combinations of inputs with non-obvious results
  • Small bugfix to remedy the fact that Q-Chem does not recognize 'dimethyl sulfoxide' as a valid solvent even though that is the keyword given in the manual. DMSO works.
  • Noteworthy change - modify lower_and_check_unique utility such that it converts all values (including numeric values) to strings. This was necessary to facilitate reading/writing numerical parameters to and from files. the already-existing pcm_defaults dictionary used str to store its numerical parameters, so this change appears to be consistent with the original design intent and it appears not to break anything. However, as a result of that change, the dielectric attribute is now a str and not a float.
  • Fix a bug regarding duplicate key removal and add tests for lower_and_check_unique
  • Update header info (authors, copyright date, etc.) in all files. The __version__=0.1 didn't seem meaningful and it didn't make sense to have Brandon's stale email in there
  • Add output parsing. CMIRS and ISOSVP solvent data are nested under solvent_data.cmirs and solvent_data.isosvp, respectively. Per discussion with @samblau , this is in preparation for nesting PCM and SMD data in a similar way, which we plan to do in a separate PR.
  • Add tests for all new functionality, including units tests and multi-job output files.

TODO

  • Manually test within atomate workflow

@coveralls
Copy link

coveralls commented Sep 21, 2022

Coverage Status

Coverage remained the same at 0.0% when pulling df3a879 on rkingsbury:cmirs into 03691a8 on materialsproject:master.

@rkingsbury
Copy link
Contributor Author

@samblau @espottesmith I want to flag this for you guys to start reviewing as it's 95% done. I'm leaving in WIP status b/c I want to add output parsing before merging, but the input setup parts are solid. I particularly want to call attention to the change in lower_and_check_unique that guarantees all numerical values get stored as str. It appears innocuous but you might know of situations where that would cause problems.

Copy link
Contributor

@espottesmith espottesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems fine to me. A few small style things, a naming issue, but the bulk of the work looks solid. Looking forward to seeing your output parser and getting this integrated into atomate/emmet!

__copyright__ = "Copyright 2018, The Materials Project"
__version__ = "0.1"
__email__ = "[email protected]"
__credits__ = "Xiaohui Qu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably Xiaohui should still get credit. His original work inspired what's now in pymatgen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not clear on the distinction between __credits__ and __authors__ (I might just be ignorant). Those seemed equivalent to me. Happy to revert though if __credits__ implies origination or some other special status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make sure to add Xiaohui to the __authors__ list for any file where he was listed in __credits__, in case that wasn't obvious

svp={"RHOISO": 0.001, "DIELST": 78.39, "NPTLEB": 110}
Note that the "DIELST" setting has no effect on non-isodensity SS(V)PE calculations (i.e., when
solvent_method = PCM).
pcm_nonels (dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name feels misleading and possibly just wrong. These aren't PCM parameters. They're nonelectrostatic implicit solvent parameters for CMIRS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair; however $pcm_nonels is the name Q-Chem chose for the relevant input file section. It seems like throughout our infrastructure we use the exact section names as variables (e.g. rem, solvent, etc.) to avoid ambiguity, so I thought it best to do so here as well.

pymatgen/io/qchem/inputs.py Show resolved Hide resolved
pymatgen/io/qchem/inputs.py Show resolved Hide resolved
pymatgen/io/qchem/inputs.py Outdated Show resolved Hide resolved
pymatgen/io/qchem/inputs.py Show resolved Hide resolved
@rkingsbury
Copy link
Contributor Author

OK, I have the basics of output parsing in place now. However I'm having difficulty understanding how test_outputs.py works. I'd also like some guidance on how to structure the output keys. I'll await feedback from @espottesmith and @samblau before moving ahead.

Specifically:

  1. I added 4 test files (currently commented out) to multi_job_out_names in test_outputs.py, and then in a terminal ran TestQCOutput.generate_multi_job_dict() to update the .json file. But even after doing this I got a KeyError on the new filenames when I tried to run the tests. Note that I have not added any of the new properties to the test yet; I just wanted to make sure they ran cleanly after adding the new files.

  2. Currently I only parse the energies (and dielectric constant) that come out of ISOSVP / CMIRS calculations. This appears to be the approach taken for SMD as well. However unlike SMD, in CMIRS the solvents are not selected by name, but rather by specifying parameters in the $pcm_nonels section. Moreover, the only place these parameters show up in the output file is in the $pcm_nonels section of the input block at the beginning; they aren't listed elsewhere. So there is no obvious way to determine what the solvent is. I could try construct a QCInput from the input block in the output file, but that seems a bit convoluted considering that ultimately we will have a TaskDocument with a calcs_reversed[0] that will do that for us, and perhaps we can use a builder to map the solvent parameters back to named solvent?

  3. data field names. From what I can tell, PCM and SMD populate completely different keys in the solvent_data block, with no overlap. I've taken the same approach here; although there's an argument that we should populate a single dielectric field whether we're using PCM, CMIRS, or ISOSVP. I made the field names similar to what is listed in the output file, but feedback there is welcome. For example, it might make sense to prefix the field names with isosvp or cmirs to make clear what's what in the final task doc, because there are going to be a lot of fields with similar names now that we have four possible solvent models *e.g. solute_internal_energy for SMD vs. change_solute_internal_e for ISOSVP

@janosh janosh added the stale Abandoned or conflicting PRs and outdated issues label Oct 23, 2022
Copy link
Contributor

@samblau samblau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor things, but over all looks great. Output testing still needed before PR is finalized.

pymatgen/io/qchem/inputs.py Outdated Show resolved Hide resolved
pymatgen/io/qchem/outputs.py Outdated Show resolved Hide resolved
pymatgen/io/qchem/outputs.py Outdated Show resolved Hide resolved
pymatgen/io/qchem/utils.py Outdated Show resolved Hide resolved
@rkingsbury rkingsbury marked this pull request as ready for review November 2, 2022 04:42
@rkingsbury
Copy link
Contributor Author

rkingsbury commented Nov 2, 2022

@samblau ready for another review. Per our discussion, I went ahead and nested the solvent data under cmirs andisosvp keys in the solvent_data dict. I held off doing so for PCM and SMD b/c we think that should be in a separate PR.

I will test this with atomate soon and report back if I encounter any problems.

@rkingsbury
Copy link
Contributor Author

rkingsbury commented Nov 5, 2022

UPDATE: I've done some testing of this with atomate and can confirm the following:

  • Task docs are still populated with float (e.g. solvent_data.pcm_dielectric is still a float) despite the change in lower_and_check_unique.
  • PCM and SMD calculations parse correctly and now include None for keys associated with other solvents (i.e., all solvent_data keys are always present for any solvent; those that aren't relevant are None).
  • ISOSVP and CMIRS Calculations set up as intended and run successfully

However
I've also found one or more "fun" undocumented behaviors / bugs in Q-Chem that cause some calculations to completely skip the solvation phase! So although it looks like an ISOSVP calculation has completed successfully, the solvent_data.isosvp information will be blank because that phase of the calculation was actually skipped, and the associated information isn't in the output file.

I'm working on pinning down what's going on here, and that might warrant some small modifications to the IO. See forum post

@rkingsbury
Copy link
Contributor Author

Alright, I think all issues have been resolved and this is working as intended. I learned (the hard way) that you have to disable gen_scfman in order for ISOSVP or CMIRS to work, and add a variable IPNRF to the $SVP section in order for CMIRS to work (neither of which is documented right now in the Q-Chem manual - see forum post).

After adding those changes to our IO, I have successfully set up and run single point calcs using all four solvent models (PCM, SMD, ISOSVP, and CMIRS) and inspected the solvent_data key of the resulting task docs. See an example below for a CMIRS calculation. All keys are present always, and assigned values of None if they don't apply.

Unless @samblau or @espottesmith spot anything problematic, I think this is ready to merge.

image

@mkhorton mkhorton removed the stale Abandoned or conflicting PRs and outdated issues label Nov 10, 2022
@mkhorton
Copy link
Member

I'd like to go ahead and merge this -- @samblau, @espottesmith unless you have any objections, will go ahead.

@samblau
Copy link
Contributor

samblau commented Nov 11, 2022

@mkhorton looks good to me! merge away

@mkhorton mkhorton merged commit 7a51c9b into materialsproject:master Nov 22, 2022
@mkhorton
Copy link
Member

Thanks @rkingsbury, merged :)

@rkingsbury rkingsbury deleted the cmirs branch October 10, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants