-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend GIST to mixtures and non-water solvents #961
Conversation
Removes * the "excludeions" option * the Gist::includeIons_ variable * the Gist::A_idxs_ array, which tracked the solute+solvent atoms, and simply contained all atom indices unless excludeions was specified.
* Two new methods Action_GIST::setSolventProperties and Action_GIST::checkSolventProperties
* Removes checks for water is a solvent, but there is still a warning, and it is checked that all solvents are the same molecule. * Molecules are currently binned by their center of mass (COM), but it is planned to add a "com" or "nocom" option to let the user choose. * The user can choose 3 atoms that form a rigid subunit of the molecule for the quaternion construction * There are some further refactorings: datasets are not cast to DataSet_GridFlt or ...Dbl, instead the polymorphic DataSet_3D is kept. I changed this after debugging for a while because I had accidently changed a dataset from double to float. The polymorphic option is more robust, but the downside is that DataSet_3D.operator[] is not writable, so we need to use the SetGrid method. Currenty unfinished * density output (g_... columns) for non-water solvents!!! * order calculation (not planned for non-water solvents) * neighbor calculation (not sure if I will implement this, it probably conflicts with the "com" option
* Remove individual dataset pointers from Action_GIST class. * Instead, manage an std::map to call them by their name. For fast access (e.g., within DoAction), the pointer is re-used. * Dynamically add g_H, g_H, g_C etc. datasets for each element in the solvent. * Add a struct SolventInfo (instance name solventInfo_) that tracks the elements in the solvent, and a method "getDensityDatasets" that returns a vector of pointers to the respective density datasets. solventInfo_ is filled by the analyzeSolventElements method during Action Setup. * Add a class DataFilePrinter (private to Action_GIST), that prints whitespace separated values to a CpptrajFile. This replaces the long format string in Action_GIST::Print().
* implement some dataset math that will replace the CalcAvgVoxelEnergy and CalcAvgVoxelEnergy_PME functions * fix a bug when parsing rigidAtomIndices * implement a nocom option to recover the old GIST behavior, and a calcMolCenter method that computes the position of a molecule in the appropriate manner (COM or first "rigid" atom)
* make individual pointers for Esw_, Eww_ etc. datasets, instead of the std::map. * Refactor some of the dataset math in the Print method
* remove solute positions from Esw in PME-GIST. * divide Eww by two when printing the total Eww on the grid. * Add -a flag to DoTest because energy output is off in the last printed digit * Fix GPU code after the last refactoring
rigidAtomIndices_ specifies which atoms of the solvent define a rigid subunit. This is needed for the entropy calc, and also (if "nocom" is specified) to define the molecule's position. Each number is an offset from the molecule's start index (i.e., 0 means the first atom of the moleucle). The first number is the central atom, and the second and third number are other atoms that are bound to it. It can be set in 2 ways: * The user specifies rigidatoms. In that case, the atom names are matched to set the indices (offset from molecule start) * If the user doesn't specify, the solvent atom with the most bonds is searched. If it has less than 2 bonds to heavy atoms, hydrogens are also included. The central atom is the one with the most bonds. The two others are chosen from the atoms that are bonded to it. Molecular densities are tracked in separate datasets. They are defined relative to the solvent bulk density, and are computed by binning the solvent centers of each specified solvent species (either the COM or the first of "rigidAtomIndices_"). The user has to choose which molecules are tracked by specifying "solventmols", followed by a comma-separated list of molecule names. No selection masks are allowed, firstly because this would mess with the comma-separated list, and also because we anyway assume that we select whole molecules. For each specified molecule, a dataset is generated and written out as dx file, e.g., gist-g-mol-WAT.dx
* add entropy tests for Gist4 test and Gist2 test * run PME tests on GPU builds as well. * change test save files to output v4, but don't change any other numbers. * Use DoTest ... -a to allow for minor differences.
* Parallel entropy calculations (dTSorient and dTSsix+dTStrans) for GIST, using OpenMP. The parallelization is very easy because most of the calculation can be done independently for each voxel, with only read access the voxel_xyz_ and voxel_Q_ arrays. The parallelization is implemented using the loop over all voxels, and there is a critical section when writing the data to arrays and incrementing the count of finished voxels and the count of water molecules. In a test system of 10000 frames of benzene in water, the results are identical to the previous version. * If skipS_ is specified, don't fill the voxel_xyz_ and voxel_Q_ arrays, since they are only needed for the entropy. Those are arrays of arrays. The outer array is still allocated, but each voxel array now stays empty with skipS_. For large GIST calculations (say, 200x200x200 voxels and >10000 frames), those arrays need large amounts of memory (~8GB in this example, assuming mostly water in the grid) This allows for an efficient strategy for very large GIST calculations: do the energy once in total (for GPU or PME implementations, this is more efficient than splitting the grid) and do the entropy calculation for small sub-grids, or on a cluster with high RAM (now also in parallel).
The PGI build in Jenkins failed. |
This pull request introduces 3 alerts when merging 298f245 into f1783be - view on LGTM.com new alerts:
|
* Tests for GIST entropy were added previously, but the save files were not yet committed, leading to failing tests.
The PGI build in Jenkins failed. |
1 similar comment
The PGI build in Jenkins failed. |
The PGI build in Jenkins failed. |
Should we update the tests to use "DoTest ... -a $TEST_TOLERANCE" also for the full output file? Currently, there is a 0.0001 difference (the last printed digit) in one line of Gist4-Eww-dens.dx and in one line of Gist3-output.dat. Otherwise, I can also update the .save files. |
Why remove the
It depends on whether you think the error is due something innocuous like floating-point round-off error or not. If the difference is due to the fact that the underlying algorithms have changed (i.e., it will always have the new value), then it's best to update the test values (assuming they have been properly validated). If the difference is because at some point the calculation accumulates round-off error (because numbers become denormalized or something; this happens a lot in curve-fitting for example) then it's appropriate to add a tolerance. Let me know if that makes sense. |
The reason I removed it was that I changed a lot of stuff regarding the assignment of atoms for the energy / entropy calculations, and it would have made everything more difficult. However, now that everything else is working, I assume that we could add it again without a large amount of work. Basically, ions are already omitted in the entropy calculation, but we would have to also omit them in the energy calculation.
I think that the difference is due to some floating point error, because it is very small and only occurs in one voxel of the test calculations. However, it will probably stay the same as long as we don't change the implementation again, so we could change the .save files. Adding a tolerance would make the tests more robust to floating point errors in the future, but could also mask small systematic errors. |
I added some comments and updated the documentation. I also added another regression test for benzene as solvent. Furthermore, I added our papers regarding the extension to multiple solvents and to salt-water mixtures to the GIST citation list. Is this ok with you? |
The PGI build in Jenkins failed. |
1 similar comment
The PGI build in Jenkins failed. |
This pull request introduces 1 alert when merging 37cbf98 into b65fc44 - view on LGTM.com new alerts:
|
src/Action_GIST.cpp
Outdated
infofile_->Printf("Ensemble total solute energy on the grid: %9.5f Kcal/mol \n", SumDataSet(*U_PME_) / NFRAME_); | ||
} | ||
if (n_linear_solvents_ > 0) { | ||
mprintf("GIST warning: %d almost-linear solvent molecules occurred. Maybe choose other \"rigidatoms\".\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This printf is missing an argument for the %d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I meant to print n_linear_solvents.
There is a test failure in the PGI build (this is non-fatal by default since PGI compilers are a bit out of the mainstream these days):
Seems like it might be round-off to me, but it's worth double-checking. Do the diffs look reasonable to you? If so, you may want to add/increase the test tolerance. If you're not comfortable with that we could potentially ignore it for now in favor of getting the code merged. |
The PGI build in Jenkins failed. |
The error in the PGI build seems a bit too much for round-off, since we are not adding a lot of numbers here (it is 10 frames of one molecule). I'll try to get a PGI build locally to debug it. |
I did some testing and it seems that the failing PGI test is actually a round-off problem.
With It is certainly rare to find low angles in a GIST calculation (because they are sampled from a 3D space), but on the other hand, 0.8 degree does not seem that small. How should we proceed? |
A more accurate version could be implemented easily by re-normalizing the quaternions at double precision before calculating the angle. However, I would not like to store the quaternions at double precision because the RAM requirement of GIST can sometimes be problematic, and the quaternion array is already one of the main offenders. |
I unfortunately don't have ready access to a modern PGI to try to reproduce the problem; this issue doesnt appear with v17 PGI. Is this the code (in Action_GIST::Print()) that leads to the round-off error? : double rR = fabs( voxel_Q_[gr_pt][q1 ] * voxel_Q_[gr_pt][q0 ]
+ voxel_Q_[gr_pt][q1+1] * voxel_Q_[gr_pt][q0+1]
+ voxel_Q_[gr_pt][q1+2] * voxel_Q_[gr_pt][q0+2]
+ voxel_Q_[gr_pt][q1+3] * voxel_Q_[gr_pt][q0+3] ); If so, can't we just cast all the voxel values to double during the calculation? double rR = fabs( (double)voxel_Q_[gr_pt][q1 ] *(double) voxel_Q_[gr_pt][q0 ]
+ (double)voxel_Q_[gr_pt][q1+1] * (double)voxel_Q_[gr_pt][q0+1] etc. This can be done anywhere there's potential for round-off error. |
I think this should give us consistent results between compilers. But the angle would not be much more accurate, because the quaternions are not normalized in double precision. A small error in the absolute value of the quaternion leads to a bigger error in the angle because arccos is very steep close to 1. But, as I said, I expect that this is only relevant very rarely. |
I think I see now. It's the values being put into the
If your feeling is that the code is good as-is that's fine. I think we do either (or both) of the following:
Doing both might provide the best coverage, but if you just want to do |
I quickly checked, and both compilers give the different result when casting to double. I.e., it seems that GCC is calculating the dot product in float, while PGI is casting to double by default. |
Using the following snippet, both compilers print out 0.0152023, which is consistent with the quaternion library, even though the input variables are still in float.
|
We could also avoid the sqrt by assuming that the quaternions are close to normalized. There is an approximate equation as follows:
This still gives the correct result on both compilers. I would have to test how much slowdown this would be compared to the existing implementation. Edit: |
I am getting a slowdown of 22% (Action_Post) with the more exact quaternion distance. The six-dimensional distance is different by 0.16 kcal/mol over the entire grid. This is surprisingly high (and I will double-check this next week). However, I am not sure whether it is worth the slowdown, because we also have other error sources of similar magnitude (even the output file precision might be in that range). |
Yeah. I'd rather be slower and keep the accuracy. Those errors will be
magnified with bigger systems.
Tom
…On Fri, Jun 3, 2022 at 10:49 AM Franz Waibl ***@***.***> wrote:
I am getting a slowdown of 22% (Action_Post) with the more exact
quaternion distance. The six-dimensional distance is different by 0.16
kcal/mol over the entire grid. This is surprisingly high (and I will
double-check this next week). However, I am not sure whether it is worth
the slowdown, because we also have other error sources of similar magnitude
(even the output file precision might be in that range).
—
Reply to this email directly, view it on GitHub
<#961 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGYKFQZAYX654YUO6T2T7N3VNILO3ANCNFSM5SQA4VBQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
************************************************
Tom Kurtzman, Ph.D.
Professor
Department of Chemistry
Lehman College, CUNY
250 Bedford Park Blvd. West
Bronx, New York 10468
718-960-8832
http://www.lehman.edu/faculty/tkurtzman/
<http://www.lehman.edu/faculty/tkurtzman/index.html>
************************************************
|
Agreed. @fwaibl let me know when you think the code is final and I can merge it. |
* add functions to compute the distance between almost normalized quaternions, such as those cast from float to double. * update unitTests with new functions
Now everything seems consistent. If you agree with the new code and there is nothing else to correct, I think we could merge. |
Great!
On Tue, Jun 7, 2022 at 8:21 AM Daniel R. Roe ***@***.***> wrote:
Merged #961 <#961> into master.
—
Reply to this email directly, view it on GitHub
<#961 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGYKFQ3SOTY4HNMGHYJIA7LVN45DTANCNFSM5SQA4VBQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Sent from My phone
|
This pull request extends GIST towards solvents other than water, and, to some extent, towards solvent mixtures (especially salt-water mixtures). Furthermore, it parallelizes the GIST entropy calculation using OpenMP.
Summary of changes:
Changes in cpptraj tests
To-Do
Several things can/should still be updated.
Other problems
@drroe @tkurtzman what do you think about it? I decided to send a pull request even though it is not completely finished, so I can get some feedback already. Do you have suggestions regarding the changed user interface options, or should I revert some of the changes?
Should we add separate Eww output for components of a solvent mixture? This would make the post-processing a lot easier, but I did not implement it yet because I also did not have it for my publication on salt-water mixtures in GIST, so I am used to work with the single energy output. (The problem is that we need a separate reference energy for each component)
The entropy is currently only calculated for the main solvent. This is no big problem because the 1st order entropy is independent for multiple solvents, so it is cheap to run multiple calculations for multiple solvents. For the energy, this is not an option (at least, it requires many GIST runs)