-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pt): train with energy Hessian #4169
Conversation
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
source/tests/pt/test_change_bias.py (2)
126-129
: Consider refactoring common verification logicThe verification logic is duplicated across test methods. Consider extracting the common model initialization and verification code into a helper method.
+ def _verify_bias_change(self, state_dict, expected_bias): + model_params = state_dict["model"]["_extra_state"]["model_params"] + model_for_wrapper = get_model_for_wrapper( + model_params, + _loss_params=self.loss_params, + ) + wrapper = ModelWrapper(model_for_wrapper) + wrapper.load_state_dict(state_dict["model"]) + updated_bias = wrapper.model["Default"].get_out_bias() + torch.testing.assert_close(updated_bias, expected_bias)
150-153
: Consider parameterizing the test with more test casesThe test could be enhanced by using
@pytest.mark.parametrize
to test various user-defined bias values, including edge cases and invalid inputs.Example parameterization:
@pytest.mark.parametrize("user_bias", [ [0.1, 3.2, -0.5], # current case [0.0, 0.0, 0.0], # zero bias [-1e6, 1e6, 0], # extreme values ]) def test_change_bias_with_user_defined(self, user_bias): # existing test code
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/tests/pt/test_change_bias.py
(4 hunks)
🔇 Additional comments (2)
source/tests/pt/test_change_bias.py (2)
90-90
: LGTM: Loss parameters initialization in setUp
The initialization of loss parameters from the config is correctly placed in the setUp method, ensuring fresh configuration for each test case.
100-103
: Verify test compatibility with Hessian mode
The test now correctly passes loss parameters to the model wrapper. However, we should verify that this test works correctly when Hessian mode is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/pt/train/training.py (1)
1228-1231
: Consider improving function name and simplifying logicWhile the function logic is correct, consider these improvements for better readability:
- Rename to
is_hessian_required
orshould_use_hessian
to better reflect its purpose- Simplify to a single return statement
-def whether_hessian(loss_params): - loss_type = loss_params.get("type", "ener") - return loss_type == "ener" and loss_params.get("start_pref_h", 0.0) > 0.0 +def is_hessian_required(loss_params): + return loss_params.get("type", "ener") == "ener" and loss_params.get("start_pref_h", 0.0) > 0.0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/training.py
(5 hunks)deepmd/utils/argcheck.py
(2 hunks)
🧰 Additional context used
📓 Learnings (1)
deepmd/utils/argcheck.py (1)
Learnt from: 1azyking
PR: deepmodeling/deepmd-kit#4169
File: deepmd/utils/argcheck.py:1982-2117
Timestamp: 2024-11-12T05:47:21.643Z
Learning: The `loss_ener_hess` and `loss_ener` functions should remain separate to avoid confusion, despite code duplication.
🔇 Additional comments (4)
deepmd/utils/argcheck.py (2)
2179-2180
: LGTM! Documentation strings are clear and consistent.
The documentation strings for the new Hessian prefactor parameters follow the established pattern and clearly explain their purpose.
2235-2248
: LGTM! Implementation of Hessian prefactor parameters is correct.
The implementation:
- Uses consistent types ([float, int]) with other prefactor parameters
- Sets appropriate default values (0.0) to make Hessian loss opt-in
- Follows the same argument structure as other loss prefactors
deepmd/pt/train/training.py (2)
28-28
: LGTM!
The import of EnergyHessianStdLoss
is correctly placed with other loss imports.
1235-1238
: LGTM!
The Hessian loss integration follows the established pattern and correctly:
- Uses whether_hessian for conditional activation
- Sets required parameters
- Returns appropriate loss type
Signed-off-by: Anchor Yu <[email protected]>
Signed-off-by: Anchor Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
doc/model/train-energy-hessian.md (4)
44-50
: Enhance data format explanation.Consider adding more details about the Hessian matrix structure:
- Example of how the matrix elements are ordered
- Relationship between atomic positions and matrix indices
52-92
: Add practical training examples.Consider adding:
- Example of training from scratch with a complete input.json
- Example of fine-tuning from a pretrained model with specific parameter adjustments
Would you like me to generate these example configurations?
🧰 Tools
🪛 Markdownlint (0.37.0)
84-84: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
100-100
: Fix grammar in conditional statement.Change "If one do freeze" to "If one does freeze"
🧰 Tools
🪛 LanguageTool
[grammar] ~100-~100: Possible subject-verb agreement error.
Context: ...output Hessian predictions. ::: If one do freeze and test a Hessian model using t...(ONE_AGREEMENT)
139-139
: Fix subject-verb agreement.Change "the predicted Hessian for each frame are output" to "the predicted Hessian for each frame is output"
🧰 Tools
🪛 LanguageTool
[uncategorized] ~139-~139: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...d, the predicted Hessian for each frame are output in an additional file in the wor...(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/model/train-energy-hessian.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/model/train-energy-hessian.md
[grammar] ~100-~100: Possible subject-verb agreement error.
Context: ...output Hessian predictions. ::: If one do freeze and test a Hessian model using t...
(ONE_AGREEMENT)
[uncategorized] ~139-~139: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...d, the predicted Hessian for each frame are output in an additional file in the wor...
(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)
🪛 Markdownlint (0.37.0)
doc/model/train-energy-hessian.md
35-35: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
84-84: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
119-119: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
141-141: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (4)
doc/model/train-energy-hessian.md (4)
1-5
: Add frontmatter and TOC entry.
Based on the project's documentation standards, this document needs:
- A frontmatter section
- An entry in the table of contents
Apply this change:
+---
+title: Train with energy Hessian
+---
+
# Fit energy Hessian {{ pytorch_icon }}
14-25
: Update loss type to match implementation.
The loss type should be "ener_hess" for training with Hessian calculations.
Apply this change:
"loss": {
- "type": "ener",
+ "type": "ener_hess",
"start_pref_e": 0.02,
35-42
: Specify language for file structure code block.
Add the text
language identifier to the code block for better formatting.
Apply this change:
-```
+```text
type.raw
🧰 Tools
🪛 Markdownlint (0.37.0)
35-35: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
119-122
: Add language identifiers to code blocks.
Add appropriate language identifiers to the code blocks:
Apply these changes:
-```
+```text
${output_prefix}.e.out ${output_prefix}.e_peratom.out ${output_prefix}.f.out
-```
+```text
${output_prefix}.h.out
-```
+```text
# frame - 0: data_h pred_h (3Na*3Na matrix in row-major order)
Also applies to: 141-143, 147-158
🧰 Tools
🪛 Markdownlint (0.37.0)
119-119: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
deepmd/entrypoints/test.py (3)
306-307
: Consider revising the Hessian data dimension.The Hessian matrix has dimensions 3N×3N (where N is the number of atoms) as it represents second derivatives with respect to atomic coordinates. Setting
ndof=1
might not be the most intuitive choice.Consider one of these alternatives:
- Set
ndof=9
to represent the 9 components per atom pair- Add a comment explaining why
ndof=1
was chosen
357-359
: Consider using named indices for better maintainability.Using a hardcoded index (3) for retrieving the Hessian makes the code fragile to future changes. Consider using named constants or an enum to make the code more maintainable.
+ # Define constants for return value indices + ENERGY_IDX, FORCE_IDX, VIRIAL_IDX, HESSIAN_IDX = 0, 1, 2, 3 if dp.has_hessian: - hessian = ret[3] + hessian = ret[HESSIAN_IDX] hessian = hessian.reshape([numb_test, -1])
545-560
: Improve the Hessian output header description.The header could be more descriptive to help users understand the structure of the Hessian matrix data.
save_txt_file( detail_path.with_suffix(".h.out"), h, - header=f"{system}: data_h pred_h (3Na*3Na matrix in row-major order)", + header=f"{system}: data_h pred_h (3N×3N Hessian matrix in row-major order, where N is the number of atoms)", append=append_detail, )deepmd/pt/train/training.py (3)
1228-1231
: Consider improving the function name and implementation.The function correctly determines if Hessian calculations are required, but could be improved:
- Rename to
is_hessian_required
for better clarity- Add docstring explaining the function's purpose
-def whether_hessian(loss_params): +def is_hessian_required(loss_params): + """Check if Hessian calculations are required based on loss parameters. + + Args: + loss_params: Dictionary containing loss configuration + + Returns: + bool: True if Hessian calculations should be performed + """ loss_type = loss_params.get("type", "ener") return loss_type == "ener" and loss_params.get("start_pref_h", 0.0) > 0.0
1235-1237
: LGTM: Hessian loss handling is correctly implemented.The code properly checks for Hessian requirements and returns the appropriate loss function. Consider adding a comment explaining when Hessian loss is used.
+ # Use Hessian loss when energy type model has Hessian preference enabled if whether_hessian(loss_params): loss_params["starter_learning_rate"] = start_lr return EnergyHessianStdLoss(**loss_params)
Line range hint
1287-1304
: Consider improving the architectural design.The current implementation determines the model type (Hessian vs. non-Hessian) based on loss parameters, which creates tight coupling between model and loss configurations. This could make the system harder to maintain and extend.
Consider these alternatives:
- Introduce a dedicated model type (e.g., "ener_hess")
- Move Hessian configuration to model parameters
- Create a separate configuration section for Hessian settings
Example:
model_params = { "type": "ener_hess", # Explicit model type # ... other parameters }This would:
- Improve separation of concerns
- Make configuration more explicit
- Reduce coupling between components
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/entrypoints/test.py
(8 hunks)deepmd/infer/deep_eval.py
(3 hunks)deepmd/pt/infer/deep_eval.py
(5 hunks)deepmd/pt/train/training.py
(5 hunks)deepmd/utils/argcheck.py
(2 hunks)doc/model/index.rst
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- doc/model/index.rst
- deepmd/pt/infer/deep_eval.py
🧰 Additional context used
📓 Learnings (1)
deepmd/utils/argcheck.py (1)
Learnt from: 1azyking
PR: deepmodeling/deepmd-kit#4169
File: deepmd/utils/argcheck.py:1982-2117
Timestamp: 2024-11-12T05:47:21.643Z
Learning: The `loss_ener_hess` and `loss_ener` functions should remain separate to avoid confusion, despite code duplication.
🪛 Ruff (0.8.2)
deepmd/entrypoints/test.py
619-619: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (7)
deepmd/utils/argcheck.py (2)
2182-2183
: Documentation looks good!
The documentation for the new Hessian parameters follows the established pattern and clearly describes their purpose.
2238-2251
: Implementation looks good, verify usage in loss calculation.
The new Hessian parameters are properly implemented with appropriate defaults and optional flags. The implementation maintains backward compatibility and follows the project's conventions.
Run the following script to verify the usage of these parameters in the loss calculation code:
✅ Verification successful
Parameters are properly integrated into loss calculation
The verification shows that the new Hessian parameters are well integrated into the codebase:
-
In
deepmd/pt/loss/ener.py
:- Parameters are properly documented with clear descriptions
- Used to control Hessian loss computation via
has_h
flag - Correctly interpolated between start and limit values using
pref_h
- Applied to the Hessian loss calculation when Hessian data is available
-
In
deepmd/pt/train/training.py
:whether_hessian()
function properly checks for Hessian activation usingstart_pref_h
-
Test coverage exists in
source/tests/pt/test_loss.py
validating the parameters' usage
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the new Hessian parameters are properly used in loss calculation code
# Test: Search for usage of start_pref_h and limit_pref_h in loss calculation files
# Look for Hessian loss calculation in Python files
rg -A 5 "start_pref_h|limit_pref_h" --type py
# Look for Hessian tensor calculation
ast-grep --pattern 'hessian($_, $_)'
Length of output: 4500
deepmd/infer/deep_eval.py (3)
78-78
: LGTM: Backend mapping for Hessian output is correctly defined.
The addition of "energy_derv_r_derv_r": "hessian"
follows the established pattern and correctly maps the energy second derivative to the Hessian backend implementation.
278-280
: LGTM: Base method for Hessian support check is properly implemented.
The method follows the established pattern of other feature check methods in the class, providing a default implementation that can be overridden by derived classes.
551-554
: LGTM: Property for Hessian support check is properly implemented.
The property correctly delegates to the backend's get_has_hessian()
method, following the established pattern of other similar properties in the class.
deepmd/entrypoints/test.py (1)
589-591
: LGTM: Hessian metrics are correctly added to the return dictionary.
The Hessian MAE and RMSE metrics are properly included in the return dictionary with their corresponding sizes.
deepmd/pt/train/training.py (1)
268-277
: LGTM: Loss parameter initialization correctly handles Hessian activation.
The code properly initializes loss parameters for both single-task and multi-task scenarios, ensuring Hessian activation is correctly configured.
Signed-off-by: Anchor Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
doc/model/train-energy-hessian.md (4)
84-92
: Add language specification to the loss output code block.Specify 'text' as the language for the fenced code block to improve documentation consistency.
🧰 Tools
🪛 Markdownlint (0.37.0)
84-84: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
100-100
: Fix grammar in conditional statement.Change "If one do freeze" to "If one does freeze" to correct the subject-verb agreement.
-If one do freeze and test a Hessian model using the commands: +If one does freeze and test a Hessian model using the commands:🧰 Tools
🪛 LanguageTool
[grammar] ~100-~100: Possible subject-verb agreement error.
Context: ...output Hessian predictions. ::: If one do freeze and test a Hessian model using t...(ONE_AGREEMENT)
119-122
: Add language specification to output format code blocks.Specify 'text' as the language for these fenced code blocks to maintain consistency throughout the documentation.
Also applies to: 141-143, 147-158
🧰 Tools
🪛 Markdownlint (0.37.0)
119-119: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
96-98
: Enhance warning about model freezing limitations.Consider expanding the warning to explain:
- Why freezing is not supported for Hessian models
- What functionality is lost when forcing freezing
- Alternative approaches if freezing is required
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/model/train-energy-hessian.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/model/train-energy-hessian.md
[grammar] ~100-~100: Possible subject-verb agreement error.
Context: ...output Hessian predictions. ::: If one do freeze and test a Hessian model using t...
(ONE_AGREEMENT)
🪛 Markdownlint (0.37.0)
doc/model/train-energy-hessian.md
35-35: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
84-84: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
119-119: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
141-141: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
doc/model/train-energy-hessian.md (3)
1-5
: Add standard frontmatter and TOC entry.
Following the project's documentation standards, please add the frontmatter section and include this document in the table of contents.
14-25
: Update loss type to match implementation.
The loss type should be "ener_hess" instead of "ener" when training with Hessian calculations.
35-42
: Specify language for fenced code block and add example data.
-
Add language specification to the code block.
-
Consider adding a small example snippet showing the actual content format of
hessian.npy
to help users understand the data structure better.
🧰 Tools
🪛 Markdownlint (0.37.0)
35-35: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Signed-off-by: Anchor Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/hessian/single_task/input.json (1)
79-90
:⚠️ Potential issueAdd Hessian-specific loss configuration.
The current loss configuration uses type "ener" without any Hessian-related terms, which is inconsistent with the PR's objective of training with energy Hessian.
Apply this diff to add Hessian-specific parameters:
"loss": { - "type": "ener", + "type": "ener_hess", "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0, "start_pref_h": 10, "limit_pref_h": 1, + "start_pref_hess": 0.1, + "limit_pref_hess": 1, "_comment": " that's all" }
🧹 Nitpick comments (2)
examples/hessian/single_task/input.json (2)
97-97
: Consider increasing the batch size.A batch size of 1 might lead to:
- Unstable training due to high variance in gradients
- Inefficient GPU utilization
- Slower convergence
Consider increasing the batch size to improve training stability and efficiency.
Also applies to: 104-104
94-96
: Use absolute paths or environment variables for data paths.The relative paths to training and validation data might cause issues if the working directory changes. Consider:
- Using absolute paths
- Using environment variables
- Adding a base path configuration option
Also applies to: 101-103
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/hessian/single_task/input.json
(1 hunks)
🔇 Additional comments (3)
examples/hessian/single_task/input.json (3)
11-58
: Update DPA2 configuration as per latest example.
As noted in previous comments, the DPA2 configuration appears to be outdated. Please update it according to example/water/dpa2
, preferably using the small
configuration for faster testing.
Let's verify the latest DPA2 configuration:
#!/bin/bash
# Description: Compare with the latest DPA2 configuration
# Check if the water example exists and compare configurations
fd -t f "input.json" -p "example/water/dpa2"
72-78
: LGTM: Learning rate configuration is well-structured.
The exponential decay setup with appropriate start and stop learning rates should work well for training.
92-92
: Verify Hessian data in statistics file.
Please ensure that the HDF5 statistics file contains the necessary Hessian-related statistics for proper scaling during training.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests