feat: Support stripped type embedding in DPA1 of PT/DP #3712

iProzd · 2024-04-25T13:59:14Z

This PR supports stripped type embedding in DPA1 of PT/DP:

Remove stripped_type_embedding params in all classes and use tebd_input_mode == "strip" instead.
Add stripped type embedding inplementation for DPA1 of PT/DP.
Add serialize and deserialize for stripped type embedding.

Note:

Old TF inplementation has not consistent behaivior when type_one_side==True and tebd_input_mode == "strip", it always uses two_side type stripped embeddings input, which is also inconsistent with DescrptSeAEbdV2 in TF (but the training still works and only raise NotImplementedError when doing serialization now) may need support from @nahso .
Old TF inplementation init_variables will not init idt weights from graph for two_side_embeeding_net_variables (fixed), I'm surprised that no ut failed before (maybe all tests use resnet_dt == False).
The TF implementation of DescrptSeAtten does not support serialization when tebd_input_mode == "strip". This limitation arises because the shape of type_embedding cannot be determined after init, as it is decided at runtime. While the consistent version DescrptDPA1Compat is compatible with this configuration.

Summary by CodeRabbit

New Features
- Enhanced model flexibility with new type embedding input modes: concat and strip.
Bug Fixes
- Improved model compression logic alignment with new type embedding modes for more efficient operations.
Documentation
- Updated documentation to explain the impact of new type embedding input modes on model descriptors.
Tests
- Adjusted test cases to reflect changes in type embedding input modes for robust testing.

coderabbitai · 2024-04-25T13:59:38Z

Walkthrough

The recent changes focus on updating the tebd_input_mode attribute to support new modes concat and strip across various files. This enhancement impacts how type embeddings are processed and integrated within the models, offering greater flexibility and potentially improving model performance. The logic for model compression has also been revised to depend on this new attribute, simplifying the configuration and utilization of type embeddings.

Changes

File Path	Change Summary
`deepmd/dpmodel/descriptor/dpa1.py`	Updated `tebd_input_mode` to support new modes `concat` and `strip`, adjusting input dimensions and embedding networks. Added new attributes and methods related to handling `tebd_input_mode`.
`deepmd/pt/model/descriptor/se_atten.py` `deepmd/utils/argcheck.py`	Introduced `tebd_input_mode` with modes `concat` and `strip`, replacing `stripped_type_embedding`. Updated model compression logic to depend on `tebd_input_mode`. Added detailed documentation for `tebd_input_mode`.
`deepmd/tf/descriptor/se_a.py`	Replaced `stripped_type_embedding` with `tebd_input_mode` set to "concat" in the `__init__` method. Adjusted logic to handle the new `tebd_input_mode` parameter.
`deepmd/utils/argcheck.py`	Changed logic related to model compression by updating the condition for enabling compression based on `tebd_input_mode` instead of `attn_layer`. Updated documentation to reflect changes.
`doc/model/train-se-atten.md`	Modified the parameter `stripped_type_embedding` to `tebd_input_mode` with a value change from `true` to `"strip"` in the attention-based descriptor `"se_atten_v2"`.
`source/tests/tf/test_model_compression_se_atten.py`	Replaced `stripped_type_embedding` with `tebd_input_mode` set to `"strip"` in the `jdata["model"]["descriptor"]` dictionary for both `_init_models()` and `_init_models_exclude_types()` functions.

Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3a98a65 and b32c4f3.

Files selected for processing (4)

deepmd/dpmodel/descriptor/dpa1.py (9 hunks)
deepmd/pt/model/descriptor/dpa1.py (8 hunks)
deepmd/tf/descriptor/se_atten.py (19 hunks)
deepmd/utils/argcheck.py (4 hunks)

Files skipped from review as they are similar to previous changes (2)

deepmd/pt/model/descriptor/dpa1.py
deepmd/utils/argcheck.py

Additional comments not posted (7)

deepmd/dpmodel/descriptor/dpa1.py (2)

138-140: The documentation for tebd_input_mode is clear and aligns with the PR objectives to handle type embedding configurations effectively.

432-442: The method cal_g_strip is well-implemented and aligns with the new functionality for stripped type embedding. Ensure that it is properly integrated and tested within the system.
deepmd/tf/descriptor/se_atten.py (5)
147-151: Ensure default handling for tebd_input_mode is appropriate.

1833-1837: Handle deserialization limitations for tebd_input_mode='strip'.

1867-1874: Clarify serialization support for tebd_input_mode='strip'.

261-261: Verify the initialization of tebd_input_mode.

Verification successful

The review comment requested verification of the initialization of tebd_input_mode in the codebase. The provided script output shows numerous instances where tebd_input_mode is initialized or used across various files in both Python and potentially other programming languages (like in test files). The usage includes setting this variable directly, checking its value, and using it in conditional statements.

Observations:

Multiple Initializations: The variable tebd_input_mode is initialized in various ways across the codebase, often set to either "strip" or "concat".

Consistent Usage: The usage of tebd_input_mode seems consistent in terms of the values it is set to ("strip" or "concat").

Documentation and Checks: There are comments and checks related to tebd_input_mode that suggest its critical role in the functionality of the models, particularly in how different modes ("strip" or "concat") affect the model's behavior.

Conclusion:

The initialization and usage of tebd_input_mode are consistent across the codebase, with it being set and checked in a manner that aligns with its intended functionality. The variable is used extensively in both setting parameters and in conditional logic that alters the behavior of the code depending on its value.

Given the extensive and consistent references to tebd_input_mode throughout the codebase, as shown in the script output, the initialization of this variable as seen in the snippet from deepmd/tf/descriptor/se_atten.py is verified to be in line with its usage elsewhere in the codebase.

Final Response:

The initialization and usage of tebd_input_mode are consistent and correctly implemented across the codebase as per the findings from the script output.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the initialization of `tebd_input_mode` across the codebase.

# Test: Search for the initialization of `tebd_input_mode`. Expect: Consistent initialization across the codebase.
rg --type py 'tebd_input_mode'
Length of output: 10998
1759-1804: Ensure robust deserialization for stripped type embedding.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

deepmd/tf/utils/graph.py

coderabbitai

Actionable comments posted: 10

deepmd/tf/utils/graph.py

deepmd/utils/argcheck.py

deepmd/pt/model/descriptor/se_atten.py

deepmd/tf/descriptor/se_a.py

deepmd/tf/descriptor/se_atten.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

codecov · 2024-04-25T14:33:24Z

Codecov Report

Attention: Patch coverage is 94.28571% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 82.23%. Comparing base (2be5f0f) to head (b32c4f3).

Files	Patch %	Lines
deepmd/tf/descriptor/se_atten.py	89.65%	6 Missing ⚠️
deepmd/dpmodel/descriptor/dpa1.py	97.22%	1 Missing ⚠️
deepmd/pt/model/descriptor/dpa1.py	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #3712      +/-   ##
==========================================
+ Coverage   82.19%   82.23%   +0.03%     
==========================================
  Files         513      513              
  Lines       47642    47745     +103     
  Branches     2979     2979              
==========================================
+ Hits        39159    39261     +102     
- Misses       7572     7573       +1     
  Partials      911      911

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 6

deepmd/tf/descriptor/se_atten.py

deepmd/utils/argcheck.py

examples/water/se_atten_dpa1_compat/input.json

deepmd/dpmodel/descriptor/dpa1.py

Signed-off-by: Duo <[email protected]>

coderabbitai

Actionable comments posted: 7

Out of diff range and nitpick comments (4)

doc/model/train-se-atten.md (3)

Line range hint 91-101: Replace hard tabs with spaces for consistency in JSON formatting.

-	"descriptor" :{
-          "type":		"se_atten",
-          "rcut_smth":	0.50,
-          "rcut":		6.00,
-          "sel":		120,
-          "neuron":		[25, 50, 100],
-          "axis_neuron":	16,
-          "resnet_dt":	false,
-          "attn":	128,
-          "attn_layer":	2,
-          "attn_mask":	false,
-          "attn_dotr":	true,
-          "seed":	1
+  "descriptor" :{
+    "type": "se_atten",
+    "rcut_smth": 0.50,
+    "rcut": 6.00,
+    "sel": 120,
+    "neuron": [25, 50, 100],
+    "axis_neuron": 16,
+    "resnet_dt": false,
+    "attn": 128,
+    "attn_layer": 2,
+    "attn_mask": false,
+    "attn_dotr": true,
+    "seed": 1

Line range hint 124-136: Replace hard tabs with spaces for consistency in JSON formatting.

-	"descriptor" :{
-          "type":		"dpa1",
-          "rcut_smth":	0.50,
-          "rcut":		6.00,
-          "sel":		120,
-          "neuron":		[25, 50, 100],
-          "tebd_dim": 8,
-          "axis_neuron":	16,
-          "attn":	128,
-          "attn_layer":	2,
-          "attn_mask": false,
-          "attn_dotr": true,
+  "descriptor" :{
+    "type": "dpa1",
+    "rcut_smth": 0.50,
+    "rcut": 6.00,
+    "sel": 120,
+    "neuron": [25, 50, 100],
+    "tebd_dim": 8,
+    "axis_neuron": 16,
+    "attn": 128,
+    "attn_layer": 2,
+    "attn_mask": false,
+    "attn_dotr": true,

Line range hint 224-224: Correct the spelling of "placeholder".

- Atom type indexes (place holder)
+ Atom type indexes (placeholder)

deepmd/pt/model/descriptor/dpa1.py (1)

Line range hint 371-416: Ensure serialization handles all configurations.

The serialization method currently handles the 'strip' mode specifically. Ensure that all configurations, especially new or modified ones, are correctly serialized and deserialized to maintain state consistency across sessions.

deepmd/tf/descriptor/se_a.py

deepmd/dpmodel/descriptor/dpa1.py

doc/model/train-se-atten.md

deepmd/pt/model/descriptor/dpa1.py

coderabbitai

Actionable comments posted: 3

Out of diff range and nitpick comments (3)

deepmd/dpmodel/descriptor/dpa1.py (1)

297-301: Clarify the logic for setting embd_input_dim based on tebd_input_mode. The current implementation might be confusing.

Consider adding comments or refactoring for clarity.

deepmd/utils/argcheck.py (2)

Line range hint 427-509: The documentation for doc_attn_layer in the descrpt_se_atten_common_args function is misleading. It suggests that model compression is only enabled when attn_layer is 0 and tebd_input_mode is 'strip', which might not be accurate or might require further clarification to ensure it aligns with the actual functionality.

507-509: The parameter stripped_type_embedding is marked as deprecated. It's good practice to provide a timeline or conditions for its removal to avoid maintaining outdated code. Consider adding a deprecation warning in the code if it's still being used.

deepmd/dpmodel/descriptor/dpa1.py

deepmd/utils/argcheck.py

coderabbitai

Actionable comments posted: 6

Out of diff range and nitpick comments (3)

deepmd/dpmodel/descriptor/dpa1.py (1)
[!TIP]
Codebase Verification

The review comment's concern about ensuring the serialization format's compatibility with existing systems, particularly with the handling of tebd_input_mode, is justified based on the findings from the executed script. The script output reveals several instances across the codebase where serialization is explicitly unsupported or restricted when tebd_input_mode is set to "strip". This indicates a potential compatibility issue that needs to be addressed to ensure robust serialization and deserialization processes.

Action Points:

The development team should review the restrictions and unsupported scenarios related to tebd_input_mode being "strip" to ensure broader compatibility or provide clear documentation on these limitations.

It may be beneficial to implement a more flexible or compatible serialization method for cases where tebd_input_mode is "strip".

Locations to review:

deepmd/tf/descriptor/se_a.py

deepmd/tf/descriptor/se_atten.py

deepmd/pt/model/descriptor/dpa1.py

deepmd/dpmodel/descriptor/dpa1.py

Analysis chain

Line range hint 583-628: The serialization method correctly handles the new tebd_input_mode and conditionally includes embeddings_strip. Ensure that the serialization format is compatible with existing systems and that deserialization is properly handled.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the serialization format compatibility.

# Test: Search for the usage of serialization related to `tebd_input_mode`. Expect: Compatibility with existing systems.
rg --type py 'serialize' -C 10 | rg 'tebd_input_mode'
Length of output: 2103
deepmd/tf/descriptor/se_atten.py (2)

Line range hint 193-202: Consider refactoring the constructor to simplify parameter handling.

The constructor of DescrptSeAtten is quite complex with many parameters and conditional logic. Simplifying this by breaking down into smaller methods or using a configuration object might improve readability and maintainability.

1945-1965: Review the extended serialization logic for DescrptDPA1Compat.

The method serialize in DescrptDPA1Compat extends the serialization logic specifically for the 'strip' mode. While this is functional, the method is quite complex and could benefit from further comments or refactoring to improve clarity.

deepmd/dpmodel/descriptor/dpa1.py

deepmd/tf/descriptor/se_atten.py

@nahso

…3712) This PR supports stripped type embedding in DPA1 of PT/DP: - Remove `stripped_type_embedding` params in all classes and use `tebd_input_mode` == "strip" instead. - Add stripped type embedding inplementation for DPA1 of PT/DP. - Add serialize and deserialize for stripped type embedding. Note: - Old TF inplementation has not consistent behaivior when `type_one_side`==True and `tebd_input_mode` == "strip", it always uses two_side type stripped embeddings input, which is also inconsistent with `DescrptSeAEbdV2` in TF (but the training still works and only raise `NotImplementedError` when doing serialization now) may need support from @nahso . - Old TF inplementation `init_variables` will not init `idt` weights from graph for `two_side_embeeding_net_variables` (fixed), I'm surprised that no ut failed before (maybe all tests use `resnet_dt` == False). - The TF implementation of `DescrptSeAtten` does not support serialization when `tebd_input_mode` == "strip". This limitation arises because the shape of `type_embedding` cannot be determined after init, as it is decided at runtime. While the consistent version `DescrptDPA1Compat` is compatible with this configuration.  ## Summary by CodeRabbit - **New Features** - Enhanced model flexibility with new type embedding input modes: `concat` and `strip`. - **Bug Fixes** - Improved model compression logic alignment with new type embedding modes for more efficient operations. - **Documentation** - Updated documentation to explain the impact of new type embedding input modes on model descriptors. - **Tests** - Adjusted test cases to reflect changes in type embedding input modes for robust testing.  --------- Signed-off-by: Duo <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

iProzd added 2 commits April 25, 2024 21:56

feat: Support stripped_type_embedding in PT/DP

7d82945

Update train-se-atten.md

a230198

github-actions bot added Python Docs Examples labels Apr 25, 2024

iProzd changed the title ~~feat: Support stripped_type_embedding in PT/DP~~ feat: Support stripped type embedding in PT/DP Apr 25, 2024

iProzd changed the title ~~feat: Support stripped type embedding in PT/DP~~ feat: Support stripped type embedding in DPA1 of PT/DP Apr 25, 2024

github-advanced-security bot found potential problems Apr 25, 2024

View reviewed changes

deepmd/tf/utils/graph.py Fixed Show fixed Hide fixed

coderabbitai bot reviewed Apr 25, 2024

View reviewed changes

iProzd requested review from njzjz, nahso and wanghan-iapcm April 25, 2024 14:08

iProzd and others added 8 commits April 25, 2024 22:13

Update graph.py

5157781

Update deepmd/utils/argcheck.py

f780d58

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Update deepmd/pt/model/descriptor/se_atten.py

3b3d25e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Update deepmd/tf/descriptor/se_a.py

cf841f2

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Update deepmd/tf/descriptor/se_a.py

a9e24d9

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Update deepmd/tf/descriptor/se_atten.py

764cab7

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Update deepmd/tf/descriptor/se_atten.py

0b9cea1

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Duo <[email protected]>

Merge branch 'devel' into add_strip_dpa1

30a594a

Update docs

1e86b75

coderabbitai bot reviewed Apr 25, 2024

View reviewed changes

njzjz requested changes Apr 25, 2024

View reviewed changes

deepmd/utils/argcheck.py Outdated Show resolved Hide resolved

examples/water/se_atten_dpa1_compat/input.json Outdated Show resolved Hide resolved

wanghan-iapcm reviewed Apr 26, 2024

View reviewed changes

deepmd/dpmodel/descriptor/dpa1.py Outdated Show resolved Hide resolved

resolve conversations

f3056ee

nahso approved these changes Apr 26, 2024

View reviewed changes

wanghan-iapcm approved these changes Apr 27, 2024

View reviewed changes

Update argcheck.py

fd8a744

iProzd added 2 commits April 29, 2024 00:21

Merge branch 'devel' into add_strip_dpa1

92f42c4

Update argcheck.py

1da8936

njzjz linked an issue May 2, 2024 that may be closed by this pull request

Implement additional tebd_input_mode options in se_atten #3703

Closed

Merge branch 'devel' into add_strip_dpa1

9334c0a

Signed-off-by: Duo <[email protected]>

coderabbitai bot reviewed May 6, 2024

View reviewed changes

move stripped_type_embedding argcheck into class

3a98a65

coderabbitai bot reviewed May 6, 2024

View reviewed changes

deepmd/dpmodel/descriptor/dpa1.py Show resolved Hide resolved

deepmd/dpmodel/descriptor/dpa1.py Outdated Show resolved Hide resolved

deepmodeling deleted a comment from coderabbitai bot May 6, 2024

iProzd requested a review from njzjz May 6, 2024 06:33

njzjz reviewed May 6, 2024

View reviewed changes

deepmd/utils/argcheck.py Outdated Show resolved Hide resolved

update the docstr

b32c4f3

iProzd requested a review from njzjz May 6, 2024 06:46

coderabbitai bot reviewed May 6, 2024

View reviewed changes

njzjz approved these changes May 6, 2024

View reviewed changes

iProzd added this pull request to the merge queue May 6, 2024

Merged via the queue into deepmodeling:devel with commit d48f84a May 6, 2024
60 checks passed

iProzd deleted the add_strip_dpa1 branch May 6, 2024 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support stripped type embedding in DPA1 of PT/DP #3712

feat: Support stripped type embedding in DPA1 of PT/DP #3712

iProzd commented Apr 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

Walkthrough

Changes

Observations:

Conclusion:

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Apr 25, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

feat: Support stripped type embedding in DPA1 of PT/DP #3712

feat: Support stripped type embedding in DPA1 of PT/DP #3712

Conversation

iProzd commented Apr 25, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Apr 25, 2024 • edited Loading

Walkthrough

Changes

Observations:

Conclusion:

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 25, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

iProzd commented Apr 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 25, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

codecov bot commented Apr 25, 2024 •

edited

Loading