Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking: pt: remove data stat from model init #3233

Closed
wants to merge 2 commits into from

Conversation

iProzd
Copy link
Collaborator

@iProzd iProzd commented Feb 5, 2024

This PR:

  • clean up the data stat process from model init.
  • fix import error and skipped uts.

Please note that this code PR is just an initial cleanup and refinement of the data stat, and a more detailed design of the data stat will be completed in the next PR.

@github-actions github-actions bot added the Python label Feb 5, 2024
@amcadmus amcadmus changed the title Remove data stat from model init breaking: pt: remove data stat from model init Feb 5, 2024
Copy link

codecov bot commented Feb 5, 2024

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (f5bb131) 0.00% compared to head (8b943c6) 74.92%.

Files Patch % Lines
deepmd/pt/model/model/model.py 60.37% 21 Missing ⚠️
deepmd/pt/model/descriptor/descriptor.py 50.00% 2 Missing ⚠️
deepmd/pt/model/descriptor/dpa1.py 66.66% 1 Missing ⚠️
deepmd/pt/model/descriptor/dpa2.py 66.66% 1 Missing ⚠️
deepmd/pt/model/descriptor/se_a.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           devel    #3233       +/-   ##
==========================================
+ Coverage       0   74.92%   +74.92%     
==========================================
  Files          0      332      +332     
  Lines          0    28006    +28006     
  Branches       0      908      +908     
==========================================
+ Hits           0    20983    +20983     
- Misses         0     6486     +6486     
- Partials       0      537      +537     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 58 to 74
@classmethod
def get_stat_name(cls, config):
"""Get the name for the statistic file of the descriptor."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_stat_name(config)

@classmethod
def get_data_stat_key(cls, config):
"""Get the keys for the data statistic of the descriptor."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_stat_key(config)

@classmethod
def get_data_process_key(cls, config):
"""Get the keys for the data preprocess."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_process_key(config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a subclass doesn't implement these subclasses, the program will stuck!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding the error message for this case:

Suggested change
@classmethod
def get_stat_name(cls, config):
"""Get the name for the statistic file of the descriptor."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_stat_name(config)
@classmethod
def get_data_stat_key(cls, config):
"""Get the keys for the data statistic of the descriptor."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_stat_key(config)
@classmethod
def get_data_process_key(cls, config):
"""Get the keys for the data preprocess."""
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_process_key(config)
@classmethod
def get_stat_name(cls, config):
"""Get the name for the statistic file of the descriptor."""
if cls is not Descriptor:
raise NotImplementedError("get_stat_name is not implemented!")
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_stat_name(config)
@classmethod
def get_data_stat_key(cls, config):
"""Get the keys for the data statistic of the descriptor."""
if cls is not Descriptor:
raise NotImplementedError("get_data_stat_key is not implemented!")
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_stat_key(config)
@classmethod
def get_data_process_key(cls, config):
"""Get the keys for the data preprocess."""
if cls is not Descriptor:
raise NotImplementedError("get_data_process_key is not implemented!")
descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_data_process_key(config)

Comment on lines +37 to +42
"EnvMat",
"NativeLayer",
"NativeNet",
"EmbeddingNet",
"FittingNet",
"DescrptSeA",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason of providing all the classes here?

descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_stat_name(config)

@classmethod
def get_data_stat_key(cls, config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a good idea to pass a dict at interface. clearly write what does the method need.

@classmethod
def get_data_process_key(cls, config):
"""Get the keys for the data preprocess."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one would not understand the method from such a doc str.

descrpt_type = config["type"]
return Descriptor.__plugins.plugins[descrpt_type].get_stat_name(config)

@classmethod
def get_data_stat_key(cls, config):
"""Get the keys for the data statistic of the descriptor."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one would not understand the method from such a doc str.

if sampled is not None: # compute stat
for sys in sampled:
dict_to_device(sys)
sumr, suma, sumn, sumr2, suma2 = self.descriptor.compute_input_stats(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all the models have the data member self.descriptor, e.g. the pair_table atomic model.
we should mv the method out of the base model.

The same is for fitting. but not that the atomic_model may need the output stat.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a lot of changes on the file. please tell what have you done

iProzd added a commit to iProzd/deepmd-kit that referenced this pull request Feb 7, 2024
Restore deepmodeling#3233 with resolved conflicts and conversations.
@iProzd iProzd closed this Feb 7, 2024
@iProzd
Copy link
Collaborator Author

iProzd commented Feb 7, 2024

See #3245 for further information.

wanghan-iapcm pushed a commit that referenced this pull request Feb 10, 2024
Restore #3233 with resolved conflicts and conversations.

This PR clean up the data stat process from model init.

Please note that this code PR is just an initial cleanup and refinement
of the data stat, and a more detailed design of the data stat will be
completed in the next PR:

- independent data stat from dataloader
- data stat support for hybrid descriptors

---------

Signed-off-by: Duo <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
njzjz pushed a commit to njzjz/deepmd-kit that referenced this pull request Feb 10, 2024
Restore deepmodeling#3233 with resolved conflicts and conversations.

This PR clean up the data stat process from model init.

Please note that this code PR is just an initial cleanup and refinement
of the data stat, and a more detailed design of the data stat will be
completed in the next PR:

- independent data stat from dataloader
- data stat support for hybrid descriptors

---------

Signed-off-by: Duo <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@iProzd iProzd deleted the data_stat_reformat branch April 24, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants