Skip to content

Commit

Permalink
delete test and add common-errors
Browse files Browse the repository at this point in the history
  • Loading branch information
HuangJiameng committed Aug 9, 2022
1 parent 3fed2bb commit ad564d9
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 2 deletions.
31 changes: 31 additions & 0 deletions doc/troubleshooting/common-errors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Common Errors
(Errors are sorted alphabetically)

## FileNotFoundError: [Errno 2] No such file or directory: '.../01.model_devi/graph.xxx.pb'
If you find this error occurs, please check your initial data. Your model will not be generated if the initial data is incorrect.

## json.decoder.JSONDecodeError
Your `.json` file is incorrect. It may be a mistake in syntax or a missing comma.

## RuntimeError: job:xxxxxxx failed 3 times
```
RuntimeError: job:xxxxxxx failed 3 times
......
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==xxxxxx
Debug information: submission_hash==xxxxxx
Please check the dirs and scripts in remote_root. The job information mentioned above may help.
```
If a user finds an error like this, he or she is advised to check the files on the remote server. It shows that your job has failed 3 times, but has not shown the reason.

To find the reason, you can check the log on the remote root. For example, you can check train.log, which is generated by DeePMD-kit. It can tell you more details.
If it doesn't help, you can manually run the `.sub` script, whose path is shown in `Debug information: remote_root==xxxxxx`

Some common reasons are as follows:
1. Two or more jobs are submitted manually or automatically at the same time, and their hash value collide. This bug will be fixed in dpdispatcher.
2. You may have something wrong in your input files, which causes the process to fail.

## RuntimeError: find too many unsuccessfully terminated jobs.
The ratio of failed jobs is larger than ratio_failure. You can set a high value for ratio_failure or check if there is something wrong with your input files.
2 changes: 0 additions & 2 deletions doc/troubleshooting/test.md

This file was deleted.

0 comments on commit ad564d9

Please sign in to comment.