-
Notifications
You must be signed in to change notification settings - Fork 176
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3fed2bb
commit ad564d9
Showing
2 changed files
with
31 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Common Errors | ||
(Errors are sorted alphabetically) | ||
|
||
## FileNotFoundError: [Errno 2] No such file or directory: '.../01.model_devi/graph.xxx.pb' | ||
If you find this error occurs, please check your initial data. Your model will not be generated if the initial data is incorrect. | ||
|
||
## json.decoder.JSONDecodeError | ||
Your `.json` file is incorrect. It may be a mistake in syntax or a missing comma. | ||
|
||
## RuntimeError: job:xxxxxxx failed 3 times | ||
``` | ||
RuntimeError: job:xxxxxxx failed 3 times | ||
...... | ||
RuntimeError: Meet errors will handle unexpected submission state. | ||
Debug information: remote_root==xxxxxx | ||
Debug information: submission_hash==xxxxxx | ||
Please check the dirs and scripts in remote_root. The job information mentioned above may help. | ||
``` | ||
If a user finds an error like this, he or she is advised to check the files on the remote server. It shows that your job has failed 3 times, but has not shown the reason. | ||
|
||
To find the reason, you can check the log on the remote root. For example, you can check train.log, which is generated by DeePMD-kit. It can tell you more details. | ||
If it doesn't help, you can manually run the `.sub` script, whose path is shown in `Debug information: remote_root==xxxxxx` | ||
|
||
Some common reasons are as follows: | ||
1. Two or more jobs are submitted manually or automatically at the same time, and their hash value collide. This bug will be fixed in dpdispatcher. | ||
2. You may have something wrong in your input files, which causes the process to fail. | ||
|
||
## RuntimeError: find too many unsuccessfully terminated jobs. | ||
The ratio of failed jobs is larger than ratio_failure. You can set a high value for ratio_failure or check if there is something wrong with your input files. |
This file was deleted.
Oops, something went wrong.