Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#Minor Fixes final checkpoint save for wsj0-mix DeepClustering example. #398

Merged
merged 2 commits into from
Feb 6, 2021

Conversation

ilyakava
Copy link
Contributor

@ilyakava ilyakava commented Dec 28, 2020

Makes wsj0-mix/DeepClustering final save like wham/DPTNet.

PyTorch Lightning's pytorch_lightning.callbacks.ModelCheckpoint assumes dirpath + filename not a dirpath as its first argument, but line 83 assumes a checkpoints folder exists.

So when this example runs this error occurs:

Epoch 132: 100%|█████████▉| 666/667 [04:11<00:00,  2.65it/s, Epoch 132, step 70888: val_loss was not in top 5
Epoch 132: 100%|██████████| 667/667 [04:11<00:00,  2.65it/s, loss=2.07e+03, v_num=0]
Traceback (most recent call last):
  File "train.py", line 202, in <module>
    main(arg_dic)
  File "train.py", line 83, in main
    torch.save(system.model.state_dict(), os.path.join(exp_dir, "checkpoints/final.pth"))
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 369, in save
    with _open_file_like(f, 'wb') as opened_file:
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'exp/train_chimera_2sep_8kmin_be1bb623/checkpoints/final.pth'

since the directory structure looks like:

# ls exp/train_chimera_2sep_8kmin_be1bb623
best_k_models.json   checkpoints-v1.ckpt  checkpoints-v3.ckpt  conf.yml        run_uuid.txt
checkpoints-v0.ckpt  checkpoints-v2.ckpt  checkpoints.ckpt     lightning_logs

I'm using pytorch_lightning version '1.1.2' with pytorch 1.8.0a0+1606899.

@ilyakava ilyakava force-pushed the wsj0-dpcl-checkpoint branch from 98c298b to 38db9f9 Compare December 28, 2020 16:29
@jonashaag
Copy link
Collaborator

Does this apply to other recipes as well?

@ilyakava ilyakava force-pushed the wsj0-dpcl-checkpoint branch from 38db9f9 to 4b0068d Compare December 28, 2020 17:19
@ilyakava
Copy link
Contributor Author

Does this apply to other recipes as well?

@jonashaag I compared to wham/DPTNet and made the behavior here like there, which is that a checkpoints folder is not assumed to exist.

But the lines:

checkpoint_dir = os.path.join(exp_dir, "checkpoints/")
    checkpoint = ModelCheckpoint(
        checkpoint_dir, monitor="val_loss", mode="min", save_top_k=5, verbose=True
    )

are present in other egs, and what happens (with pytorch_lightning version '1.1.2' anyway) is that checkpoint_dir is treated like dirpath + filename and the slash is ignored. So a little surprising, but no crash.

@@ -80,7 +80,7 @@ def main(conf):
with open(os.path.join(exp_dir, "best_k_models.json"), "w") as f:
json.dump(best_k, f, indent=0)
# Save last model for convenience
torch.save(system.model.state_dict(), os.path.join(exp_dir, "checkpoints/final.pth"))
torch.save(system.model.state_dict(), os.path.join(exp_dir, "best_model.pth"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change in the kinect eg as well?

@jonashaag
Copy link
Collaborator

Yeah that behaviour changed in PL 1. Would you mind sending a fix for all the egs to use dirpath instead of filepath?

@mpariente
Copy link
Collaborator

Thanks a lot for the PR @ilyakava, let's fix this!

Do we have to fix the model loading in the eval file as well?

@mpariente
Copy link
Collaborator

Hey @ilyakava, do you have updates about this?

@mpariente mpariente merged commit 6ca575b into asteroid-team:master Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants