Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No transcriptions to clean up #28

Open
zx1292982431 opened this issue Dec 12, 2023 · 3 comments
Open

No transcriptions to clean up #28

zx1292982431 opened this issue Dec 12, 2023 · 3 comments

Comments

@zx1292982431
Copy link

Hello!
When I execute $ make WSJ_DIR=/path/to/wsj SMS_WSJ_DIR=/path/to/write/db/to,return me a AssertionError: No transcriptions to clean up. error from sms_wsj/database/wsj/create_json.py.
How to fix it?

@boeddeker
Copy link
Member

Hello, could you provide the command that you executed and the full terminal output?

Have you changed /path/to/wsj to the path of WSJ on your system?

@zx1292982431
Copy link
Author

Sorry for late reply!
After setting the Kaldi path, I use make WSJ_DIR=/data/lzx/wsj0 SMS_WSJ_DIR=/data/lzx/Datasets/SMS_WSJ to generate sms_wsj dataset, but I got a error:

creating /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json
python -m sms_wsj.database.wsj.create_json \
with json_path=/data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json database_dir=/data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean as_wav=True
WARNING - Create wsj json - No observers have been added to this run
INFO - Create wsj json - Running command 'create_database'
INFO - Create wsj json - Started
ERROR - Create wsj json - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 293, in create_database
    transcriptions = get_transcriptions(database_dir, database_dir)
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 170, in get_transcriptions
    data_dict["clean word"] = normalize_transcription(word, wsj_root)
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 186, in normalize_transcription
    assert len(transcriptions) > 0, 'No transcriptions to clean up.'
AssertionError: No transcriptions to clean up.

make: *** [Makefile:32: /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json] Error 1

May I ask if you have encountered the similar problem and how to fix it?

@boeddeker
Copy link
Member

This error means, the code was not able to find the transcriptions.
I guess, the code was not able to find the *.dot and *.pth files in /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.

Could you execute the following commands and report the output:

find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.dot" | wc -l
find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.ptx" | wc -l
find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.wav" | wc -l

I got the following output:

/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.dot" | wc -l
3585
/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.ptx" | wc -l
3547
/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.wav" | wc -l
129106

Maybe something went wrong, when creating the wsj_8k_zeromean folder.

I guess the /data/lzx/wsj0 folder contains only the WSJ0 files. If that is correct, you have to delete all generated files and change the call to specify the WSJ0 and WSJ1 folder, e.g. make WSJ0_DIR=/data/lzx/wsj0 WSJ1_DIR=/data/lzx/wsj1 SMS_WSJ_DIR=/data/lzx/Datasets/SMS_WSJ (I assumed, the WSJ1 files are in /data/lzx/wsj1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants