Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor asr_datamodule. #15

Merged
merged 3 commits into from
Aug 21, 2021

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Aug 20, 2021

It throws the following error:

@pzelasko Could you please have a look at it? Thanks

(I am using the latest lhotse, with the commit d24e6faa6f26a5034cebf1d97dc1bd933f285a03)

The refactoring is based on the asr datamodule from the gigaspeech recipe in snowfall.

2021-08-21 00:02:42,414 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-21 00:02:42,552 INFO [decode.py:336] device: cuda:0
2021-08-21 00:02:55,867 INFO [checkpoint.py:75] Loading checkpoint from tdnn_lstm_ctc/exp/epoch-19.pt
/ceph-fj/open-source/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames o
r max_cuts constraints - we'll return it anyway. Consider increasing max_frames/max_cuts.
  warnings.warn(
ERROR:root:Error while extracting the features for cut with ID 8224-274384-0008-1657-0 -- details:
MonoCut(id='8224-274384-0008-1657-0', start=0, duration=13.42, channel=0, supervisions=[SupervisionSegment(id='8224-274384-0008', recording_id='8224-
274384-0008', start=0.0, duration=13.42, channel=0, text='THE GOOD NATURED AUDIENCE IN PITY TO FALLEN MAJESTY SHOWED FOR ONCE GREATER DEFERENCE TO TH
E KING THAN TO THE MINISTER AND SUNG THE PSALM WHICH THE FORMER HAD CALLED FOR', language='English', speaker='8224', gender=None, custom=None, alignm
ent=None)], features=Features(type='fbank', num_frames=1342, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=13.42, storage
_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='a83019d1-3639-47a3-8790-ae2d82cde42e', recording_id=None, ch
annels=0), recording=Recording(id='8224-274384-0008', sources=[AudioSource(type='file', channels=[0], source='data/LibriSpeech/test-clean/8224/274384
/8224-274384-0008.flac')], sampling_rate=16000, num_samples=214720, duration=13.42, transforms=None))
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 102, in _get_item
    return _GLOBAL_DATASET_CACHE[cut_ids]
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/speech_recognition.py", line 105, in __getitem__
    inputs, _ = self.input_strategy(cuts)
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/input_strategies.py", line 244, in __call__
    features = self.extractor.extract(samples, cuts[idx].sampling_rate)
AttributeError: 'PrecomputedFeatures' object has no attribute 'extract'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./tdnn_lstm_ctc/decode.py", line 428, in <module>
    main()
  File "/ceph-fj/fangjun/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "./tdnn_lstm_ctc/decode.py", line 411, in main
    results_dict = decode_dataset(
  File "./tdnn_lstm_ctc/decode.py", line 246, in decode_dataset
    for batch_idx, batch in enumerate(dl):
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 85, in __next__
    return self._retrieve_one()
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 79, in _retrieve_one
    return self._futures.popleft().result()
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
AttributeError: 'PrecomputedFeatures' object has no attribute 'extract'
ERROR:root:Error while extracting the features for cut with ID 61-70970-0038-2373-0 -- details:
MonoCut(id='61-70970-0038-2373-0', start=0, duration=10.4, channel=0, supervisions=[SupervisionSegment(id='61-70970-0038', recording_id='61-70970-003
8', start=0.0, duration=10.4, channel=0, text='THE OLD SERVANT TOLD HIM QUIETLY AS THEY CREPT BACK TO GAMEWELL THAT THIS PASSAGE WAY LED FROM THE HUT
 IN THE PLEASANCE TO SHERWOOD AND THAT GEOFFREY FOR THE TIME WAS HIDING WITH THE OUTLAWS IN THE FOREST', language='English', speaker='61', gender=Non
e, custom=None, alignment=None)], features=Features(type='fbank', num_frames=1040, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, d
uration=10.4, storage_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='f6d871ec-b1e5-4a5b-bd1f-0b95f7972946',
recording_id=None, channels=0), recording=Recording(id='61-70970-0038', sources=[AudioSource(type='file', channels=[0], source='data/LibriSpeech/test
-clean/61/70970/61-70970-0038.flac')], sampling_rate=16000, num_samples=166400, duration=10.4, transforms=None))
ERROR:root:Error while extracting the features for cut with ID 1995-1837-0024-1526-0 -- details:
MonoCut(id='1995-1837-0024-1526-0', start=0, duration=5.385, channel=0, supervisions=[SupervisionSegment(id='1995-1837-0024', recording_id='1995-1837
-0024', start=0.0, duration=5.385, channel=0, text='FOR A WHILE SHE LAY IN HER CHAIR IN HAPPY DREAMY PLEASURE AT SUN AND BIRD AND TREE', language='En
glish', speaker='1995', gender=None, custom=None, alignment=None)], features=Features(type='fbank', num_frames=539, num_features=80, frame_shift=0.01
, sampling_rate=16000, start=0, duration=5.385, storage_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='0811f
f5e-48ed-467d-873c-6a9f742472e0', recording_id=None, channels=0), recording=Recording(id='1995-1837-0024', sources=[AudioSource(type='file', channels
=[0], source='data/LibriSpeech/test-clean/1995/1837/1995-1837-0024.flac')], sampling_rate=16000, num_samples=86160, duration=5.385, transforms=None))
ERROR:root:Error while extracting the features for cut with ID 121-127105-0026-2191-0 -- details:

... ... 

Fbank(FbankConfig(num_mel_bins=80))
Fbank(FbankConfig(num_mel_bins=80), num_workers=4)
if self.args.on_the_fly_feats
else PrecomputedFeatures()
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this closing parenthesis has to be moved 3 lines up, so the code looks like:

input_strategy=OnTheFlyFeatures(
   Fbank(FbankConfig(num_mel_bins=80), num_workers=4)
)  if self.args.on_the_fly_feats
   else PrecomputedFeatures(),
return_cuts=...

Currently when args.on_the_fly_feats = False, it tries to use OnTheFlyFeatures(PrecomputedFeatures()) which is an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
In that case, I think we should also change snowfall to fix that as this block of code is from snowfall.
See
https://github.com/k2-fsa/snowfall/blob/1f79957e9716c3f980c151df5b1d77bc4bb7ce78/egs/gigaspeech/asr/simple_v1/asr_datamodule.py#L337-L344

            test = K2SpeechRecognitionDataset(
                input_strategy=(
                    OnTheFlyFeatures(Fbank(FbankConfig(num_mel_bins=80)), num_workers=8)
                    if self.args.on_the_fly_feats
                    else PrecomputedFeatures()
                ),
                return_cuts=self.args.return_cuts,
            )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you’re right

# persistent_workers=False,
# )

train_dl = LhotseDataLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say be careful with LhotseDataLoader -- it is experimental and I'm hoping to avoid needing to use it in the future. It overcomes some I/O issues with GigaSpeech, but for LibriSpeech you shouldn't see any difference in perf with a regular DataLoader.

The downside of LhotseDataLoader is that it doesn't have the elaborate shutdown mechanisms of PyTorch DataLoader and might leave your script running after the training has finished (i.e., everything runs ok, but the script doesn't exit by itself..).

input_strategy=OnTheFlyFeatures(
Fbank(FbankConfig(num_mel_bins=80))
Fbank(FbankConfig(num_mel_bins=80), num_workers=4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LibriSpeech, remove the num_workers argument from OnTheFlyFeatures -- it will attempt to spawn extra processes that are not needed for LibriSpeech (they help with GigaSpeech which has long OPUS recordings)

@@ -0,0 +1,362 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense -- but maybe it's sufficient to have a single copy of this script one level of directories up, and if any recipe requires non-standard processing, it would make it's own copy at the "current" directory level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting a symlink to other model directories to this file?
I was thinking that each model is as self-contained as possible.
If someone wants to modify this file, he/she can replace the symlink with a copy of this file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense to me

try:
num_batches = len(dl)
except TypeError:
num_batches = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_batches = '?' which will display nicer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and batch_str below won't need an extra if

@csukuangfj csukuangfj changed the title WIP: Refactor asr_datamodule. Refactor asr_datamodule. Aug 21, 2021
@csukuangfj
Copy link
Collaborator Author

I've fixed all the comments. @pzelasko Thanks and please accept the invitation for this repo.


Ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants