-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tarred audio support in ASR data layer (#602)
* Initial draft of WebDataset integration for reading tarred audio datasets Signed-off-by: Jocelyn Huang <[email protected]> * WebDataset integration bugfixes Signed-off-by: Jocelyn Huang <[email protected]> * WebDataset integration: add batch_size and num_workers options, fix collate_fn for non-distributed Signed-off-by: Jocelyn Huang <[email protected]> * Add wider collate_fn support in actions.py for DataLayers w/ Datasets Signed-off-by: Jocelyn Huang <[email protected]> * Don't create distributed sampler if provided dataset is an IterableDataset Signed-off-by: Jocelyn Huang <[email protected]> * Adding torch.distributed multiprocessing support to TarredAudioToTextDataLayer (prevent duplicate samples) Signed-off-by: Jocelyn Huang <[email protected]> * Add filter (pipe) for when WebDataset tries to retrieve the entry for an already-filtered-out sample. Signed-off-by: Jocelyn Huang <[email protected]> * Add script to convert non-tarred ASR datasets to tarred ones compatible with TarredAudioToTextDataLayer. Signed-off-by: Jocelyn Huang <[email protected]> * Add leftover files to last shard in dataset conversion script Signed-off-by: Jocelyn Huang <[email protected]> * Fix for docstring of TarredAudioToTextDataLayer Signed-off-by: Jocelyn Huang <[email protected]> * Added changelog entry and fixed imports Signed-off-by: Jocelyn Huang <[email protected]> * Removed unused imports Signed-off-by: Jocelyn Huang <[email protected]> * Add unit tests for tarred data loading Signed-off-by: Jocelyn Huang <[email protected]> * Add more arguments to docstring, add tarfile requirement for conversion script Signed-off-by: Jocelyn Huang <[email protected]> * Remove tarfile from requirements--already in standard library. Signed-off-by: Jocelyn Huang <[email protected]>
- Loading branch information
1 parent
8025d3d
commit d219483
Showing
10 changed files
with
515 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.