Update dependency torchaudio to v0.13.1 #45
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.11.0
->==0.13.1
Release Notes
pytorch/audio (torchaudio)
v0.13.1
: TorchAudio 0.13.1 Release NoteCompare Source
This is a minor release, which is compatible with PyTorch 1.13.1 and includes bug fixes, improvements and documentation updates. There is no new feature added.
Bug Fix
IO
Model
Recipe
v0.13.0
: torchaudio 0.13.0 Release NoteCompare Source
Highlights
TorchAudio 0.13.0 release includes:
[Beta] Source Separation Models and Bundles
Hybrid Demucs is a music source separation model that uses both spectrogram and time domain features. It has demonstrated state-of-the-art performance in the Sony Music DeMixing Challenge. (citation: https://arxiv.org/abs/2111.03600)
The TorchAudio v0.13 release includes the following features
SDR Results of pre-trained pipelines on MUSDB-HQ test set
* Trained on the training data of MUSDB-HQ dataset.
** Trained on both training and test sets of MUSDB-HQ and 150 extra songs from an internal database that were specifically produced for Meta.
Special thanks to @adefossez for the guidance.
ConvTasNet model architecture was added in TorchAudio 0.7.0. It is the first source separation model that outperforms the oracle ideal ratio mask. In this release, TorchAudio adds the pre-trained pipeline that is trained within TorchAudio on the Libri2Mix dataset. The pipeline achieves 15.6dB SDR improvement and 15.3dB Si-SNR improvement on the Libri2Mix test set.
[Beta] Datasets and Metadata Mode for SUPERB Benchmarks
With the addition of four new audio-related datasets, there is now support for all downstream tasks in version 1 of the SUPERB benchmark. Furthermore, these datasets support metadata mode through a
get_metadata
function, which enables faster dataset iteration or preprocessing without the need to load or store waveforms.Datasets with metadata functionality:
[Beta] Custom Language Model support in CTC Beam Search Decoding
In release 0.12, TorchAudio released a CTC beam search decoder with KenLM language model support. This release, there is added functionality for creating custom Python language models that are compatible with the decoder, using the
torchaudio.models.decoder.CTCDecoderLM
wrapper.[Beta] StreamWriter
torchaudio.io.StreamWriter
is a class for encoding media including audio and video. This can handle a wide variety of codecs, chunk-by-chunk encoding and GPU encoding.Backward-incompatible changes
The
GriffinLim
implementations in transforms and functional used themomentum
parameter differently, resulting in inconsistent results between the two implementations. Thetransforms.GriffinLim
usage ofmomentum
is updated to resolve this discrepancy.torchaudio.info
decode audio to computenum_frames
if it is not found in metadata (#2740).In such cases,
torchaudio.info
may now return non-zero values fornum_frames
.Bug Fixes
torchaudio.compliance.kaldi.fbank
with dither option produced a different output from kaldi because it used a skewed, rather than gaussian, distribution for dither. This is updated in this release to correctly use a random gaussian instead.The previous download link for SpeechCommands v2 did not include data for the valid and test sets, resulting in errors when trying to use those subsets. Update the download link to correctly download the whole dataset.
New Features
IO
Ops
Models
Pipelines
Datasets
Improvements
IO
runtime_error
exception withTORCH_CHECK
(#2550, #2551, #2592)Ops
The kernel generation for resampling is optimized in this release. The following table illustrates the performance improvements from the previous release for the
torchaudio.functional.resample
function using the sinc resampling method, onfloat32
tensor with two channels and one second duration.CPU
CUDA
Models
Datasets
Tutorials
Recipes
WER improvement on LibriSpeech dev and test sets
Documentation
Examples
Other
:autosummary:
in torchaudio docs (#2664, #2681, #2683, #2684, #2693, #2689, #2690, #2692)Build/CI
v0.12.1
: torchaudio 0.12.1 Release NoteCompare Source
This is a minor release, which is compatible with PyTorch 1.12.1 and include small bug fixes, improvements and documentation update. There is no new feature added.
Bug Fix
Improvement
For the full feature of v0.12, please refer to the v0.12.0 release note.
v0.12.0
Compare Source
TorchAudio 0.12.0 Release Notes
Highlights
TorchAudio 0.12.0 includes the following:
[Beta] CTC beam search decoder
To support inference-time decoding, the release adds the wav2letter CTC beam search decoder, ported over from Flashlight (GitHub). Both lexicon and lexicon-free decoding are supported, and decoding can be done without a language model or with a KenLM n-gram language model. Compatible token, lexicon, and certain pretrained KenLM files for the LibriSpeech dataset are also available for download.
For usage details, please check out the documentation and ASR inference tutorial.
[Beta] New beamforming modules and methods
To improve flexibility in usage, the release adds two new beamforming modules under
torchaudio.transforms
: SoudenMVDR and RTFMVDR. They differ from MVDR mainly in that they:reference_channel
as an input argument in the forward method to allow users to select the reference channel in model training or dynamically change the reference channel in inference.Besides the two modules, the release adds new function-level beamforming methods under
torchaudio.functional
. These includeFor usage details, please check out the documentation at torchaudio.transforms and torchaudio.functional and the Speech Enhancement with MVDR Beamforming tutorial.
[Beta] Streaming API
StreamReader
is TorchAudio’s new I/O API. It is backed by FFmpeg† and allows users toFor usage details, please check out the documentation and tutorials:
† To use
StreamReader
, FFmpeg libraries are required. Please install FFmpeg. The coverage of codecs depends on how these libraries are configured. TorchAudio official binaries are compiled to work with FFmpeg 4 libraries; FFmpeg 5 can be used if TorchAudio is built from source.Backwards-incompatible changes
I/O
torchaudio.load
, please install a compatible version of FFmpeg (Version 4 when using an official binary distribution).torchaudio.info
now returnsnum_frames=0
for MP3.Models
Hypothesis
subclassednamedtuple
. Containers ofnamedtuple
instances, however, are incompatible with the PyTorch Lite Interpreter. To achieve compatibility,Hypothesis
has been modified in release 0.12 to instead aliastuple
. This affectsRNNTBeamSearch
as it accepts and returns a list ofHypothesis
instances.Bug Fixes
Ops
complex128
to improve the precision and robustness of downstream matrix computations. The output dtype, however, is not correctly converted back to the original dtype. In release 0.12, we fix the output dtype to be consistent with the original input dtype.Build
New Features
I/O
Ops
Datasets
Improvements
I/O
Ops
Models
Datasets
Performance
The following table illustrates the performance improvement over the previous release by comparing the time in msecs it takes
torchaudio.transforms.PitchShift
, after its first call, to perform the operation onfloat32
Tensor with two channels and 8000 frames, resampled to 44.1 kHz across various shifted steps.Tests
Build
Other
__getattr__
to implement delayed initialization (#2377)Examples
Ops
Pipelines
Tests
Training recipes
Prototypes
Models
Pipelines
Documentation
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.