Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.11.0
->==2.5.1
Release Notes
pytorch/audio (torchaudio)
v2.5.1
Compare Source
v2.5.0
: TorchAudio 2.5.0 ReleaseCompare Source
This release is compatible with
PyTorch 2.5
. There are no new features added.This release contains one improvement:
v2.4.1
: TorchAudio 2.4.1 ReleaseCompare Source
This release is compatible with PyTorch 2.4.1 patch release. There are no new features added.
v2.4.0
: TorchAudio 2.4.0 ReleaseCompare Source
This release is compatible with
PyTorch 2.4
. There are no new features added.This release contains 2 fixes:
v2.3.1
: TorchAudio 2.3.1 ReleaseCompare Source
This release is compatible with PyTorch 2.3.1 patch release. There are no new features added.
v2.3.0
: TorchAudio 2.3.0 ReleaseCompare Source
This release is compatible with PyTorch 2.3.0 patch release. There are no new features added.
This release contains minor documentation and code quality improvements (#3734, #3748, #3757, #3759)
v2.2.2
: TorchAudio 2.2.2 ReleaseCompare Source
This release is compatible with PyTorch 2.2.2 patch release. There are no new features added.
v2.2.1
: TorchAudio 2.2.1 ReleaseCompare Source
This release is compatible with PyTorch 2.2.1 patch release. There are no new features added.
v2.2.0
: TorchAudio 2.2.0 ReleaseCompare Source
New Features
trio
top-level module, dedicated for core I/O operations (https://github.com/pytorch/audio/pull/3676, https://github.com/pytorch/audio/pull/3680, https://github.com/pytorch/audio/pull/3681, https://github.com/pytorch/audio/pull/3682) Please refer to https://pytorch.org/audio/2.2.0/torio.html for the details.Bug Fixes
Recipe Updates
v2.1.2
: TorchAudio 2.1.2 ReleaseCompare Source
This is a patch release, which is compatible with PyTorch 2.1.2. There are no new features added.
v2.1.1
Compare Source
This is a minor release, which is compatible with PyTorch 2.1.1 and includes bug fixes, improvements and documentation updates.
Bug Fixes
v2.1.0
: Torchaudio 2.1 Release NoteCompare Source
Hilights
TorchAudio v2.1 introduces the new features and backward-incompatible changes;
torchaudio.io.AudioEffector
can apply filters, effects and encodings to waveforms in online/offline fashion.You can use it as a form of augmentation.
Please refer to https://pytorch.org/audio/2.1/tutorials/effector_tutorial.html for the examples.
New functions and a pre-trained model for forced alignment were added.
torchaudio.functional.forced_align
computes alignment from an emission andtorchaudio.pipelines.MMS_FA
provides access to the model trained for multilingual forced alignment in MMS: Scaling Speech Technology to 1000+ languages project.Please refer to https://pytorch.org/audio/2.1/tutorials/ctc_forced_alignment_api_tutorial.html for the usage of
forced_align
function, and https://pytorch.org/audio/2.1/tutorials/forced_alignment_for_multilingual_data_tutorial.html for how one can useMMS_FA
to align transcript in multiple languages.Model architectures and pre-trained models from the paper TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio were added.
You can use
torchaudio.pipelines.SQUIM_SUBJECTIVE
andtorchaudio.pipelines.SQUIM_OBJECTIVE
models to estimate the various speech quality and intelligibility metrics. This is helpful when evaluating the quality of speech generation models, such as TTS.Please refer to https://pytorch.org/audio/2.1/tutorials/squim_tutorial.html for the detail.
torchaudio.models.decoder.CUCTCDecoder
takes emission stored in CUDA memory and performs CTC beam search on it in CUDA device. The beam search is fast. It eliminates the need to move data from CUDA device to CPU when performing automatic speech recognition. With PyTorch's CUDA support, it is now possible to perform the entire speech recognition pipeline in CUDA.Please refer to https://pytorch.org/audio/2.1/tutorials/asr_inference_with_cuda_ctc_decoder_tutorial.html for the detail.
We are working to add utilities that are relevant to music AI. Since the last release, the following APIs were added to the prototype.
Please refer to respective documentation for the usage.
Recipes for Audio-visual ASR, multi-channel DNN beamforming and TCPGen context-biasing were added.
Please refer to the recipes
The version of supported FFmpeg libraries was updated.
TorchAudio v2.1 works with FFmpeg 6, 5 and 4.4. The support for 4.3, 4.2 and 4.1 are dropped.
Please refer to https://pytorch.org/audio/2.1/installation.html#optional-dependencies for the detail of the new FFmpeg integration mechanism.
TorchAudio now depends on libsox installed separately from torchaudio. Sox I/O backend no longer supports file-like object. (This is supported by FFmpeg backend and soundfile)
Please refer to https://pytorch.org/audio/2.1/installation.html#optional-dependencies for the detail.
New Features
I/O
torchaudio.io.StreamWriter
(#3135)torchaudio.io.StreamReader.get_out_stream_info
(#3155)torchaudio.io.StreamReader
filter graph (#3183, #3479)torchaudio.io.StreamWriter
(#3194)torchaudio.io.StreamReader
(#3216)torchaudio.io.StreamWriter
(#3207)420p10le
support totorchaudio.io.StreamReader
CPU decoder (#3332)Ops
torchaudio.io.AudioEffector
(#3163, #3372, #3374)torchaudio.transforms.SpecAugment
(#3309, #3314)torchaudio.functional.forced_align
(#3348, #3355, #3533, #3536, #3354, #3365, #3433, #3357)torchaudio.functional.merge_tokens
(#3535, #3614)torchaudio.functional.frechet_distance
(#3545)Models
torchaudio.models.SquimObjective
for speech enhancement (#3042, 3087, #3512)torchaudio.models.SquimSubjective
for speech enhancement (#3189)torchaudio.models.decoder.CUCTCDecoder
(#3096)Pipelines
torchaudio.pipelines.SquimObjectiveBundle
for speech enhancement (#3103)torchaudio.pipelines.SquimSubjectiveBundle
for speech enhancement (#3197)torchaudio.pipelines.MMS_FA
Bundle for forced alignment (#3521, #3538)Tutorials
torchaudio.io.AudioEffector
(#3226)torchaudio.models.decoder.CUCTCDecoder
(#3297)Recipe
Backward-incompatible changes
Third-party libraries
In this release, the following third party libraries are removed from TorchAudio binary distributions. TorchAudio now search and link these libraries at runtime. Please install them to use the corresponding APIs.
SoX
libsox
is used for various audio I/O, filtering operations.Pre-built binaries are avaialble via package managers, such as
conda
,apt
andbrew
. Please refer to the respective documetation.The APIs affected include;
torchaudio.load
("sox" backend)torchaudio.info
("sox" backend)torchaudio.save
("sox" backend)torchaudio.sox_effects.apply_effects_tensor
torchaudio.sox_effects.apply_effects_file
torchaudio.functional.apply_codec
(also deprecated, see below)Changes related to the removal: #3232, #3246, #3497, #3035
Flashlight Text
flashlight-text
is the core of CTC decoder.Pre-built packages are available on PyPI. Please refer to https://github.com/flashlight/text for the detail.
The APIs affected include;
torchaudio.models.decoder.CTCDecoder
Changes related to the removal: #3232, #3246, #3236, #3339
Kaldi
A custom built
libkaldi
was used to implementtorchaudio.functional.compute_kaldi_pitch
. This function, along with libkaldi integration, is removed in this release. There is no replcement.Changes related to the removal: #3368, #3403
I/O
To make I/O operations more flexible, TorchAudio introduced the backend dispatcher in v2.0, and users could opt-in to use the dispatcher.
In this release, the backend dispatcher becomes the default mechanism for selecting the I/O backend.
You can pass
backend
argument totorchaudio.info
,torchaudio.load
andtorchaudio.save
function to select I/O backend library per-call basis. (If it is omitted, an available backend is automatically selected.)If you want to use the global backend mechanism, you can set the environment variable,
TORCHAUDIO_USE_BACKEND_DISPATCHER=0
.Please note, however, that this the global backend mechanism is deprecated and is going to be removed in the next release.
Please see #2950 for the detail of migration work.
torchaudio.io.StreamReader
accepted a byte-string wrapped in 1Dtorch.Tensor
object. This is no longer supported.Please wrap the underlying data with
io.BytesIO
instead.The optional arguments of
add_[audio|video]_stream
methods oftorchaudio.io.StreamReader
andtorchaudio.io.StreamWriter
are now keyword-only arguments.Previously TorchAudio supported FFmpeg 4 (>=4.1, <=4.4). In this release, TorchAudio supports FFmpeg 4, 5 and 6 (>=4.4, <7). With this change, support for FFmpeg 4.1, 4.2 and 4.3 are dropped.
Ops
torchaudio.functional.apply_codec
(#3397)In previous versions, TorchAudio shipped custom built
libsox
, so that it can perform in-memory decoding and encoding.Now, in-memory decoding and encoding are handled by FFmpeg binding, and with the switch to dynamic
libsox
linking,torchaudio.functional.apply_codec
no longer process audio in in-memory fashion. Instead it writes to temporary file.For in-memory processing, please use
torchaudio.io.AudioEffector
.lstsq
when solving InverseMelScale (#3280)Previously,
torchaudio.transform.InverseMelScale
ran SGD optimizer to find the inverse of mel-scale transform. This approach has number of issues as listed in #2643.This release switches to use
torch.linalg.lstsq
.Models
The
infer
method oftorchaudio.models.RNNTBeamSearch
has been updated to accept series of previous hypotheses.Deprecations
Ops
torchaudio.functional.apply_codec
function (#3386)Due to the removal of custom libsox binding,
torchaudio.functional.apply_codec
no longer supports in-memory processing. Please migrate totorchaudio.io.AudioEffector
.Please refer to for the detailed usage of
torchaudio.io.AudioEffector
.Bug Fixes
Models
Tutorials
get_trellis
in forced alignment tutorial (#3172)Build
I/O
torchaudio.io.StreamWriter
(#3373)Ops
lfilter
(#3432)Improvements
I/O
torchaudio.io.StreamWriter
is not opened (#3152)torchaudio.io.StreamReader
(#3157, #3170, #3186, #3184, #3188, #3320, #3296, #3328, #3419, #3209)torchaudio.io.StreamWriter
(#3205, #3319, #3296, #3328, #3426, #3428)Ops
Documentation
Tutorials
n_fft
(#3442)Build
Recipe
Other
torch.norm
totorch.linalg.vector_norm
(#3522)torch.nn.utils.weight_norm
tonn.utils.parametrizations.weight_norm
(#3523)v2.0.2
Compare Source
TorchAudio 2.0.2 Release Note
This is a minor release, which is compatible with PyTorch 2.0.1 and includes bug fixes, improvements and documentation updates. There is no new feature added.
Bug fix
Full Changelog: pytorch/audio@v2.0.1...v2.0.2
v2.0.1
: Torchaudio 2.0 Release NoteHighlights
TorchAudio 2.0 release includes:
info
,load
,save
functions[Beta] Data augmentation operators
The release adds several data augmentation operators under
torchaudio.functional
andtorchaudio.transforms
:torchaudio.functional.add_noise
torchaudio.functional.convolve
torchaudio.functional.deemphasis
torchaudio.functional.fftconvolve
torchaudio.functional.preemphasis
torchaudio.functional.speed
torchaudio.transforms.AddNoise
torchaudio.transforms.Convolve
torchaudio.transforms.Deemphasis
torchaudio.transforms.FFTConvolve
torchaudio.transforms.Preemphasis
torchaudio.transforms.Speed
torchaudio.transforms.SpeedPerturbation
The operators can be used to synthetically diversify training data to improve the generalizability of downstream models.
For usage details, please refer to the documentation for
torchaudio.functional
andtorchaudio.transforms
, and tutorial “Audio Data Augmentation”.[Beta] WavLM and XLS-R models and pre-trained pipelines
The release adds two self-supervised learning models for speech and audio.
Besides the model architectures, torchaudio also supports corresponding pre-trained pipelines:
torchaudio.pipelines.WAVLM_BASE
torchaudio.pipelines.WAVLM_BASE_PLUS
torchaudio.pipelines.WAVLM_LARGE
torchaudio.pipelines.WAV2VEC_XLSR_300M
torchaudio.pipelines.WAV2VEC_XLSR_1B
torchaudio.pipelines.WAV2VEC_XLSR_2B
For usage details, please refer to
factory function
andpre-trained pipelines
documentation.Backend dispatcher
Release 2.0 introduces new versions of I/O functions
torchaudio.info
,torchaudio.load
andtorchaudio.save
, backed by a dispatcher that allows for selecting one of backends FFmpeg, SoX, and SoundFile to use, subject to library availability. Users can enable the new logic in Release 2.0 by setting the environment variableTORCHAUDIO_USE_BACKEND_DISPATCHER=1
; the new logic will be enabled by default in Release 2.1.Please see the documentation for
torchaudio
for more details.Backward-incompatible changes
Dropped Python 3.7 support (#3020)
Following the upstream PyTorhttps://github.com/pytorch/pytorch/pull/931553155), the support for Python 3.7 has been dropped.
Default to "precise" seek in
torchaudio.io.StreamReader.seek
(#2737, #2841, #2915, #2916, #2970)Previously, the
StreamReader.seek
method seeked into a key frame closest to the given time stamp. A new optionmode
has been added which can switch the behavior to seeking into any type of frame, including non-key frames, that is closest to the given timestamp, and this behavior is now default.Removed deprecated/unused/undocumented functions from datasets.utils (#2926, #2927)
The following functions are removed from
datasets.utils
stream_url
download_url
validate_file
extract_archive
.Deprecations
Ops
Deprecated 'onesided' init param for MelSpectrogram (#2797, #2799)
torchaudio.transforms.MelSpectrogram
assumes theonesided
argument to be alwaysTrue
. The forward path fails if its value isFalse
. Therefore this argument is deprecated. Users specifying this argument should stop specifying it.Deprecated
"sinc_interpolation"
and"kaiser_window"
option value in favor of"sinc_interp_hann"
and"sinc_interp_kaiser"
(#2922)The valid values of
resampling_method
argument of resampling operations (torchaudio.transforms.Resample
andtorchaudio.functional.resample
) are changed."kaiser_window"
is now"sinc_interp_kaiser"
and"sinc_interpolation"
is"sinc_interp_hann"
. The old values will continue to work, but users are encouraged to update their code.For the reason behind of this change, please refer #2891.
Deprecated sox initialization/shutdown public API functions (#3010)
torchaudio.sox_effects.init_sox_effects
andtorchaudio.sox_effects.shutdown_sox_effects
are deprecated. They were required to use libsox-related features, but are called automatically since v0.6, and the initialization/shutdown mechanism have been moved elsewhere. These functions are now no-op. Users can simply remove the call to these functions.Models
Since v0.12, TorchAudio binary distributions included the CTC decoder based on flashlight-text project. In a future release, TorchAudio will switch to dynamic binding of underlying CTC decoder implementation, and stop shipping the core CTC decoder implementations. Users who would like to use the CTC decoder need to separately install the CTC decoder from the upstream flashlight-text project. Other functionalities of TorchAudio will continue to work without flashlight-text.
Note: The API and numerical behavior does not change.
For more detail, please refer #3088.
I/O
As a preparation to switch to dynamically bound libsox, file-like object support in sox_io backend has been deprecated. It will be removed in 2.1 release in favor of the dispatcher. This deprecation affects the following functionalities.
torchaudio.load
,torchaudio.info
andtorchaudio.save
.torchaudio.sox_effects.apply_effects_file
andtorchaudio.functional.apply_codec
.For I/O, to continue using file-like objects, please use the new dispatcher mechanism.
For effects, replacement functions will be added in the next release.
torchaudio.io.StreamReader
supports decoding media from byte strings contained in 1D tensors oftorch.uint8
type. Using torch.Tensor type as a container for byte string is now deprecated. To pass byte strings, please wrap the string withio.BytesIO
.data = b"..."
src = torch.frombuffer(data, dtype=torch.uint8)
StreamReader(src)
data = b"..."
src = io.BytesIO(data)
StreamReader(src)
Bug Fixes
Ops
torchaudio.functional.lfilter
(#3080)Pipelines
In self-supervised learning models such as Wav2Vec 2.0, HuBERT, or WavLM, layer normalization should be applied to waveforms if the convolutional feature extraction module uses layer normalization and is trained on a large-scale dataset. After adding layer normalization to those affected models, the Word Error Rate is significantly reduced.
Without the change in #2873, the WER results are:
After applying layer normalization, the updated WER results are:
Recipe
If
shuffle
is setTrue
inBucketizeBatchSampler
, the seed is only the same for the first epoch. In later epochs, eachBucketizeBatchSampler
object will generate a different shuffled iteration list, which may cause DPP training to hang forever if the lengths of iteration lists are different across nodes. In the 2.0.0 release, the issue is fixed by using the same seed for RNG in all nodes.IO
_fail_info_fileobj
(#3032)This fixes the memory leak reported in
torchaudio.io.StreamReader
.New Features
Ops
torchaudio.functional.lfilter
(#3018)Introduces
AddNoise
,Convolve
,FFTConvolve
,Speed
,SpeedPerturbation
,Deemphasis
, andPreemphasis
intorchaudio.transforms
, andadd_noise
,fftconvolve
,convolve
,speed
,preemphasis
, anddeemphasis
intorchaudio.functional
.Models
Pipelines
I/O
fill_buffer
method totorchaudio.io.StreamReader
(#2954, #2971)buffer_chunk_size=-1
option totorchaudio.io.StreamReader
(#2969)When
buffer_chunk_size=-1
,StreamReader
does not drop any buffered frame. Together with thefill_buffer
method, this is a recommended way to load the entire media.torchaudio.io.StreamReader
(#2975)torchaudio.io.SteramReader
now gives PTS (presentation time stamp) of the media chunk it is returning. To maintain backward compatibility, the timestamp information is attached to the returned media chunk.Fetch timestamp
Chunks behave the same as torch.Tensor.
torchaudio.io.play_audio
(#3026, #3051)You can play audio with the
torchaudio.io.play_audio
function. (macOS only)Other
The following functions are added to
torchaudio.utils.ffmpeg_utils
, which can be used to query into the dynamically linked FFmpeg libraries.get_demuxers()
get_muxers()
get_audio_decoders()
get_audio_encoders()
get_video_decoders()
get_video_encoders()
get_input_devices()
get_output_devices()
get_input_protocols()
get_output_protocols()
get_build_config()
Recipes
Improvements
I/O
Refactor StreamReader/Writer implementation
torchaudio::ffmpeg
namespace withtorchaudio::io
(#3013)pop_chunks
implementations (#3002)Added logging to
torchaudio.io.StreamReader/Writer
(#2878)Fixed the #threads used by FilterGraph to 1 (#2985)
Fixed the default #threads used by decoder to 1 in
torchaudio.io.StreamReader
(#2949)Moved libsox integration from
libtorchaudio
tolibtorchaudio_sox
(#2929)Added query methods to FilterGraph (#2976)
Ops
cuda_version
(#2952)Models
Datasets
Documentation
Recipes
Tutorials
Builds
USE_CUDA
detection (#3005)USE_ROCM
detection (#3008)Tests
Style
v0.13.1
: TorchAudio 0.13.1 Release NoteCompare Source
This is a minor release, which is compatible with PyTorch 1.13.1 and includes bug fixes, improvements and documentation updates. There is no new feature added.
Bug Fix
IO
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.