AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

arabcoders · 2024-02-26T18:18:58Z

Hello, I have simple project testing out whisperx, the test script

import json
import logging
import whisperx

model_opts = {
    "whisper_arch": "large-v2",
    "device": "cuda",
    "compute_type": "float16",
    "download_root": "/home/user/.config/whisper-models",
    "language": "ja"
}

trans_opts = {
    "temperatures": [
        0.0,
        0.2,
        0.4,
        0.6000000000000001,
        0.8,
        1.0
    ],
    "best_of": 5,
    "beam_size": 5,
    "patience": 2,
    "initial_prompt": None,
    "condition_on_previous_text": True,
    "compression_ratio_threshold": 2.4,
    "log_prob_threshold": -1.0,
    "no_speech_threshold": 0.6,
    "word_timestamps": False,
    "prepend_punctuations": "\"'“¿([{-",
    "append_punctuations": "\"'.。,，!！?？:：”)]}、",
    "max_new_tokens": None,
    "clip_timestamps": None,
    "hallucination_silence_threshold": None
}

filename = '/mnt/media/test.mkv';

model = whisperx.load_model(**model_opts, asr_options=trans_opts)
audio = whisperx.load_audio(filename)

results = model.transcribe(audio, batch_size=16)

device = 'cuda'

# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(
    language_code=results["language"],
    device=device,
)

results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)

logging.debug(json.dumps(results, indent=2, ensure_ascii=False))

leads to

/home/user/test/.venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.0.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.1+cu121. Bad things might happen unless you revert torch to 1.x.
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/home/user/test/test.py", line 54, in <module>
    results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/test/.venv/lib/python3.11/site-packages/whisperx/alignment.py", line 232, in align
    inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate'

I am unable to get it working at all. testing just faster-whisper works ok it seems there is problem with the Wav2Vec model.

The text was updated successfully, but these errors were encountered:

frodo821 · 2024-03-03T07:44:45Z

I have the same issue, does anyone get around this? It might be caused by some breaking changes in transformers, I'll try downgrading transformers.

frodo821 · 2024-03-03T08:59:21Z

Finally I solved this error rewriting alignment.py like this:

-                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)

arabcoders · 2024-03-03T16:35:40Z

Finally I solved this error rewriting alignment.py like this:

-                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)

Thanks, i've made small patch file that make it backwards compatible

--- .venv/lib/python3.11/site-packages/whisperx/alignment.py	2024-03-03 17:22:05.042130573 +0300
+++ .venv/lib/python3.11/site-packages/whisperx/alignment.py	2024-03-03 17:25:20.760972944 +0300
@@ -229,7 +229,13 @@
                 emissions, _ = model(waveform_segment.to(device), lengths=lengths)
             elif model_type == "huggingface":
                 if preprocess:
-                    inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                    sample_rate = None
+                    if 'sampling_rate' in processor.__dict__:
+                        sample_rate = processor.sampling_rate
+                    if 'feature_extractor' in processor.__dict__ and 'sampling_rate' in processor.feature_extractor.__dict__:
+                        sample_rate = processor.feature_extractor.sampling_rate
+
+                    inputs = processor(waveform_segment.squeeze(), sampling_rate=sample_rate, return_tensors="pt").to(device)
                     emissions = model(**inputs).logits
                 else:
                     emissions = model(waveform_segment.to(device)).logits

alfahadgm · 2024-03-12T12:35:54Z

How did you solve it, I tried to find the code you mentioned its not exist.

melanie-rosenberg · 2024-03-15T15:48:45Z

I also don't see the code referenced above.

arabcoders · 2024-03-15T16:17:09Z

@alfahadgm @melanie-rosenberg, i am unsure why but this fix intended for v3.1.2, which it seems has been removed from the repo for some reason.

Maybe @m-bain can shed some light on why

melanie-rosenberg · 2024-03-15T17:15:52Z

Thank you @arabcoders -- applying the patch worked while using v3.1.2.

melanie-rosenberg · 2024-03-15T17:24:02Z

FYI @alfahadgm running this also worked:
pip install -U git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560

HHousen · 2024-03-25T06:11:28Z

Here's some info about the PyPI release vs this repo in case anyone else is confused like I was: It seems like the PyPI releases are created by someone other than the maintainer of this repo according to #700 (comment). The above patch works on top of this PR #625.

- Based on m-bain#722 (comment)

eschmidbauer · 2024-05-09T13:35:42Z

@HHousen Any chance you could submit a PR to get that change merged?

Barabazs · 2025-01-02T07:52:50Z

@alfahadgm @melanie-rosenberg, i am unsure why but this fix intended for v3.1.2, which it seems has been removed from the repo for some reason.

Maybe @m-bain can shed some light on why

That version on PyPi was from a fork and not this repo.

Note

as of January 1st 2025, the whisperX project on PyPi will be maintained by the author of this project. Previous, unofficial, versions have been removed to prevent potential issues.

Barabazs · 2025-01-02T07:54:52Z

@HHousen You're very welcome to create a PR for your fix, if it also applies to this repo.

HHousen added a commit to HHousen/whisperX that referenced this issue Mar 25, 2024

Patch sampling_rate in align

aa6632f

- Based on m-bain#722 (comment)

HHousen mentioned this issue Sep 6, 2024

Version 3.1.5 is distributed on pypi but Github repo only has 3.1.1? #859

Closed

erklem mentioned this issue Dec 2, 2024

Mismatch between tags and latest version on PyPi (3.1.2) #761

Closed

Barabazs closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

arabcoders commented Feb 26, 2024

frodo821 commented Mar 3, 2024

frodo821 commented Mar 3, 2024

arabcoders commented Mar 3, 2024

alfahadgm commented Mar 12, 2024

melanie-rosenberg commented Mar 15, 2024

arabcoders commented Mar 15, 2024

melanie-rosenberg commented Mar 15, 2024 •

edited

Loading

melanie-rosenberg commented Mar 15, 2024 •

edited

Loading

HHousen commented Mar 25, 2024

eschmidbauer commented May 9, 2024 •

edited

Loading

Barabazs commented Jan 2, 2025

Barabazs commented Jan 2, 2025

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

Comments

arabcoders commented Feb 26, 2024

frodo821 commented Mar 3, 2024

frodo821 commented Mar 3, 2024

arabcoders commented Mar 3, 2024

alfahadgm commented Mar 12, 2024

melanie-rosenberg commented Mar 15, 2024

arabcoders commented Mar 15, 2024

melanie-rosenberg commented Mar 15, 2024 • edited Loading

melanie-rosenberg commented Mar 15, 2024 • edited Loading

HHousen commented Mar 25, 2024

eschmidbauer commented May 9, 2024 • edited Loading

Barabazs commented Jan 2, 2025

Barabazs commented Jan 2, 2025

melanie-rosenberg commented Mar 15, 2024 •

edited

Loading

melanie-rosenberg commented Mar 15, 2024 •

edited

Loading

eschmidbauer commented May 9, 2024 •

edited

Loading