Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate' #722

Closed
arabcoders opened this issue Feb 26, 2024 · 12 comments
Closed

Comments

@arabcoders
Copy link

Hello, I have simple project testing out whisperx, the test script

import json
import logging
import whisperx

model_opts = {
    "whisper_arch": "large-v2",
    "device": "cuda",
    "compute_type": "float16",
    "download_root": "/home/user/.config/whisper-models",
    "language": "ja"
}

trans_opts = {
    "temperatures": [
        0.0,
        0.2,
        0.4,
        0.6000000000000001,
        0.8,
        1.0
    ],
    "best_of": 5,
    "beam_size": 5,
    "patience": 2,
    "initial_prompt": None,
    "condition_on_previous_text": True,
    "compression_ratio_threshold": 2.4,
    "log_prob_threshold": -1.0,
    "no_speech_threshold": 0.6,
    "word_timestamps": False,
    "prepend_punctuations": "\"'“¿([{-",
    "append_punctuations": "\"'.。,,!!??::”)]}、",
    "max_new_tokens": None,
    "clip_timestamps": None,
    "hallucination_silence_threshold": None
}

filename = '/mnt/media/test.mkv';

model = whisperx.load_model(**model_opts, asr_options=trans_opts)
audio = whisperx.load_audio(filename)

results = model.transcribe(audio, batch_size=16)

device = 'cuda'

# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(
    language_code=results["language"],
    device=device,
)

results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)

logging.debug(json.dumps(results, indent=2, ensure_ascii=False))

leads to

/home/user/test/.venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.0.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.1+cu121. Bad things might happen unless you revert torch to 1.x.
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-japanese and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/home/user/test/test.py", line 54, in <module>
    results = whisperx.align(results["segments"], model_a, metadata, audio, device, return_char_alignments=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/test/.venv/lib/python3.11/site-packages/whisperx/alignment.py", line 232, in align
    inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Wav2Vec2Processor' object has no attribute 'sampling_rate'

I am unable to get it working at all. testing just faster-whisper works ok it seems there is problem with the Wav2Vec model.

@frodo821
Copy link

frodo821 commented Mar 3, 2024

I have the same issue, does anyone get around this? It might be caused by some breaking changes in transformers, I'll try downgrading transformers.

@frodo821
Copy link

frodo821 commented Mar 3, 2024

Finally I solved this error rewriting alignment.py like this:

-                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)

@arabcoders
Copy link
Author

Finally I solved this error rewriting alignment.py like this:

-                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                     inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt").to(device)

Thanks, i've made small patch file that make it backwards compatible

--- .venv/lib/python3.11/site-packages/whisperx/alignment.py	2024-03-03 17:22:05.042130573 +0300
+++ .venv/lib/python3.11/site-packages/whisperx/alignment.py	2024-03-03 17:25:20.760972944 +0300
@@ -229,7 +229,13 @@
                 emissions, _ = model(waveform_segment.to(device), lengths=lengths)
             elif model_type == "huggingface":
                 if preprocess:
-                    inputs = processor(waveform_segment.squeeze(), sampling_rate=processor.sampling_rate, return_tensors="pt").to(device)
+                    sample_rate = None
+                    if 'sampling_rate' in processor.__dict__:
+                        sample_rate = processor.sampling_rate
+                    if 'feature_extractor' in processor.__dict__ and 'sampling_rate' in processor.feature_extractor.__dict__:
+                        sample_rate = processor.feature_extractor.sampling_rate
+
+                    inputs = processor(waveform_segment.squeeze(), sampling_rate=sample_rate, return_tensors="pt").to(device)
                     emissions = model(**inputs).logits
                 else:
                     emissions = model(waveform_segment.to(device)).logits

@alfahadgm
Copy link

How did you solve it, I tried to find the code you mentioned its not exist.

@melanie-rosenberg
Copy link

I also don't see the code referenced above.

@arabcoders
Copy link
Author

@alfahadgm @melanie-rosenberg, i am unsure why but this fix intended for v3.1.2, which it seems has been removed from the repo for some reason.

Maybe @m-bain can shed some light on why

@melanie-rosenberg
Copy link

melanie-rosenberg commented Mar 15, 2024

Thank you @arabcoders -- applying the patch worked while using v3.1.2.

@melanie-rosenberg
Copy link

melanie-rosenberg commented Mar 15, 2024

FYI @alfahadgm running this also worked:
pip install -U git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560

@HHousen
Copy link

HHousen commented Mar 25, 2024

Here's some info about the PyPI release vs this repo in case anyone else is confused like I was: It seems like the PyPI releases are created by someone other than the maintainer of this repo according to #700 (comment). The above patch works on top of this PR #625.

HHousen added a commit to HHousen/whisperX that referenced this issue Mar 25, 2024
@eschmidbauer
Copy link

eschmidbauer commented May 9, 2024

@HHousen Any chance you could submit a PR to get that change merged?

@Barabazs
Copy link
Collaborator

Barabazs commented Jan 2, 2025

@alfahadgm @melanie-rosenberg, i am unsure why but this fix intended for v3.1.2, which it seems has been removed from the repo for some reason.

Maybe @m-bain can shed some light on why

That version on PyPi was from a fork and not this repo.

Note

as of January 1st 2025, the whisperX project on PyPi will be maintained by the author of this project. Previous, unofficial, versions have been removed to prevent potential issues.

@Barabazs
Copy link
Collaborator

Barabazs commented Jan 2, 2025

@HHousen You're very welcome to create a PR for your fix, if it also applies to this repo.

@Barabazs Barabazs closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants