Whisper v3 dependency issue #28156

lionsheep0724 · 2023-12-20T02:53:34Z

System Info

transformers version: transformers-4.37.0.dev0 (installed via pip install --upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[audio], which instructed in here
Platform: Windows 10, WSL
Python version: 3.10

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_path = f"./models/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_path)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Expected behavior

I'm trying to load pretrained whisper-large-v3 model but I guess there is dependency issue in transformers (transformers-4.37.0.dev0)
I got an error as follows. ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.0.
I guess transformers(4.37.0.dev0) and whisper-v3 depends on tokenizers under 0.15, but installed one through pip command in official hf-whisper page is 0.15.
When I install lower version of tokenizers, ValueError: Non-consecutive added token ‘<|0.02|>’ found. Should have index 50365 but has index 50366 in saved vocabulary. error occurrs.
I'm confused which tokenizers version I need to install.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-12-20T11:14:03Z

Hi @lionsheep0724, thanks for raising this issue!

The most recent version of transformers is compatible with tokenizers==0.15.

Could you try reinstalling transformers?

pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git

For the error message, could you share the full traceback?

cc @sanchit-gandhi

lionsheep0724 · 2024-01-02T12:08:39Z

Hi @amyeroberts, sorry for late response, I was in year-end vacation.
I created conda env with python 3.10, and followed your comment, as below.

pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git

But same result as follows(full traceback):

Traceback (most recent call last):
  File "C:\Users\Kakaobank\Documents\stt-benchmark\whisper_v3.py", line 2, in <module>
    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\dependency_versions_check.py", line 57, in <module>
    require_version_core(deps[pkg])
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 117, in require_version_core
    return require_version(requirement, hint)
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 111, in require_version
    _compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 44, in _compare_versions
    raise ImportError(
ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.0.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git main

And refer to my test code:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_path = "./models/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_path)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

amyeroberts · 2024-01-02T14:33:47Z

@lionsheep0724 Could you confirm the versions of transformers and tokenizers in your environment?

pip list | grep tokenizers

pip list | grep transformers

And in the python environment:

python -c "import tokenizers; import transformers; print(tokenizers.__version__); print(transformers.__version__)"

lionsheep0724 · 2024-01-03T05:43:56Z

@amyeroberts
0.15.0 for tokenizers, 4.37.0.dev0 for transformers.

lionsheep0724 · 2024-01-03T08:02:58Z

Let me share my troubleshooting result.
The problem was windows.
I installed transformers as you mentioned above in docker container (linux) and there was no dependency issue.
But I'm confusing why transformers 4.37.0.dev0 behaves diffrently in linux and windows, even though the printed version was same in both system.

lionsheep0724 · 2024-01-04T02:21:02Z

Another finding : ubuntu 18.04 version(pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime image) also has same issue.
I guess 4.37.0.dev0 works differently depending on the platform.

amyeroberts · 2024-01-04T11:23:18Z

Thanks for updating @lionsheep0724!

Across different platforms - when working and not working - do you see the same versions of tokenizers and transformers installed in the python environment? Are you using the same method to install the libraries e.g. pip?

lionsheep0724 · 2024-01-05T12:52:54Z

Yes, I installed libraries using same method and the versions were same.

amyeroberts · 2024-01-05T14:37:18Z

@lionsheep0724 Hmmmm - I honestly have no idea what's happening here. I am to run without issue on my ubuntu machine and mac.

My best guess is that the version of transformers being run in the python environment isn't the same as the one being installed by pip. The version restrictions seen in the warning message were changed with #23909 and have been part of the library since v4.34.

You can check which version is being run using the python command I posted above. If you're running in an ipython environment, you'll need to make sure you're using the same libraries installed by pip. Running:

import x
print(x.__version__)

in the python environment should confirm if this is what's happening.

lionsheep0724 · 2024-01-18T01:37:55Z

@amyeroberts
After a lot of trials, the problem has been somehow solved. I just repeated methods the way I explained above.
I'm not sure about the root cause, I just assume its caused by our security s/w.
Really thank you for your reply.

amyeroberts · 2024-01-18T10:04:55Z

@lionsheep0724 Thanks for the update!

Leejilin · 2024-01-27T09:45:28Z

i try to uninstall both of transform and token

and then ,i used pip install transformers== 4.27.0,during this installation the token always be installed auto

finally, it worked!

github-actions · 2024-02-21T08:04:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

mohammadr8za · 2024-04-29T05:58:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Hi, I faced the same error "Wrong index found for <|0.02|>: should be None but found 50366"

Tried this:
pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git
and now it works.

502dxceit · 2024-06-26T14:15:34Z

I solve the same problem by upgrade transformers to latest version : transformers==4.41.2

github-actions bot closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper v3 dependency issue #28156

Whisper v3 dependency issue #28156

lionsheep0724 commented Dec 20, 2023

amyeroberts commented Dec 20, 2023

lionsheep0724 commented Jan 2, 2024

amyeroberts commented Jan 2, 2024

lionsheep0724 commented Jan 3, 2024

lionsheep0724 commented Jan 3, 2024

lionsheep0724 commented Jan 4, 2024 •

edited

Loading

amyeroberts commented Jan 4, 2024

lionsheep0724 commented Jan 5, 2024

amyeroberts commented Jan 5, 2024

lionsheep0724 commented Jan 18, 2024

amyeroberts commented Jan 18, 2024

Leejilin commented Jan 27, 2024

github-actions bot commented Feb 21, 2024

mohammadr8za commented Apr 29, 2024 •

edited

Loading

502dxceit commented Jun 26, 2024

Whisper v3 dependency issue #28156

Whisper v3 dependency issue #28156

Comments

lionsheep0724 commented Dec 20, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Dec 20, 2023

lionsheep0724 commented Jan 2, 2024

amyeroberts commented Jan 2, 2024

lionsheep0724 commented Jan 3, 2024

lionsheep0724 commented Jan 3, 2024

lionsheep0724 commented Jan 4, 2024 • edited Loading

amyeroberts commented Jan 4, 2024

lionsheep0724 commented Jan 5, 2024

amyeroberts commented Jan 5, 2024

lionsheep0724 commented Jan 18, 2024

amyeroberts commented Jan 18, 2024

Leejilin commented Jan 27, 2024

github-actions bot commented Feb 21, 2024

mohammadr8za commented Apr 29, 2024 • edited Loading

502dxceit commented Jun 26, 2024

lionsheep0724 commented Jan 4, 2024 •

edited

Loading

mohammadr8za commented Apr 29, 2024 •

edited

Loading