Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper v3 dependency issue #28156

Closed
1 of 4 tasks
lionsheep0724 opened this issue Dec 20, 2023 · 15 comments
Closed
1 of 4 tasks

Whisper v3 dependency issue #28156

lionsheep0724 opened this issue Dec 20, 2023 · 15 comments

Comments

@lionsheep0724
Copy link

System Info

  • transformers version: transformers-4.37.0.dev0 (installed via pip install --upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[audio], which instructed in here
  • Platform: Windows 10, WSL
  • Python version: 3.10

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_path = f"./models/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_path)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Expected behavior

  • I'm trying to load pretrained whisper-large-v3 model but I guess there is dependency issue in transformers (transformers-4.37.0.dev0)
  • I got an error as follows. ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.0.
  • I guess transformers(4.37.0.dev0) and whisper-v3 depends on tokenizers under 0.15, but installed one through pip command in official hf-whisper page is 0.15.
  • When I install lower version of tokenizers, ValueError: Non-consecutive added token ‘<|0.02|>’ found. Should have index 50365 but has index 50366 in saved vocabulary. error occurrs.
  • I'm confused which tokenizers version I need to install.
@amyeroberts
Copy link
Collaborator

Hi @lionsheep0724, thanks for raising this issue!

The most recent version of transformers is compatible with tokenizers==0.15.

Could you try reinstalling transformers?

pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git

For the error message, could you share the full traceback?

cc @sanchit-gandhi

@lionsheep0724
Copy link
Author

Hi @amyeroberts, sorry for late response, I was in year-end vacation.
I created conda env with python 3.10, and followed your comment, as below.

pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git

But same result as follows(full traceback):

Traceback (most recent call last):
  File "C:\Users\Kakaobank\Documents\stt-benchmark\whisper_v3.py", line 2, in <module>
    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\dependency_versions_check.py", line 57, in <module>
    require_version_core(deps[pkg])
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 117, in require_version_core
    return require_version(requirement, hint)
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 111, in require_version
    _compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
  File "C:\Users\Kakaobank\Documents\stt-benchmark\transformers\utils\versions.py", line 44, in _compare_versions
    raise ImportError(
ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.0.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git main

And refer to my test code:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_path = "./models/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_path)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

@amyeroberts
Copy link
Collaborator

@lionsheep0724 Could you confirm the versions of transformers and tokenizers in your environment?

pip list | grep tokenizers
pip list | grep transformers

And in the python environment:

python -c "import tokenizers; import transformers; print(tokenizers.__version__); print(transformers.__version__)"

@lionsheep0724
Copy link
Author

@amyeroberts
0.15.0 for tokenizers, 4.37.0.dev0 for transformers.

@lionsheep0724
Copy link
Author

Let me share my troubleshooting result.
The problem was windows.
I installed transformers as you mentioned above in docker container (linux) and there was no dependency issue.
But I'm confusing why transformers 4.37.0.dev0 behaves diffrently in linux and windows, even though the printed version was same in both system.

@lionsheep0724
Copy link
Author

lionsheep0724 commented Jan 4, 2024

Another finding : ubuntu 18.04 version(pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime image) also has same issue.
I guess 4.37.0.dev0 works differently depending on the platform.

@amyeroberts
Copy link
Collaborator

Thanks for updating @lionsheep0724!

Across different platforms - when working and not working - do you see the same versions of tokenizers and transformers installed in the python environment? Are you using the same method to install the libraries e.g. pip?

@lionsheep0724
Copy link
Author

Yes, I installed libraries using same method and the versions were same.

@amyeroberts
Copy link
Collaborator

@lionsheep0724 Hmmmm - I honestly have no idea what's happening here. I am to run without issue on my ubuntu machine and mac.

My best guess is that the version of transformers being run in the python environment isn't the same as the one being installed by pip. The version restrictions seen in the warning message were changed with #23909 and have been part of the library since v4.34.

You can check which version is being run using the python command I posted above. If you're running in an ipython environment, you'll need to make sure you're using the same libraries installed by pip. Running:

import x
print(x.__version__)

in the python environment should confirm if this is what's happening.

@lionsheep0724
Copy link
Author

@amyeroberts
After a lot of trials, the problem has been somehow solved. I just repeated methods the way I explained above.
I'm not sure about the root cause, I just assume its caused by our security s/w.
Really thank you for your reply.

@amyeroberts
Copy link
Collaborator

@lionsheep0724 Thanks for the update!

@Leejilin
Copy link

i try to uninstall both of transform and token
屏幕截图 2024-01-27 174155

and then ,i used pip install transformers== 4.27.0,during this installation the token always be installed auto

finally, it worked!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@mohammadr8za
Copy link

mohammadr8za commented Apr 29, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Hi, I faced the same error "Wrong index found for <|0.02|>: should be None but found 50366"

Tried this:
pip uninstall transformers
pip install --upgrade git+https://github.com/huggingface/transformers.git
and now it works.

@502dxceit
Copy link

I solve the same problem by upgrade transformers to latest version : transformers==4.41.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants