-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to run Speech T5 on XPU #10025
Comments
about the
|
SpeechT5 model can be successfully loaded using bigdl using from bigdl.llm.transformers import AutoModelForSpeechSeq2Seq
...
model = AutoModelForSpeechSeq2Seq.from_pretrained("microsoft/speecht5_tts", load_in_4bit=True)
model = model.to("xpu")
... |
I updated my bigdl version, but now I am getting segfault. Here is backtrace:
According to the backtrace, it seems like issue with finding the GPU. |
Yes, the default BigDL-LLM has upgraded to PyTorch 2.1/oneAPI 2024.0, and you will need to upgrade your oneAPI. Alternatively, you may continue to install PyTorch 2.0 version of BigDL-LLM, which is compatible with oneAPI 2023.2 (see https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux). |
But with oneAPI 2023.2 this does not work and it segfaults as mentioned in the previous comment. With oneAPI 2024 I still did not get anything working since I am getting an error message: |
Did you correctly configure the OneAPI env variables (refer to the instructions here)? And also pay attention to the runtime configurations instructions here which may prevent lots of runtime issues. |
Yes and yes, but the issue is still there. |
Could you provide the os, kernel and python version? |
To resolve this problem and use oneAPI 2024.0, it is recommended creating a new conda env through: conda create -n new-llm-env python=3.9
conda activate new-llm-env
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu Or if you would like to use BigDL-LLM with oneAPI 2024.0 in your old conda environment, you could: pip uninstall bigdl-core-xe
pip uninstall bigdl-core-xe-21
pip uninstall bigdl-core-xe-esimd
pip uninstall bigdl-core-xe-esimd-21
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu Note that |
OS: Ubuntu 22.04 |
Hi @nedo99, For Env (PyTorch 2.1 with oneAPI 2024.0): conda create -n speecht5-test python=3.9
conda activate speecht5-test
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install datasets soundfile Runtime Configuration: following here Code: import torch
from transformers import SpeechT5Processor, SpeechT5HifiGan, SpeechT5ForTextToSpeech
from datasets import load_dataset
import soundfile as sf
import time
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
from bigdl.llm import optimize_model
model = optimize_model(model, modules_to_not_convert=["speech_decoder_postnet.feat_out",
"speech_decoder_postnet.prob_out"])
model = model.to('xpu')
vocoder = vocoder.to('xpu')
text = "On a cold winter night, a lonely traveler found a shimmering stone in the snow, unaware that it would lead him to a world full of wonders."
inputs = processor(text=text, return_tensors="pt").to('xpu')
# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors",
split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0).to('xpu')
with torch.inference_mode():
# wamrup
st = time.perf_counter()
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
print(f'Warmup time: {time.perf_counter() - st}')
st1 = time.perf_counter()
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
torch.xpu.synchronize()
st2 = time.perf_counter()
print(f"Inference time: {st2-st1}")
sf.write("speech_bigdl_llm.wav", speech.to('cpu').numpy(), samplerate=16000) Please let us know for any further problems :) |
If you would be also interested in other TTS models we support, you can run Bark with BigDL-LLM optimization as follows :) Env (PyTorch 2.1 with oneAPI 2024.0): conda create -n bark-test python=3.9
conda activate bark-test
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install scipy Runtime Configuration: following here Code: from transformers import AutoProcessor, BarkModel
import torch
import time
processor = AutoProcessor.from_pretrained("suno/bark-small")
model = BarkModel.from_pretrained("suno/bark-small")
from bigdl.llm import optimize_model
model = optimize_model(model).to('xpu')
voice_preset = "v2/en_speaker_6"
text = "Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."
inputs = processor(text, voice_preset=voice_preset).to('xpu')
# warmup
st = time.time()
with torch.inference_mode():
model.generate(**inputs)
torch.xpu.synchronize()
print(f"Warmup time: {time.time() - st}")
st = time.time()
with torch.inference_mode():
audio_array = model.generate(**inputs)
torch.xpu.synchronize()
print(f"Inference time: {time.time() - st}")
audio_array = audio_array.cpu().numpy().squeeze()
from scipy.io.wavfile import write as write_wav
sample_rate = model.generation_config.sample_rate
write_wav("output/bark_generation_bigdl_llm.wav", sample_rate, audio_array) |
Speech T5 sample works. Bark does not work. It segfaults and has the same backtrace as posted in one of the previous comments. |
Hi @nedo99 , Could you let me know your test env for Bark?
What shows here seems not be a correct PyTorch 2.1 env for me :) You could try the steps here for a correct PyTorch 2.1 + oneAPI 2024.0 env for |
Here is the updated environment:
|
Hello,
I am trying to run Speech T5 on XPU but am unable to. It is this model https://huggingface.co/microsoft/speecht5_tts and here is my code:
and I am getting the following error:
Is there support for text-to-speech by BigDL? Or am I missing something?
Regards,
Nedim
The text was updated successfully, but these errors were encountered: