-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyannote 3.1.0 still on CPU only? #1563
Comments
Thank you for your issue.You might want to check the FAQ if you haven't done so already. Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). Companies relying on
|
You are using the wrong pretrained pipeline. |
Thank you, I tried that and got this error:
Do I need to remove that line? Is that no longer needed? |
Looks like you forgot to request access to this new pipeline on HuggingFace model hub. |
Hi @hbredin I also tried the latest 3.1.0 version with 3.1 model. However, it's also extremely slow for me. |
I am having the same problem here. It is extremely slow. |
Tagging this issue as |
You can also upload your audio file here to get an idea of the expected processing speed on a T4 GPU. |
It seems that the problem was in my installation. I used this as
And this for Dockerfile. FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
git \
git-lfs \
wget \
curl \
# python build dependencies \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libxml2-dev \
libxmlsec1-dev \
libffi-dev \
liblzma-dev \
# gradio dependencies \
ffmpeg \
ca-certificates \
# fairseq2 dependencies \
libsndfile-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
PATH=/home/user/.local/bin:${PATH}
WORKDIR ${HOME}
RUN git clone https://github.com/yyuu/pyenv.git .pyenv
ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}
ARG PYTHON_VERSION=3.10
RUN pyenv install ${PYTHON_VERSION} && \
pyenv global ${PYTHON_VERSION} && \
pyenv rehash && \
pip install --no-cache-dir -U pip setuptools wheel
COPY --chown=1000 ./requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt
COPY --chown=1000 . ${HOME}/app
ENV PYTHONPATH=${HOME}/app \
PYTHONUNBUFFERED=1 \
GRADIO_ALLOW_FLAGGING=never \
GRADIO_NUM_PORTS=1 \
GRADIO_SERVER_NAME=0.0.0.0 \
GRADIO_THEME=huggingface \
SYSTEM=spaces \
GRADIO_SERVER_PORT=7860
EXPOSE 7860
WORKDIR ${HOME}/app
CMD ["python", "app.py"] I do not know if it is using GPU or not. But without this, It took around 90 minutes to process a 110 minute file. Now, It takes around 1~2 minutes. |
@pourmand1376 thank you for providing your Docker code, could you also please provide the python code you used for the diarization using Pyannote? |
How to do that? |
The same way you already did for the old pipeline. By visiting hf.co/pyannote/speaker-diarization-3.1 and agreeing on the terms. |
Thanks, I could fix the error I posted above by simply re-accepting the terms at the links below: https://hf.co/pyannote/segmentation-3.0 So that my authorization code worked again. I am still investigating the missing GPU usage... I'll be back as soon as I find out more. |
Yes! It looks like the requirements @pourmand1376 posted above fixed the problem! Now I see the GPU being used ;) My guess, in particular, is the following one: --extra-index-url https://download.pytorch.org/whl/cu113 Because I tried the other ones singularly and didn't do the trick. |
Here (this is not a Miminal Example but rather it splits the file and creates a zip file for the user): import gradio as gr
import os
from dotenv import load_dotenv
from pydub import AudioSegment
from pathlib import Path
import torch
from pyannote.audio import Pipeline
load_dotenv()
HF_API = os.getenv("HF_API")
print(f"HF API Length: {len(HF_API)}")
DESCRIPTION = """
# Speaker Diarization v3.1.0
"""
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1", use_auth_token=HF_API
)
pipeline.to(torch.device("cuda"))
import os
import zipfile
def zip_folder(folder_path):
folder_name = os.path.basename(folder_path)
zip_path = f"{folder_name}.zip"
zip_file = zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED)
for root, dirs, files in os.walk(folder_path):
for file in files:
zip_file.write(os.path.join(root, file))
zip_file.close()
return zip_path
import os
import shutil
def rmrf(path):
if os.path.isfile(path):
os.remove(path)
elif os.path.isdir(path):
shutil.rmtree(path)
def predict(number_of_speakers, audio_source, input_audio_mic, input_audio_file):
if audio_source == "microphone":
input_data = input_audio_mic
else:
input_data = input_audio_file
print(input_data)
if number_of_speakers == 0:
diarization = pipeline(input_data)
else:
diarization = pipeline(input_data, num_speakers=number_of_speakers)
text_output = ""
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start}s stop={turn.end}s speaker_{speaker}")
text_output = (
text_output
+ f"start={turn.start}s stop={turn.end}s speaker_{speaker}"
+ "\n"
)
song = AudioSegment.from_wav(input_data)
rmrf("files")
print(Path("files").absolute)
Path("files").mkdir(exist_ok=True, parents=True)
for i, (turn, _, speaker) in enumerate(diarization.itertracks(yield_label=True)):
try:
clipped = song[turn.start * 1000 : turn.end * 1000]
clipped.export(f"files/{i:03}.wav", format="wav", bitrate=16000)
except Exception as e:
print(e)
output_path = zip_folder("files")
return (text_output, output_path)
def update_audio_ui(audio_source: str) -> tuple[dict, dict]:
mic = audio_source == "microphone"
return (
gr.update(visible=mic, value=None), # input_audio_mic
gr.update(visible=not mic, value=None), # input_audio_file
)
with gr.Blocks(css="style.css") as demo:
gr.Markdown(DESCRIPTION)
with gr.Group():
with gr.Row():
number_of_speakers = gr.Number(
label="Number of Speakers",
info="Keep it zero, if you want the model to automatically detect the number of speakers",
)
with gr.Row() as audio_box:
audio_source = gr.Radio(
choices=["file", "microphone"], value="file", interactive=True
)
input_audio_mic = gr.Audio(
label="Input speech",
type="filepath",
source="microphone",
visible=False,
)
input_audio_file = gr.Audio(
label="Input speech",
type="filepath",
source="upload",
visible=True,
)
final_audio = gr.Audio(label="Output", visible=False)
audio_source.change(
fn=update_audio_ui,
inputs=audio_source,
outputs=[input_audio_mic, input_audio_file],
queue=False,
api_name=False,
)
input_audio_mic.change(lambda x: x, input_audio_mic, final_audio)
input_audio_file.change(lambda x: x, input_audio_file, final_audio)
submit = gr.Button("Submit")
text_output = gr.Textbox(
label="Transcribed Text",
value="",
interactive=False,
lines=10,
scale=10,
max_lines=10,
)
file_output = gr.File(label="output")
submit.click(
fn=predict,
inputs=[
number_of_speakers,
audio_source,
input_audio_mic,
input_audio_file,
],
outputs=[text_output, file_output],
api_name="predict",
)
demo.queue(max_size=50).launch() |
this worked for me too. specifically, what i did was create a with the contents:
Then install it with Now, I can run some simple code:
And It took me quite a while to find this solution. Should it be added to README? Why is this version of |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
What is more weird on my side that 3.1 model works sometimes on GPU, sometimes on CPU, but 3.0 model always works on GPU. So, I specifically wrote a bit of code to choose between models. I always start with 3.1 because it does the segmentation faster. But then, if I see within 5 seconds that it's using CPU instead of GPU, I cancel that and re-run it with 3.0. Who knows... |
I am sorry top open this issue again, but I am still experiencing Pyannote version 3.1.0 running on CPU only.
I just installed the latest version with:
And I can confirm I have the latest version installed with:
And yet, I see my program using just the CPU. I am testing it with an RTX A5000.
Here is my code:
Is there anything wrong with my code? Or any other steps I might have missed?
I am using the latest version of torch on Linux.
The text was updated successfully, but these errors were encountered: