Pyannote 3.1.0 still on CPU only? #1563

fablau · 2023-11-25T21:48:37Z

I am sorry top open this issue again, but I am still experiencing Pyannote version 3.1.0 running on CPU only.

I just installed the latest version with:

pip3 install pyannote.audio

And I can confirm I have the latest version installed with:

pip list

And yet, I see my program using just the CPU. I am testing it with an RTX A5000.

Here is my code:

import sys
from pyannote.audio import Pipeline
import torch

fileOutWav = sys.argv[1] 
spkrsNo = int(sys.argv[2])
fileDiary = sys.argv[3]

pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization", use_auth_token="xxxxxxxxxxxxx")

pipeline.to(torch.device("cuda"))

# 4. apply pretrained pipeline
diarization = pipeline(fileOutWav, num_speakers=spkrsNo)

# 5. print the result

with open(fileDiary, mode='w') as file_object:
	for turn, _, speaker in diarization.itertracks(yield_label=True):
		#print(f"start={turn.start:.2f}s stop={turn.end:.2f}s speaker_{speaker}")
		print(f"start={turn.start:.2f}s stop={turn.end:.2f}s speaker_{speaker}", file=file_object)

Is there anything wrong with my code? Or any other steps I might have missed?

I am using the latest version of torch on Linux.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-25T21:48:55Z

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-11-26T06:45:50Z

You are using the wrong pretrained pipeline.
Switch from pyannote/speaker-diarization to pyannote/speaker-diarization-3.1.

fablau · 2023-11-27T06:33:17Z

Thank you, I tried that and got this error:

pipeline.to(torch.device("cuda"))
2023-11-27T06:25:21.370318815Z AttributeError: 'NoneType' object has no attribute 'to'

Do I need to remove that line? Is that no longer needed?

hbredin · 2023-11-27T06:41:01Z

Looks like you forgot to request access to this new pipeline on HuggingFace model hub.

arnavmehta7 · 2023-11-27T08:52:38Z

Hi @hbredin I also tried the latest 3.1.0 version with 3.1 model. However, it's also extremely slow for me.
5min of audio takes around ~5min to just diarize.

pourmand1376 · 2023-11-27T10:52:14Z

I am having the same problem here. It is extremely slow.

hbredin · 2023-11-27T11:06:56Z

Tagging this issue as cannot reproduce.
Please provide a minimal reproducible example on Google Colab.

hbredin · 2023-11-27T11:09:36Z

You can also upload your audio file here to get an idea of the expected processing speed on a T4 GPU.

pourmand1376 · 2023-11-27T13:50:05Z

It seems that the problem was in my installation.

I used this as requirements.txt (found from here):

gradio==3.38.0
--extra-index-url https://download.pytorch.org/whl/cu113
torch==2.0.1
pyannote-audio==3.1.0

And this for Dockerfile.

FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    git \
    git-lfs \
    wget \
    curl \
    # python build dependencies \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libxml2-dev \
    libxmlsec1-dev \
    libffi-dev \
    liblzma-dev \
    # gradio dependencies \
    ffmpeg \
    ca-certificates \
    # fairseq2 dependencies \
    libsndfile-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:${PATH}

WORKDIR ${HOME}

RUN git clone https://github.com/yyuu/pyenv.git .pyenv

ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}

ARG PYTHON_VERSION=3.10
RUN pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION} && \
    pyenv rehash && \
    pip install --no-cache-dir -U pip setuptools wheel

COPY --chown=1000 ./requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt

COPY --chown=1000 . ${HOME}/app
ENV PYTHONPATH=${HOME}/app \
    PYTHONUNBUFFERED=1 \
    GRADIO_ALLOW_FLAGGING=never \
    GRADIO_NUM_PORTS=1 \
    GRADIO_SERVER_NAME=0.0.0.0 \
    GRADIO_THEME=huggingface \
    SYSTEM=spaces \
    GRADIO_SERVER_PORT=7860
EXPOSE 7860
WORKDIR ${HOME}/app
CMD ["python", "app.py"]

I do not know if it is using GPU or not. But without this, It took around 90 minutes to process a 110 minute file. Now, It takes around 1~2 minutes.

fablau · 2023-11-27T15:26:29Z

@pourmand1376 thank you for providing your Docker code, could you also please provide the python code you used for the diarization using Pyannote?

fablau · 2023-11-27T16:19:32Z

Looks like you forgot to request access to this new pipeline on HuggingFace model hub.

How to do that?

hbredin · 2023-11-27T18:29:17Z

The same way you already did for the old pipeline. By visiting hf.co/pyannote/speaker-diarization-3.1 and agreeing on the terms.

fablau · 2023-11-27T18:29:56Z

Thanks, I could fix the error I posted above by simply re-accepting the terms at the links below:

https://hf.co/pyannote/segmentation-3.0
https://hf.co/pyannote/speaker-diarization-3.1

So that my authorization code worked again.

I am still investigating the missing GPU usage... I'll be back as soon as I find out more.

fablau · 2023-11-27T18:32:19Z

Yes! It looks like the requirements @pourmand1376 posted above fixed the problem! Now I see the GPU being used ;)

My guess, in particular, is the following one:

--extra-index-url https://download.pytorch.org/whl/cu113

Because I tried the other ones singularly and didn't do the trick.

pourmand1376 · 2023-11-28T05:47:02Z

@pourmand1376 thank you for providing your Docker code, could you also please provide the python code you used for the diarization using Pyannote?

Here (this is not a Miminal Example but rather it splits the file and creates a zip file for the user):

import gradio as gr
import os
from dotenv import load_dotenv
from pydub import AudioSegment
from pathlib import Path
import torch
from pyannote.audio import Pipeline

load_dotenv()

HF_API = os.getenv("HF_API")

print(f"HF API Length: {len(HF_API)}")
DESCRIPTION = """
# Speaker Diarization v3.1.0
"""


pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1", use_auth_token=HF_API
)
pipeline.to(torch.device("cuda"))


import os
import zipfile


def zip_folder(folder_path):
    folder_name = os.path.basename(folder_path)
    zip_path = f"{folder_name}.zip"
    zip_file = zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED)
    for root, dirs, files in os.walk(folder_path):
        for file in files:
            zip_file.write(os.path.join(root, file))
    zip_file.close()
    return zip_path


import os
import shutil


def rmrf(path):
    if os.path.isfile(path):
        os.remove(path)
    elif os.path.isdir(path):
        shutil.rmtree(path)


def predict(number_of_speakers, audio_source, input_audio_mic, input_audio_file):
    if audio_source == "microphone":
        input_data = input_audio_mic
    else:
        input_data = input_audio_file

    print(input_data)

    if number_of_speakers == 0:
        diarization = pipeline(input_data)
    else:
        diarization = pipeline(input_data, num_speakers=number_of_speakers)

    text_output = ""
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        print(f"start={turn.start}s stop={turn.end}s speaker_{speaker}")
        text_output = (
            text_output
            + f"start={turn.start}s stop={turn.end}s speaker_{speaker}"
            + "\n"
        )

    song = AudioSegment.from_wav(input_data)
    rmrf("files")
    print(Path("files").absolute)
    Path("files").mkdir(exist_ok=True, parents=True)
    for i, (turn, _, speaker) in enumerate(diarization.itertracks(yield_label=True)):
        try:
            clipped = song[turn.start * 1000 : turn.end * 1000]
            clipped.export(f"files/{i:03}.wav", format="wav", bitrate=16000)

        except Exception as e:
            print(e)

    output_path = zip_folder("files")
    return (text_output, output_path)


def update_audio_ui(audio_source: str) -> tuple[dict, dict]:
    mic = audio_source == "microphone"
    return (
        gr.update(visible=mic, value=None),  # input_audio_mic
        gr.update(visible=not mic, value=None),  # input_audio_file
    )


with gr.Blocks(css="style.css") as demo:
    gr.Markdown(DESCRIPTION)
    with gr.Group():
        with gr.Row():
            number_of_speakers = gr.Number(
                label="Number of Speakers",
                info="Keep it zero, if you want the model to automatically detect the number of speakers",
            )
        with gr.Row() as audio_box:
            audio_source = gr.Radio(
                choices=["file", "microphone"], value="file", interactive=True
            )
            input_audio_mic = gr.Audio(
                label="Input speech",
                type="filepath",
                source="microphone",
                visible=False,
            )
            input_audio_file = gr.Audio(
                label="Input speech",
                type="filepath",
                source="upload",
                visible=True,
            )
            final_audio = gr.Audio(label="Output", visible=False)
        audio_source.change(
            fn=update_audio_ui,
            inputs=audio_source,
            outputs=[input_audio_mic, input_audio_file],
            queue=False,
            api_name=False,
        )
        input_audio_mic.change(lambda x: x, input_audio_mic, final_audio)
        input_audio_file.change(lambda x: x, input_audio_file, final_audio)
        submit = gr.Button("Submit")
        text_output = gr.Textbox(
            label="Transcribed Text",
            value="",
            interactive=False,
            lines=10,
            scale=10,
            max_lines=10,
        )
        file_output = gr.File(label="output")

        submit.click(
            fn=predict,
            inputs=[
                number_of_speakers,
                audio_source,
                input_audio_mic,
                input_audio_file,
            ],
            outputs=[text_output, file_output],
            api_name="predict",
        )


demo.queue(max_size=50).launch()

EarningsCall · 2023-11-30T05:32:30Z

It seems that the problem was in my installation.

I used this as requirements.txt (found from here):

gradio==3.38.0
--extra-index-url https://download.pytorch.org/whl/cu113
torch==2.0.1
pyannote-audio==3.1.0

And this for Dockerfile.

FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    git \
    git-lfs \
    wget \
    curl \
    # python build dependencies \
    build-essential \
    libssl-dev \
    zlib1g-dev \
    libbz2-dev \
    libreadline-dev \
    libsqlite3-dev \
    libncursesw5-dev \
    xz-utils \
    tk-dev \
    libxml2-dev \
    libxmlsec1-dev \
    libffi-dev \
    liblzma-dev \
    # gradio dependencies \
    ffmpeg \
    ca-certificates \
    # fairseq2 dependencies \
    libsndfile-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:${PATH}

WORKDIR ${HOME}

RUN git clone https://github.com/yyuu/pyenv.git .pyenv

ENV PATH=${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${PATH}

ARG PYTHON_VERSION=3.10
RUN pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION} && \
    pyenv rehash && \
    pip install --no-cache-dir -U pip setuptools wheel

COPY --chown=1000 ./requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt

COPY --chown=1000 . ${HOME}/app
ENV PYTHONPATH=${HOME}/app \
    PYTHONUNBUFFERED=1 \
    GRADIO_ALLOW_FLAGGING=never \
    GRADIO_NUM_PORTS=1 \
    GRADIO_SERVER_NAME=0.0.0.0 \
    GRADIO_THEME=huggingface \
    SYSTEM=spaces \
    GRADIO_SERVER_PORT=7860
EXPOSE 7860
WORKDIR ${HOME}/app
CMD ["python", "app.py"]

I do not know if it is using GPU or not. But without this, It took around 90 minutes to process a 110 minute file. Now, It takes around 1~2 minutes.

this worked for me too.

specifically, what i did was create a requirements.txt file:

with the contents:

gradio==3.38.0
--extra-index-url https://download.pytorch.org/whl/cu113
torch==2.0.1
pyannote-audio==3.1.0

Then install it with pip install -r requirements.txt.

Now, I can run some simple code:

In [1]: from pyannote.audio import Pipeline

In [2]: import torch

In [3]: pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
torchvision is not available - cannot save figures

In [4]: pipeline.to(torch.device("cuda"))
Out[4]: <pyannote.audio.pipelines.speaker_diarization.SpeakerDiarization at 0x7f2ce8f143d0>

In [5]: diarization = pipeline("/tmp/tmphgpfklya.wav")

And $ nvidia-smi -l 1 shows:

It took me quite a while to find this solution. Should it be added to README? Why is this version of torch required for the GPU to be properly utilized?

stale · 2024-06-01T10:13:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

helLf1nGer · 2024-07-25T21:55:13Z

What is more weird on my side that 3.1 model works sometimes on GPU, sometimes on CPU, but 3.0 model always works on GPU. So, I specifically wrote a bit of code to choose between models. I always start with 3.1 because it does the segmentation faster. But then, if I see within 5 seconds that it's using CPU instead of GPU, I cancel that and re-run it with 3.0. Who knows...

hbredin added the cannot_reproduce label Nov 27, 2023

fablau mentioned this issue Nov 27, 2023

Unable to create a correct diarization #1567

Closed

stale bot added the wontfix label Jun 1, 2024

stale bot closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyannote 3.1.0 still on CPU only? #1563

Pyannote 3.1.0 still on CPU only? #1563

fablau commented Nov 25, 2023

github-actions bot commented Nov 25, 2023

hbredin commented Nov 26, 2023

fablau commented Nov 27, 2023

hbredin commented Nov 27, 2023

arnavmehta7 commented Nov 27, 2023 •

edited

Loading

pourmand1376 commented Nov 27, 2023

hbredin commented Nov 27, 2023 •

edited

Loading

hbredin commented Nov 27, 2023

pourmand1376 commented Nov 27, 2023 •

edited

Loading

fablau commented Nov 27, 2023 •

edited

Loading

fablau commented Nov 27, 2023

hbredin commented Nov 27, 2023

fablau commented Nov 27, 2023

fablau commented Nov 27, 2023 •

edited

Loading

pourmand1376 commented Nov 28, 2023 •

edited

Loading

EarningsCall commented Nov 30, 2023

stale bot commented Jun 1, 2024

helLf1nGer commented Jul 25, 2024

Pyannote 3.1.0 still on CPU only? #1563

Pyannote 3.1.0 still on CPU only? #1563

Comments

fablau commented Nov 25, 2023

github-actions bot commented Nov 25, 2023

hbredin commented Nov 26, 2023

fablau commented Nov 27, 2023

hbredin commented Nov 27, 2023

arnavmehta7 commented Nov 27, 2023 • edited Loading

pourmand1376 commented Nov 27, 2023

hbredin commented Nov 27, 2023 • edited Loading

hbredin commented Nov 27, 2023

pourmand1376 commented Nov 27, 2023 • edited Loading

fablau commented Nov 27, 2023 • edited Loading

fablau commented Nov 27, 2023

hbredin commented Nov 27, 2023

fablau commented Nov 27, 2023

fablau commented Nov 27, 2023 • edited Loading

pourmand1376 commented Nov 28, 2023 • edited Loading

EarningsCall commented Nov 30, 2023

stale bot commented Jun 1, 2024

helLf1nGer commented Jul 25, 2024

arnavmehta7 commented Nov 27, 2023 •

edited

Loading

hbredin commented Nov 27, 2023 •

edited

Loading

pourmand1376 commented Nov 27, 2023 •

edited

Loading

fablau commented Nov 27, 2023 •

edited

Loading

fablau commented Nov 27, 2023 •

edited

Loading

pourmand1376 commented Nov 28, 2023 •

edited

Loading