Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410

shanky100 · 2023-06-20T04:54:10Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context about the feature request here.

github-actions · 2023-06-20T04:54:30Z

Thank you for your issue.
We found the following entries in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

We also offer paid scientific consulting services around speaker diarization (and speech processing in general).

This is an automated reply, generated by FAQtory

sheetalmathur · 2023-06-20T06:04:48Z

I would also like to know the same, can i use the pyannote without hugging face model?

hbredin · 2023-06-20T06:14:58Z

Please read the FAQ. The FAQtory bot did a good job pointing to the answer...

nlp03 · 2023-06-20T06:26:55Z

I am also interested in understanding whether it is possible to utilize Pyannote independently, without relying on Hugging Face model keys.

shanky100 · 2023-06-20T06:52:26Z

I see that the model is gated, is it mandatory to have the huggingface token in order to use these models or this model is developed and hosted by hugginface.
Also what information it collects in order to provide access.

It would be helpful, if I could get few answers due to data privacy issues and some access related issues faced in accessing huggingface.

LouieBHLu · 2023-07-05T15:24:36Z

In the FAQ, it only refers to the offline usage of segmentation I guess. What about diarization then?

ExtReMLapin · 2023-08-07T07:59:06Z

The FAQ didn't really help me, what I did on my side it I used the token and specified a custom download path, which later will be working without the token, for example :

os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ['TRANSFORMERS_CACHE'] = diarization_path
os.environ['TORCH_HOME'] = diarization_path
os.environ['HF_HOME'] = diarization_path
os.environ['PYANNOTE_CACHE'] = diarization_path

then turn all the sym links into real files and it should work, the offline tutorial thing was unreadable so I did this and it works like a charm

mllife · 2023-08-10T05:08:33Z

os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ['TRANSFORMERS_CACHE'] = diarization_path
os.environ['TORCH_HOME'] = diarization_path
os.environ['HF_HOME'] = diarization_path
os.environ['PYANNOTE_CACHE'] = diarization_path

can you shed some more details on the code changes and files placement you did?

@ExtReMLapin , how to convert symlink into files

ExtReMLapin · 2023-08-10T05:20:55Z

What will happen is the first time you load it using your token it will download it to the diarization_path path, the next time, you should be able to plug out your ethernet cable and it will auto-download it because it will find it in the cache folder. This way :

diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2c6a571d14c3794623b098a065ff95fa22da7f25")

From memory i got the 2c6a571d14c3794623b098a065ff95fa22da7f25 by manually exploring the diarization_path variable value (which ends up just being a absolute path inside my code folder) and it was just the folder name.

I'm leaving my house in few mins to go to the office, once I get here i'll give you more details

To convert symlink into files on windows just copy/paste lol it auto turns them into real files
I had to do this because of code redistribution

ExtReMLapin · 2023-08-10T07:13:37Z

This is (one of)) the folder I end up with after my changes and I load it with the code posted above.

Please note the snapshot id in the path that I used in the function in my previous message

mllife · 2023-08-10T09:20:19Z

@ExtReMLapin ,
I download all the files and converted them to actual files from symlinks. But, still

diarization_model = Pipeline.from_pretrained(
    checkpoint_path="./diarization_model/models--pyannote--speaker-diarization/snapshots/2c6a571d14c3794623b098a065ff95fa22da7f25/"
)

giving error

Traceback (most recent call last):
  File "/Users/xXOL/Projects/annotations/check_pyannote_offline.py", line 18, in <module>
    diarization_model = Pipeline.from_pretrained(
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained
    config_yml = hf_hub_download(
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './diarization_model/models--pyannote--speaker-diarization/snapshots/2c6a571d14c3794623b098a065ff95fa22da7f25/'. Use `repo_type` argument if needed.

mllife · 2023-08-11T05:29:10Z

@ExtReMLapin , can you take a look my previous comment

mllife · 2023-08-11T06:27:52Z

@ExtReMLapin model files are now loading from offline cache after I added the env vars. but I still have to use "use_auth_token" in the code.
what changes to do in the code?

  device = "cuda" if torch.cuda.is_available() else "cpu"
  # TODO - update this
  diarization_model = Pipeline.from_pretrained(
      checkpoint_path="pyannote/speaker-diarization@2c6a571d14c3794623b098a065ff95fa22da7f25",
      use_auth_token="hf_xxxxxxxxxxxxx",
  )
  diarization_model = diarization_model.to(torch.device(device))

ExtReMLapin · 2023-08-11T07:11:50Z

I got out of bed, brain's booting.

Right now I don't see why it could not work on your end can you show the full error stack ?

mllife · 2023-08-11T10:45:04Z

@ExtReMLapin , this is the full error

Traceback (most recent call last):
  File "/Users/xXOL/Projects/annotations/check_pyannote_offline.py", line 18, in <module>
    diarization_model = Pipeline.from_pretrained(
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/pyannote/audio/core/pipeline.py", line 88, in from_pretrained
    config_yml = hf_hub_download(
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/Users/xXOL/Projects/annotations/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './diarization_model/models--pyannote--speaker-diarization/snapshots/2c6a571d14c3794623b098a065ff95fa22da7f25/'. Use `repo_type` argument if needed.

ExtReMLapin · 2023-08-11T11:06:04Z

The error you posted shows your line of code isn't the one from this message #1410 (comment)

Use the one from my message

mllife · 2023-08-11T11:30:22Z

The model loaded is None. If I don't add "use_auth_token"

@ExtReMLapin , can you share full code snippet of how you are processing a sample.wav file?

ExtReMLapin · 2023-08-11T11:35:26Z

Whole code is at the office and models are at the office, and I got this day off. So i'll be able to help next week

mllife · 2023-08-11T11:37:58Z

thanks, will wait for the update.

ExtReMLapin · 2023-08-14T07:48:35Z

Alright, this is how my localized folder looks

In this folder there used to be symbolic links, copy/paste managed to turn them into real files.

my pip freeze :

transformers @ git+https://github.com/huggingface/transformers.git@460b844360131c99d3dd4dbd9c08545ea2e6ac9e

pyannote.algorithms==0.8
pyannote.audio==2.1.1
pyannote.core==5.0.0
pyannote.database==5.0.0
pyannote.metrics==3.2.1
pyannote.parser==0.8
pyannote.pipeline==2.3

As for the code :

import torch
from pyannote.audio import Pipeline

import numpy as np
diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2c6a571d14c3794623b098a065ff95fa22da7f25")

if torch.cuda.is_available():
    diarize_pipeline = diarize_pipeline.to("cuda")

def diarize(audio_path, num_spk=None):
    
    diarization = diarize_pipeline(audio_path, num_speakers=num_spk)
    diarization_list = list()
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        diarization_list.append([turn.start, turn.end, speaker])
    print(diarization_list)
    return diarization_list

and in another file included sooner :

import os
#load config and if path is relative path, convert to absolute path
import json

with open("config.json") as f:
    data_json = json.load(f)



diarization_path = data_json["diarization_path"]
if not os.path.isabs(diarization_path):
    diarization_path = os.path.abspath(diarization_path) + "/"
    




os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ['TRANSFORMERS_CACHE'] = diarization_path
os.environ['TORCH_HOME'] = diarization_path
os.environ['HF_HOME'] = diarization_path
os.environ['PYANNOTE_CACHE'] = diarization_path

mllife · 2023-08-24T09:15:02Z

@ExtReMLapin , thanks for your detailed response.
But i am unable to install the modules required due to conflicts.

The conflict is caused by:
    The user requested pyannote.core==5.0.0
    pyannote-algorithms 0.8 depends on pyannote.core>=1.3.1
    pyannote-audio 2.1.1 depends on pyannote.core<5.0 and >=4.4

If I am changing anything, then I am getting

  ERROR: Failed building wheel for hmmlearn
  Running setup.py clean for hmmlearn
Failed to build hmmlearn

Can you tell about the python version being used? or anything else I can do?

With the setup python 3.10 I got
pip list | grep 'pyannote'

pyannote.audio          2.1.1
pyannote.core           5.0.0
pyannote.database       5.0.1
pyannote.metrics        3.2.1
pyannote.pipeline       2.3

stale · 2024-02-20T11:21:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ColtonBehannon mentioned this issue Oct 23, 2023

Error when calling local model，（speaker-diarization-3.0） #1508

Closed

stale bot added the wontfix label Feb 20, 2024

stale bot closed this as completed Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410

Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410

shanky100 commented Jun 20, 2023

github-actions bot commented Jun 20, 2023

sheetalmathur commented Jun 20, 2023

hbredin commented Jun 20, 2023

nlp03 commented Jun 20, 2023

shanky100 commented Jun 20, 2023

LouieBHLu commented Jul 5, 2023

ExtReMLapin commented Aug 7, 2023 •

edited

Loading

mllife commented Aug 10, 2023 •

edited

Loading

ExtReMLapin commented Aug 10, 2023 •

edited

Loading

ExtReMLapin commented Aug 10, 2023

mllife commented Aug 10, 2023 •

edited

Loading

mllife commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023 •

edited

Loading

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 14, 2023

mllife commented Aug 24, 2023 •

edited

Loading

stale bot commented Feb 20, 2024

Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410

Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410

Comments

shanky100 commented Jun 20, 2023

github-actions bot commented Jun 20, 2023

sheetalmathur commented Jun 20, 2023

hbredin commented Jun 20, 2023

nlp03 commented Jun 20, 2023

shanky100 commented Jun 20, 2023

LouieBHLu commented Jul 5, 2023

ExtReMLapin commented Aug 7, 2023 • edited Loading

mllife commented Aug 10, 2023 • edited Loading

ExtReMLapin commented Aug 10, 2023 • edited Loading

ExtReMLapin commented Aug 10, 2023

mllife commented Aug 10, 2023 • edited Loading

mllife commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023 • edited Loading

ExtReMLapin commented Aug 11, 2023

mllife commented Aug 11, 2023

ExtReMLapin commented Aug 14, 2023

mllife commented Aug 24, 2023 • edited Loading

stale bot commented Feb 20, 2024

ExtReMLapin commented Aug 7, 2023 •

edited

Loading

mllife commented Aug 10, 2023 •

edited

Loading

ExtReMLapin commented Aug 10, 2023 •

edited

Loading

mllife commented Aug 10, 2023 •

edited

Loading

mllife commented Aug 11, 2023 •

edited

Loading

mllife commented Aug 24, 2023 •

edited

Loading