-
-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use pyannote speaker diarization without hugginface token or is there any way to download and use the model do do diarization. #1410
Comments
Thank you for your issue.
Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). We also offer paid scientific consulting services around speaker diarization (and speech processing in general).
|
I would also like to know the same, can i use the pyannote without hugging face model? |
Please read the FAQ. The FAQtory bot did a good job pointing to the answer... |
I am also interested in understanding whether it is possible to utilize Pyannote independently, without relying on Hugging Face model keys. |
I see that the model is gated, is it mandatory to have the huggingface token in order to use these models or this model is developed and hosted by hugginface. It would be helpful, if I could get few answers due to data privacy issues and some access related issues faced in accessing huggingface. |
In the FAQ, it only refers to the offline usage of segmentation I guess. What about diarization then? |
The FAQ didn't really help me, what I did on my side it I used the token and specified a custom download path, which later will be working without the token, for example :
then turn all the sym links into real files and it should work, the offline tutorial thing was unreadable so I did this and it works like a charm |
can you shed some more details on the code changes and files placement you did? @ExtReMLapin , how to convert symlink into files |
What will happen is the first time you load it using your token it will download it to the diarization_path path, the next time, you should be able to plug out your ethernet cable and it will auto-download it because it will find it in the cache folder. This way : diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2c6a571d14c3794623b098a065ff95fa22da7f25") From memory i got the I'm leaving my house in few mins to go to the office, once I get here i'll give you more details To convert symlink into files on windows just copy/paste lol it auto turns them into real files |
@ExtReMLapin ,
giving error
|
@ExtReMLapin , can you take a look my previous comment |
@ExtReMLapin model files are now loading from offline cache after I added the env vars. but I still have to use "use_auth_token" in the code.
|
I got out of bed, brain's booting. Right now I don't see why it could not work on your end can you show the full error stack ? |
@ExtReMLapin , this is the full error
|
The error you posted shows your line of code isn't the one from this message #1410 (comment) Use the one from my message |
The model loaded is None. If I don't add "use_auth_token" |
Whole code is at the office and models are at the office, and I got this day off. So i'll be able to help next week |
thanks, will wait for the update. |
Alright, this is how my localized folder looks In this folder there used to be symbolic links, copy/paste managed to turn them into real files. my pip freeze :
As for the code : import torch
from pyannote.audio import Pipeline
import numpy as np
diarize_pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2c6a571d14c3794623b098a065ff95fa22da7f25")
if torch.cuda.is_available():
diarize_pipeline = diarize_pipeline.to("cuda")
def diarize(audio_path, num_spk=None):
diarization = diarize_pipeline(audio_path, num_speakers=num_spk)
diarization_list = list()
for turn, _, speaker in diarization.itertracks(yield_label=True):
diarization_list.append([turn.start, turn.end, speaker])
print(diarization_list)
return diarization_list and in another file included sooner : import os
#load config and if path is relative path, convert to absolute path
import json
with open("config.json") as f:
data_json = json.load(f)
diarization_path = data_json["diarization_path"]
if not os.path.isabs(diarization_path):
diarization_path = os.path.abspath(diarization_path) + "/"
os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ['TRANSFORMERS_CACHE'] = diarization_path
os.environ['TORCH_HOME'] = diarization_path
os.environ['HF_HOME'] = diarization_path
os.environ['PYANNOTE_CACHE'] = diarization_path
|
@ExtReMLapin , thanks for your detailed response.
If I am changing anything, then I am getting
Can you tell about the python version being used? or anything else I can do? With the setup python 3.10 I got
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered: