x-vector extractor for Japanese speech

This repository provides a pre-trained model for extracting the x-vector (speaker representation vector). The model is trained using JTubeSpeech corpus, a Japanese speech corpus collected from YouTube.

このリポジトリは，x-vector (話者表現ベクトル) を抽出するための学習済みモデルを提供します．このモデルは，JTubeSpeechコーパスと呼ばれる，YouTubeから収集した日本語音声から学習されています．

Quick Usage

Instantiate the pre-trained model without explicit install as follow:

import torch
model = torch.hub.load("sarulab-speech/xvector_jtubespeech", "xvector", trust_repo=True)

Then, follow 'Usage / 使い方' section.

Training configures / 学習時の設定

The number of speakers: 1,233
Sampling frequency: 16,000Hz
Speaker recognition accuracy: 91% (test data)
Feature: 24-dimensional MFCC
Dimensionality of x-vector: 512
Other configurations: followed the ASV recipe for VoxCeleb in Kaldi.
- In the opensourced model, model parameters of recognition layers following to the x-vector layer were randomized to protect data privacy.

Installation

pip install xvector-jtubespeech

Usage / 使い方

import numpy as np
from scipy.io import wavfile
import torch
from torchaudio.compliance import kaldi

from xvector_jtubespeech import XVector

def extract_xvector(
  model, # xvector model
  wav   # 16kHz mono
):
  # extract mfcc
  wav = torch.from_numpy(wav.astype(np.float32)).unsqueeze(0)
  mfcc = kaldi.mfcc(wav, num_ceps=24, num_mel_bins=24) # [1, T, 24]
  mfcc = mfcc.unsqueeze(0)

  # extract xvector
  xvector = model.vectorize(mfcc) # (1, 512)
  xvector = xvector.to("cpu").detach().numpy().copy()[0]  

  return xvector

_, wav = wavfile.read("sample.wav") # 16kHz mono
model = XVector("xvector.pth")
xvector = extract_xvector(model, wav) # (512, )

Contributors / 貢献者

Takaki Hamada / 濱田誉輝 (The University of Tokyo / 東京大学)
Shinnosuke Takamichi / 高道慎之介 (The University of Tokyo / 東京大学)

License / ライセンス

MIT

Others / その他

The audio sample sample.wav was copied from PJS corpus.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
test		test
xvector_jtubespeech		xvector_jtubespeech
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample.wav		sample.wav
xvector.pth		xvector.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

x-vector extractor for Japanese speech

Quick Usage

Training configures / 学習時の設定

Installation

Usage / 使い方

Contributors / 貢献者

License / ライセンス

Others / その他

About

Releases

Packages

Contributors 4

Languages

License

sarulab-speech/xvector_jtubespeech

Folders and files

Latest commit

History

Repository files navigation

x-vector extractor for Japanese speech

Quick Usage

Training configures / 学習時の設定

Installation

Usage / 使い方

Contributors / 貢献者

License / ライセンス

Others / その他

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages