Merge branch 'release/2.1'

pyannote · Oct 27, 2022 · 2cf1490 · 2cf1490
2 parents 25462d5 + 6d9d98c
commit 2cf1490
Show file tree

Hide file tree

Showing 23 changed files with 2,853 additions and 1,746 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -37,4 +37,4 @@ jobs:
         file: ./coverage.xml
         env_vars: PYTHON
         name: codecov-pyannote-audio
-        fail_ci_if_error: true
+        fail_ci_if_error: false
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,75 @@
+# Changelog
+
+## Version 2.1 (2022-11-xx)
+
+  - BREAKING(pipeline): rewrite speaker diarization pipeline
+  - feat(pipeline): add option to optimize for DER variant
+  - feat(clustering): add support for NeMo speaker embedding
+  - feat(clustering): add FINCH clustering
+  - feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
+  - feat(hub): add support for private/gated models
+  - setup(hub): switch to latest hugginface_hub API
+  - fix(pipeline): fix support for missing reference in Resegmentation pipeline
+  - fix(clustering) fix corner case where HMM.fit finds too little states
+
+## Version 2.0.1 (2022-07-20)
+
+  - BREAKING: complete rewrite
+  - feat: much better performance
+  - feat: Python-first API
+  - feat: pretrained pipelines (and models) on Huggingface model hub
+  - feat: multi-GPU training with pytorch-lightning
+  - feat: data augmentation with torch-audiomentations
+  - feat: Prodigy recipe for model-assisted audio annotation
+
+## Version 1.1.2 (2021-01-28)
+
+  - fix: make sure master branch is used to load pretrained models (#599)
+
+## Version 1.1 (2020-11-08)
+
+  - last release before complete rewriting
+
+## Version 1.0.1 (2018--07-19)
+
+  - fix: fix regression in Precomputed.__call__ (#110, #105)
+
+## Version 1.0 (2018-07-03)
+
+  - chore: switch from keras to pytorch (with tensorboard support)
+  - improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
+  - feat: add tunable speaker diarization pipeline (with its own tutorial)
+  - chore: drop support for Python 2 (use Python 3.6 or later)
+
+## Version 0.3.1 (2017-07-06)
+
+  - feat: add python 3 support
+  - chore: rewrite neural speaker embedding using autograd
+  - feat: add new embedding architectures
+  - feat: add new embedding losses
+  - chore: switch to Keras 2
+  - doc: add tutorial for (MFCC) feature extraction
+  - doc: add tutorial for (LSTM-based) speech activity detection
+  - doc: add tutorial for (LSTM-based) speaker change detection
+  - doc: add tutorial for (TristouNet) neural speaker embedding
+
+## Version 0.2.1 (2017-03-28)
+
+  - feat: add LSTM-based speech activity detection
+  - feat: add LSTM-based speaker change detection
+  - improve: refactor LSTM-based speaker embedding
+  - feat: add librosa basic support
+  - feat: add SMORMS3 optimizer
+
+## Version 0.1.4 (2016-09-26)
+
+  - feat: add 'covariance_type' option to BIC segmentation
+
+## Version 0.1.3 (2016-09-23)
+
+  - chore: rename sequence generator in preparation of the release of
+    TristouNet reproducible research package.
+
+## Version 0.1.2 (2016-09-22)
+
+  - first public version
diff --git a/README.md b/README.md
@@ -11,23 +11,26 @@
 
 
 ```python
-# instantiate pretrained speaker diarization pipeline
+# 1. visit hf.co/pyannote/speaker-diarization and accept user conditions (only if requested)
+# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
+# 3. instantiate pretrained speaker diarization pipeline
 from pyannote.audio import Pipeline
-pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
+pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
+                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")
 
-# apply pretrained pipeline
+# 4. apply pretrained pipeline
 diarization = pipeline("audio.wav")
 
-# print the result
+# 5. print the result
 for turn, _, speaker in diarization.itertracks(yield_label=True):
     print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
-# start=0.2s stop=1.5s speaker_A
-# start=1.8s stop=3.9s speaker_B
-# start=4.2s stop=5.7s speaker_A
+# start=0.2s stop=1.5s speaker_0
+# start=1.8s stop=3.9s speaker_1
+# start=4.2s stop=5.7s speaker_0
 # ...
 ```
 
-## What's new in `pyannote.audio` 2.0
+## What's new in `pyannote.audio` 2.x?
 
 For version 2.x of `pyannote.audio`, [I](https://herve.niderb.fr) decided to rewrite almost everything from scratch.
 Highlights of this release are:
@@ -51,11 +54,12 @@ conda activate pyannote
 # (see https://pytorch.org/get-started/previous-versions/#v1110)
 conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
 
-pip install pyannote.audio
+pip install -qq https://github.com/pyannote/pyannote-audio/archive/develop.zip
 ```
 
 ## Documentation
 
+- [Changelog](CHANGELOG.md)
 - Models
     - Available tasks explained
     - [Applying a pretrained model](tutorials/applying_a_model.ipynb)
@@ -69,6 +73,9 @@ pip install pyannote.audio
     - [Adding a new task](tutorials/add_your_own_task.ipynb)
     - Adding a new pipeline
     - Sharing pretrained models and pipelines
+- Blog
+    - 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)
+    - 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)
 - Miscellaneous
     - [Training with `pyannote-audio-train` command line tool](tutorials/training_with_cli.md)
     - [Annotating your own data with Prodigy](tutorials/prodigy.md)
@@ -94,15 +101,19 @@ pip install pyannote.audio
 
 ## Benchmark
 
-Out of the box, `pyannote.audio` default speaker diarization pipeline is expected to be much better (and faster) in v2.0 than in v1.1.:
-
-| Dataset     | DER% with v1.1 | DER% with v2.0 | Relative improvement |
-| ----------- | -------------- | -------------- | -------------------- |
-| AMI         | 29.7%          | 18.2%          | 38%                  |
-| DIHARD      | 29.2%          | 21.0%          | 28%                  |
-| VoxConverse | 21.5%          | 12.8%          | 40%                  |
-
-A more detailed benchmark is available [here](https://hf.co/pyannote/speaker-diarization).
+Out of the box, `pyannote.audio` default speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization) is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)
+
+| Dataset \ Version      | v1.1 | v2.0 | v2.1 (finetuned) |
+| ---------------------- | ---- | ---- | ---------------- |
+| AISHELL-4              | -    | 14.6 | 14.1 (14.5)      |
+| AliMeeting (channel 1) | -    | -    | 27.4 (23.8)      |
+| AMI (IHM)              | 29.7 | 18.2 | 18.9 (18.5)      |
+| AMI (SDM)              | -    | 29.0 | 27.1 (22.2)      |
+| CALLHOME (part2)       | -    | 30.2 | 32.4 (29.3)      |
+| DIHARD 3 (full)        | 29.2 | 21.0 | 26.9 (21.9)      |
+| VoxConverse (v0.3)     | 21.5 | 12.6 | 11.2 (10.7)      |
+| REPERE (phase2)        | -    | 12.6 | 8.2 ( 8.3)       |
+| This American Life     | -    | -    | 20.8 (15.2)      |
 
 ## Citations
 

diff --git a/doc/source/changelog.rst b/doc/source/changelog.rst
@@ -2,9 +2,22 @@
 Changelog
 #########
 
-Version 2.0.1 (2022-07-20)
+Version 2.1 (2022-11-xx)
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
+  - BREAKING(pipeline): rewrite speaker diarization pipeline
+  - feat(pipeline): add option to optimize for DER variant
+  - feat(clustering): add support for NeMo speaker embedding
+  - feat(clustering): add FINCH clustering
+  - feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
+  - feat(hub): add support for private/gated models
+  - setup(hub): switch to latest hugginface_hub API
+  - fix(pipeline): fix support for missing reference in Resegmentation pipeline
+  - fix(clustering) fix corner case where HMM.fit finds too little states
+
+Version 2.0.1 (2022-07-20)
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
   - BREAKING: complete rewrite
   - feat: much better performance
   - feat: Python-first API

diff --git a/pyannote/audio/cli/train_config/optimizer/Adan.yaml b/pyannote/audio/cli/train_config/optimizer/Adan.yaml
@@ -0,0 +1,5 @@
+# @package _group_
+_target_: adan_pytorch.Adan
+lr: 1e-3
+betas: [0.1, 0.1, 0.001]
+weight_decay: 0.0
diff --git a/pyannote/audio/core/model.py b/pyannote/audio/core/model.py
@@ -33,7 +33,8 @@
 import torch
 import torch.nn as nn
 import torch.optim
-from huggingface_hub import cached_download, hf_hub_url
+from huggingface_hub import hf_hub_download
+from huggingface_hub.utils import RepositoryNotFoundError
 from pyannote.core import SlidingWindow
 from pytorch_lightning.utilities.cloud_io import load as pl_load
 from pytorch_lightning.utilities.model_summary import ModelSummary
@@ -415,6 +416,10 @@ def on_save_checkpoint(self, checkpoint):
 
     @staticmethod
     def check_version(library: Text, theirs: Text, mine: Text):
+
+        theirs = ".".join(theirs.split(".")[:3])
+        mine = ".".join(mine.split(".")[:3])
+
         theirs = VersionInfo.parse(theirs)
         mine = VersionInfo.parse(mine)
         if theirs.major != mine.major:
@@ -777,32 +782,62 @@ def from_pretrained(
                 model_id = checkpoint
                 revision = None
 
-            url = hf_hub_url(
-                model_id, filename=HF_PYTORCH_WEIGHTS_NAME, revision=revision
-            )
-            path_for_pl = cached_download(
-                url=url,
-                library_name="pyannote",
-                library_version=__version__,
-                cache_dir=cache_dir,
-                use_auth_token=use_auth_token,
-            )
+            try:
+                path_for_pl = hf_hub_download(
+                    model_id,
+                    HF_PYTORCH_WEIGHTS_NAME,
+                    repo_type="model",
+                    revision=revision,
+                    library_name="pyannote",
+                    library_version=__version__,
+                    cache_dir=cache_dir,
+                    # force_download=False,
+                    # proxies=None,
+                    # etag_timeout=10,
+                    # resume_download=False,
+                    use_auth_token=use_auth_token,
+                    # local_files_only=False,
+                    # legacy_cache_layout=False,
+                )
+            except RepositoryNotFoundError:
+                print(
+                    f"""
+Could not download '{model_id}' model.
+It might be because the model is private or gated so make
+sure to authenticate. Visit https://hf.co/settings/tokens to
+create your access token and retry with:
+
+   >>> Model.from_pretrained('{model_id}',
+   ...                       use_auth_token=YOUR_AUTH_TOKEN)
+
+If this still does not work, it might be because the model is gated:
+visit https://hf.co/{model_id} to accept the user conditions."""
+                )
+                return None
 
             # HACK Huggingface download counters rely on config.yaml
             # HACK Therefore we download config.yaml even though we
             # HACK do not use it. Fails silently in case model does not
             # HACK have a config.yaml file.
             try:
-                config_url = hf_hub_url(
-                    model_id, filename=HF_LIGHTNING_CONFIG_NAME, revision=revision
-                )
-                _ = cached_download(
-                    url=config_url,
+
+                _ = hf_hub_download(
+                    model_id,
+                    HF_LIGHTNING_CONFIG_NAME,
+                    repo_type="model",
+                    revision=revision,
                     library_name="pyannote",
                     library_version=__version__,
                     cache_dir=cache_dir,
+                    # force_download=False,
+                    # proxies=None,
+                    # etag_timeout=10,
+                    # resume_download=False,
                     use_auth_token=use_auth_token,
+                    # local_files_only=False,
+                    # legacy_cache_layout=False,
                 )
+
             except Exception:
                 pass
 

diff --git a/pyannote/audio/core/pipeline.py b/pyannote/audio/core/pipeline.py
@@ -28,14 +28,15 @@
 from typing import Callable, List, Optional, Text, Union
 
 import yaml
-from huggingface_hub import cached_download, hf_hub_url
+from huggingface_hub import hf_hub_download
+from huggingface_hub.utils import RepositoryNotFoundError
+from pyannote.core.utils.helper import get_class_by_name
+from pyannote.database import FileFinder, ProtocolFile
+from pyannote.pipeline import Pipeline as _Pipeline
 
 from pyannote.audio import Audio, __version__
 from pyannote.audio.core.io import AudioFile
 from pyannote.audio.core.model import CACHE_DIR
-from pyannote.core.utils.helper import get_class_by_name
-from pyannote.database import FileFinder, ProtocolFile
-from pyannote.pipeline import Pipeline as _Pipeline
 
 PIPELINE_PARAMS_NAME = "config.yaml"
 
@@ -77,15 +78,40 @@ def from_pretrained(
             else:
                 model_id = checkpoint_path
                 revision = None
-            url = hf_hub_url(model_id, filename=PIPELINE_PARAMS_NAME, revision=revision)
-
-            config_yml = cached_download(
-                url=url,
-                library_name="pyannote",
-                library_version=__version__,
-                cache_dir=cache_dir,
-                use_auth_token=use_auth_token,
-            )
+
+            try:
+                config_yml = hf_hub_download(
+                    model_id,
+                    PIPELINE_PARAMS_NAME,
+                    repo_type="model",
+                    revision=revision,
+                    library_name="pyannote",
+                    library_version=__version__,
+                    cache_dir=cache_dir,
+                    # force_download=False,
+                    # proxies=None,
+                    # etag_timeout=10,
+                    # resume_download=False,
+                    use_auth_token=use_auth_token,
+                    # local_files_only=False,
+                    # legacy_cache_layout=False,
+                )
+
+            except RepositoryNotFoundError:
+                print(
+                    f"""
+Could not download '{model_id}' pipeline.
+It might be because the pipeline is private or gated so make
+sure to authenticate. Visit https://hf.co/settings/tokens to
+create your access token and retry with:
+
+   >>> Pipeline.from_pretrained('{model_id}',
+   ...                          use_auth_token=YOUR_AUTH_TOKEN)
+
+If this still does not work, it might be because the pipeline is gated:
+visit https://hf.co/{model_id} to accept the user conditions."""
+                )
+                return None
 
         with open(config_yml, "r") as fp:
             config = yaml.load(fp, Loader=yaml.SafeLoader)
@@ -95,7 +121,9 @@ def from_pretrained(
         Klass = get_class_by_name(
             pipeline_name, default_module_name="pyannote.pipeline.blocks"
         )
-        pipeline = Klass(**config["pipeline"].get("params", {}))
+        params = config["pipeline"].get("params", {})
+        params.setdefault("use_auth_token", use_auth_token)
+        pipeline = Klass(**params)
 
         # freeze  parameters
         if "freeze" in config:

diff --git a/pyannote/audio/interactive/pipeline/recipe.py b/pyannote/audio/interactive/pipeline/recipe.py
@@ -175,7 +175,7 @@ def pipeline(
     beep: bool = False,
 ) -> Dict[str, Any]:
 
-    pipeline = Pipeline.from_pretrained(pipeline)
+    pipeline = Pipeline.from_pretrained(pipeline, use_auth_token=True)
     classes = pipeline.classes()
 
     if isinstance(classes, Iterator):