[NeMo I 2023-07-27 03:27:14 clustering_diarizer:157] Loading pretrained titanet_large model from NGC [NeMo I 2023-07-27 03:27:14 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.19.0rc0/titanet-l/11ba0924fdf87c049e339adbf6899d48/titanet-l.nemo. [NeMo I 2023-07-27 03:27:14 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.19.0rc0/titanet-l/11ba0924fdf87c049e339adbf6899d48/titanet-l.nemo [NeMo I 2023-07-27 03:27:14 common:913] Instantiating model from pre-trained checkpoint [NeMo I 2023-07-27 03:27:14 features:289] PADDING: 16 [NeMo I 2023-07-27 03:27:15 save_restore_connector:249] Model EncDecSpeakerLabelModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.19.0rc0/titanet-l/11ba0924fdf87c049e339adbf6899d48/titanet-l.nemo. [NeMo I 2023-07-27 03:27:15 speaker_utils:93] Number of files to diarize: 0 [NeMo I 2023-07-27 03:27:15 vad_utils_own:1216] Using local VAD model from /workspace/model/vad_multilingual_frame_marblenet.nemo [NeMo I 2023-07-27 03:27:15 features:289] PADDING: 2 [NeMo I 2023-07-27 03:27:15 classification_models:871] Using cross-entropy with weights: [1.0, 1.0] [NeMo I 2023-07-27 03:27:15 cross_entropy:55] Weighted Cross Entropy loss with weight tensor([1., 1.]) [NeMo I 2023-07-27 03:27:15 save_restore_connector:249] Model EncDecFrameClassificationModel was successfully restored from /workspace/model/vad_multilingual_frame_marblenet.nemo. waveform.size() torch.Size([1, 960000]) wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[0.0000, 2.0000, 2.0100]]) [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 829.93ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 830.56ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 12.01ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 12.14ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 10.79ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 10.88ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 11.92ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 12.01ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 17.51ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:16 online_diarizer:56] 17.60ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 203.53ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 1089.25ms 'diarize_step' step: 0, diar_hyp: ['0.0 2.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[2.0000, 3.9200, 3.9300]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.02ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.67ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.87ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.94ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.12ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.18ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.32ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.13ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 11.26ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 263.53ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 312.53ms 'diarize_step' step: 1, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[4.0000, 6.0000, 6.0100]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.57ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.77ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.58ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.71ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.96ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.09ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.99ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.12ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 14.91ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 63.24ms 'diarize_step' step: 2, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 6.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[6.0400, 8.0000, 7.9700]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.12ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.33ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.07ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.21ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.61ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.83ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.96ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 14.63ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 61.68ms 'diarize_step' step: 3, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 8.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[ 8.0000, 10.0000, 10.0100]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.16ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.37ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.17ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.31ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.62ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.76ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.85ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.99ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 23.16ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 70.33ms 'diarize_step' step: 4, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 10.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[10.0000, 12.0000, 12.0100]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.12ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 13.33ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.17ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 10.32ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.61ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.81ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 9.94ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 22.53ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 69.64ms 'diarize_step' step: 5, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 12.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[12.0000, 13.5200, 13.5300]]) [NeMo I 2023-07-27 03:27:17 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 26.08ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 26.22ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.37ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.51ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.56ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.69ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.70ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.83ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 24.39ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 85.43ms 'diarize_step' step: 6, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[14.0000, 16.0000, 16.0100]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.36ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.57ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.34ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.47ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.91ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.04ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.95ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.09ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 25.79ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 73.89ms 'diarize_step' step: 7, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 16.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[16.0000, 18.0000, 18.0100]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.14ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.35ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.22ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.35ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.61ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.75ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.81ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.94ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 28.34ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 75.71ms 'diarize_step' step: 8, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 18.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[18.0000, 20.0000, 20.0100]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.14ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 13.34ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.15ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.28ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.63ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.76ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.85ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.98ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 30.48ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 77.83ms 'diarize_step' step: 9, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 20.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[20.0400, 22.0000, 21.9700]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 11.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.05ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.79ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.92ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.65ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.78ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 14.30ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 14.49ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 36.55ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 87.96ms 'diarize_step' step: 10, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 22.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[22.0800, 24.0000, 23.9300]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.11ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.32ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.58ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.71ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.64ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.78ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.87ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 34.63ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 81.55ms 'diarize_step' step: 11, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 24.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[24.0200, 26.0000, 25.9900]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 11.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.05ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.59ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.62ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.75ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.82ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.96ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 35.15ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 81.81ms 'diarize_step' step: 12, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 26.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[26.0000, 27.9400, 27.9500]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 11.83ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.03ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.70ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.84ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.61ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.82ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.95ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 36.12ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 82.92ms 'diarize_step' step: 13, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[28.0400, 30.0000, 29.9700]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.05ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.26ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.73ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.87ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.95ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.08ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.99ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.12ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 37.70ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 85.36ms 'diarize_step' step: 14, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 30.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[30.0000, 32.0000, 32.0100]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 11.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.05ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.74ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.87ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.61ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.87ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 42.13ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 89.10ms 'diarize_step' step: 15, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 29.790000915527344 speaker_0', '30.0 32.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[32.0000, 33.9600, 33.9700]]) [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 11.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 12.05ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.69ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 10.83ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.60ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.74ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:18 online_diarizer:56] 9.97ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 44.99ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 92.22ms 'diarize_step' step: 16, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 29.790000915527344 speaker_0', '30.0 31.75 speaker_0', '32.0 33.959999084472656 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[34.3200, 36.0000, 35.6900]]) [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 11.85ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 12.06ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.71ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.84ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.65ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.78ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.85ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.98ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 46.47ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 93.84ms 'diarize_step' step: 17, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 29.790000915527344 speaker_0', '30.0 31.75 speaker_0', '32.0 33.959999084472656 speaker_0', '34.31999969482422 36.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[36.0000, 38.0000, 38.0100]]) [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 11.84ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 12.05ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.69ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.82ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.66ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.79ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.81ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 9.95ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 45.96ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 93.28ms 'diarize_step' step: 18, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 29.790000915527344 speaker_0', '30.0 31.75 speaker_0', '32.0 33.959999084472656 speaker_0', '34.31999969482422 35.81999969482422 speaker_0', '36.0 38.0 speaker_0'] wav_data torch.Size([1, 32000]) vad_timestamps: tensor([[38.0200, 40.0000, 39.9900]]) [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 0.00ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 12.08ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 12.29ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.71ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.85ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.02ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.15ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.25ms '_run_embedding_extractor' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 10.43ms '_extract_online_embeddings' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 47.60ms '_perform_online_clustering' [NeMo I 2023-07-27 03:27:19 online_diarizer:56] 96.30ms 'diarize_step' step: 19, diar_hyp: ['0.0 1.75 speaker_0', '2.0 3.9200000762939453 speaker_0', '4.0 5.75 speaker_0', '6.039999961853027 7.789999961853027 speaker_0', '8.0 9.75 speaker_0', '10.0 11.75 speaker_0', '12.0 13.520000457763672 speaker_0', '14.0 15.75 speaker_0', '16.0 17.75 speaker_0', '18.0 19.75 speaker_0', '20.040000915527344 21.790000915527344 speaker_0', '22.079999923706055 23.829999923706055 speaker_0', '24.020000457763672 25.770000457763672 speaker_0', '26.0 27.940000534057617 speaker_0', '28.040000915527344 29.790000915527344 speaker_0', '30.0 31.75 speaker_0', '32.0 33.959999084472656 speaker_0', '34.31999969482422 35.81999969482422 speaker_0', '36.0 37.75 speaker_0', '38.02000045776367 40.0 speaker_0']