Multi-latency look-ahead for streaming speaker segmentation - Structuration, Analyse et Modélisation de documents Vidéo et Audio
Communication Dans Un Congrès Année : 2024

Multi-latency look-ahead for streaming speaker segmentation

Résumé

We address the task of streaming speaker diarization and propose several contributions to achieve a better trade-off between latency and accuracy. First, computational latency is reduced to its bare minimum by switching to a causal frame-wise speaker segmentation architecture. Then, a multi-latency look-ahead mechanism is used during training to support adaptive latency during inference at no additional computational cost. Finally, we detail the method used during inference to achieve the final frame-wise segmentation. We evaluate the impact of these contributions on the AMI meeting dataset with a focus on the speaker segmentation step, seen through the prism of voice activity detection, overlapped speech detection and speaker change detection.
Fichier principal
Vignette du fichier
rahou24_interspeech.pdf (402.31 Ko) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-04734819 , version 1 (14-10-2024)

Identifiants

Citer

Bilal Rahou, Hervé Bredin. Multi-latency look-ahead for streaming speaker segmentation. Interspeech 2024, Sep 2024, Kos, Greece. pp.1610-1614, ⟨10.21437/Interspeech.2024-923⟩. ⟨hal-04734819⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More