On the calibration of powerset speaker diarization models - Structuration, Analyse et Modélisation de documents Vidéo et Audio
Communication Dans Un Congrès Année : 2024

On the calibration of powerset speaker diarization models

Résumé

End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, we proposed a powerset multiclass formulation that has beaten the state-of-the-art on multiple datasets. In this paper, we propose to study the calibration of a powerset speaker diarization model, and explore some of its uses. We study the calibration in-domain, as well as out-of-domain, and explore the data in low-confidence regions. The reliability of model confidence is then tested in practice: we use the confidence of the pretrained model to selectively create training and validation subsets out of unannotated data, and compare this to random selection. We find that top-label confidence can be used to reliably predict high-error regions. Moreover, training on low-confidence regions provides a better calibrated model, and validating on low-confidence regions can be more annotation-efficient than random regions.
Fichier principal
Vignette du fichier
template.pdf (259.1 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-04696316 , version 1 (24-09-2024)

Licence

Identifiants

Citer

Alexis Plaquet, Hervé Bredin. On the calibration of powerset speaker diarization models. 25th Interspeech Conference (INTERSPEECH 2024), ISCA: International Speech Communication Association, Sep 2024, Kos, Greece. pp.3764--3768, ⟨10.21437/Interspeech.2024-1060⟩. ⟨hal-04696316⟩
53 Consultations
8 Téléchargements

Altmetric

Partager

More