On the calibration of powerset speaker diarization models

Alexis Plaquet; Hervé Bredin

doi:10.21437/Interspeech.2024-1060

Communication Dans Un Congrès Année : 2024

On the calibration of powerset speaker diarization models

(1, 2) , (1, 3)

1
2
3

Alexis Plaquet

Fonction : Auteur
PersonId : 1291841
IdHAL : alexis-plaquet
ORCID : 0009-0004-5123-6731

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Université Toulouse III - Paul Sabatier

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Centre National de la Recherche Scientifique

Résumé

End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, we proposed a powerset multiclass formulation that has beaten the state-of-the-art on multiple datasets. In this paper, we propose to study the calibration of a powerset speaker diarization model, and explore some of its uses. We study the calibration in-domain, as well as out-of-domain, and explore the data in low-confidence regions. The reliability of model confidence is then tested in practice: we use the confidence of the pretrained model to selectively create training and validation subsets out of unannotated data, and compare this to random selection. We find that top-label confidence can be used to reliably predict high-error regions. Moreover, training on low-confidence regions provides a better calibrated model, and validating on low-confidence regions can be more annotation-efficient than random regions.

Mots clés

Speaker diarization Calibration Powerset classification Confidence Estimation

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

template.pdf (259.1 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Alexis Plaquet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04696316

Soumis le : mardi 24 septembre 2024-02:40:03

Dernière modification le : mercredi 25 septembre 2024-03:10:54

Dates et versions

hal-04696316 , version 1 (24-09-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04696316 , version 1
DOI : 10.21437/Interspeech.2024-1060

Citer

Alexis Plaquet, Hervé Bredin. On the calibration of powerset speaker diarization models. 25th Interspeech Conference (INTERSPEECH 2024), ISCA: International Speech Communication Association, Sep 2024, Kos, Greece. pp.3764--3768, ⟨10.21437/Interspeech.2024-1060⟩. ⟨hal-04696316⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UT1-CAPITOLE GENCI IRIT IRIT-SAMOVA IRIT-SI IRIT-CNRS IRIT-UT3 TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

284 Consultations

19 Téléchargements

On the calibration of powerset speaker diarization models

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager