Liang He, Xianhong Chen, Can Xu and Jia Liu
Inspired by P. Kenny's variational Bayes (VB) method, we derive a latent class model (LCM) for single channel speaker diarization. Similar to the VB method, the LCM uses soft information and avoids premature hard decisions in its iterations. Different from the VB method, the LCM provides an iterative framework for multi-objective optimization and allows a more flexible way to compute the probability that given a speaker, a segment occurs. Based on this model, we propose a latent class model-i-vector-probabilistic linear discriminant analysis (LCM-Ivec-PLDA) system. Besides, as the divided segments are very short, their neighbors are taken into consideration. To overcome the initial sensitivity problem, we use an agglomerative hierarchical cluster (AHC) to do initialization and present hard and soft priors. Experiments on the NIST RT09 speaker diarization database and our collected database show that the proposed systems are superior to the traditional VB system.
Cite as: He, L., Chen, X., Xu, C., Liu, J. (2018) Latent Class Model for Single Channel Speaker Diarization . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 128-133, DOI: 10.21437/Speaker Odyssey.2018-18.