Odyssey 2018 - Supervector Compression Strategies to Speed up I-Vector System Development June 29, 2018

Speaker Recognition

Ville Vestman and Tomi Kinnunen


The front-end factor analysis (FEFA), an extension of principal component analysis (PPCA) tailored to be used with Gaussian mixture models (GMMs), is currently the prevalent approach to extract compact utterance-level features (i-vectors) for automatic speaker verification (ASV) systems. Little research has been conducted comparing FEFA to the conventional PPCA applied to maximum a posteriori (MAP) adapted GMM supervectors. We study several alternative methods, including PPCA, factor analysis (FA), and two supervised approaches, supervised PPCA (SPPCA) and the recently proposed probabilistic partial least squares (PPLS), to compress MAP-adapted GMM supervectors. The resulting i-vectors are used in ASV tasks with a probabilistic linear discriminant analysis (PLDA) back-end. We experiment on two different datasets, on the telephone condition of NIST SRE 2010 and on the recent VoxCeleb corpus collected from YouTube videos containing celebrity interviews recorded in various acoustical and technical conditions. The results suggest that, in terms of ASV accuracy, the supervector compression approaches are on a par with FEFA. The supervised approaches did not result in improved performance. In comparison to FEFA, we obtained more than hundred-fold (100x) speedups in the total variability model (TVM) training using the PPCA and FA supervector compression approaches.


Cite as: Vestman, V., Kinnunen, T. (2018) Supervector Compression Strategies to Speed up I-Vector System Development . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 357-364, DOI: 10.21437/Speaker Odyssey.2018-50.

Start video
Check the box to indicate the beginning of playing desired.