UMotion - Informatique - Odyssey 2018 - Boosting The Performance Of S…

Informatique

Text-dependent Speaker Recognition

Md Jahangir Alam, Gautam Bhattacharya and Patrick Kenny

Feature normalization strategies help to compensate for the effects of environmental mismatch and are normally incorporated into the feature extraction framework after applying a logarithmic or power function nonlinearity. For spoofing detection systems in the presence of voice conversion and speech synthesis-based spoofing attacks, feature normalization is found to be harmful. However when it comes to spoofing detection for replay attacks, normalization of features aids to reduce equal error rates significantly. In this work, we use discrete Fourier transform (DFT)-based spectral and product spectral features with feature normalization applied in the q-log domain. The q-log function acts as intermediate domain between linear and log domains for normalization of the features. After that, the final features are extracted by applying a principal component analysis technique to the log DFT and product power spectra. Experimental results on the version 2 of second ASVspoof2017 challenge evaluation data show that normalizing features in q-log domain results in relative reduction of equal error rates by approximately 5%. Over all four baseline systems, the DFT spectral features, normalized in the q-log domain, provides an average relative improvement of 28%.

Cite as: Alam, M.J., Bhattacharya, G., Kenny, P. (2018) Boosting the Performance of Spoofing Detection Systems on Replay Attacks Using q-Logarithm Domain Feature Normalization. Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 393-398, DOI: 10.21437/Speaker Odyssey.2018-55.

Ajouté par : Emmanuelle Billard
Mis à jour le : 17 octobre 2018 00:00
Chaîne :
- Informatique
Type : Conférence
Langue principale : Français
Discipline(s) :
- Informatique
- Stic

Informatique

Odyssey 2018 - Boosting the Performance of Spoofing Detection Systems on Replay Attacks Using q-Logarithm Domain Feature Normalization

Text-dependent Speaker Recognition

Informations