Odyssey 2018 - An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification June 27, 2018
Chunlei Zhang, Shivesh Ranjan and John Hansen
In this paper, we present transfer learning for deep neural network based text-independent speaker verification, in the presence of a severe mismatch between the enrollment and the test data. Given a pre-trained speaker embedding network developed with out-of domain data, we explore and analyze how this pre-trained model can benefit for the in-domain speaker verification task. Two alternative strategies are investigated to perform transfer learning, i.e., vanilla transfer learning (V-TL) and curriculum learning based transfer learning (CL-TL). The proposed methods are validated on UT-SCOPE-physical speech corpus, where we create a setup to introduce mismatched evaluation conditions with the neutral and the physical task stressed speech. Experimental results confirm the effectiveness of both V-TL and CL-TL techniques. Employing transfer learning based on the pre-trained model, we are able to achieve a +47.7% relative improvement over a conventional i-vector/PLDA system and a +30.6% relative improvement over a recent proposed end-to-end system, respectively.
Cite as: Zhang, C., Ranjan, S., Hansen, J. (2018) An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 181-186, DOI: 10.21437/Speaker Odyssey.2018-26.