UMotion - Odyssey 2018 - Deepmine Speech Processing Da…

Text-dependent Speaker Recognition

Hossein Zeinali, Hossein Sameti and Themos Stafylakis

In this paper, we introduce a new database for text-dependent, text-prompted and text-independent speaker recognition, as well as for speech recognition. DeepMine is a large-scale database in Persian and English, with its current version containing more than 1300 speakers and 360 thousand recordings overall. DeepMine has several appealing characteristics which make it unique of its kind. First of all, it is the first large-scale speaker recognition database in Persian, enabling the development of voice biometrics applications in the native language of about 110 million people. Second, it is the largest text-dependent and text-prompted speaker recognition database in English, facilitating research on deep learning and other data demanding approaches. Third, its unique combination of Persian and English makes it suitable for exploring domain adaptation and transfer learning approaches, which constitute some of the emerging tasks in speech and speaker recognition. Finally, the extensive annotation with respect to age, gender, province, and educational level, combined with the inherent variability of the Persian language in terms of different accents are ideal for exploring the use of attribute information in utterance and speaker modeling.The presentation of the database is accompanied with several experiments using state-of-the-art algorithms. More specifically, we conduct experiments using HMM-based i-vectors, and we reaffirm their effectiveness in text-dependent speaker recognition. Furthermore, we conduct speech recognition experiments using the annotated text-independent part of the database for training and testing, and we demonstrate that the database can also serve for training robust speech recognition models in Persian.

Cite as: Zeinali, H., Sameti, H., Stafylakis, T. (2018) DeepMine Speech Processing Database: Text-Dependent and Independent Speaker Verification and Speech Recognition in Persian and English . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 386-392, DOI: 10.21437/Speaker Odyssey.2018-54.

Ajouté par : Emmanuelle Billard
Mis à jour le : 17 octobre 2018 00:00
Chaîne :
- Informatique
Type : Conférence
Langue principale : Français
Discipline(s) :
- Informatique
- Stic

Odyssey 2018 - DeepMine Speech Processing Database: Text-Dependent and Independent Speaker Verification and Speech Recognition in Persian and English

Text-dependent Speaker Recognition

Informations