Informatique

Speaker identification

18 juillet 2023
Durée : 01:10:43
Nombre de vues 19
Nombre de favoris 0

Lectures given by Petr Schwarz, PHONEXIA, and Themos Stafylakis, OMILIA

Introduction to speaker identification and deep fake context

Abstract: Petr will present how a speaker identification system based on the ResNet neural network architecture is designed. He will also tell you about basic principles used in speech synthesis, voice morphing, and speech codecs and explain how speaker identification, speech synthesis, and speech codecs can affect each other in the real world.

Biography: Petr Schwarz [PhD, Brno University of Technology, 2009] is senior researcher in BUT Speech@FIT at the Faculty of Information Technology (FIT) of BUT. He has broad experience in speech technologies ranging from voice biometry, speech transcription, keyword spotting, to language identification. At BUT, Petr worked on many national, EU, and US research projects and many international technology evaluation campaigns like those organized by the U.S. National Institute of Standards and Technology (NIST). In 2006, Petr co-founded Phonexia, and served for several years as its CEO and CTO. Phonexia sells speech technologies to more than 60 countries. Currently, he is working on conversational AI technologies and security/defense applications of voice biometry.

 

Extracting speaker and emotion information from self-supervised speech models

Abstract: Themos will  present hot topics in speaker identification research emphasizing self-supervised models.

Biography: Themos Stafylakis received the B.Eng. degree from the National Technical University of Athens, Greece, in 2004, the M.Sc. degree in communication and signal processing from Imperial College London, London, U.K., in 2005, and the Ph.D. degree in speaker diarization for broadcast news from the National Technical University of Athens, Athens, Greece, in 2011. In 2011, he joined Centre de Recherche Informatique de Montréal: (Montréal, QC, Canada) as a Postdoc Researcher on speaker recognition. In 2016, he joined the Computer Vision Laboratory, University of Nottingham, Nottingham, U.K., as a Marie Curie Research Fellow. His main research interests are audiovisual speech and speaker recognition, machine learning and deep neural networks.

 

Mots clés : ai neural network self-supervised models speaker identification speech synthesis voice morphing

 Informations