Odyssey 2018 - Keynote - Speaking naturally? It depends who is listening. June 27, 2018

Simon King


Putting one technology against another can lead to intriguing developments. Using speech synthesis to ‘spoof’ speaker verification systems was initially found to be very successful, but immediately triggered the development of effective countermeasures. The next step in the arms race is synthetic speech that cannot be detected by those countermeasures. It doesn’t even have to sound natural or like the target speaker to a human listener — only to the machine. Other forms of such an adversarial attack have been demonstrated against image classifiers (with images that look like one thing to a human but something entirely different to the machine) and automatic speech recognition systems (where signals that sound like noise to a human are recognised as words by the machine). This highlights the enormous differences between human and machine perception. Does that matter? Do generative models and adversarial techniques tell us anything about human speech, or is there no connection? I’m not promising any answers though; I’m likely to raise more questions.


Cite as: King, S. (2018) Speaking naturally? It depends who is listening. Proc. Odyssey 2018 The Speaker and Language Recognition Workshop.

Start video
Check the box to indicate the beginning of playing desired.