Odyssey 2018 - sprocket: Open-Source Voice Conversion Software June 27, 2018
Kazuhiro Kobayashi and Tomoki Toda
Statistical voice conversion (VC) is a technique to convert specific non- or para-linguistic information while keeping linguistic information unchanged, and speaker conversion has been studied as one of the typical VC applications for a few decades. To better understand various VC techniques using a freely available common dataset, the Voice Conversion Challenge (VCC) was launched in 2016, and the 2nd challenge was held in 2018. As one of the baseline systems for the VCC 2018, we have developed an open-source VC software called ``sprocket'', where not only traditional techniques, such as a trajectory-based conversion method using a Gaussian mixture model (GMM) and a vocoder-based conversion framework but also recently developed techniques, such as a vocoder-free VC framework, have been implemented. The use of sprocket makes it possible to 1) easily reproduce the converted voices using the VCC datasets, and 2) develop VC systems using other parallel speech datasets with fundamental VC functions, such as acoustic feature extraction, time-alignment between the source and target features, GMM training, feature conversion, and waveform generation. In this paper, we describe 1) technical details and usage of sprocket, 2) the development of the baseline systems for HUB and SPOKE tasks of the VCC 2018 using sprocket, and 3) performance of sprocket as a VC system by demonstrating results of our developed baseline systems in the VCC 2018.
Cite as: Kobayashi, K., Toda, T. (2018) sprocket: Open-Source Voice Conversion Software . Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 203-210, DOI: 10.21437/Speaker Odyssey.2018-29.