Demo page for ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion

T. Okamoto, Y. Ohtani, T. Toda and H. Kawai, "ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion," in Proc. ICASSP, Apr. 2024. (accepted, to appear) [Preprint (PDF)]

Source code

The PyTorch source code based on ESPNet2-TTS used in the experiments is available here.



Demo samples for Japanese

End-to-end sequence-to-sequence text-to-speech (E2E-S2S-TTS) condition (Japanese female)
Ground truth
JETS JETS-WN CN-JETS (proposed) ConvNeXt-TTS (proposed)

End-to-end sequence-to-sequence text-to-speech (E2E-S2S-TTS) condition (Japanese male with)
Ground truth
JETS JETS-WN CN-JETS (proposed) ConvNeXt-TTS (proposed)

End-to-end sequence-to-sequence voice conversion (E2E-S2S-VC) condition (Japanese male to female)
Ground truth (source) Ground truth (target)
JETS_VC JETS-WN-VC CN-JETS-VC (proposed) ConvNeXt-VC (proposed)

End-to-end sequence-to-sequence voice conversion (E2E-S2S-VC) condition (Japanese female to male)
Ground truth (source) Ground truth (target)
JETS_VC JETS-WN-VC CN-JETS-VC (proposed) ConvNeXt-VC (proposed)