Demo page for ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion

T. Okamoto, Y. Ohtani, T. Toda and H. Kawai, "ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion," in Proc. ICASSP, Apr. 2024. (accepted, to appear) [Preprint (PDF)]

Source code

The PyTorch source code based on ESPNet2-TTS used in the experiments is available here.

Demo samples for Japanese

End-to-end sequence-to-sequence text-to-speech (E2E-S2S-TTS) condition (Japanese female)

Ground truth

JETS	JETS-WN	CN-JETS (proposed)	ConvNeXt-TTS (proposed)

End-to-end sequence-to-sequence text-to-speech (E2E-S2S-TTS) condition (Japanese male with)

Ground truth

JETS	JETS-WN	CN-JETS (proposed)	ConvNeXt-TTS (proposed)

End-to-end sequence-to-sequence voice conversion (E2E-S2S-VC) condition (Japanese male to female)

Ground truth (source)	Ground truth (target)

JETS_VC	JETS-WN-VC	CN-JETS-VC (proposed)	ConvNeXt-VC (proposed)

End-to-end sequence-to-sequence voice conversion (E2E-S2S-VC) condition (Japanese female to male)

Ground truth (source)	Ground truth (target)

JETS_VC	JETS-WN-VC	CN-JETS-VC (proposed)	ConvNeXt-VC (proposed)