Noise level limited sub-modeling for diffusion probabilistic vocoders

T. Okamoto, T. Toda, Y. Shiga and H. Kawai,
"Noise level limited sub-modeling for diffusion probabilistic vocoders,"
in Proc. ICASSP, June 2021, pp. 6029–6033. [IEEE Xplore]




Audio Samples

Analysis-synthesis condition

Original WaveGlow Parallel WaveGAN (PWG)

WaveGrad (50 iterations) DiffWave (50 iterations)

WaveGrad (25 iterations) Sub-WaveGrad (25 iterations) DiffWave (25 iterations) Sub-DiffWave (25 iterations)

Sub-WaveGrad (6 iterations) DiffWave (6 iterations) Sub-DiffWave (6 iterations)


Text-to-speech condition

WaveGlow Parallel WaveGAN (PWG)

DiffWave (25 iterations) Sub-DiffWave (25 iterations) DiffWave (6 iterations) Sub-DiffWave (6 iterations)


DiffWaveGrad (10 iterations with 3 sub-DiffWave + 7 sub-WaveGrad): NOT included in the paper

Analysis-syntheis Text-to-speech