Harmonic-Net: Fundamental frequency and speech rate controllable fast neural vocoder


K. Matsubara, T. Okamoto, R. Takashima, T. Takiguchi, T. Toda, and H. Kawai, "Harmonic-Net: Fundamental frequency and speech-rate controllable fast neural vocoder," IEEE/ACM Trans. Audio, Speech, Lang. Process. (accepted, in press)

Unseen speaker synthesis with multi-speaker model trained using JVS corpus

Normal condifion

Male (jvs001)
Original WORLD uSFGAN WaveNet
HiFi-GAN HiFi-GAN (melspc) Harmonic-Net Harmonic-Net+
Female (jvs004)
Original WORLD uSFGAN WaveNet
HiFi-GAN HiFi-GAN (melspc) Harmonic-Net Harmonic-Net+

0.5 x fo condifion

Male (jvs001)
WORLD uSFGAN HiFi-GAN
Harmonic-Net Harmonic-Net+
Female (jvs004)
WORLD uSFGAN HiFi-GAN
Harmonic-Net Harmonic-Net+

1.5 x fo condifion

Male (jvs001)
WORLD uSFGAN HiFi-GAN
Harmonic-Net Harmonic-Net+
Female (jvs004)
WORLD uSFGAN HiFi-GAN
Harmonic-Net Harmonic-Net+

0.8 x T condifion

Male (jvs001)
WORLD WaveNet HiFi-GAN HiFi-GAN (melspc)
Harmonic-Net Harmonic-Net+
Female (jvs004)
WORLD WaveNet HiFi-GAN HiFi-GAN (melspc)
Harmonic-Net Harmonic-Net+

1.5 x T condifion

Male (jvs001)
WORLD WaveNet HiFi-GAN HiFi-GAN (melspc)
Harmonic-Net Harmonic-Net+
Female (jvs004)
WORLD WaveNet HiFi-GAN HiFi-GAN (melspc)
Harmonic-Net Harmonic-Net+

Full-band singing voice synthesis using Tohoku Kiritan corpus

Normal condifion

Original WORLD HiFi-GAN HiFI-GAN (melspc)
PeriodNet Harmonic-Net Harmonic-Net+

0.5 x fo condifion

WORLD HiFi-GAN PeriodNet Harmonic-Net
Harmonic-Net+

1.5 x fo condifion

WORLD HiFi-GAN PeriodNet Harmonic-Net
Harmonic-Net+

0.8 x T condifion

WORLD HiFi-GAN HiFi-GAN (melspc) PeriodNet
Harmonic-Net Harmonic-Net+

1.5 x T condifion

WORLD HiFi-GAN HiFi-GAN (melspc) PeriodNet
Harmonic-Net Harmonic-Net+

Text-to-speech using JSUT corpus

Normal condifion

Original HiFi-GAN HiFi-GAN (melspc) Harmonic-Net+

0.5 x fo condifion

HiFi-GAN Harmonic-Net+

1.5 x fo condifion

HiFi-GAN Harmonic-Net+