Hi-Fi-CAPTAIN: High-fidelity and high-capacity conversational speech synthesis corpus developed by NICT
What's new
- Aug. 10, 2023 ver1.0.0 has been released.
We NICT open-sourced Hi-Fi-CAPTAIN (High-fidelity and high-capacity conversational speech synthesis corpus developed by NICT) corpus to further accelerate speech synthesis research. This corpus is recorded with conversational style and includes 14,000 utterances of one female speaker and one male speaker (12,988 utterances are parallel) for American English (en-US), and 19,056 utterances of one female speaker and 19,058 utterances of one male speaker (18,855 utterances are parallel) for Japanese (ja-JP). For American English, read style utterances are also recorded using TIMIT corpus. All the speakers are professional, and all the speech waveforms (24-bit linear PCM, sampling frequency: 48 kHz) were recorded in soundproof rooms.
Additionally, we open-sourced ESPnet recipe for end-to-end text-to-speech JETS (D. Lim et al., Interspeech 2022) using the corpus.Download
Hi-Fi-CAPTAIN en-US Female (7.07 GB)
Total 14,000 utts [15.1 h]: (Parallel 12,988 utts [13.8 h], Non-parallel 1,012 utts [1.3 h])
Hi-Fi-CAPTAIN en-US Male (6.91 GB)
合Total 14,000 utts [15.0 h]: (Parallel 12,988 utts [14.0 h], Non-parallel 1,012 utts [1.0 h])
Hi-Fi-CAPTAIN ja-JP Female (10.94 GB)
Total 19,056 utts [23.3 h]: (Parallel 18,855 utts [23.0 h], Non-parallel 201 utts [0.3 h])
Hi-Fi-CAPTAIN ja-JP Male (10.46 GB)
Total 19,058 utts [22.3 h]: (Parallel 18,855 utts [22.0 h], Non-parallel 203 utts [0.3 h])
This corpus is released under "Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)".
Please cite the following when using the corpus.
authour = {Takuma Okamoto and Yoshinori Shiga and Hisashi Kawai},
title = {{Hi-Fi-CAPTAIN: High-fidelity and high-capacity conversational speech synthesis corpus developed by NICT}},
howpublished = {https://ast-astrec.nict.go.jp/en/release/hi-fi-captain/},
year = {2023},