SPREDS-P1: SPeech Recognition Evaluation Data Set - Presentation type 1 (ver1.0)
What's new
- 2023/12/25 ver1.0 has been released.
Overview
SPREDS-P1 is a set of evaluation data for multilingual speech recognition released by NICT under the Creative Commons Attribution 4.0 International License (CC BY 4.0) consisting of lectures in 15 languages: Japanese, English, Chinese, Korean, Thai, Vietnamese, Indonesian, Myanmar, Spanish, French, Brazilian Portuguese, Filipino, Khmer, Nepali, and Mongolian. The data set contains audio data recorded under almost the same conditions (domain, number of speakers, recording environment, etc.) and their transcriptions which include tags used at NICT. For further details, please refer to '00README.txt' in each directory.
15 languages
Extracted directory
The files have been compressed in 'xz' format. The extracted directory should look like the following. For further information about the LABEL and WAVE directories, please refer to '00README.txt'.
------------------------------------------------------------------------------------------- $ver =[version number] $lang={01_jpn,02_eng,03_zho,04_kor,05_tha,06_vie,07_ind,08_mya,09_spa,10_fra,11_por_BRA,14_fil,15_khm,16_nep,17_mon} $ver/ 00_doc/ $lang/ unsegmented/ LABEL/ WAVE/ segmented/ LABEL/ WAVE/ -------------------------------------------------------------------------------------------