ASTREC

NICT

Advanced Speech Technology Laboratory

【Journal Article】

<2017>
・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Deep neural network-based power spectrum reconstruction for quality improvement of vocoded speech with limited acoustic parameters," Acoustical Science and Technology (Special Issue on Speech Communication, 2018-03).

<2015>
・Y. Wu, C. Hori, H. Kashioka and H. Kawai, "Leveraging social Q&A collections for improving complex  question answering," Comput. Speech Lang., vol. 29, no. 1, pp. 1–19, Jan. 2015.

<2014>
・T. Okamoto, S. Enomoto and R. Nishimura, "Least squares approach in wavenumber domain for sound  field recording and reproduction using multiple parallel linear arrays," Appl. Acoust., vol. 86, pp.  95–103, Dec. 2014. (in press).

・Y. Tsao, X. Lu, P. Dixon, T. Hu, S. Matsuda and C. Hori, "Incorporating local information of the acoustic  environments to MAP-based feature compensation and acoustic model adaptation," Comput. Speech  Lang., vol. 28, no. 3, pp. 709–726, May 2014.

・S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig and S. Nakamura, "Parameter generation  methods with rich context models for high-quality and flexible text-to-speech synthesis," IEEE J. Sel.  Topic Signal Process., vol. 8, no. 2, pp. 239–250, Apl. 2014.

【International Conference】

<2017>
・T. Okamoto, "Angular spectrum decomposition-based 2.5D higher-order spherical harmonic sound field synthesis with a linear loudspeaker array," WASPAA 2017, New Platz, New York, Oct. 15-18, 2017.

・M. Fujimoto, "Factored deep convolutional neural networks for noise robust speech recognition," Interspeech 2017, Stockholm, Sweden, August 20-24, 2017.

・P. Shen, X. Lu, S. Li, and H. Kawai, “Conditional Generative Adversarial Nets Classifier for Spoken Language Identification,” in Proc. Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.

・J. Ni, Y. Shiga, and H. Kawai, “Global Syllable Vectors for Building TTS Front-End with Deep Learning,” Interspeech 2017, Stockholm, Sweden, Aug. 20-24, 2017.

・K. Sugiura and H. Kawai, "Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification," ASRU 2017, Okinawa, Japan, 16-20, Dec. 2017.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Subband WaveNet with overlapped single-sideband filterbanks," ASRU 2017, Okinawa, Japan, 16-20, Dec. 2017.

・S. Li, X. Lu, P. Shen, R. Takashima, T. Kawahara and H. Kawai, "Incremental training and construction the very deep convolutional residual network acoustic models," in Proc. ASRU, Okinawa, Japan, 16-20, Dec. 2017.

・M. Fujimoto and H. Kawai, "Comparative evaluations of factored various deep convolutional RNN architectures for noise robust speech recognition", ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "CTC loss function with a unit-level ambiguity penalty," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "An Investigation of a Knowledge Distillation Method for CTC Acoustic Models," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

<2015>
・J. Ni, Y. Shiga and C. Hori, "Extraction of pitch register from expressive speech in Japanese," Proc.  ICASSP 2015, Apr. 2015. (accepted, to appear).

・T. Ochiai, S. Matsuda, H. Watanabe, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training for deep  neural networks embedding linear transformation networks," Proc. ICASSP 2015, Apr. 2015.  (accepted, to appear).

・T. Okamoto, "Near-field sound propagation based on a circular and linear array combination," Proc.  ICASSP 2015, Apr. 2015. (accepted, to appear).

<2014>
・J. Ni, Y. Shiga and C. Hori, "Tuning intonation with pitch accent decomposition for HMM-based 1158  expressive speech synthesis," Proc. APSIPA 2014, Dec. 2014.

・X. Hu, M. Saiko and C. Hori, "Incorporating tone features to convolutional neural network to improve  Mandarin/Thai speech recognition," Proc. APSIPA 2014, Dec. 2014.

・M. Saiko, H. Yamamoto, R. Isotani and C. Hori, "Efficient multi-lingual unsupervised acoustic model  training under mismatch conditions," Proc. SLT 2014, pp. 24–29, Dec. 2014.

・P. Shen, X. Lu, X. Hu, N. Kanda, M. Saiko and C. Hori, "The NICT asr system for IWSLT 2014," Proc.  IWSLT 2014, pp. 113–118, Dec. 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Ensemble modeling of denoising autoencoder for speech  spectrum restoration," Proc. INTERSPEECH 2014, pp. 885–889, Sept. 2014.

・X. Lu, Y. Tsao, P. Shen and C. Hori, "Spectral patch based sparse coding for acoustic event  detection," Proc. ISCSLP 2014, pp. 317–320, Sept. 2014.

・J. Ni, Y. Shiga and C. Hori, "Superpositional HMM-based intonation synthesis using a functional F0  model," Proc. ISCSLP 2014, pp. 270–274, Sept. 2014.

・X. Hu, X. Lu and C. Hori, "Mandarin speech recognition using convolution neural network with  augmented tone features," Proc. ISCSLP 2014, pp. 15–18, Sept. 2014.

・Y. Wu, H. Xinhui, and C. Hori, "Translating TED speeches by recurrent neural network based  translation model," Proc. ICASSP 2014, pp. 7098–7102, May 2014.

・T. Ochiai, S. Matsuda, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training using deep neural  networks," Proc. ICASSP 2014, pp. 6349–6353, May 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Sparse representation based on a bag of spectral exemplars  for acoustic event detection," Proc. ICASSP 2014, pp. 6255–6259, May 2014.

・T. Okamoto, "Generation of multiple sound zones by spatial filtering in wavenumber domain using a  linear array of loudspeakers," Proc. ICASSP 2014, pp. 4733–4737, May 2014.

・H.-T. Fang, J. Huang, X. Lu, S. Wang and Y. Tsao, "Speech enhancement using segmental  nonnegative matrix factorization," Proc. ICASSP 2014, pp. 4483–4487, May 2014.

・C.-L. Huang and C. Hori, "Semantic context Inference for spoken document retrieval using term  association matrices," Proc. ICASSP 2014, pp. 4116–4120, May 2014.