Skip to main content

Search on Google Scholar for details
List of all published papers by NICT

Journals

2018

  • T. Okamoto, "Mode-matching-based sound field recording and synthesis with circular double-layer arrays," Appl. Sci., vol. 8, no. 7, 1048, Jul. 2018.
  • T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Deep neural network-based power spectrum reconstruction for quality improvement of vocoded speech with limited acoustic parameters," Acoust. Sci. & Tech., vol. 39, no. 2, pp. 163–166, Mar. 2018.
  • K. Sugiura, "SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones", IEEE Robotics and Automation Letters, Vol. 3, Issue 4, pp. 2963-2970, 2018.
  • A. Magassouba, K. Sugiura, H. Kawai, "A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions", IEEE Robotics and Automation Letters, Vol. 3, Issue 4, pp. 3113-3120, 2018.

2015

  • Y. Wu, C. Hori, H. Kashioka and H. Kawai, "Leveraging social Q&A collections for improving complex question answering," Comput. Speech Lang., vol. 29, no. 1, pp. 1–19, Jan. 2015.
  • K. Sugiura, Y. Shiga, H. Kawai, T. Misu and C. Hori, "A Cloud Robotics Approach towards Dialogue-Oriented Robot Speech," Advanced Robotics, Vol. 29, Issue 7, pp. 449-456, 2015.

2014

  • T. Okamoto, S. Enomoto and R. Nishimura, "Least squares approach in wavenumber domain for sound field recording and reproduction using multiple parallel linear arrays," Appl. Acoust., vol. 86, pp. 95–103, Dec. 2014. (in press).
  • Y. Tsao, X. Lu, P. Dixon, T. Hu, S. Matsuda and C. Hori, "Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation," Comput. Speech Lang., vol. 28, no. 3, pp. 709–726, May 2014.
  • S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig and S. Nakamura, "Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis," IEEE J. Sel. Topic Signal Process., vol. 8, no. 2, pp. 239–250, Apr. 2014.

International Conferences

2019

  • R. Takashima, S. Li, and H. Kawai, "An investigation of sequence-level knowledge distillation methods for CTC acoustic models," ICASSP 2019, Brighton, UK, May 12-17, 2019.
  • T. Okamoto, T. Toda, Y. Shiga, and H. Kawai, "Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features," ICASSP 2019, Brighton, UK, May 12-17, 2019.
  • T. Okamoto, "Horizontal 3D sound field recording and 2.5D synthesis with omni-directional circular arrays," ICASSP 2019, Brighton, UK, May 12-17, 2019.
  • P. Shen, X. Lu, S. Li, and H. Kawai, "Interactive learning of teacher-student model for short utterance spoken language identification," ICASSP 2019, Brighton, UK, May 12-17, 2019.

2018

  • S. Li, X. Lu, R. Takashima, P. Shen, T. Kawahara, and H. Kawai, "Improving Very Deep Time-delay Neural Network with Vertical-attention for Effectively Training CTC-based ASR Systems," IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece, Dec. 18-21, 2018.
  • T. Okamoto, T. Toda, Y. Shiga, and H. Kawai, "Improving FFTNet vocoder with noise shaping and subband approaches," IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece, Dec. 18-21, 2018.
  • Y. Hirata and H. Kato, "Acoustic and perceptual evaluation of Japanese geminates produced by L2 learners", 5th NINJAL International Conference on Phonetics and Phonology, Tachikawa, Oct. 26-28, 2018.
  • A. Magassouba, K. Sugiura, and H. Kawai, "A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions", IEEE Robotics and Automation Letters presented at IEEE/RSJ IROS, Madrid, Spain, Oct. 1-5, 2018.
  • K. Sugiura, "SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones", IEEE Robotics and Automation Letters presented at IEEE/RSJ IROS, Madrid, Spain, Oct. 1-5, 2018.
  • S. Li, X. Lu, R. Takashima, P. Shen, T. Kawahara, and H. Kawai, "Improving CTC acoustic model with very deep residual time-delay neural networks," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.
  • J. Ni, Y. Shiga, and H. Kawai, "Multilingual grapheme-to-phoneme conversion with global character vectors," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.
  • P. Shen, X. Lu, S. Li, and H. Kawai, "Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.
  • T. Okamoto, "2.5D localized sound zone generation with a circular array of fixed-directivity loudspeakers," IWAENC 2018, Hitotsubashi, Japan, Sept. 17-20, 2018.
  • M. Fujimoto and H. Kawai, "Comparative evaluations of factored various deep convolutional RNN architectures for noise robust speech recognition", ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.
  • R. Takashima, S. Li and H. Kawai, "CTC loss function with a unit-level ambiguity penalty," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.
  • R. Takashima, S. Li and H. Kawai, "An Investigation of a Knowledge Distillation Method for CTC Acoustic Models," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.
  • T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.

2017

  • K. Sugiura and H. Kawai, "Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification," ASRU 2017, Okinawa, Japan, Dec. 16-20, 2017.
  • T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Subband WaveNet with overlapped single-sideband filterbanks," ASRU 2017, Okinawa, Japan, Dec. 16-20, 2017.
  • S. Li, X. Lu, P. Shen, R. Takashima, T. Kawahara and H. Kawai, "Incremental training and construction the very deep convolutional residual network acoustic models," in Proc. ASRU, Okinawa, Japan, Dec. 16-20, 2017.
  • T. Okamoto, "Angular spectrum decomposition-based 2.5D higher-order spherical harmonic sound field synthesis with a linear loudspeaker array," WASPAA 2017, New Platz, New York, Oct. 15-18, 2017.
  • M. Fujimoto, "Factored deep convolutional neural networks for noise robust speech recognition," Interspeech 2017, Stockholm, Sweden, August 20-24, 2017.
  • P. Shen, X. Lu, S. Li, and H. Kawai, “Conditional Generative Adversarial Nets Classifier for Spoken Language Identification,” in Proc. Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.
  • J. Ni, Y. Shiga, and H. Kawai, “Global Syllable Vectors for Building TTS Front-End with Deep Learning,” Interspeech 2017, Stockholm, Sweden, Aug. 20-24, 2017.

2015

  • J. Ni, Y. Shiga and C. Hori, "Extraction of pitch register from expressive speech in Japanese," in Proc. ICASSP 2015, Apr. 2015.
  • T. Ochiai, S. Matsuda, H. Watanabe, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training for deep neural networks embedding linear transformation networks," in Proc. ICASSP 2015, Apr. 2015.
  • T. Okamoto, "Near-field sound propagation based on a circular and linear array combination," in Proc. ICASSP 2015, Apr. 2015.

2014

  • J. Ni, Y. Shiga and C. Hori, "Tuning intonation with pitch accent decomposition for HMM-based 1158 expressive speech synthesis," in Proc. APSIPA 2014, Dec. 2014.
  • X. Hu, M. Saiko and C. Hori, "Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition," in Proc. APSIPA 2014, Dec. 2014.
  • M. Saiko, H. Yamamoto, R. Isotani and C. Hori, "Efficient multi-lingual unsupervised acoustic model training under mismatch conditions," in Proc. SLT 2014, pp. 24–29, Dec. 2014.
  • P. Shen, X. Lu, X. Hu, N. Kanda, M. Saiko and C. Hori, "The NICT asr system for IWSLT 2014," in Proc. IWSLT 2014, pp. 113–118, Dec. 2014.
  • X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Ensemble modeling of denoising autoencoder for speech spectrum restoration," in Proc. Interspeech 2014, pp. 885–889, Sept. 2014.
  • X. Lu, Y. Tsao, P. Shen and C. Hori, "Spectral patch based sparse coding for acoustic event detection," in Proc. ISCSLP 2014, pp. 317–320, Sept. 2014.
  • J. Ni, Y. Shiga and C. Hori, "Superpositional HMM-based intonation synthesis using a functional F0 model," in Proc. ISCSLP 2014, pp. 270–274, Sept. 2014.
  • X. Hu, X. Lu and C. Hori, "Mandarin speech recognition using convolution neural network with augmented tone features," in Proc. ISCSLP 2014, pp. 15–18, Sept. 2014.
  • Y. Wu, H. Xinhui, and C. Hori, "Translating TED speeches by recurrent neural network based translation model," in Proc. ICASSP 2014, pp. 7098–7102, May 2014.
  • T. Ochiai, S. Matsuda, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training using deep neural networks," in Proc. ICASSP 2014, pp. 6349–6353, May 2014.
  • X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Sparse representation based on a bag of spectral exemplars for acoustic event detection," in Proc. ICASSP 2014, pp. 6255–6259, May 2014.
  • T. Okamoto, "Generation of multiple sound zones by spatial filtering in wavenumber domain using a linear array of loudspeakers," in Proc. ICASSP 2014, pp. 4733–4737, May 2014.
  • H.-T. Fang, J. Huang, X. Lu, S. Wang and Y. Tsao, "Speech enhancement using segmental nonnegative matrix factorization," in Proc. ICASSP 2014, pp. 4483–4487, May 2014.
  • C.-L. Huang and C. Hori, "Semantic context Inference for spoken document retrieval using term association matrices," in Proc. ICASSP 2014, pp. 4116–4120, May 2014.