ASTREC

NICT

先進的音声技術研究室

【学術論文】

<2017年>
・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Deep neural network-based power spectrum reconstruction for quality improvement of vocoded speech with limited acoustic parameters," Acoustical Science and Technology (Special Issue on Speech Communication, 2018-03).

<2015年>
・Y. Wu, C. Hori, H. Kashioka and H. Kawai, "Leveraging social Q&A collections for improving complex  question answering," Comput. Speech Lang., vol. 29, no. 1, pp. 1–19, Jan. 2015.

<2014年>
・T. Okamoto, S. Enomoto and R. Nishimura, "Least squares approach in wavenumber domain for sound  field recording and reproduction using multiple parallel linear arrays," Appl. Acoust., vol. 86, pp.  95–103, Dec. 2014. (in press).

・Y. Tsao, X. Lu, P. Dixon, T. Hu, S. Matsuda and C. Hori, "Incorporating local information of the acoustic  environments to MAP-based feature compensation and acoustic model adaptation," Comput. Speech  Lang., vol. 28, no. 3, pp. 709–726, May 2014.

・S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig and S. Nakamura, "Parameter generation  methods with rich context models for high-quality and flexible text-to-speech synthesis," IEEE J. Sel.  Topic Signal Process., vol. 8, no. 2, pp. 239–250, Apl. 2014.

【国際会議】

<2017年>
・T. Okamoto, "Angular spectrum decomposition-based 2.5D higher-order spherical harmonic sound field synthesis with a linear loudspeaker array," WASPAA 2017, New Platz, New York, Oct. 15-18, 2017.

・M. Fujimoto, "Factored deep convolutional neural networks for noise robust speech recognition," Interspeech 2017, Stockholm, Sweden, August 20-24, 2017.

・P. Shen, X. Lu, S. Li, and H. Kawai, “Conditional Generative Adversarial Nets Classifier for Spoken Language Identification,” in Proc. Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.

・J. Ni, Y. Shiga, and H. Kawai, “Global Syllable Vectors for Building TTS Front-End with Deep Learning,” Interspeech 2017, Stockholm, Sweden, Aug. 20-24, 2017.

・K. Sugiura and H. Kawai, "Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification," ASRU 2017, Okinawa, Japan, 16-20, Dec. 2017.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Subband WaveNet with overlapped single-sideband filterbanks," ASRU 2017, Okinawa, Japan, 16-20, Dec. 2017.

・S. Li, X. Lu, P. Shen, R. Takashima, T. Kawahara and H. Kawai, "Incremental training and construction the very deep convolutional residual network acoustic models," in Proc. ASRU, Okinawa, Japan, 16-20, Dec. 2017.

・M. Fujimoto and H. Kawai, "Comparative evaluations of factored various deep convolutional RNN architectures for noise robust speech recognition", ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "CTC loss function with a unit-level ambiguity penalty," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "An Investigation of a Knowledge Distillation Method for CTC Acoustic Models," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features," ICASSP 2018, Calgary, Alberta, Canada, April 15-20, 2018.

<2015年>
・J. Ni, Y. Shiga and C. Hori, "Extraction of pitch register from expressive speech in Japanese," Proc.  ICASSP 2015, Apr. 2015. (accepted, to appear).

・T. Ochiai, S. Matsuda, H. Watanabe, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training for deep  neural networks embedding linear transformation networks," Proc. ICASSP 2015, Apr. 2015.  (accepted, to appear).

・T. Okamoto, "Near-field sound propagation based on a circular and linear array combination," Proc.  ICASSP 2015, Apr. 2015. (accepted, to appear).

<2014年>
・J. Ni, Y. Shiga and C. Hori, "Tuning intonation with pitch accent decomposition for HMM-based 1158  expressive speech synthesis," Proc. APSIPA 2014, Dec. 2014.

・X. Hu, M. Saiko and C. Hori, "Incorporating tone features to convolutional neural network to improve  Mandarin/Thai speech recognition," Proc. APSIPA 2014, Dec. 2014.

・M. Saiko, H. Yamamoto, R. Isotani and C. Hori, "Efficient multi-lingual unsupervised acoustic model  training under mismatch conditions," Proc. SLT 2014, pp. 24–29, Dec. 2014.

・P. Shen, X. Lu, X. Hu, N. Kanda, M. Saiko and C. Hori, "The NICT asr system for IWSLT 2014," Proc.  IWSLT 2014, pp. 113–118, Dec. 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Ensemble modeling of denoising autoencoder for speech  spectrum restoration," Proc. INTERSPEECH 2014, pp. 885–889, Sept. 2014.

・X. Lu, Y. Tsao, P. Shen and C. Hori, "Spectral patch based sparse coding for acoustic event  detection," Proc. ISCSLP 2014, pp. 317–320, Sept. 2014.

・J. Ni, Y. Shiga and C. Hori, "Superpositional HMM-based intonation synthesis using a functional F0  model," Proc. ISCSLP 2014, pp. 270–274, Sept. 2014.

・X. Hu, X. Lu and C. Hori, "Mandarin speech recognition using convolution neural network with  augmented tone features," Proc. ISCSLP 2014, pp. 15–18, Sept. 2014.

・Y. Wu, H. Xinhui, and C. Hori, "Translating TED speeches by recurrent neural network based  translation model," Proc. ICASSP 2014, pp. 7098–7102, May 2014.

・T. Ochiai, S. Matsuda, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training using deep neural  networks," Proc. ICASSP 2014, pp. 6349–6353, May 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Sparse representation based on a bag of spectral exemplars  for acoustic event detection," Proc. ICASSP 2014, pp. 6255–6259, May 2014.

・T. Okamoto, "Generation of multiple sound zones by spatial filtering in wavenumber domain using a  linear array of loudspeakers," Proc. ICASSP 2014, pp. 4733–4737, May 2014.

・H.-T. Fang, J. Huang, X. Lu, S. Wang and Y. Tsao, "Speech enhancement using segmental  nonnegative matrix factorization," Proc. ICASSP 2014, pp. 4483–4487, May 2014.

・C.-L. Huang and C. Hori, "Semantic context Inference for spoken document retrieval using term  association matrices," Proc. ICASSP 2014, pp. 4116–4120, May 2014.

 

【国内研究会】

    <2017年>
    ・浅見太一,大谷大和,岡本拓磨,小川哲司,落合翼,亀岡弘和,駒谷和範,高道慎之介,俵直弘,南條浩輝,橋本佳,福田隆,増村亮,松田 繁樹,渡部晋治,"国際会議 ICASSP2017 参加報告",情報処理学会研究報告,2017年7月

    ・藤本雅清, "Factored deep convolutional neural networks による雑音下音声認識," 情報処理学会研究報告,2017年7月

    ・岡本拓磨,橘健太郎,戸田智基,志賀芳則,河井恒,"サブバンド処理に基づくWaveNetの高速化",日本音響学会秋季研究発表会,2017年9月

    ・岡本拓磨,"球面調和展開から角度スペクトルへの解析的音場変換",日本音響学会秋季研究発表会,2017年9月

    ・杉浦孔明、河井恒、" Latent Classifier Generative Adversarial Netsによる動詞のない命令文理解", 第35回日本ロボット学会学術講演会, 2017年9月

    ・S. Li, X. Lu, P. Shen and H. Kawai,"Very deep convolutional residual network acoustic models for Japanese lecture transcription",日本音響学会秋季研究発表会,2017年9月

    ・高島遼一, 河井恒,"Connectionist temporal classification の損失関数におけるサブワードレベルの曖昧度に基づく罰則項の導入",日本音響学会秋季研究発表会,2017年9月

    ・P. Shen, X. Lu, S. Li, and H. Kawai, “cGAN-classifier: Conditional Generative Adversarial Nets for Classification,” 日本音響学会秋季研究発表会,2017年9月

    ・岡本拓磨,橘健太郎,戸田智基,志賀芳則,河井恒,"サブバンドWaveNetボコーダによる全可聴帯域音声合成の検討",日本音響学会春季研究発表会,2018年3月

    ・岡本拓磨,"剛球バッフルを用いた超接話アレイ処理",日本音響学会春季研究発表会,2018年3月