ASTREC

NICT

先進的音声技術研究室

【学術論文】

<H30/2018年>

・T. Okamoto, "Mode-matching-based sound field recording and synthesis with circular double-layer arrays," Appl. Sci., vol. 8, no. 7, 1048, Jul. 2018.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Deep neural network-based power spectrum reconstruction for quality improvement of vocoded speech with limited acoustic parameters," Acoust. Sci. & Tech., vol. 39, no. 2, pp. 163–166, Mar. 2018.

・K. Sugiura, "SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones", IEEE Robotics and Automation Letters, Vol. 3, Issue 4, pp. 2963-2970, 2018.

・A. Magassouba, K. Sugiura, H. Kawai, "A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions", IEEE Robotics and Automation Letters, Vol. 3, Issue 4, pp. 3113-3120, 2018.

<H27/2015年>

・Y. Wu, C. Hori, H. Kashioka and H. Kawai, "Leveraging social Q&A collections for improving complex question answering," Comput. Speech Lang., vol. 29, no. 1, pp. 1–19, Jan. 2015.

・K. Sugiura, Y. Shiga, H. Kawai, T. Misu and C. Hori, "A Cloud Robotics Approach towards Dialogue-Oriented Robot Speech," Advanced Robotics, Vol. 29, Issue 7, pp. 449-456, 2015.

<H26/2014年>

・T. Okamoto, S. Enomoto and R. Nishimura, "Least squares approach in wavenumber domain for sound field recording and reproduction using multiple parallel linear arrays," Appl. Acoust., vol. 86, pp. 95–103, Dec. 2014. (in press).

・Y. Tsao, X. Lu, P. Dixon, T. Hu, S. Matsuda and C. Hori, "Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation," Comput. Speech Lang., vol. 28, no. 3, pp. 709–726, May 2014.

・S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig and S. Nakamura, "Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis," IEEE J. Sel. Topic Signal Process., vol. 8, no. 2, pp. 239–250, Apr. 2014.

【国際会議】

<R1/2019年>

・R. Takashima, S. Li, and H. Kawai, "An investigation of sequence-level knowledge distillation methods for CTC acoustic models," ICASSP 2019, Brighton, UK, May 12-17, 2019.

・T. Okamoto, T. Toda, Y. Shiga, and H. Kawai, "Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features," ICASSP 2019, Brighton, UK, May 12-17, 2019.

・T. Okamoto, "Horizontal 3D sound field recording and 2.5D synthesis with omni-directional circular arrays," ICASSP 2019, Brighton, UK, May 12-17, 2019.

・P. Shen, X. Lu, S. Li, and H. Kawai, "Interactive learning of teacher-student model for short utterance spoken language identification," ICASSP 2019, Brighton, UK, May 12-17, 2019.

<H30/2018年>

・S. Li, X. Lu, R. Takashima, P. Shen, T. Kawahara, and H. Kawai, "Improving Very Deep Time-delay Neural Network with Vertical-attention for Effectively Training CTC-based ASR Systems," IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece, Dec. 18-21, 2018.

・T. Okamoto, T. Toda, Y. Shiga, and H. Kawai, "Improving FFTNet vocoder with noise shaping and subband approaches," IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece, Dec. 18-21, 2018.

・Y. Hirata and H. Kato, "Acoustic and perceptual evaluation of Japanese geminates produced by L2 learners", 5th NINJAL International Conference on Phonetics and Phonology, Tachikawa, Oct. 26-28, 2018.

・A. Magassouba, K. Sugiura, and H. Kawai, "A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions", IEEE Robotics and Automation Letters presented at IEEE/RSJ IROS, Madrid, Spain, Oct. 1-5, 2018.

・K. Sugiura, "SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Networks by Drones", IEEE Robotics and Automation Letters presented at IEEE/RSJ IROS, Madrid, Spain, Oct. 1-5, 2018.

・S. Li, X. Lu, R. Takashima, P. Shen, T. Kawahara, and H. Kawai, "Improving CTC acoustic model with very deep residual time-delay neural networks," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.

・J. Ni, Y. Shiga, and H. Kawai, "Multilingual grapheme-to-phoneme conversion with global character vectors," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.

・P. Shen, X. Lu, S. Li, and H. Kawai, "Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification," Interspeech 2018, Hyderabad, India, Sept. 2-6, 2018.

・T. Okamoto, "2.5D localized sound zone generation with a circular array of fixed-directivity loudspeakers," IWAENC 2018, Hitotsubashi, Japan, Sept. 17-20, 2018.

・M. Fujimoto and H. Kawai, "Comparative evaluations of factored various deep convolutional RNN architectures for noise robust speech recognition", ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "CTC loss function with a unit-level ambiguity penalty," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.

・R. Takashima, S. Li and H. Kawai, "An Investigation of a Knowledge Distillation Method for CTC Acoustic Models," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features," ICASSP 2018, Calgary, Alberta, Canada, Apr. 15-20, 2018.

<H29/2017年>

・K. Sugiura and H. Kawai, "Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification," ASRU 2017, Okinawa, Japan, Dec. 16-20, 2017.

・T. Okamoto, K. Tachibana, T. Toda, Y. Shiga, and H. Kawai, "Subband WaveNet with overlapped single-sideband filterbanks," ASRU 2017, Okinawa, Japan, Dec. 16-20, 2017.

・S. Li, X. Lu, P. Shen, R. Takashima, T. Kawahara and H. Kawai, "Incremental training and construction the very deep convolutional residual network acoustic models," in Proc. ASRU, Okinawa, Japan, Dec. 16-20, 2017.

・T. Okamoto, "Angular spectrum decomposition-based 2.5D higher-order spherical harmonic sound field synthesis with a linear loudspeaker array," WASPAA 2017, New Platz, New York, Oct. 15-18, 2017.

・M. Fujimoto, "Factored deep convolutional neural networks for noise robust speech recognition," Interspeech 2017, Stockholm, Sweden, August 20-24, 2017.

・P. Shen, X. Lu, S. Li, and H. Kawai, “Conditional Generative Adversarial Nets Classifier for Spoken Language Identification,” in Proc. Interspeech, Stochholm, Sweden, Aug. 20-24, 2017.

・J. Ni, Y. Shiga, and H. Kawai, “Global Syllable Vectors for Building TTS Front-End with Deep Learning,” Interspeech 2017, Stockholm, Sweden, Aug. 20-24, 2017.

<H27/2015年>

・J. Ni, Y. Shiga and C. Hori, "Extraction of pitch register from expressive speech in Japanese," in Proc. ICASSP 2015, Apr. 2015.

・T. Ochiai, S. Matsuda, H. Watanabe, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training for deep neural networks embedding linear transformation networks," in Proc. ICASSP 2015, Apr. 2015.

・T. Okamoto, "Near-field sound propagation based on a circular and linear array combination," in Proc. ICASSP 2015, Apr. 2015.

<H26/2014年>

・J. Ni, Y. Shiga and C. Hori, "Tuning intonation with pitch accent decomposition for HMM-based 1158 expressive speech synthesis," in Proc. APSIPA 2014, Dec. 2014.

・X. Hu, M. Saiko and C. Hori, "Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition," in Proc. APSIPA 2014, Dec. 2014.

・M. Saiko, H. Yamamoto, R. Isotani and C. Hori, "Efficient multi-lingual unsupervised acoustic model training under mismatch conditions," in Proc. SLT 2014, pp. 24–29, Dec. 2014.

・P. Shen, X. Lu, X. Hu, N. Kanda, M. Saiko and C. Hori, "The NICT asr system for IWSLT 2014," in Proc. IWSLT 2014, pp. 113–118, Dec. 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Ensemble modeling of denoising autoencoder for speech spectrum restoration," in Proc. Interspeech 2014, pp. 885–889, Sept. 2014.

・X. Lu, Y. Tsao, P. Shen and C. Hori, "Spectral patch based sparse coding for acoustic event detection," in Proc. ISCSLP 2014, pp. 317–320, Sept. 2014.

・J. Ni, Y. Shiga and C. Hori, "Superpositional HMM-based intonation synthesis using a functional F0 model," in Proc. ISCSLP 2014, pp. 270–274, Sept. 2014.

・X. Hu, X. Lu and C. Hori, "Mandarin speech recognition using convolution neural network with augmented tone features," in Proc. ISCSLP 2014, pp. 15–18, Sept. 2014.

・Y. Wu, H. Xinhui, and C. Hori, "Translating TED speeches by recurrent neural network based translation model," in Proc. ICASSP 2014, pp. 7098–7102, May 2014.

・T. Ochiai, S. Matsuda, X. Lu, C. Hori and S. Katagiri, "Speaker adaptive training using deep neural networks," in Proc. ICASSP 2014, pp. 6349–6353, May 2014.

・X. Lu, Y. Tsao, S. Matsuda and C. Hori, "Sparse representation based on a bag of spectral exemplars for acoustic event detection," in Proc. ICASSP 2014, pp. 6255–6259, May 2014.

・T. Okamoto, "Generation of multiple sound zones by spatial filtering in wavenumber domain using a linear array of loudspeakers," in Proc. ICASSP 2014, pp. 4733–4737, May 2014.

・H.-T. Fang, J. Huang, X. Lu, S. Wang and Y. Tsao, "Speech enhancement using segmental nonnegative matrix factorization," in Proc. ICASSP 2014, pp. 4483–4487, May 2014.

・C.-L. Huang and C. Hori, "Semantic context Inference for spoken document retrieval using term association matrices," in Proc. ICASSP 2014, pp. 4116–4120, May 2014.

【国内研究会】

    <R1/2019年>

    ・岡本, "スピーカアレイを用いた空間フーリエ変換に基づく局所再生",電子情報通信学会応用音響研究会, 京都, 2019年1月.

    ・岡本, 戸田, 志賀, 河井, "基本周波数とメルケプストラムを用いたリアルタイムニューラルボコーダの検討", 日本音響学会2019年春季研究発表会, 東京, 2019年3月.

    ・岡本, "円形アレイを用いた水平面3次元音場の収録と再現", 日本音響学会2019年春季研究発表会, 東京, 2019年3月.

    ・P. Shen, X. Lu, S. Li, and H. Kawai, "Investigation of multi-domain training for speech recognition", 日本音響学会2019年春季研究発表会, 東京, 2019年3月.

    ・A. Magassouba, K.Sugiura, and H.Kawai, "A Multi-modal Target-source Classifier Model for Object Picking from Natural Language Instructions," 2019年度 人工知能学会全国大会, 新潟, 2019年6月.

    <H30/2018年>

    ・高島, 李, 河井, "CTC音響モデルのためのシーケンスレベル知識蒸留法の検討," 情報処理学会第124回音声言語情報処理研究発表会(SIG-SLP), 東京, 2018年10月.

    ・岡本, 戸田, 志賀, 河井, "FFTNetボコーダの高品質化に関する検討," 日本音響学会2018年秋季研究発表会, 大分, 2018年9月.

    ・岡本, "内部外部混合音場の収録と再現, "日本音響学会2018年秋季研究発表会, 大分, 2018年9月.

    ・李, 盧, 高島, 沈, 河井, "An empirical comparison of sequence training methods for the very deep residual time-delay neural network," 日本音響学会2018年秋季研究発表会, 大分, 2018年9月.

    ・鮮于, 加藤, 田嶋, "日本語学習者の促音と非促音の発話特性-リズム・強さの不自然性の印象と客観的指標との関係-," 日本音響学会秋季研究発表会, 大分, 2018年9月.

    ・藤本 雅清, 河井 恒, "雑音重畳音声と強調音声の併用による単一チャネル雑音下音声認識," 電子情報通信学会, 音声研究会, SP2018-19, pp. 15-20, 2018年7月.

    ・秋田, 安藤, 岡本, 小川, 神田, 倉田, 郡山, 篠崎, 高島, 太刀岡, 藤本, 増村, "国際会議ICASSP2018報告", 第123回SLP研究会, 静岡, 2018年7月.

    ・杉浦孔明, マガスーバ アリー, 河井恒, "生活支援ロボットにおけるGenerative Adversarial Netsを用いた曖昧な指示の理解, " 2018年度 人工知能学会全国大会, 鹿児島, 2018年6月.

    ・高島, 李, 河井, "CTC 音響モデルのための knowledge distillation 方式の検討", 日本音響学会春季研究発表会, 2018年3月.

    ・李, 盧, 高島, 沈, 河井, "IMPROVING CTC-BASED ACOUSTIC MODEL WITH VERY DEEP RESIDUAL NEURAL NETWORKS", 日本音響学会春季研究発表会, 2018年3月.

    ・岡本拓磨, 橘健太郎, 戸田智基, 志賀芳則, 河井恒, "サブバンドWaveNetボコーダによる全可聴帯域音声合成の検討", 日本音響学会春季研究発表会, 2018年3月.

    ・岡本拓磨, "剛球バッフルを用いた超接話アレイ処理", 日本音響学会春季研究発表会, 2018年3月.

    <H29/2017年>

    ・岡本拓磨, 橘健太郎, 戸田智基, 志賀芳則, 河井恒, "サブバンド処理に基づくWaveNetの高速化", 日本音響学会秋季研究発表会, 2017年9月.

    ・岡本拓磨, "球面調和展開から角度スペクトルへの解析的音場変換", 日本音響学会秋季研究発表会, 2017年9月.

    ・杉浦孔明, 河井恒, " Latent Classifier Generative Adversarial Netsによる動詞のない命令文理解", 第35回日本ロボット学会学術講演会, 2017年9月.

    ・S. Li, X. Lu, P. Shen, and H. Kawai,"Very deep convolutional residual network acoustic models for Japanese lecture transcription",日本音響学会秋季研究発表会,2017年9月.

    ・高島遼一, 河井恒, "Connectionist temporal classification の損失関数におけるサブワードレベルの曖昧度に基づく罰則項の導入",日本音響学会秋季研究発表会,2017年9月.

    ・P. Shen, X. Lu, S. Li, and H. Kawai, “cGAN-classifier: Conditional Generative Adversarial Nets for Classification,” 日本音響学会秋季研究発表会,2017年9月.

    ・浅見太一, 大谷大和, 岡本拓磨, 小川哲司, 落合翼, 亀岡弘和, 駒谷和範, 高道慎之介, 俵直弘, 南條浩輝, 橋本佳, 福田隆, 増村亮, 松田繁樹, 渡部晋治, "国際会議ICASSP2017参加報告", 情報処理学会研究報告,2017年7月.

    ・藤本雅清, "Factored deep convolutional neural networks による雑音下音声認識," 情報処理学会研究報告, 2017年7月.