박사

DNN-based Acoustic Modeling for Robust Automatic Speech Recognition

이강현 2019년
논문상세정보
' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • 응용 물리
  • Robust speech recognition, feature enhancement, feature compensa- tion, acoustic modeling, deep neural network (DNN), variational autoencoder (VAE), variational inference (VIF), uncertainty decoding (UD)
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
4,650 0

0.0%

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 참고문헌

  • ||, \Nonparametric uncertainty estimation and propagation for noise robust ASR," IEEE Trans. Audio, Speech, Language Process., vol. 23, no. 11, pp. 1835{ 1846, Nov. 2015.
  • Z.Wang and D.Wang, \A joint training framework for robust autiomatic speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 24, no. 4, pp. 796{806, Jan. 2016.
  • Y. Ephraim and D. Malah, \Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol. 32, no. 6, pp. 1109{1121, Dec. 1984.
  • X. Glorot, A. Bordes, and Y. Bengio, \Deep sparse recti er neural networks," in Proc. Int. Conf. Artif. Intell. Statist., 2011, pp. 315{323.
  • X. Feng, Y. Zhang, and J. Glass, \Speech feature denoising and dereverbera- tion via deep autoencoders for noisy reverberant speech recognition," in Proc. ICASSP, 2014, pp. 1759{1763.
  • W. Li, Y. Z. L. Wang, J. Dines, M. Magimai.-Doss, H. Bourlard, and Q. Liao, \Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 12, pp. 2244{2255, Dec. 2014.
  • V. Ion and R. Haeb-Umbach, \A novel uncertainty decoding rule with appli- cations to transmission error robust speech recognition," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 5, pp. 1047{1060, Jul. 2008.
  • T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, \WSJCAM0: a british english speech corpus for large vocabulary continuous speech recognition," in Proc. ICASSP, 1995, pp. 81{84.
  • T. Kowaliw, N. Bredeche, and R. Doursat, Growing adaptive machines: com- bining development and learning in arti cial neural networks. Springer, 2014.
  • T. Gao, J. Du, L.-R. Dai, and C.-H. Lee, \Joint training of front-end and back- end deep neural networks for robust speech recognition," in Proc. ICASSP, 2015, pp. 4375{4379.
  • S. Young, \The HTK book," Tech. Rep., 2006.
  • S. Tan and K. C. Sim, \Learning utterance-level normalisation using variational autoencoders for robust automatic speech recognition," in SLT, 2016, pp. 3556{ 3560.
  • R. Salakhutdinov, \Learning deep generative models," Annual Review of Statis- tics and Its Application, vol. 2, no. 1, pp. 361{385, Jan. 2015.
  • R. F. Astudillo, A. Abad, and I. Trancoso, \Accounting for the residual uncer- tainty of multi-layer perceptron based features," in Proc. ICASSP, 2014, pp. 6859{6863.
  • R. F. Astudillo and R. Orglmeister, \Computing MMSE estimates and residual unertainty directly in the feature domain of ASR using STFT domain speech distortation models," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 5, pp. 1023{1034, May 2013.
  • R. F. Astudillo and J. P. da Silva Neto, \Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition," in Proc. In- terspeech, 2011, pp. 461{464.
  • Q. Huo and C.-H. Lee, \A bayesian predictive classi cation approach to robust speech recognition," IEEE Speech Audio Process., vol. 8, no. 2, pp. 200{204, Mar. 2003.
  • O. L. Frost, \An algorithm for linearly constrained adaptive array processing," in Proc. IEEE, vol. 60, no. 8, Aug. 1972, pp. 926{935.
  • N. Srivastava, \Dropout: a simple way to prevent neural networks from over- tting," Journal of Machine Learning Research, vol. 15, pp. 1929{1958, Jun. 2014.
  • N. S. Kim, \IMM-based estimation for slowly evolving environments," IEEE Signal Process. Lett., vol. 5, no. 6, pp. 146{149, Jun. 1998.
  • N. B. Yoma and M. Villar, \Speaker veri cation in noise using a stochastic version of the weighted viterbi algorithm," IEEE Speech Audio Process., vol. 10, no. 3, pp. 158{166, May 2002.
  • M. Seltzer, D. Yu, and Y. Wang, \An investigation of deep neural networks for noise robust speech recognition," in Proc. ICASSP, 2013, pp. 7398{7402.
  • M. Mimura, S. Sakai, and T. Kawahara, \Exploring deep neural networks and deep autoencoders in reverberant speech recognition," in HSCMA, 2014, pp. 197{201.
  • M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti, \The multi-channel wall street journal audio visual corpus (MC-WSJ-AV): speci cation and initial experiments," in Proc. ASRU, 2005, pp. 357{262.
  • M. Delcroix, T. Nakatani, and S. Watanabe, \Static and dynamic variance compsensation for recognition of reverberant speech with dereverberation pre- processing," IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 2, pp. 324{334, Jul. 2013.
  • M. Delcroix, S. Watanabe, T. Nakatani, and A. Nakamura, \Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre- processor and speech recognizer," Comput. Speech Lang., vol. 27, no. 1, pp. 350{368, Jul. 2013.
  • M. Delcroix, K. Kinoshita, T. Hori, and T. Nakatami, \Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions," in Proc. ICASSP, 2016, pp. 5270{5274.
  • M. D. Zeiler, \Adadelta: An adaptive learning rate method," in arXiv preprint arXiv:1212.5701, 2012.
  • L. Lu, K. Chin, A. Ghoshal, and S. Renals, \Joint uncertainty decoding for noise robust subspace gaussian mixture models," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 9, pp. 1791{1804, Jul. 2013.
  • L. J. Griths and C. W. Jim, \An alternative approach to linearly constrained adaptive beamforming," IEEE Trans. Antennas Propag., vol. AP-30, no. 1, pp. 27{34, Jan. 1982.
  • L. Deng, J. Droppo, and A. Acero, \Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition," IEEE Speech Audio Process., vol. 11, no. 6, pp. 568{580, Nov. 2003.
  • L. Deng, J. Droppo, and A. Acero, \Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Speech Audio Process., vol. 13, May 2006.
  • K. Nathwani, E. Vincent, and I. Illina, \Consistent DNN uncertainty training and decoding for robust ASR," in Proc. ASRU, 2017, pp. 185{192.
  • K. H. Lee, W. H. Kang, T. G. Kang, and N. S. Kim, \Integrated DNN- based model adaptation technique for noise-robust speech recognition," in Proc. ICASSP, 2017, pp. 5245{5249.
  • J. Benesty, J. Chen, and Y. Huang, Microphone array signal processing. Springer, 2008.
  • J. A. Arrowood and M. A. Clements, \Using observation uncertainty in HMM decoding," in Proc. Interspeech, 2002, pp. 1561{1564.
  • H. Nishizaki, \Data augmentation and feature extraction using variational au- toencoder for acoustic modeling," in APSIPA, 2017, pp. 1222{1227.
  • H. Liao and M. Gales, \Joint uncertainty decoding for noise robust speech recognition," in Proc. Interspeech, 2005, pp. 3129{3132.
  • H. L. K. Sohn and X. Yan, \Learning structured output representation using deep conditional generative models," in Proc. NIPS, 2015.
  • G. Saon, H. Nahamoo, D. Nahamoo, and M. Picheny, \Speaker adaptation of neural network acoustic models using i-vectors," in Proc. ASRU, 2013, pp. 55{59.
  • G. Hu. (2004) 100 nonspeech environmental sounds. [Online]. Available: http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html
  • G. Hirsch, \Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task, version 2.0," Tech. Rep., 2002.
  • G. Hirsch, \AURORA-5 experimental framework for the performance evalua- tion of speech recognition in case of a hands-free speech input in noisy environ- ments," Tech. Rep., 2007.
  • G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, \Improving neural networks by preventing co-adaptation of feature detectors," CoRR, vol. abs/1207.0580, 2012.
  • G. Dahl, D. Yu, L. Deng, and A. Acero, \Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 30{42, Jan. 2012.
  • F. Nesta, M. Matassoni, and R. F. Astudillo, \A exible spatial blind source extraction framework for robust speech recognition in noisy environments," 2013, pp. 33{40.
  • F. Chollet. (2015) Keras. [Online]. Available: https://github.com/fchollet/keras
  • E. Vincent, S. Watanabe, A. A. Nugraha, J. Barker, and R. Marxer, \HMM adaptation using vector taylor series for noisy speech recognition," Comput. Speech Lang., vol. 46, no. 11, pp. 537{557, Nov. 2000.
  • D. Yu, M. L. Seltzer, J. Li, J. Huang, and F. Seide, \Feature learning in deep neural networks - A study on speech recognition tasks," CoRR, vol. abs/1301.3605, 2013.
  • D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, \Robust speech recognition using a cepstral minimum-meansquare-error-motivated noise sup- pressor," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 5, pp. 1061{1070, Jul. 2008.
  • D. T. Tran, E. Vincent, and D. Jouvet, \Fusion of multiple uncertainty esti- mators and propagators for noise robust ASR," in Proc. ICASSP, 2014, pp. 5512{5516.
  • D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Han- nemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, \The kaldi speech recognition toolkit," in Proc. ASRU, 2011.
  • D. P. Kingma and M. Welling, \Auto-encoding variational bayes," in Proc. ICLR, 2014.
  • D. Kolossa, R. F. Astudillo, E. Ho mann, and R. Orglmeister, \Independent component analysis and time frequency masking for speech recognition in mul- titalker condition," EURASIP Journal on Audio, Speech, and Music Processing, vol. 2010, no. 1, pp. 1{13, Jul. 2010.
  • D. Kolossa and R. Haeb-Umbach, in Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications. Berlin: Springer-Verlag, 2011.
  • C. Huemmer, R. Mass, A. Schwarz, R. F. Astudillo, and W. Kellermann, \Un- certainty decoding for DNN-HMM hybrid systems based on numerical sam- pling," in Proc. Interspeech, 2015, pp. 3556{3560.
  • C. Huemmer, A. Schwarz, R. Mass, H. Barfuss, R. F. Astudillo, and W. Keller- mann, \A new uncertainty decoding scheme for DNN-HMM hybrid systems with multichannel speech enhancement," in Proc. Interspeech, 2016, pp. 5760{ 5764.
  • C. Doersch, \Tutorial on variational autoencoders," in arXiv:1606.05908, 2016.
  • B. Kingsbury, T. N. Sainath, and H. Soltau, \Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization," in Proc. Interspeech, 2012, pp. 10{13.
  • A. Tjandra, S. Sakti, S. Nakamura, and M. Adriani, \Stochastic gradient varia- tional bayes for deep learning-based ASR," in Proc. ASRU, 2015, pp. 175{180.
  • A. Ozerov, M. Lagrange, and E. Vincent, \Uncertainty-based learning of acous- tic models from noisy data," Comput. Speech Lang., vol. 27, no. 3, pp. 874{894, Mar. 2013.
  • A. Narayanan and D. Wang, \Investigation of speech separation as a front- end for noise robust speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 826{835, Apr. 2014.
  • A. Narayanan and D. Wang, \Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 92{101, Jan. 2015.
  • A. Mohamed, G. Dahl, and G. Hinton, \Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 14{22, Jan. 2012.
  • A. H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, and R. Haeb-Umbach, \Gmm-based signi cance decoding," in Proc. ICASSP, 2013, pp. 6827{6831.
  • A. H. Abdelaziz, S. Watanabe, J. R. Hershey, E. Vincent, and D. Kolossa, \Uncertainty propagation through deep neural networks," in Proc. Interspeech, 2015, pp. 3561{3565.
  • A. Graves, Supervised Sequence Labeling with Recurrent Neural Networks. ser. Studies in Computation Intelligence: Springer, 2012, vol. 385.
  • A. Ghoshal and D. Povey, \Sequence-discriminative training of deep neural networks," in Proc. Interspeech, 2013, pp. 2345{2349.