DNN-based Acoustic Modeling for Robust Automatic Speech Recognition

이강현 2019년

활용도
공유도
영향력

논문상세정보

- 저자 이강현
- 기타서명 강인한 음성인식을 위한 DNN 기반 음향 모델링
- 형태사항 삽화, 표: x, 100 p.: 26 cm
- 일반주기 참고문헌 수록
- 학위논문사항 서울대학교 대학원, 학위논문(박사)-, 전기·컴퓨터공학부, 2019. 2
- DDC 621.3, 22
- 발행지 서울
- 언어 eng
- 출판년 2019
- 발행사항 서울대학교 대학원
- 주제어 Robust speech recognition, feature enhancement, feature compensa- tion, acoustic modeling, deep neural network (DNN), variational autoencoder (VAE), variational inference (VIF), uncertainty decoding (UD)
- 참고문헌( 68)
유사주제 논문( 4,648)
- 응용 물리 4,648건

인용/피인용

DNN-based Acoustic Modeling for Robust Automat ...

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 주제별 논문영향력

논문영향력 요약
동일주제 총논문수	논문피인용 총횟수	주제별 논문영향력의 평균
주제	응용 물리 Robust speech recognition, feature enhancement, feature compensa- tion, acoustic modeling, deep neural network (DNN), variational autoencoder (VAE), variational inference (VIF), uncertainty decoding (UD)
4,650	0	0.0%

논문영향력
주제		주제별 논문수	주제별 논문영향력
주제분류(KDC/DDC)	응용 물리	4,649	0.0%
주제어	Robust speech recognition, fe ...	1	0.0%
계		4,650	0.0%
* 다른 주제어 보유 논문에서 피인용된 횟수

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 참고문헌

||, \Nonparametric uncertainty estimation and propagation for noise robust ASR," IEEE Trans. Audio, Speech, Language Process., vol. 23, no. 11, pp. 1835{ 1846, Nov. 2015.
Z.Wang and D.Wang, \A joint training framework for robust autiomatic speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 24, no. 4, pp. 796{806, Jan. 2016.
Y. Ephraim and D. Malah, \Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process., vol. 32, no. 6, pp. 1109{1121, Dec. 1984.
X. Glorot, A. Bordes, and Y. Bengio, \Deep sparse rectier neural networks," in Proc. Int. Conf. Artif. Intell. Statist., 2011, pp. 315{323.
X. Feng, Y. Zhang, and J. Glass, \Speech feature denoising and dereverbera- tion via deep autoencoders for noisy reverberant speech recognition," in Proc. ICASSP, 2014, pp. 1759{1763.
W. Li, Y. Z. L. Wang, J. Dines, M. Magimai.-Doss, H. Bourlard, and Q. Liao, \Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 12, pp. 2244{2255, Dec. 2014.
V. Ion and R. Haeb-Umbach, \A novel uncertainty decoding rule with appli- cations to transmission error robust speech recognition," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 5, pp. 1047{1060, Jul. 2008.
T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, \WSJCAM0: a british english speech corpus for large vocabulary continuous speech recognition," in Proc. ICASSP, 1995, pp. 81{84.
T. Kowaliw, N. Bredeche, and R. Doursat, Growing adaptive machines: com- bining development and learning in articial neural networks. Springer, 2014.
T. Gao, J. Du, L.-R. Dai, and C.-H. Lee, \Joint training of front-end and back- end deep neural networks for robust speech recognition," in Proc. ICASSP, 2015, pp. 4375{4379.
S. Young, \The HTK book," Tech. Rep., 2006.
S. Tan and K. C. Sim, \Learning utterance-level normalisation using variational autoencoders for robust automatic speech recognition," in SLT, 2016, pp. 3556{ 3560.
R. Salakhutdinov, \Learning deep generative models," Annual Review of Statis- tics and Its Application, vol. 2, no. 1, pp. 361{385, Jan. 2015.
R. F. Astudillo, A. Abad, and I. Trancoso, \Accounting for the residual uncer- tainty of multi-layer perceptron based features," in Proc. ICASSP, 2014, pp. 6859{6863.
R. F. Astudillo and R. Orglmeister, \Computing MMSE estimates and residual unertainty directly in the feature domain of ASR using STFT domain speech distortation models," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 5, pp. 1023{1034, May 2013.
R. F. Astudillo and J. P. da Silva Neto, \Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition," in Proc. In- terspeech, 2011, pp. 461{464.
Q. Huo and C.-H. Lee, \A bayesian predictive classication approach to robust speech recognition," IEEE Speech Audio Process., vol. 8, no. 2, pp. 200{204, Mar. 2003.
O. L. Frost, \An algorithm for linearly constrained adaptive array processing," in Proc. IEEE, vol. 60, no. 8, Aug. 1972, pp. 926{935.
N. Srivastava, \Dropout: a simple way to prevent neural networks from over- tting," Journal of Machine Learning Research, vol. 15, pp. 1929{1958, Jun. 2014.
N. S. Kim, \IMM-based estimation for slowly evolving environments," IEEE Signal Process. Lett., vol. 5, no. 6, pp. 146{149, Jun. 1998.
N. B. Yoma and M. Villar, \Speaker verication in noise using a stochastic version of the weighted viterbi algorithm," IEEE Speech Audio Process., vol. 10, no. 3, pp. 158{166, May 2002.
M. Seltzer, D. Yu, and Y. Wang, \An investigation of deep neural networks for noise robust speech recognition," in Proc. ICASSP, 2013, pp. 7398{7402.
M. Mimura, S. Sakai, and T. Kawahara, \Exploring deep neural networks and deep autoencoders in reverberant speech recognition," in HSCMA, 2014, pp. 197{201.
M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti, \The multi-channel wall street journal audio visual corpus (MC-WSJ-AV): specication and initial experiments," in Proc. ASRU, 2005, pp. 357{262.
M. Delcroix, T. Nakatani, and S. Watanabe, \Static and dynamic variance compsensation for recognition of reverberant speech with dereverberation pre- processing," IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 2, pp. 324{334, Jul. 2013.
M. Delcroix, S. Watanabe, T. Nakatani, and A. Nakamura, \Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre- processor and speech recognizer," Comput. Speech Lang., vol. 27, no. 1, pp. 350{368, Jul. 2013.
M. Delcroix, K. Kinoshita, T. Hori, and T. Nakatami, \Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions," in Proc. ICASSP, 2016, pp. 5270{5274.
M. D. Zeiler, \Adadelta: An adaptive learning rate method," in arXiv preprint arXiv:1212.5701, 2012.
L. Lu, K. Chin, A. Ghoshal, and S. Renals, \Joint uncertainty decoding for noise robust subspace gaussian mixture models," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 9, pp. 1791{1804, Jul. 2013.
L. J. Griths and C. W. Jim, \An alternative approach to linearly constrained adaptive beamforming," IEEE Trans. Antennas Propag., vol. AP-30, no. 1, pp. 27{34, Jan. 1982.
L. Deng, J. Droppo, and A. Acero, \Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition," IEEE Speech Audio Process., vol. 11, no. 6, pp. 568{580, Nov. 2003.
L. Deng, J. Droppo, and A. Acero, \Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Speech Audio Process., vol. 13, May 2006.
K. Nathwani, E. Vincent, and I. Illina, \Consistent DNN uncertainty training and decoding for robust ASR," in Proc. ASRU, 2017, pp. 185{192.
K. H. Lee, W. H. Kang, T. G. Kang, and N. S. Kim, \Integrated DNN- based model adaptation technique for noise-robust speech recognition," in Proc. ICASSP, 2017, pp. 5245{5249.
J. Benesty, J. Chen, and Y. Huang, Microphone array signal processing. Springer, 2008.
J. A. Arrowood and M. A. Clements, \Using observation uncertainty in HMM decoding," in Proc. Interspeech, 2002, pp. 1561{1564.
H. Nishizaki, \Data augmentation and feature extraction using variational au- toencoder for acoustic modeling," in APSIPA, 2017, pp. 1222{1227.
H. Liao and M. Gales, \Joint uncertainty decoding for noise robust speech recognition," in Proc. Interspeech, 2005, pp. 3129{3132.
H. L. K. Sohn and X. Yan, \Learning structured output representation using deep conditional generative models," in Proc. NIPS, 2015.
G. Saon, H. Nahamoo, D. Nahamoo, and M. Picheny, \Speaker adaptation of neural network acoustic models using i-vectors," in Proc. ASRU, 2013, pp. 55{59.
G. Hu. (2004) 100 nonspeech environmental sounds. [Online]. Available: http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html
G. Hirsch, \Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task, version 2.0," Tech. Rep., 2002.
G. Hirsch, \AURORA-5 experimental framework for the performance evalua- tion of speech recognition in case of a hands-free speech input in noisy environ- ments," Tech. Rep., 2007.
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, \Improving neural networks by preventing co-adaptation of feature detectors," CoRR, vol. abs/1207.0580, 2012.
G. Dahl, D. Yu, L. Deng, and A. Acero, \Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 30{42, Jan. 2012.
F. Nesta, M. Matassoni, and R. F. Astudillo, \A exible spatial blind source extraction framework for robust speech recognition in noisy environments," 2013, pp. 33{40.
F. Chollet. (2015) Keras. [Online]. Available: https://github.com/fchollet/keras
E. Vincent, S. Watanabe, A. A. Nugraha, J. Barker, and R. Marxer, \HMM adaptation using vector taylor series for noisy speech recognition," Comput. Speech Lang., vol. 46, no. 11, pp. 537{557, Nov. 2000.
D. Yu, M. L. Seltzer, J. Li, J. Huang, and F. Seide, \Feature learning in deep neural networks - A study on speech recognition tasks," CoRR, vol. abs/1301.3605, 2013.
D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, \Robust speech recognition using a cepstral minimum-meansquare-error-motivated noise sup- pressor," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 5, pp. 1061{1070, Jul. 2008.
D. T. Tran, E. Vincent, and D. Jouvet, \Fusion of multiple uncertainty esti- mators and propagators for noise robust ASR," in Proc. ICASSP, 2014, pp. 5512{5516.
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Han- nemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, \The kaldi speech recognition toolkit," in Proc. ASRU, 2011.
D. P. Kingma and M. Welling, \Auto-encoding variational bayes," in Proc. ICLR, 2014.
D. Kolossa, R. F. Astudillo, E. Homann, and R. Orglmeister, \Independent component analysis and time frequency masking for speech recognition in mul- titalker condition," EURASIP Journal on Audio, Speech, and Music Processing, vol. 2010, no. 1, pp. 1{13, Jul. 2010.
D. Kolossa and R. Haeb-Umbach, in Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications. Berlin: Springer-Verlag, 2011.
C. Huemmer, R. Mass, A. Schwarz, R. F. Astudillo, and W. Kellermann, \Un- certainty decoding for DNN-HMM hybrid systems based on numerical sam- pling," in Proc. Interspeech, 2015, pp. 3556{3560.
C. Huemmer, A. Schwarz, R. Mass, H. Barfuss, R. F. Astudillo, and W. Keller- mann, \A new uncertainty decoding scheme for DNN-HMM hybrid systems with multichannel speech enhancement," in Proc. Interspeech, 2016, pp. 5760{ 5764.
C. Doersch, \Tutorial on variational autoencoders," in arXiv:1606.05908, 2016.
B. Kingsbury, T. N. Sainath, and H. Soltau, \Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization," in Proc. Interspeech, 2012, pp. 10{13.
A. Tjandra, S. Sakti, S. Nakamura, and M. Adriani, \Stochastic gradient varia- tional bayes for deep learning-based ASR," in Proc. ASRU, 2015, pp. 175{180.
A. Ozerov, M. Lagrange, and E. Vincent, \Uncertainty-based learning of acous- tic models from noisy data," Comput. Speech Lang., vol. 27, no. 3, pp. 874{894, Mar. 2013.
A. Narayanan and D. Wang, \Investigation of speech separation as a front- end for noise robust speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 4, pp. 826{835, Apr. 2014.
A. Narayanan and D. Wang, \Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 1, pp. 92{101, Jan. 2015.
A. Mohamed, G. Dahl, and G. Hinton, \Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 14{22, Jan. 2012.
A. H. Abdelaziz, S. Zeiler, D. Kolossa, V. Leutnant, and R. Haeb-Umbach, \Gmm-based signicance decoding," in Proc. ICASSP, 2013, pp. 6827{6831.
A. H. Abdelaziz, S. Watanabe, J. R. Hershey, E. Vincent, and D. Kolossa, \Uncertainty propagation through deep neural networks," in Proc. Interspeech, 2015, pp. 3561{3565.
A. Graves, Supervised Sequence Labeling with Recurrent Neural Networks. ser. Studies in Computation Intelligence: Springer, 2012, vol. 385.
A. Ghoshal and D. Povey, \Sequence-discriminative training of deep neural networks," in Proc. Interspeech, 2013, pp. 2345{2349.

DNN-based Acoustic Modeling for Robust Automatic Speech Recognition

유사주제 논문( 4,648)

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 주제별 논문영향력

주제별 논문영향력

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 참고문헌

' DNN-based Acoustic Modeling for Robust Automatic Speech Recognition' 의 유사주제( ) 논문