그래프 어텐션 심층신경망을 이용한 화자 인증 및 위변조 음성 검출 통합 시스템 = Graph attention deep neural networks for speaker verification and audio anti-spoofing integrated system

심혜진 2022년
논문상세정보
' 그래프 어텐션 심층신경망을 이용한 화자 인증 및 위변조 음성 검출 통합 시스템 = Graph attention deep neural networks for speaker verification and audio anti-spoofing integrated system' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • 그래프 어텐션 심층신경망
  • 딥 러닝
  • 위변조 음성 검출
  • 화자 인증
  • 화자 인증 및 위변조 음성 검출 통합 시스템
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
1,948 0

0.0%

' 그래프 어텐션 심층신경망을 이용한 화자 인증 및 위변조 음성 검출 통합 시스템 = Graph attention deep neural networks for speaker verification and audio anti-spoofing integrated system' 의 참고문헌

  • [9] D. Snyder, P. Ghahremani, D. Povey, D. Garcia- Romero, Y. Carmiel, and S. Khudanpur, “Deep neural networkbased speaker embeddings for end-to-end speaker verification,” 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings, pp. 165–170, Feb. 2017, doi: 10.1109/SLT.2016.7846260.
  • [5] S. Gao, M. Cheng, K. Zhao, … X. Z.-I. transactions on, and undefined 2019, “Res2net: A new multi-scale backbone architecture,” IEEE Trans Pattern Anal Mach Intell, 2019,
  • [53] D. Kwasny and D. Hemmerling, “Gender and Age Estimation Methods Based on Speech Using Deep Neural Networks,” Sensors 2021, Vol. 21, Page 4785, vol. 21, no. 14, p. 4785, Jul. 2021, doi: 10.3390/S21144785.
  • [52] X. Chen, Z. Li, S. Setlur, and W. Xu, “Exploring racial and gender disparities in voice biometrics,” Nature, Scientific Reports, vol. 12, p. 3723, 2022, doi: 10.1038/s41598-022- 06673-y.
  • [51] D. Byrd, “Preliminary results on speakerdependent variation in the TIMIT database,” Citation: The Journal of the Acoustical Society of America, vol. 92, p. 593, 1992, doi: 10.1121/1.404271.
    [1992]
  • [50] S. Shon, H. Tang, and J. Glass, “Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model,” 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings, pp. 1007–1013, Feb. 2019, doi: 10.1109/SLT.2018.8639622.
  • [49] J. P. Eatock and J. S. Mason, “A quantitative assessment of the relative speaker discriminating properties of phonemes,” Acoustics, Speech, and Signal Processing, IEEE International Conference on, vol. 1, pp. 133–136, Apr. 1994, doi: 10.1109/ICASSP.1994.389337.
    [1994]
  • [48] P. Scanlon, D. P. W. Ellis, and R. B. Reilly, “Using Broad Phonetic Group Experts for Improved Speech Recognition”.
  • [47] G. Zhu, F. Jiang, and Z. Duan, “Y-Vector: Multiscale Waveform Encoder for Speaker Embedding,” in Proc. INTERSPEECH, vol. 1, pp. 626–630, 2021, doi: 10.21437/Interspeech.2021-1707.
  • [45] S. Kye, Y. Kwon, and J. Chung, “Cross attentive pooling for speaker verification,” in Proc. IEEE Spoken Language Technology Workshop(SLT), 2021.
  • [44] Y. Jung, S. Kye, Y. Choi, M. Jung, and H. Kim, “Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances,” in Proc. INTERSPEECH, pp. 1501–1505, 2020, doi: 10.21437/Interspeech.2020-1025.
    [2020]
  • [43] Y. Liu, Y. Song, I. McLoughlin, and L. Liu, “An effective deep embedding learning method based on denseresidual networks for speaker verification,” in Proc. ICASSP, 2021.
  • [42] Y. Yu, L. Fan, and W. Li, “Ensemble additive margin softmax for speaker verification,” in Proc. ICASSP, 2019.
    [2019]
  • [41] J. Chung et al., “In defence of metric learning for speaker recognition,” in Proc. ICASSP, 2020, Accessed: Apr. 01, 2022. [Online]. Available: https://arxiv.org/abs/2003.11982
  • [40] H. Delgado et al., “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements,” 2018. doi: 10.7488/ds/298.
  • [3] S. Prince and J. Elder, “Probabilistic linear discriminant analysis for inferences about identity,” 2007.
    [2007]
  • [39] Lopes C and Perdigão F, “TIMIT Acoustic-Phonetic Continuous Speech Corpus | Semantic Scholar.”
  • [37] J. Jung et al., “AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks,” 2022.
  • [35] G. Lavrentyeva, S. Novoselov, A. Tseren, M. Volkova, A. Gorlanov, and A. Kozlov, “STC Antispoofing Systems for the ASVspoof2019 Challenge,” in Proc. INTERSPEECH, vol. 2019-September, pp. 1033–1037, Apr. 2019, doi: 10.48550/arxiv.1904.05576.
    [2019]
  • [33] R. Caruana, “Multitask Learning,” Machine Learning, vol. 28, no. 1, pp. 41–75, 1997, doi: 10.1023/A:1007379606734.
    [1997]
  • [32] H. Gao and S. Ji, “Graph U-Nets,” 2019.
    [2019]
  • [31] J. Jung, H. Heo, I. Yang, and H. Shim, “A complete end-to-end speaker verification system using deep neural networks: From raw signals to verification result,” in Proc ICASSP, 2018.
    [2018]
  • [2] W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308–311, May 2006, doi: 10.1109/LSP.2006.870086.
    [2006]
  • [28] A. Nagrani, J. Chung, and A. Zisserman, “Voxceleb: a large-scale speaker identification dataset,” 2017.
    [2017]
  • [27] H. Heo, B. Lee, J. Huh, and J. Chung, “Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020,” arXiv preprint, Sep. 2020.
    [2020]
  • [26] A. Atwood and D. Towsley, “Diffusion- Convolutional Neural Networks,” 2016. https://proceedings.neurips.cc/paper/2016/hash/390e982518 a50e280d8e2b535462ec1f-Abstract.html
  • [25] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in Proc. ICML, 2016,
    [2016]
  • [24] A. Michele, “Neural network for graphs: A contextual constructive approach,” IEEE Transactions on Neural Networks, pp. 498–511, 2009, https://ieeexplore.ieee.org/abstract/document/4773279/
    [2009]
  • [23] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and deep locally connected networks on graphs,” in Proc. ICLR, 2014.
    [2014]
  • [22] P. Veličković, A. Casanova, P. Liò, G. Cucurull, A. Romero, and Y. Bengio, “Graph Attention Networks,” in Proc. ICLR, Oct. 2018, doi: 10.48550/arxiv.1710.10903.
    [2018]
  • [21] T. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 2018.
    [2018]
  • [20] A. Maas, A. Hannun, and Ng. AY, “Rectifier nonlinearities improve neural network acoustic models,” 2013.
    [2013]
  • [19] A. Sizov, E. Khoury, T. Kinnunen, Z. Wu, and S. Marcel, “Joint Speaker Verification and Anti-Spoofing in i102 vector space,” IEEE Transactions on Information Forensics and Security, 2015.
    [2015]
  • [18] J. Li, M. Sun, and X. Zhang, “Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection,” in Proc. APSIPA, pp. 1517–1522, Nov. 2019, doi: 10.1109/APSIPAASC47483.2019.9023289.
    [2019]
  • [16] M. Sahidullah et al., “Integrated spoofing countermeasures and automatic speaker verification: An evaluation on ASVspoof 2015,” 2016. doi: 10.21437/Interspeech.2016-1280.
  • [15] T. Kinnunen et al., “Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 28, pp. 2195– 2210, 2020, doi: 10.1109/TASLP.2020.3009494.
    [2020]
  • [14] J. Yamagishi, X. Wang, M. Todisco, and M. Sahidullah, “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” 2021. Acce
  • [12] T. Kinnunen, M. Sahidullah, H. Delgado, and M. Todisco, “The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection,” 2017.
    [2017]
  • [11] Z. Wu et al., “ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge,” 2015.
    [2015]
  • [10] Y. Zhu, T. Ko, D. Snyder, B. Mak, D. P.- Interspeech, and undefined 2018, “Self-attentive speaker embeddings for text-independent speaker verification.,” in in Proc. INTERSPEECH, 2018, pp. 3573–3577.
  • X-vectors : Robust dnn embeddings for speaker recognition
  • Wav2Spk : A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms.
    W. Lin and M. Man , doi : 10.21437/Interspeech.2020-1287 . [2020]
  • VoxCeleb2 : Deep Speaker Recognition
    J. Chung , A. Nagrani , and A. Zisserman vol . 20 , pp . 1086 ? 1090 . doi : 10.21437/Interspeech.2018- 1929 . [2018]
  • Squeeze-andexcitation networks
    J. Hu , L. Shen , and G. Sun pp . 7132 ? 7141 [2018]
  • Self-Attention Graph Pooling
    J. Lee , I. Lee , and J. Kang pp . 3734 ? 3743 . [2019]
  • Pre-emphasis and speech recognition
    R. Vergin and D. O ’ Shaughnessy pp . 1062 ? 1065 [1995]
  • Integrated presentation attack detection and automatic speaker verification : Common features and Gaussian back-end fusion
    M. Todisco doi : 10.21437/Interspeech.2018- 2289 ? . [2018]
  • Improved RawNet with feature map scaling for text-independent speaker verification using raw waveforms ,
    J. Jung , S. Kim , H. Shim , J. Kim , and Yu vol . 2020-October , pp . 1496 ? 1500 . doi : 10.21437/Interspeech.2020-1011 . [2020]
  • ECAPA-TDNN : Emphasized Channel Attention , Propagation and Aggregation in TDNN Based Speaker Verification
    B. Desplanques , J. Thienpondt , and K. Demuynck , vol . 2020-October , pp . 3830 ? 3834 . doi : 10.21437/Interspeech.2020-2650 . [2020]
  • Deep residual learning for image recognition
  • Audio Replay Attack Detection with Deep Learning Frameworks Spoofing Detection Methods for Automatic Speaker Verification System View project Audio replay attack detection with deep learning frameworks
    G. Lavrentyeva , doi : 10.21437/Interspeech.2017-360 . [2017]
  • Attentive statistics pooling for deep speaker embedding
    Okabe , T. Koshinaka , and K. Shinoda , vol . 2018-September , pp . 2252 ? 2256 . doi : 10.21437/Interspeech.2018-993 . [2018]
  • ASVspoof 2019 : Future horizons in spoofed and fake audio detection
  • A Light CNN for Deep Face Representation with Noisy Labels ,
    X. Wu , R. He , Z . Sun , and T. Tan , vol . 13 , no . 11 , pp . 2884 ? 2896 , Nov. [2015]