박사

비디오 스토리 질의응답을 위한 주의 깊은 시각-언어 스토리 표현 학습 = Attentive Visual-Linguistic Story Representation Learning for Video Story Question Answering

김경민 2018년
논문상세정보
    • 저자 김경민
    • 형태사항 26 cm
    • 일반주기 지도교수: 장병탁
    • 학위논문사항 전기·컴퓨터공학부, 학위논문(박사)-, 2018. 8, 서울대학교 대학원
    • DDC 621.3
    • 발행지 서울
    • 언어 eng
    • 출판년 2018
    • 발행사항 서울대학교 대학원
    유사주제 논문( 4,648)
' 비디오 스토리 질의응답을 위한 주의 깊은 시각-언어 스토리 표현 학습 = Attentive Visual-Linguistic Story Representation Learning for Video Story Question Answering' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • 응용 물리
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
4,649 0

0.0%

' 비디오 스토리 질의응답을 위한 주의 깊은 시각-언어 스토리 표현 학습 = Attentive Visual-Linguistic Story Representation Learning for Video Story Question Answering' 의 참고문헌

  • Zhang, B.-T., Ha, J.-W., and Kang, M. (2012) Sparse Population Code Models of Word Learning in Concept Drift. In Proceedings of the 34th Annual Conference of Cogitive Science Society (Cogsci 2012). 1221-1226.
  • Zhang, B.-T. (2013) Information-Theoretic Objective Functions for Lifelong Learning. AAAI 2013 Spring Symposium on Lifelong Machine Learning. 62-69.
  • Zhang, B.-T. (2008) Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory, IEEE Computational Intelligence Magazine, 3(3):49-63.
  • Zeng, K. H., Chen, T. H., Chuang, C. Y., Liao, Y. H., Niebles, J.C., Sun, M. (2017) Leveraging video descriptions to learn video question answering. In Proceedings of Association for the Advancement of Artificial Intelligence (AAAI 2017).
  • Yu, L., Hermann, K. M., Blunsom, P., and Pulman, S. (2014) Deep Learning for Answer Sentence Selection, arXiv preprint arXiv:1412.1632.
  • Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A. (2015) Describing Videos by Exploiting Temporal Structure, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
  • Xu, J., Mei, T., Yao, T., and Rui, Y. (2016) Msr-vtt: A large video description dataset for bridging video and language. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016).
  • Weston, J., Chopra, S., and Bordes A. (2015) Memory Networks. In Proceedings of International Conference of Learning Representations (ICLR 2015).
  • Weston, J., Bordes, A., Chopra, S., Rush, A. M., Merri nboer, B., Joulin, A., Mikolov, T. (2014) Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In arXiv preprint arXiv:1502.05698.
  • Weston, J., Bengio, S., and Usunier, N. (2010) Large Scale Image Annotation: Learning to Rank with Joint Word-image Embeddings. Machine Learning. 81(1):21-35.
  • Wang, D., and Nyberg, E. (2015) A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering, In Proceedings of Association for Computational Linguistics (ACL 2015).
  • Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015) Show and Tell: A Neural Image Caption Generator. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
  • Venugopalan, S., Rohrbach, M., Donahue, J., Darrell, T., Mooney, R., and Saenko, K.. (2015) Sequence to Sequence - Video to Text, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I. (2017) Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2017).
  • Tran, D., Bourdev, L. D., Fergus, R., Torresani, L., Paluri, M. (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of International Conference on Computer Vision (ICCV 2015).
  • Torabi, A., Pal, C., Larochelle, H., Courville, A. (2015) Using descriptive video services to create a large data source for video annotation research. In arXiv preprint arXiv:1503.01070v1.
  • Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., and Fidler, S. (2016) MovieQA: Understanding Stories in Movies through Question- Answering. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016).
  • Tan, M., Santos, C., Xiang, B., and Zhou, B. (2016) LSTM-based Deep Learning Models for Non-factoid Answer Selection. International Conference on Learning Representations Workshop.
  • Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. (2015) End-To-End Memory Networks. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2015).
  • Schuster, M. and Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673{2681.
  • Saerbeck, M., Schut, T., Bartneck, C.; and Janse, M. D. (2010) Expressive robots in education: varying the degree of social supportive behavior of a robotic tutor. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1613–1622.
  • Rohrbach, A., Rohrbach, M., Tandon, N., and Schiele, B. (2015) A dataset for movie description. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3202–3212.
  • Richardson, M. (2013) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text, In Proceedings of Conference on Emprical Methods in Natural Language Processing (EMNLP 2013).
  • Ren, M., Kiros, R., and Zemel, R. (2015) Image question answering: A visual semantic embedding model and a new dataset. ICML 2015 Deep Learning Workshop.
  • Reimers, N., Gurevych, I. (2017) Optimal hyperparameters for deep lstmnetworks for sequence labeling tasks. In Proceedings of Conference on Emprical Methods in Natural Language Processing (EMNLP 2017).
  • Pennington, J., Socher, R., Manning, C. D. (2014) Glove: Global vectors for word representation. In Proceedings of Conference on Emprical Methods in Natural Language Processing (EMNLP 2014).
  • Nair, V., and Hinton, G.E. (2010) Rectified linear units improve restricted boltzmann machines. In Proceedings of International Conference on Machine Learning (ICML 2010).
  • Na, S. I., Lee, S.H., Kim, J.S., Kim, G.H. (2017) A read-write memory network for movie story understanding. In Proceedings of International Conference on Computer Vision (ICCV 2017).
  • Mun, J., Seo, P. H., Jung, I., Han, B. (2017) MarioQA: Answering Questions by Watching Gameplay Videos, In Proceedings of International Conference on Computer Vision (ICCV 2017).
  • Mikolov, T. Sutskever, I. Chen, K., Corrado, G., and Dean, J. (2013) Distributed Representation of Words and Phrases and Their Compositionality, In Proceedings of Advances in Neural Information Processing Systems (NIPS 2013).
  • Malinowski, M., Rohrbach, M., and Fritz, M. (2015) Ask your neurons: A neural-based approach to answering questions about images. In Proceedings of International Conference on Computer Vision (ICCV 2015).
  • Lu, J., Yang, J., Batra, D., Parikh, D. (2016) Hierarchical question-image coattention for visual question answering. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2016).
  • Li, Y., Song, Y., Cao, L., Tetreault, J., Goldberg, L., Jaimes, A., and Luo, J. (2016) TGIF: A New Dataset and Benchmark on Animated GIF Description. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016).
  • Leyzberg, D., Spaulding, S., and Scassellati, B. (2014) Personalizing robot tutors to individuals’ learning differences. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction. 423– 430.
  • Kory, J. M., and Breazeal, C. L. (2014) Storytelling with Robots: Learning Companions for Preschool Children’s Language Development. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).
  • Kory, J. M., Jeong, S., and Breazeal, C. L. (2013) Robotic learning companions for early language development. In Proceedings of the 15th ACM on International conference on multimodal interaction (ICMI 2013). 71–72.
  • Kiros, R., Zhu, Y., Salakhutdinov, R., and Zemel, R. S., Torralba, A., Urtasun, R., Fidler, S. (2015a) Skip-Thought Vectors. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2015).
  • Kiros, R., Salakhutdinov, R., and Zemel, R. S. (2015b) Unifying Visual- Semantic Embeddings with Multimodal Neural Language Models, In Proceedings of Transactions of the Association for Computational Linguistics (ACL 2015).
  • Kingma, D.P., and, Ba, J. (2015) Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR 2015).
  • Kim, Y. (2014) Convolutional neural networks for sentence classification. In Proceedings of Conference on Emprical Methods in Natural Language Processing (EMNLP 2014).
  • Kim, K.-M., Nan, C.-J., Heo, M.-O., and Zhang, B.-T. (2016b) "Pororobot: Child Tutoring Robot for English Education", International Symposium on Perception, Action, and Cognitive Systems (PACS).
  • Kim, K.-M., Nan, C.-J., Heo, M.-O., Choi, S.-H., and Zhang, B.-T. (2016a) "PororoQA: A Cartoon Video Series Dataset for Story Understanding", NIPS 2016 Workshop on Large Scale Computer Vision System.
  • Kim, K.-M., Nan, C.-J., Ha, J.-W., Heo Y.-J., and Zhang, B.-T. (2015) "Pororobot: A Deep Learning Robot That Plays Video Q&A Games", AAAI 2015 Fall Symposium on AI for Human-Robot Interaction (AI-HRI 2015).
  • Kim, K.-M., Choi, S.-H., Choi, S.-J., Kim, S.-H., and Zhang, B.-T. (2017b) "MuSM: Multimodal Sequence Memory for Video Story Question Answering". ICCV 2017 Workshop on The Joint Video and Language Understanding, 2017
  • Kim, K. M., Heo, M. O, Choi, S. H , and Zhang, B. T. (2017a) Deepstory video story qa by deep embedded memory networks. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2017).
  • Kim, J. H., On, K.W., Lim, W. S., Kim, J. H., Ha, J. W., Zhang, B. T. (2017b) Hadamard product for low-rank bilinear pooling. In Proceedings of International Conference on Learning Representations (ICLR 2017).
  • Kim, J. H., Lee, S. W., Kwak, D. H., Heo, M. O., Kim, J., Ha, J. W., and Zhang, B. T. (2016b) Multimodal residual learning for visual QA. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2016).
  • Karpathy, A., and Fei-Fei, L. (2015) Deep Visual-Semantic Alignments for Generating Image Description. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015). 3128-3137.
  • Jang, Y. S., Song, Y., Yu, Y. J., Kim, Y. J., Kim, G. H. (2017) Tgif-qa: Toward spatiotemporal reasoning in visual question answering. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017).
  • Jabri, A., Joulin, A., van der Maaten, L. (2016) Revisiting visual question answering baselines. In Proceedings of European Conference on Computer Vision (ECCV 2016).
  • Hovy, E., Gerber, L., Hermjakob, U., Lin, C. Y., and Ravichandran D. (2001) Toward Semantics-based Answer Pinpointing. In Proceedings of Human Language Technology Conference, 339-345.
  • Hochreiter, S., and Schmidhuber. J., Long short-term memory, Neural Computation, 9(8): 1735–1780, 1997.
  • Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and, Salakhutdinov, R.R. (2012) Improving neural networks by preventing co-adaptation of feature detectors. In ArXiv eprint arXiv:1207.0580.
  • He, K., Zhang, X., Ren, S., Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016).
  • Ha, J.-W., Kim, K.-M., and Zhang B.-T. (2015) Automated Construction of Visual-Linguistic Knowledge via Concept Learning from Cartoon Videos. In Proceedings of Association for the Advancement of Artificial Intelligence (AAAI 2015).
  • Glorot, X., and, Bengio, Y. (2010) Understanding the di_culty of training deep feedforward neural networks. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTAT 2010).
  • Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y. N. (2017) Convolutional sequence to sequence learning. In Arxiv eprint arXiv:1705.03122.
  • Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., and Xu, W. (2015) Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2015).
  • Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M. (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding. In Proceedings of Conference on Emprical Methods in Natural Language Processing (EMNLP 2016).
  • Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., and Mikolov, T. (2013) DeViSE: A Deep Visual-Semantic Embedding Model. In Proceedings of Advances in Neural Information Processing Systems (NIPS 2013), 2121-2129.
  • Fridin, M. (2014) Storytelling by a kindergarten social assistive robot: A tool for constructive learning in preschool education. Computers & Education. 70(0):53–64.
  • Fasola, J., and Mataric, M. (2013) A socially assistive robot exercise coach for the elderly. Journal of Human-Robot Interaction. 2(2):3-32.
  • Fang, H., Gupta, S., Iandola, F., Srivastava, R. K., Deng, L., Doll r, P., Gao, J., He, X., Mitchell, M., Platt, J. C., Zitnick, C. L., and Zweig, G. (2015) From Captions to Visual Concepts and Back, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015). 1473-1482
  • Chen, D., Bolton, J., Manning, C. D. (2016) A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. In Proceedings of Association for Computational Linguistics (ACL 2016).
  • Bordes, A., Usunier, N., Chopra, S., Weston, J. (2015) Large-scale Simple Question Answering with Memory Networks, In arXiv preprint arXiv: 1506.02075.
  • Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009) Curriculum Learning, In Proceedings of International Conference on Machine Learning (ICML 2009).
  • Bahdanau, D., Cho, K., and Bengio, Y. (2015) Neural machine translation by jointly learning to align and translate. In Proceedings of International Conference on Learning Representations (ICLR 2015).
  • Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C. L., Batra, D., and Parikh, D. (2015) VQA: Visual Question Answering. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
  • Abney, S., Collins, M., and Singhal, A. (2000) Answer Extraction. In Proceedings of the 6th Applied Natural Language Processing Conference, 296-301.