Beyond 2D image space : exploring 3D geometric variation for visual object recognition

정성훈 2022년
논문상세정보
' Beyond 2D image space : exploring 3D geometric variation for visual object recognition' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • camera viewpoint variation
  • convolutional neural networks
  • geometric variation
  • object detection
  • viewpoint estimation
  • visual object recognition
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
247 0

0.0%

' Beyond 2D image space : exploring 3D geometric variation for visual object recognition' 의 참고문헌

  • [9] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in ICCV, 2017, pp. 764–773.
    [2017]
  • [99] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” JMLR, vol. 9, no. 11, pp. 2579–2605, 2008.
    [2008]
  • [98] B. Pepik, M. Stark, P. Gehler, and B. Schiele, “Teaching 3d geometry to deformable part models,” in CVPR, 2012, pp. 3362–3369.
    [2012]
  • [97] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” IJCV, vol. 111, no. 1, pp. 98–136, 2015.
    [2015]
  • [96] P. Poirson, P. Ammirato, C.-Y. Fu, W. Liu, J. Kosecka, and A. C. Berg, “Fast single shot detection and pose estimation,” in 3DV, 2016, pp. 676–684.
    [2016]
  • [95] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in CVPR, 2012, pp. 3354–3361.
    [2012]
  • [94] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in ICCV, 2015, pp. 1026–1034.
    [2015]
  • [93] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
    [2017]
  • [92] F. Massa and R. Girshick, “maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch,” https://github.com/ facebookresearch/maskrcnn-benchmark, 2018.
    [2018]
  • [91] S. Wu, C. Rupprecht, and A. Vedaldi, “Unsupervised learning of probably symmetric deformable 3d objects from images in the wild,” in CVPR, 2020, pp. 1–10.
    [2020]
  • [90] J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in CVPR, 2019, pp. 2849–2858.
    [2019]
  • [8] M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” in NeurIPS, 2015, pp. 2017–2025.
    [2015]
  • [89] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in CVPR, 2015, pp. 1912–1920.
    [2015]
  • [88] X. Li, S. Liu, S. De Mello, K. Kim, X.Wang, M.-H. Yang, and J. Kautz, “Online adaptation for consistent mesh reconstruction in the wild,” in NeurIPS, 2020, pp. 15009–15019.
    [2020]
  • [87] X. Li, S. Liu, K. Kim, S. De Mello, V. Jampani, M.-H. Yang, and J. Kautz, “Self-supervised single-view 3d reconstruction via semantic consistency,” in ECCV, 2020, pp. 677–693.
    [2020]
  • [86] S. Goel, A. Kanazawa, and J. Malik, “Shape and viewpoint without keypoints,” in ECCV, 2020, pp. 88–104.
    [2020]
  • [85] Y. Xiang,W. Kim,W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese, “Objectnet3d: A large scale database for 3d object recognition,” in ECCV, 2016, pp. 160–176.
    [2016]
  • [84] C. Wen, Y. Zhang, Z. Li, and Y. Fu, “Pixel2mesh++: Multi-view 3d mesh generation via deformation,” in ICCV, 2019, pp. 1042–1051.
    [2019]
  • [83] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in ECCV, 2018, pp. 52–67.
    [2018]
  • [82] X. Wei, R. Yu, and J. Sun, “View-gcn: View-based graph convolutional network for 3d shape analysis,” in CVPR, 2020, pp. 1850–1859.
    [2020]
  • [81] C. Esteves, Y. Xu, C. Allen-Blanchette, and K. Daniilidis, “Equivariant multi-view networks,” in ICCV, 2019, pp. 1568–1577.
    [2019]
  • [80] T. Yu, J. Meng, and J. Yuan, “Multi-view harmonized bilinear network for 3d object recognition,” in CVPR, 2018, pp. 186–194.
    [2018]
  • [7] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE TPAMI, vol. 32, no. 9, pp. 1627–1645, 2009.
    [2009]
  • [79] O. Mariotti and H. Bilen, “Semi-supervised viewpoint estimation with geometry-aware conditional generation,” in ECCV, 2020, pp. 631–647.
    [2020]
  • [78] S. K. Mustikovela, V. Jampani, S. D. Mello, S. Liu, U. Iqbal, C. Rother, and J. Kautz, “Self-supervised viewpoint learning from image collections,” in CVPR, 2020, pp. 3971–3981.
  • [77] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
    [2015]
  • [76] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
    [2014]
  • [75] J. Deng,W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
    [2009]
  • [74] P. Khorramshahi, A. Kumar, N. Peri, S. S. Rambhatla, J.-C. Chen, and R. Chellappa, “A dual-path model with adaptive attention for vehicle re-identification,” in ICCV, 2019, pp. 6132–6141.
    [2019]
  • [73] Y. Zhou and L. Shao, “Viewpoint-aware attentive multi-view inference for vehicle re-identification,” in CVPR, 2018, pp. 6489–6498.
    [2018]
  • [72] T.-S. Chen, C.-T. Liu, C.-W. Wu, and S.-Y. Chien, “Orientation-aware vehicle re-identification with semantics-guided part attention network,” in ECCV, 2020, pp. 330–346.
    [2020]
  • [71] D. Meng, L. Li, X. Liu, Y. Li, S. Yang, Z.-J. Zha, X. Gao, S. Wang, and Q. Huang, “Parsing-based view-aware embedding network for vehicle re-identification,” in CVPR, 2020, pp. 7103–7112.
  • [70] B. He, J. Li, Y. Zhao, and Y. Tian, “Part-regularized near-duplicate vehicle re-identification,” in CVPR, 2019, pp. 3997–4005.
    [2019]
  • [6] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in NeurIPS, 2017, pp. 3856–3866.
    [2017]
  • [69] Z. Wang, L. Tang, X. Liu, Z. Yao, S. Yi, J. Shao, J. Yan, S. Wang, H. Li, and X. Wang, “Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification,” in ICCV, 2017, pp. 379–387.
    [2017]
  • [68] H. Liu, Y. Tian, Y. Yang, L. Pang, and T. Huang, “Deep relative distance learning: Tell the difference between similar vehicles,” in CVPR, 2016, pp. 2167–2175.
    [2016]
  • [67] X. Liu, W. Liu, T. Mei, and H. Ma, “A deep learning-based approach to progressive vehicle reidentification for urban surveillance,” in ECCV, 2016, pp. 869–884.
    [2016]
  • [66] Y. Zhao, K. Yan, F. Huang, and J. Li, “Graph-based high-order relation discovery for fine-grained recognition,” in CVPR, 2021, pp. 15079–15088.
  • [65] R. Du, D. Chang, A. K. Bhunia, J. Xie, Z. Ma, Y.-Z. Song, and J. Guo, “Fine-grained visual classification via progressive multi-granularity training of jigsaw patches,” in ECCV, 2020, pp. 153–168.
    [2020]
  • [64] Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in ICCV, 2019, pp. 6599–6608.
    [2019]
  • [63] W. Luo, X. Yang, X. Mo, Y. Lu, L. S. Davis, J. Li, J. Yang, and S.-N. Lim, “Cross-x learning for fine-grained visual categorization,” in ICCV, 2019, pp. 8242–8251.
    [2019]
  • [62] W. Ge, X. Lin, and Y. Yu, “Weakly supervised complementary parts models for fine-grained image classification from the bottom up,” in CVPR, 2019, pp. 3034–3043.
    [2019]
  • [61] H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, “Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition,” in CVPR, 2019, pp. 5012–5021.
    [2019]
  • [60] M. Sun, Y. Yuan, F. Zhou, and E. Ding, “Multi-attention multi-class constraint for fine-grained image recognition,” in ECCV, 2018, pp. 805–821.
    [2018]
  • [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
    [2016]
  • [59] T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear cnn models for fine-grained visual recognition,” in ICCV, 2015, pp. 1449–1457.
    [2015]
  • [58] R. Farrell, O. Oza, N. Zhang, V. I. Morariu, T. Darrell, and L. S. Davis, “Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance,” in ICCV, 2011, pp. 161–168.
    [2011]
  • [57] D. Onoro-Rubio, R. J. Lopez-Sastre, C. Redondo-Cabrera, and P. Gil-Jim´enez, “The challenge of simultaneous object detection and pose estimation: A comparative study,” IVC, vol. 79, pp. 109–122, 2018.
    [2018]
  • [56] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation using deep learning and geometry,” in CVPR, 2017, pp. 7074–7082.
    [2017]
  • [55] S. Mahendran, H. Ali, and R. Vidal, “3d pose regression using convolutional neural networks,” in ICCVW, 2017, pp. 2174–2182.
    [2017]
  • [54] Y. Wang, S. Li, M. Jia, and W. Liang, “Viewpoint estimation for objects with convolutional neural network trained on synthetic images,” in PCM, 2016, pp. 169–179.
    [2016]
  • [53] X. Zhou, A. Karpur, L. Luo, and Q. Huang, “Starmap for category-agnostic keypoint and viewpoint estimation,” in ECCV, 2018, pp. 318–334.
    [2018]
  • [52] A. Grabner, P. M. Roth, and V. Lepetit, “3d pose estimation and 3d model retrieval for objects in the wild,” in CVPR, 2018, pp. 3022–3031.
    [2018]
  • [51] G. Pavlakos, X. Zhou, A. Chan, K. G. Derpanis, and K. Daniilidis, “6-dof object pose from semantic keypoints,” in ICRA, 2017, pp. 2011–2018.
    [2017]
  • [50] J. Thewlis, H. Bilen, and A. Vedaldi, “Modelling and unsupervised learning of symmetric deformable object categories,” in NeurIPS, 2018, pp. 8178–8189.
    [2018]
  • [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, 2012, pp. 1097–1105.
    [2012]
  • [49] J. Thewlis, H. Bilen, and A. Vedaldi, “Unsupervised learning of object frames by dense equivariant image labelling,” in NeurIPS, 2017, pp. 844–855.
    [2017]
  • [48] A. Bas, P. Huber, W. A. Smith, M. Awais, and J. Kittler, “3d morphable models as spatial transformer networks,” in CVPRW, 2017, pp. 904–912.
    [2017]
  • [47] S. Kim, S. Lin, S. R. Jeon, D. Min, and K. Sohn, “Recurrent transformer networks for semantic correspondence,” in NeurIPS, 2018, pp. 6126–6136.
    [2018]
  • [46] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in ECCV, 2016, pp. 467–483.
    [2016]
  • [45] S. Kim, S. S¨usstrunk, and M. Salzmann, “Volumetric transformer networks,” in ECCV, 2020, pp. 561– 578.
    [2020]
  • [44] C. Esteves, C. Allen-Blanchette, X. Zhou, and K. Daniilidis, “Polar transformer networks,” in ICLR, 2018.
    [2018]
  • [43] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Interpretable transformations with encoder-decoder networks,” in ICCV, 2017, pp. 5726–5735.
    [2017]
  • [42] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
    [2015]
  • [41] K. Navaneet, A. Mathew, S. Kashyap,W.-C. Hung, V. Jampani, and R. V. Babu, “From image collections to point clouds with self-supervised shape and pose networks,” in CVPR, 2020, pp. 1132–1140.
    [2020]
  • [40] M. Tatarchenko, S. R. Richter, R. Ranftl, Z. Li, V. Koltun, and T. Brox, “What do single-view 3d reconstruction networks learn?,” in CVPR, 2019, pp. 3405–3414.
    [2019]
  • [3] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick, “Mask r-cnn,” in ICCV, 2017, pp. 2961–2969.
    [2017]
  • [39] E. Insafutdinov and A. Dosovitskiy, “Unsupervised learning of shape and pose with differentiable point clouds,” in NeurIPS, 2018, pp. 2802–2812.
    [2018]
  • [38] A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik, “Learning category-specific mesh reconstruction from image collections,” in ECCV, 2018, pp. 371–386.
    [2018]
  • [37] S. Liu, T. Li, W. Chen, and H. Li, “Soft rasterizer: A differentiable renderer for image-based 3d reasoning,” in ICCV, 2019, pp. 7708–7717.
    [2019]
  • [36] H. Kato, Y. Ushiku, and T. Harada, “Neural 3d mesh renderer,” in CVPR, 2018, pp. 3907–3916.
    [2018]
  • [35] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 2117–2125.
    [2017]
  • [34] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in ECCV, 2016, pp. 21–37.
    [2016]
  • [33] H. Zheng, J. Fu, T. Mei, and J. Luo, “Learning multi-attention convolutional neural network for finegrained image recognition,” in ICCV, 2017, pp. 5209–5217.
    [2017]
  • [32] J. Fu, H. Zheng, and T. Mei, “Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition,” in CVPR, 2017, pp. 4438–4446.
    [2017]
  • [31] J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in CVPR, 2015, pp. 5546–5555.
    [2015]
  • [30] S. Branson, G. Van Horn, S. Belongie, and P. Perona, “Bird species categorization using pose normalized deep convolutional nets,” in BMVC, 2014.
    [2014]
  • [2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014, pp. 580–587.
    [2014]
  • [29] P. Guo and R. Farrell, “Aligned to the object, not to the image: A unified pose-aligned representation for fine-grained recognition,” in WACV, 2019, pp. 1876–1885.
    [2019]
  • [28] J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in ICCVW, 2013, pp. 554–561.
    [2013]
  • [27] Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the wild,” in WACV, 2014, pp. 75–82.
    [2014]
  • [26] A. Kanezaki, Y. Matsushita, and Y. Nishida, “Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints,” in CVPR, 2018, pp. 5010–5019.
    [2018]
  • [25] C. Wang, M. Pelillo, and K. Siddiqi, “Dominant set clustering and pooling for multi-view 3d object recognition,” in BMVC, 2017, pp. 61.4–61.12.
    [2017]
  • [24] S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. Jan Latecki, “Gift: A real-time and scalable 3d shape search engine,” in CVPR, 2016, pp. 5023–5032.
    [2016]
  • [23] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in ICCV, 2015, pp. 945–953.
    [2015]
  • [22] M. Elhoseiny, T. El-Gaaly, A. Bakry, and A. Elgammal, “A comparative analysis and study of multiview cnn models for joint object categorization and pose estimation,” in ICML, 2016, pp. 888–897.
    [2016]
  • [21] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NeurIPS, 2015, pp. 91–99.
    [2015]
  • [1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
    [2015]
  • [19] G. Divon and A. Tal, “Viewpoint estimation—insights & model,” in ECCV, 2018, pp. 252–268.
    [2018]
  • [18] F. Massa, R. Marlet, and M. Aubry, “Crafting a multi-task cnn for viewpoint estimation,” in BMVC, 2016, pp. 91.1–91.12.
    [2016]
  • [17] H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views,” in ICCV, 2015, pp. 2686–2694.
    [2015]
  • [16] S. Tulsiani and J. Malik, “Viewpoints and keypoints,” in CVPR, 2015, pp. 1510–1519.
    [2015]
  • [15] Z. Shu, M. Sahasrabudhe, R. Alp Guler, D. Samaras, N. Paragios, and I. Kokkinos, “Deforming autoencoders: Unsupervised disentangling of shape and appearance,” in ECCV, 2018, pp. 650–665.
    [2018]
  • [14] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” in NeurIPS, 2016, pp. 379–387.
    [2016]
  • [13] C.-H. Lin and S. Lucey, “Inverse compositional spatial transformer networks,” in CVPR, 2017, pp. 2568–2576.
    [2017]
  • [130] S. Wu, T. Jakab, C. Rupprecht, and A. Vedaldi, “Dove: Learning deformable 3d objects by watching videos,” arXiv preprint arXiv:2107.10844, 2021.
  • [12] C. B. Choy, J. Gwak, S. Savarese, and M. Chandraker, “Universal correspondence network,” in NeurIPS, 2016, pp. 2414–2422.
    [2016]
  • [129] F. Kokkinos and I. Kokkinos, “Learning monocular 3d reconstruction of articulated categories from motion,” in CVPR, 2021, pp. 1737–1746.
  • [128] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in ECCV, 2016, pp. 17–35.
    [2016]
  • [127] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in ICCV, 2015, pp. 1116–1124.
    [2015]
  • [126] W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person reidentification,” in CVPR, 2014, pp. 152–159.
    [2014]
  • [125] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The inaturalist species classification and detection dataset,” in CVPR, 2018, pp. 8769–8778.
    [2018]
  • [124] A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, “Novel dataset for fine-grained image categorization: Stanford dogs,” in CVPRW, 2011, pp. 1–2.
    [2011]
  • [123] X. Liu, S. Zhang, Q. Huang, andW. Gao, “Ram: a region-aware deep model for vehicle re-identification,” in ICME, 2018, pp. 1–6.
    [2018]
  • [122] M. Zhou, Y. Bai, W. Zhang, T. Zhao, and T. Mei, “Look-into-object: Self-supervised structure modeling for object recognition,” in CVPR, 2020, pp. 11774–11783.
  • [121] R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, and F. Huang, “Attention convolutional binary neural tree for fine-grained visual categorization,” in CVPR, 2020, pp. 10468–10477.
    [2020]
  • [120] Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction learning for fine-grained image recognition,” in CVPR, 2019, pp. 5157–5166.
    [2019]
  • [11] S. Joung, S. Kim, M. Kim, I.-J. Kim, and K. Sohn, “Learning canonical 3d object representation for fine-grained recognition,” in ICCV, 2021.
  • [119] Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, and L. Wang, “Learning to navigate for fine-grained classification,” in ECCV, 2018, pp. 420–435.
    [2018]
  • [118] Y. Wang, V. I. Morariu, and L. S. Davis, “Learning a discriminative filter bank within a cnn for finegrained recognition,” in CVPR, 2018, pp. 4148–4157.
    [2018]
  • [117] Z. Li, Y. Yang, X. Liu, F. Zhou, S. Wen, and W. Xu, “Dynamic computational time for visual attention,” in ICCVW, 2017, pp. 1199–1209.
    [2017]
  • [116] X. Liu, T. Xia, J. Wang, Y. Yang, F. Zhou, and Y. Lin, “Fully convolutional attention networks for fine-grained recognition,” arXiv preprint arXiv:1603.06765, 2016.
    [2016]
  • [115] D. Wang, Z. Shen, J. Shao, W. Zhang, X. Xue, and Z. Zhang, “Multiple granularity descriptors for fine-grained categorization,” in ICCV, 2015, pp. 2399–2406.
    [2015]
  • [114] H. Zhang, T. Xu,M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas, “Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition,” in CVPR, 2016, pp. 1143–1152.
    [2016]
  • [113] A. Recasens, P. Kellnhofer, S. Stent, W. Matusik, and A. Torralba, “Learning to zoom: a saliency-based sampling layer for neural networks,” in ECCV, 2018, pp. 51–66.
    [2018]
  • [112] C.Wah, S. Branson, P.Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” in California Institute of Technology, 2011.
    [2011]
  • [111] U. Pinkall and K. Polthier, “Computing discrete minimal surfaces and their conjugates,” Experimental mathematics, vol. 2, no. 1, pp. 15–36, 1993.
    [1993]
  • [110] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018, pp. 586–595.
    [2018]
  • [10] S. Joung, S. Kim, H. Kim, M. Kim, I.-J. Kim, J. Cho, and K. Sohn, “Cylindrical convolutional networks for joint object detection and viewpoint estimation,” in CVPR, 2020, pp. 14163–14172.
    [2020]
  • [109] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV, 2020, pp. 213–229.
    [2020]
  • [108] R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” in NeurIPS, 2018, pp. 9605–9616.
    [2018]
  • [107] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL-HLT (1), 2019, pp. 4171–4186.
    [2019]
  • [106] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, pp. 5998–6008.
    [2017]
  • [105] N. Kulkarni, A. Gupta, and S. Tulsiani, “Canonical surface mapping via geometric cycle consistency,” in ICCV, 2019, pp. 2202–2211.
    [2019]
  • [104] S. Tulsiani, T. Zhou, A. A. Efros, and J. Malik, “Multi-view supervision for single-view reconstruction via differentiable ray consistency,” in CVPR, 2017, pp. 2626–2634.
    [2017]
  • [103] H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in CVPR, 2017, pp. 605–613.
    [2017]
  • [102] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3d-r2n2: A unified approach for single and multi-view 3d object reconstruction,” in ECCV, 2016, pp. 628–644.
    [2016]
  • [101] Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Subcategory-aware convolutional neural networks for object proposals and detection,” in WACV, 2017, pp. 924–933.
    [2017]
  • [100] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626.
    [2017]
  • Fast r-cnn
    R. Girshick pp . 1440 ? 1448 [2015]