Resource management of multi-channel processing-in-memory

김광수 2022년
논문상세정보
' Resource management of multi-channel processing-in-memory' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • hardwareaccelerator
  • mainmemory
  • memory management
  • near-data processing
  • processing-in-memory
  • system architecture
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
27 0

0.0%

' Resource management of multi-channel processing-in-memory' 의 참고문헌

  • [9] O. Mutlu, S. Ghose, J. G´omez-Luna, and R. Ausavarungnirun, “Enabling practical processing in and near memory for data-intensive computing,” in Proceedings of the 56th Annual Design Automation Conference 2019, 2019, pp. 1–4.
    [2019]
  • [84] M. Tirmazi, A. Barker, N. Deng, M. E. Haque, Z. G. Qin, S. Hand, M. Harchol-Balter, and J. Wilkes, “Borg: the Next Generation,” in Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys’20). Heraklion, Greece: ACM, 2020. [Online]. Available: https://doi.org/10.1145/3342195.3387517
  • [82] M. Arlitt and T. Jin, “A workload characterization study of the 1998 world cup web site,” IEEE network, vol. 14, no. 3, pp. 30–37, 2000.
    [2000]
  • [81] J. V. Kistowski, N. Herbst, S. Kounev, H. Groenda, C. Stier, and S. Lehrig, “Modeling and extracting load intensity profiles,” ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 11, no. 4, pp. 1–28, 2017.
    [2017]
  • [80] E. Van Eyk, J. Grohmann, S. Eismann, A. Bauer, L. Versluis, L. Toader, N. Schmitt, N. Herbst, C. L. Abad, and A. Iosup, “The spec-rg reference architecture for faas: From microservices and containers to serverless platforms,” IEEE Internet Computing, vol. 23, no. 6, pp. 7–18, 2019.
  • [7] O. Mutlu, S. Ghose, J. G´omez-Luna, and R. Ausavarungnirun, “A modern primer on processing in memory,” arXiv preprint arXiv:2012.03112, 2020.
    [2020]
  • [79] E. Van Eyk, A. Iosup, C. L. Abad, J. Grohmann, and S. Eismann, “A spec rg cloud group’s vision on the performance challenges of faas cloud architectures,” in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, 2018, pp. 21–24.
    [2018]
  • [78] L. Liu, S. Yang, L. Peng, and X. Li, “Hierarchical hybrid memory management in os for tiered memory systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 10, pp. 2223–2236, 2019.
    [2019]
  • [77] A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “Lazypim: An efficient cache coherence mechanism for processing-in-memory,” IEEE Computer Architecture Letters, vol. 16, no. 1, pp. 46–50, 2017.
    [2017]
  • [76] X. Pan, Y. J. Gownivaripalli, and F. Mueller, “Tintmalloc: Reducing memory access divergence via controller-aware coloring,” in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2016, pp. 363–372.
    [2016]
  • [75] S. Kannan, A. Gavrilovska, V. Gupta, and K. Schwan, “Heteroos—os design for heterogeneous memory management in datacenter,” in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2017, pp. 521–534.
    [2017]
  • [74] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C.Wu, “A software memory partition approach for eliminating bank-level interference in multicore systems,” in 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 2012, pp. 367–375.
    [2012]
  • [73] V. Chandru and F. Mueller, “Reducing noc and memory contention for manycores,” in International Conference on Architecture of Computing Systems. Springer, 2016, pp. 293–305.
    [2016]
  • [72] H. Yun, R. Mancuso, Z.-P. Wu, and R. Pellizzoni, “Palloc: Dram bankaware memory allocator for performance isolation on multicore platforms,” in 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 2014, pp. 155–166.
    [2014]
  • [71] S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing memory interference in multicore systems via application-aware memory channel partitioning,” in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011, pp. 374–385.
    [2011]
  • [70] S. Haria, M. D. Hill, and M. M. Swift, “Devirtualizing memory in heterogeneous systems,” in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2018, pp. 637–650.
    [2018]
  • [6] D. S. Cali, G. S. Kalsi, Z. Bing¨ol, C. Firtina, L. Subramanian, J. S. Kim, R. Ausavarungnirun, M. Alser, J. Gomez-Luna, A. Boroumand et al., “Genasm: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2020, pp. 951–966.
  • [69] Z. Xue and D. B. Thomas, “Sysalloc: A hardware manager for dynamic memory allocation in heterogeneous systems,” in 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2015, pp. 1–7.
    [2015]
  • [68] L. Ke, U. Gupta, B. Y. Cho, D. Brooks, V. Chandra, U. Diril, A. Firoozshahian, K. Hazelwood, B. Jia, H.-H. S. Lee et al., “Recnmp: Accelerating personalized recommendation with near-memory processing,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2020, pp. 790–803.
  • [67] T. Jeong, D. Choi, S. Han, and E.-Y. Chung, “A study of data layout in multi-channel processing-in-memory architecture,” in Proceedings of the 2018 7th International Conference on Software and Computer Applications, 2018, pp. 134–138.
    [2018]
  • [66] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti et al., “The gem5 simulator,” ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1–7, 2011.
    [2011]
  • [65] P. C. Santos, B. E. Forlin, and L. Carro, “Sim2pim: A fast method for simulating host independent & pim agnostic designs,” in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021, pp. 226–231.
  • [64] C. Yu, S. Liu, and S. Khan, “Multipim: A detailed and configurable multistack processing-in-memory simulator,” IEEE Computer Architecture Letters, vol. 20, no. 1, pp. 54–57, 2021.
  • [63] S. Xu, X. Chen, Y. Wang, Y. Han, X. Qian, and X. Li, “Pimsim: A flexible and detailed processing-in-memory simulator,” IEEE Computer Architecture Letters, vol. 18, no. 1, pp. 6–9, 2018.
    [2018]
  • [62] G. F. Oliveira, P. C. Santos, M. A. Alves, and L. Carro, “A generic processing in memory cycle accurate simulator under hybrid memory cube architecture,” in 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 2017, pp. 54–61.
    [2017]
  • [61] J. D. Leidel and Y. Chen, “Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations,” in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016, pp. 621–630.
    [2016]
  • [60] W. Lloyd, S. Ramesh, S. Chinthalapati, L. Ly, and S. Pallickara, “Serverless computing: An investigation of factors influencing microservice performance,” in 2018 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2018, pp. 159–169.
    [2018]
  • [5] A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan et al., “Google workloads for consumer devices: Mitigating data movement bottlenecks,” in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018, pp. 316–331.
  • [59] R. Pellegrini, I. Ivkic, and M. Tauber, “Function-as-a-service benchmarking framework,” arXiv preprint arXiv:1905.11707, 2019.
    [2019]
  • [58] T. Back and V. Andrikopoulos, “Using a microbenchmark to compare function as a service solutions,” in European Conference on Service- Oriented and Cloud Computing. Springer, 2018, pp. 146–160.
    [2018]
  • [57] S. K. Moore. (2021) Samsung speeds ai with processing in memory. [Online]. Available: https://spectrum.ieee.org/samsung-ai-memory-chips
  • [56] S. Rheindt, A. Fried, O. Lenke, L. Nolte, T. Wild, and A. Herkersdorf, “Nemesys: near-memory graph copy enhanced system-software,” in Proceedings of the International Symposium on Memory Systems, 2019, pp. 3–18.
    [2019]
  • [55] M. Zhang, Y. Zhuo, C. Wang, M. Gao, Y. Wu, K. Chen, C. Kozyrakis, and X. Qian, “Graphp: Reducing communication for pim-based graph processing with efficient data partition,” in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2018, pp. 544–557.
    [2018]
  • [54] M. Gao, G. Ayers, and C. Kozyrakis, “Practical near-data processing for in-memory analytics frameworks,” in 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 2015, pp. 113–124.
    [2015]
  • [53] K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O’Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler, “Transparent offloading and mapping (tom): Enabling programmer-transparent near-data processing in gpu systems,” in ACM SIGARCH Computer Architecture News, vol. 44, no. 3. IEEE Press, 2016, pp. 204–216.
  • [52] G. Kim, N. Chatterjee, M. O’Connor, and K. Hsieh, “Toward standardized near-data processing with unrestricted data placement for gpus,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2017, p. 24.
    [2017]
  • [51] Z. Liu, I. Calciu, M. Herlihy, and O. Mutlu, “Concurrent data structures for near-memory computing,” in Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017, pp. 235–245.
    [2017]
  • [50] Z. Sura, A. Jacob, T. Chen, B. Rosenburg, O. Sallenave, C. Bertolli, S. Antao, J. Brunheroto, Y. Park, K. O’Brien et al., “Data access optimization in a processing-in-memory system,” in Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015, pp. 1– 8.
    [2015]
  • [4] D. Milojicic, P. Faraboschi, N. Dube, and D. Roweth, “Future of hpc: Diversifying heterogeneity,” in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021, pp. 276–281.
  • [49] Q. Zhu, T. Graf, H. E. Sumbul, L. Pileggi, and F. Franchetti, “Accelerating sparse matrix-matrix multiplication with 3d-stacked logic-inmemory hardware,” in 2013 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2013, pp. 1–6.
    [2013]
  • [48] J. Zhan, I. Akgun, J. Zhao, A. Davis, P. Faraboschi, Y. Wang, and Y. Xie, “A unified memory network architecture for in-memory computing in commodity servers,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016, pp. 1–14.
    [2016]
  • [47] W. Sun, Z. Li, S. Yin, S. Wei, and L. Liu, “Abc-dimm: Alleviating the bottleneck of communication in dimm-based near-memory processing with inter-dimm broadcast,” in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 237–250.
  • [46] B. Gopireddy and J. Torrellas, “Designing vertical processors in monolithic 3d,” in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2019, pp. 643–656.
    [2019]
  • [45] M. M. S. Aly, M. Gao, G. Hills, C.-S. Lee, G. Pitner, M. M. Shulaker, T. F. Wu, M. Asheghi, J. Bokor, F. Franchetti et al., “Energy-efficient abundantdata computing: The n3xt 1,000 x,” Computer, vol. 48, no. 12, pp. 24–33, 2015.
    [2015]
  • [44] P. Batude, B. Sklenard, C. Fenouillet-Beranger, B. Previtali, C. Tabone, O. Rozeau, O. Billoint, O. Turkyilmaz, H. Sarhan, S. Thuries et al., “3d sequential integration opportunities and technology optimization,” in IEEE International Interconnect Technology Conference. IEEE, 2014, pp. 373– 376.
    [2014]
  • [43] Q. Zhu, B. Akin, H. E. Sumbul, F. Sadi, J. C. Hoe, L. Pileggi, and F. Franchetti, “A 3d-stacked logic-in-memory accelerator for applicationspecific data intensive computing,” in 2013 IEEE international 3D systems integration conference (3DIC). IEEE, 2013, pp. 1–7.
    [2013]
  • [42] Y. Tang, Y. Wang, H. Li, and X. Li, “Approxpim: Exploiting realistic 3dstacked dram for energy-efficient processing in-memory,” in 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2017, pp. 396–401.
    [2017]
  • [41] A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, R. Ausavarungnirun, K. Hsieh, N. Hajinazar, K. T. Malladi, H. Zheng et al., “Conda: Efficient cache coherence support for near-data accelerators,” in Proceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 629–642.
    [2019]
  • [40] M. Drumond, A. Daglis, N. Mirzadeh, D. Ustiugov, J. Picorel, B. Falsafi, B. Grot, and D. Pnevmatikatos, “The mondrian data engine,” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 639–651, 2017.
    [2017]
  • [3] M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, “Clearing the clouds: a study of emerging scale-out workloads on modern hardware,” Acm sigplan notices, vol. 47, no. 4, pp. 37–48, 2012.
  • [39] J. Standard, “High bandwidth memory (hbm) dram,” Jesd235, 2013.
    [2013]
  • [37] C. Sudarshan, J. Lappas, M. M. Ghaffar, V. Rybalkin, C.Weis, M. Jung, and N. Wehn, “An in-dram neural network processing engine,” in 2019 IEEE international symposium on circuits and systems (ISCAS). IEEE, 2019, pp. 1–5.
    [2019]
  • [36] J. D. Ferreira, G. Falcao, J. G´omez-Luna, M. Alser, L. Orosa, M. Sadrosadati, J. S. Kim, G. F. Oliveira, T. Shahroodi, A. Nori et al., “pluto: In-dram lookup tables to enable massively parallel general-purpose computation,” arXiv preprint arXiv:2104.07699, 2021.
  • [35] V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch et al., “Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,” in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013, pp. 185–197.
  • [34] S. Angizi and D. Fan, “Graphide: A graph processing accelerator leveraging in-dram-computing,” in Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019, pp. 45–50.
    [2019]
  • [33] V. Seshadri and O. Mutlu, “The processing using memory paradigm: In-dram bulk copy, initialization, bitwise and and or,” arXiv preprint arXiv:1610.09603, 2016.
    [2016]
  • [32] G. Dai, T. Huang, Y. Chi, J. Zhao, G. Sun, Y. Liu, Y. Wang, Y. Xie, and H. Yang, “Graphh: A processing-in-memory architecture for largescale graph processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 4, pp. 640–653, 2018.
    [2018]
  • [31] S. Angizi, Z. He, A. S. Rakin, and D. Fan, “Cmp-pim: an energy-efficient comparator-based processing-in-memory neural network accelerator,” in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1–6.
    [2018]
  • [30] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
    [2016]
  • [2] J. Ouyang, M. Noh, Y. Wang, W. Qi, Y. Ma, C. Gu, S. Kim, K.-i. Hong, W.-K. Bae, Z. Zhao et al., “Baidu kunlun an ai processor for diversified workloads,” in 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 2020, pp. 1–18.
    [2020]
  • [29] S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie, “Pinatubo: A processing- in-memory architecture for bulk bitwise operations in emerging nonvolatile memories,” in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 2016, pp. 1–6.
    [2016]
  • [28] A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “Lazypim: An efficient cache coherence mechanism for processing-in-memory,” IEEE Computer Architecture Letters, vol. 16, no. 1, pp. 46–50, 2016.
    [2016]
  • [27] J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-inmemory accelerator for parallel graph processing,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 105–117.
    [2015]
  • [26] J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “Pim-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture,” in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 2015, pp. 336–348.
    [2015]
  • [25] D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “Top-pim: Throughput-oriented programmable processing in memory,” in Proceedings of the 23rd international symposium on Highperformance parallel and distributed computing, 2014, pp. 85–98.
    [2014]
  • [24] J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang et al., “The architecture of the diva processing-in-memory chip,” in Proceedings of the 16th international conference on Supercomputing, 2002, pp. 14–25.
    [2002]
  • [23] M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava et al., “Mapping irregular applications to diva, a pim-based data-intensive architecture,” in SC’99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. IEEE, 1999, pp. 57–57.
  • [22] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A case for intelligent ram,” IEEE micro, vol. 17, no. 2, pp. 34–44, 1997.
    [1997]
  • [20] D. G. Elliott, W. M. Snelgrove, and M. Stumm, “Computational ram: A memory-simd hybrid and its application to dsp,” in Custom Integrated Circuits Conference, vol. 30. Citeseer, 1992, pp. 1–30.
    [1992]
  • [1] A. Rashid and A. Chaturvedi, “Cloud computing characteristics and services: a brief review,” International Journal of Computer Sciences and Engineering, vol. 7, no. 2, pp. 421–426, 2019.
    [2019]
  • [19] D. E. Shaw, S. J. Stolfo, H. Ibrahim, B. Hillyer, G. Wiederhold, and J. Andrews, “The non-von database machine: A brief overview,” IEEE Database Eng. Bull., vol. 4, no. 2, pp. 41–52, 1981.
    [1981]
  • [18] H. S. Stone, “A logic-in-memory computer,” IEEE Transactions on Computers, vol. 100, no. 1, pp. 73–78, 1970.
    [1970]
  • [17] M. Gokhale, B. Holmes, and K. Iobst, “Processing in memory: The terasys massively parallel pim array,” Computer, vol. 28, no. 4, pp. 23–31, 1995.
    [1995]
  • [16] Intel. (2019) A milestone in moving data. [Online]. Available: https://newsroom.intel.com/editorials/milestone-moving-data/
    [2019]
  • [15] J. Stuecheli, W. Starke, J. Irish, L. Arimilli, D. Dreps, B. Blaner, C. Wollbrink, and B. Allison, “Ibm power9 opens up a new era of acceleration enablement: Opencapi,” IBM Journal of Research and Development, vol. 62, no. 4/5, pp. 8–1, 2018.
    [2018]
  • [14] B. Benton, “Ccix, gen-z, opencapi: Overview and comparison,” in OpenFabrics Alliance 13th Workshop, 2017.
    [2017]
  • [13] M. Larabel. (2017) Cache coherent device memory for hmm. [Online]. Available: https://www.phoronix.com/scan.php?page=news item&px=CDM-HMMMemory
    [2017]
  • [11] T. kernel development community. Heterogeneous memory management (hmm). [Online]. Available: https://www.kernel.org/doc/html/latest/vm/hmm.html
  • Re : [ patch 0/6 ] cache coherent device memory ( cdm ) with hmm v5
    J. Glisse Available : http : //lkml.iu.edu/hypermail/linux/kernel/1711.1/06313.html [2017]
  • Opencapi technology
    O . Foundation Available : https : //openpowerfoundation.org/wp-content/uploads/2018/04/Myron- Slota.pdf [2018]
  • Memory scaling : A systems architecture perspective
    O. Mutlu pp . 21 ? 25 [2013]
  • Hybrid memory cube specification 2.1
  • Google cluster-usage traces v3
    J. Wilkes Available : https : //drive.google.com/file/d/10r6cnJ5cJ89fPWCgj7j4LtLBqYN9RiI9 [2020]
  • Execube-a new architecture for scaleable mpps
    P. M. Kogge Vol . 1 , vol . 1 .pp . 77 ? 84 . [1994]