박사

Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors

정이품 2020년
논문상세정보
' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 주제별 논문영향력
논문영향력 선정 방법
논문영향력 요약
주제
  • Coarse-Grained Instruction Commit
  • Dynamic Instruction Scheduling
  • Instruction Window
  • Memory Disambiguation
  • Performance
  • Precise Exception
  • Register Renaming
  • energy efficiency
  • speculation
  • 동적 명령어 스케줄링
  • 레지스터 리네이밍
  • 메모리 명확성
  • 명령어 윈도
  • 성능
  • 스페큘레이션
  • 에너지 효율성
  • 정확한 예외 처리
  • 코스 그레인드 커밋
동일주제 총논문수 논문피인용 총횟수 주제별 논문영향력의 평균
1,623 0

0.0%

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 참고문헌

  • ¡± in Proceedings of the 5th internationalConference on Software engineering
    pp . 439 ? 449 [1981]
  • ¡± in Performance Analysis of Systems and Software
    pp . 68 ? 77 . [2004]
  • ¡± in 1998 IEEE 4th International Symposium on High-PerformanceComputer Architecture ( HPCA )
    pp . 175 ? 184 [1998]
  • ¡°Use of selective precharge for low-powerContent-addressable memories , ¡± inCircuits and Systems
    vol . 3 . IEEEpp . 1788 ? 1791 [1997]
  • ¡°Two techniques to enhance the performance of memoryConsistency models
    pp . 355 ? 364 [1991]
  • ¡°TraceCache : a low latency approach to high bandwidth instruction fetching
    pp . 24 ? 35 [1996]
  • ¡°The load sliceCore microarchitecture , ¡± in Proceedings of the 42nd Annual International Symposium onComputer Architecture
    pp . 272 ? 284 [2015]
  • ¡°The heterogeneous block architecture , ¡± inComputer Design ( ICCD )
    pp . 386 ? 393 [2014]
  • ¡°The alpha 21264 microprocessor
    vol . 19 , no . 2 , pp . 24 ? 36 [1999]
  • ¡°Store vulnerability window ( svw ) : Re-execution filtering for enhanced load optimization , ¡± in Proceedings of the 32nd Annual International Symposium onComputer Architecture
    pp . 458 ? 468 [2005]
  • ¡°Speculative precomputation : Long-range prefetching of delin137quent loads ,
    pp . 14 ? 25 . [2001]
  • ¡°Speculation techniques for improving load related instruction scheduling
    pp . 42 ? 53 [1999]
  • ¡°Software-hardwareCooperative memory disambiguation , ¡± in 2006 IEEE 12th International Symposium on HighPerformanceComputer Architecture ( HPCA )
    pp . 244 ? 253 [2006]
  • ¡°Scalable store-load forwarding via store queue index prediction , ¡± in Proceedings of the 38th Annual ACM/IEEE International Symposium on Microarchitecture
    pp . 159 ? 170 [2005]
  • ¡°Scalable hardware memory disambiguation for high ilp processors ,
    p. 399 . [2003]
  • ¡°Runahead execution : An alternative to very large instruction windows for out-of-order processors
    pp . 129 ? 140 [2003]
  • ¡°Revisiting ilp designs for throughput-oriented gpgpu architecture , ¡± inCluster ,Cloud and GridComputing (CCGrid ) , 2015 15th IEEE/ACM International Symposium on
    pp . 121 ? 130 [2015]
  • ¡°Reno : a rename-based instruction optimizer , ¡± in Proceedings of the 32nd Annual International Symposium onComputer Architecture
    pp . 98 ? 109 [2005]
  • ¡°Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources
    pp . 90 ? 101 [2001]
  • ¡°Reducing designComplexity of the load/store queue
    p. 411 . [2003]
  • ¡°Ramulator : A fast and extensible dram simulator
    vol . 15 , no . 1 , pp . 45 ? 49 [2016]
  • ¡°Quantifying sources of error in mcpat and potential impacts on architectural studies , ¡± in 2015 IEEE 21st International Symposium on High-PerformanceComputer Architecture ( HPCA )
    pp . 577 ? 589 [2015]
  • ¡°Putting the fill unit to work : Dynamic optimizations for traceCache microprocessors ,
    pp . 173 ? 181 [1998]
  • ¡°Performance improvement by prioritizing the issue of the instructions in unconfident branch slices
    pp . 82 ? 94 [2018]
  • ¡°Parrot : power awareness through selective dynamically optimized traces
    pp . 196 ? 214 . [2003]
  • ¡°Overcome :Coarse-grained instructionCommit with handover register renaming ,
    vol . 68 , no . 12 , pp . 1802 ? 1816 [2019]
  • ¡°Nosq : Store-loadCommunication without a store queue , ¡± in Proceedings of the 39th Annual ACM/IEEE International Symposium on Microarchitecture
    pp . 285 ? 296 [2006]
  • ¡°Morphcore : An energy-efficient microarchitecture for high performance ilp and high throughput tlp ,
    pp . 305 ? 316 . [2012]
  • ¡°Mlp-aware dynamic instruction window resizing for adaptively exploiting both ilp and mlp ,
    pp . 37 ? 48 . [2013]
  • ¡°Large virtual robs by processorCheckpointing , ¡± Technical Report UPC-DAC-2002-39
    [2002]
  • ¡°Kilo-instruction processors : Overcoming the memory wall
    vol . 25 , no . 3 , pp . 48 ? 57 , [2005]
  • ¡°Investigating the implementation of a block structured processor architecture in an early design stage , ¡± in EUROMICROConference
    vol . 1 . IEEEpp . 186 ? 193 [1999]
  • ¡°Inside 6th-generation intelCore : New microarchitectureCode-named skylake
    vol . 37 , no . 2 , pp . 52 ? 62 [2017]
  • ¡°Increasing the size of atomic instruction blocks usingControl flow assertions , ¡± in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture
    pp . 303 ? 313 [2000]
  • ¡°Increasing the instruction fetch rate via multiple branch prediction and a branch addressCache
    pp . 67 ? 76 [1993]
  • ¡°Increasing processor performance through early register release ,
    pp . 480 ? 487
  • ¡°Improving traceCache effectiveness with branch promotion and trace packing ,
    [1998]
  • ¡°Hardware schemes for early register release , ¡± in Parallel Processing , 2002
    pp . 5 ? 13 . [2002]
  • ¡°Freeway : Maximizing mlp for slice-out-of-order execution , ¡± in 2019 IEEE 24th International Symposium on High-PerformanceComputer Architecture ( HPCA )
    pp . 558 ? 569 [2019]
  • ¡°Fiforder microarchitecture : Ready-aware instruction scheduling for ooo processors
    pp . 716 ? 721 [2019]
  • ¡°Exploring the performance limits of out-of-orderCommit
    pp . 211 ? 220 [2017]
  • ¡°Exploiting instruction level parallelism in processors byCaching scheduled groups
    pp . 13 ? 25 [1997]
  • ¡°Evaluation of issue queue delay : Banking tag ram and identifyingCorrectCritical path , ¡± in 2011 IEEE 29th InternationalConference onComputer Design ( ICCD )
    pp . 313 ? 319 [2011]
  • ¡°Energy and performance improvements in microprocessor design using a loopCache ,
    pp . 378 ? 383 [1999]
  • ¡°Energy : efficient instruction dispatch buffer design for superscalar processors
    pp . 237 ? 242 [2001]
  • ¡°Dynamos : dynamic schedule migration for heterogeneousCores ,
    pp . 322 ? 333 [2015]
  • ¡°Dynamic speculation and synchronization of data dependences , ¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture
    pp . 181 ? 193 [1997]
  • ¡°Design and evaluation of a hierarchical decoupled architecture ,
    vol . 38 , no . 3 , pp . 237 ? 259 [2006]
  • ¡°Delaying physical register allocation through virtual-physical registers ,
    pp . 186 ? 192 [1999]
  • ¡°Crob : implementing a large instruction window throughCompression , ¡± in Transactions on high-performance embedded architectures andCompilers III
    pp . 115 ? 134 . [2011]
  • ¡°Cprob :Checkpoint processing with opportunistic minimal recovery , ¡± in Parallel Architectures andCompilation Techniques
    pp . 159 ? 168 . [2009]
  • ¡°Compiler directed early register release , ¡± in 14th InternationalConference on Parallel Architectures andCompilation Techniques ( PACT¡¯05 )
    pp . 110 ? 119 [2005]
  • ¡°Cherry :Checkpointed early resource recycling in out-of-order microprocessors ,
    pp . 3 ? 14 [2002]
  • ¡°Checkpoint processing and recovery : Towards scalable large instruction window processors , ¡± in Proceedings of the 36th Annual ACM/IEEE International Symposium on Microarchitecture
    pp . 423 ? 434 [2003]
  • ¡°CASINOCore microarchitecture : Generating out-of-order schedules usingCascaded in-order scheduling windows ,
    [2020]
  • ¡°Beating in-order stalls with flea-flicker two-pass pipelining ,
    36 [2003]
  • ¡°AutomaticallyCharacterizing large scale program behavior , ¡± in Proceedings of the Tenth InternationalConference on Architectural Support for Programming Languages and Operating Systems
    pp . 45 ? 57 . [2002]
  • ¡°AssigningConfidence toConditional branch predictions
    pp . 142 ? 152 [1996]
  • ¡°Address-value decoupling for early register deallocation , ¡± in Parallel Processing
    pp . 337 ? 346 [2006]
  • ¡°AComplexity-effective out-of-order retirement microarchitecture
    vol . 58 , no . 12 , pp . 1626 ? 1639 [2009]
  • ¡°AComparative performance evaluation of various state maintenance mechanisms
    pp . 70 ? 79 [1993]
  • ¡°A large , fast instruction window for toleratingCache misses ,
    pp . 59 ? 70 . [2002]
  • ¡°A high-speed dynamic instruction scheduling scheme for superscalar processors
    pp . 225 ? 236 . [2001]
  • ¡°A group-commit mechanism for robbased processors implementing the x86 isa , ¡± in 2013 IEEE 19th International Symposium on High-PerformanceComputer Architecture ( HPCA )
    pp . 47 ? 58 . [2013]
  • ¡°A front-end execution architecture for high energy efficiency
    pp . 419 ? 431 [2014]
  • ¡°40-entry unified out-of-order scheduler and integer execution unit for the amd bulldozer x86 ?
    pp . 80 ? 82 [2011]
  • ¡° Multi2Sim : A Simulation Framework forCPU-GPUComputing
    [2012]
  • criticality-aware resource allocation in ooo processors , ¡± in Proceedings of the 48th Annual ACM/IEEE International Symposium on Microarchitecture
    pp . 334 ? 346 [2015]
  • [9] D. Folegnani and A. Gonzalez, ¡°Energy-effective issue logic,¡± in Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001, pp. 230?239.
  • [99] W. W. Hwu and Y. N. Patt, ¡°Checkpoint repair for out-of-order execution machines,¡± in Proceedings of the 14th Annual International Symposium on Computer Architecture, 1987, pp. 18?26.
  • [98] A.Cristal, D. Ortega, J. Llosa, and M. Valero, ¡°Out-of-orderCommit processors,¡± in 2004 IEEE 10th International Symposium on High-PerformanceComputer Architecture (HPCA), 2004, pp. 48?59.
    pp . 48 ? 59 . [2004]
  • [94] J. E. Smith and A. R. Pleszkun, ¡°Implementation of precise interrupts in pipelined processors,¡± in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp. 36?44.
  • [92] H. W. Cain and M. H. Lipasti, ¡°Memory ordering: A value-based approach,¡± in Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004, pp. 90?.
  • [8] C. Isci and M. Martonosi, ¡°Runtime power monitoring in high-end processors: Methodology and empirical data,¡± in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003, p. 93.
  • [86] G. Z. Chrysos and J. S. Emer, ¡°Memory dependence prediction using store sets,¡± in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 142?153.
  • [85] A. Moshovos and G. S. Sohi, ¡°Streamlining inter-operation memory communication via data dependence prediction,¡± in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997, pp. 235? 245.
  • [7] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc, ¡°Design of ion-implanted mosfet¡¯s with very small physical dimensions,¡± IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256?268, 1974.
  • [79] W. W. Ro and J.-L. Gaudiot, ¡°Spear: A hybrid model for speculative preexecution,¡± in 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, 2004, p. 75.
  • [76] J. Dundas and T. Mudge, ¡°Improving data cache performance by pre-executing instructions under a cache miss,¡± in Proceedings of the 11th international conference on Supercomputing. ACM, 1997, pp. 68?75.
  • [75] C. Ozturk and R. Sendag, ¡°An analysis of hard to predict branches,¡± in Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010, pp. 213?222.
  • [74] J. Casazza, ¡°First the tick, now the tock: Intel microarchitecture (nehalem),¡± Intel Corporation, 2009.
  • [73] J. P. Shen and M. H. Lipasti, Modern processor design: fundamentals of superscalar processors. Waveland Press, 2013.
  • [6] S. Borkar and A. A. Chien, ¡°The future of microprocessors,¡± Communications of the ACM, vol. 54, no. 5, pp. 67?77, 2011.
  • [62] S. J. Patel and S. S. Lumetta, ¡°replay: A hardware framework for dynamic optimization,¡± IEEE transactions on computers, vol. 50, no. 6, pp. 590?608, 2001.
  • [5] J. Bolaria, ¡°Cortex-a57 extends ARM¡¯s reach,¡± Microprocessor Report, vol. 11, no. 5, pp. 12?1, 2012.
  • [57] P. Greenhalgh, ¡°Big. little processing with arm cortex-a15 & cortex-a7,¡± ARM White paper, vol. 17, 2011.
  • [56] K. Krewell, ¡°Cortex-a53 is ARM¡¯s next little thing,¡± Microprocessor Report, vol. 11, no. 5, pp. 12?2, 2012.
  • [55] A. Cortex, ¡°A9 processor,¡± https://www.arm.com/products/processors/cortexa/cortex-a9.php, 2011.
  • [53] P. Shivakumar and N. P. Jouppi, ¡°Cacti 3.0: An integrated cache timing, power, and area model,¡± Technical Report 2001/2, Compaq Computer Corporation, Tech. Rep., 2001.
  • [4] D. Schor, ¡°Intel reveals 10nm sunny cove core, a new core roadmap, and teases ice lake chips.¡± https://bit.ly/2NLEbTg, 2018.
  • [49] C. D. Spradling, ¡°SPEC CPU2006 Benchmark Tools,¡± SIGARCH Computer Architecture News, vol. 35, March 2007.
  • [48] K. Pagiamtzis and A. Sheikholeslami, ¡°Content-addressable memory (cam) circuits and architectures: A tutorial and survey,¡± IEEE journal of solid-state circuits, vol. 41, no. 3, pp. 712?727, 2006.
  • [42] E. Safi, A. Moshovos, and A. Veneris, ¡°Two-stage, pipelined register renaming,¡± IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 10, pp. 1926?1931, 2010.
    vol . 19 , no . 10 , pp .
  • [39] P. Salverda and C. Zilles, ¡°Dependence-based scheduling revisited: A tale of two baselines,¡± in 6th Annual Workshop on Duplicating, Deconstructing, and Debunking, 2007.
  • [37] E. Talpes and D. Marculescu, ¡°Execution cache-based microarchitecture for power-efficient superscalar processors,¡± IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 1, pp. 14?26, 2005.
  • [33] D. S. McFarlin, C. Tucker, and C. Zilles, ¡°Discerning the dominant out-of-order performance advantage: Is it speculation or dynamism?¡± in Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ¡¯13. New York, NY, USA: ACM, 2013, pp. 241?252. [Online]. Available: http://doi.acm.org/10.1145/2451116.2451143
  • [32] F. M. Sleiman and T. F. Wenisch, ¡°Efficiently scaling out-of-order cores for simultaneous multithreading,¡± in Proceedings of the 43rd Annual International Symposium on Computer Architecture, 2016, pp. 431?443.
  • [26] A. Buyuktosunoglu, A. El-Moursy, and D. H. Albonesi, ¡°An oldest-first selection logic implementation for non-compacting issue queues,¡± in 15th International ASIC/SOC Conference. Citeseer, 2002, pp. 31?35.
  • [24] P. G. Sassone, J. Rupley, II, E. Brekelbaum, G. H. Loh, and B. Black, ¡°Matrix scheduler reloaded,¡± in Proceedings of the 34th Annual International Symposium onComputer Architecture, 2007, pp. 335?346.
    pp . 335 ? 346 [2007]
  • [23] S. Palacharla, N. P. Jouppi, and J. E. Smith, ¡°Complexity-effective superscalar processors,¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture, 1997, pp. 206?218.
    pp . 206 ? 218 . [1997]
  • [19] J. Stark, M. D. Brown, and Y. N. Patt, ¡°On pipelining dynamic instruction scheduling logic,¡± in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000, pp. 57?66.
    pp . 57 ? 66 [2000]
  • [18] S. Palacharla, N. P. Jouppi, and J. E. Smith, Quantifying the complexity of superscalar processors. University of Wisconsin-Madison, Computer Sciences Department, 1996.
  • [17] J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.
  • [13] A. Ros and S. Kaxiras, ¡°The superfluous load queue,¡± in Proceedings of the 51st Annual ACM/IEEE International Symposium on Microarchitecture, 2018, pp. 95?107.
  • [11] N. P. Jouppi and J. Smith, ¡°Complexity-effective superscalar processors,¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture, 1997.
    [1997]
  • [10] N. H. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
  • [109] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith, ¡°Trace processors,¡± in Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, 1997, pp. 138?148.
    pp . 138 ? 148 [1997]
  • [107] E. Sprangle and Y. Patt, ¡°Facilitating superscalar processing via a combined static/dynamic register renaming scheme,¡± in Proceedings of the 27th Annual IEEE/ACM International Symposium on Microarchitecture, 1994, pp. 143? 147.
  • [105] W. W. Hwu and Y. N. Patt, ¡°Checkpoint repair for high-performance outof-order execution machines,¡± IEEE Transactions on Computers, vol. 100, no. 12, pp. 1496?1514, 1987.
  • [103] A. Cristal, O. J. Santana, M. Valero, and J. F. Martinez, ¡°Toward kiloinstruction processors,¡± ACM Transactions on Architecture and Code Optimization (TACO), vol. 1, no. 4, pp. 389?417, 2004.
  • [100] K. Jothi and H. Akkary, ¡°Tuning the continual flow pipeline architecture,¡± in Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 2013, pp. 243?252.
  • D. M. Tullsen , and N. P. Jouppi , Mcpat : an integrated power , area , and timing modeling framework for multicore and manycore architectures , ¡± in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
    pp . 469 ? 480 [2009]