Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors

정이품 2020년

활용도
공유도
영향력

논문상세정보

- 저자 정이품
- 기타서명 고성능 비순차 실행 프로세서를 위한 에너지 효율적인 명령어 스케줄링 기법
- 형태사항 xii, 148장 :: 26 cm: 삽화
- 일반주기 지도교수: Won Woo Ro
- 학위논문사항 2020.2, Department of Electrical and Electronic Engineering, 학위논문(박사) -, Graduate School, Yonsei University
- 발행지 [Seoul]
- 언어 eng
- 출판년 2020
- 발행사항 Graduate School, Yonsei University
- 주제어 Coarse-Grained Instruction Commit Dynamic Instruction Scheduling Instruction Window Memory Disambiguation Performance Precise Exception Register Renaming Energy Efficiency Speculation 동적 명령어 스케줄링 레지스터 리네이밍 메모리 명확성 명령어 윈도 성능 스페큘레이션 에너지 효율성 정확한 예외 처리 코스 그레인드 커밋
- 참고문헌( 112)
유사주제 논문( 1,605)

인용/피인용

Energy-efficient instruction scheduling mechanisms ...

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 주제별 논문영향력

논문영향력 요약
동일주제 총논문수	논문피인용 총횟수	주제별 논문영향력의 평균
주제	Coarse-Grained Instruction Commit Dynamic Instruction Scheduling Instruction Window Memory Disambiguation Performance Precise Exception Register Renaming energy efficiency speculation 동적 명령어 스케줄링 레지스터 리네이밍 메모리 명확성 명령어 윈도 성능 스페큘레이션 에너지 효율성 정확한 예외 처리 코스 그레인드 커밋
1,623	0	0.0%

논문영향력
주제		주제별 논문수	주제별 논문영향력
주제어	Coarse-Grained Instruction ...	1	0.0%
	Dynamic Instruction Schedul ...	1	0.0%
	Instruction Window	1	0.0%
	Memory Disambiguation	1	0.0%
	Performance	1,279	0.0%
	Precise Exception	1	0.0%
	Register Renaming	1	0.0%
	energy efficiency	175	0.0%
	speculation	26	0.0%
	동적 명령어 스케줄링	1	0.0%
	레지스터 리네이밍	1	0.0%
	메모리 명확성	1	0.0%
	명령어 윈도	1	0.0%
	성능	86	0.0%
	스페큘레이션	1	0.0%
	에너지 효율성	44	0.0%
	정확한 예외 처리	1	0.0%
	코스 그레인드 커밋	1	0.0%
계		1,623	0.0%
* 다른 주제어 보유 논문에서 피인용된 횟수

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 참고문헌

¡± in Proceedings of the 5th internationalConference on Software engineering
pp . 439 ? 449 [1981]
¡± in Performance Analysis of Systems and Software
pp . 68 ? 77 . [2004]
¡± in 1998 IEEE 4th International Symposium on High-PerformanceComputer Architecture ( HPCA )
pp . 175 ? 184 [1998]
¡°Use of selective precharge for low-powerContent-addressable memories , ¡± inCircuits and Systems
vol . 3 . IEEEpp . 1788 ? 1791 [1997]
¡°Two techniques to enhance the performance of memoryConsistency models
pp . 355 ? 364 [1991]
¡°TraceCache : a low latency approach to high bandwidth instruction fetching
pp . 24 ? 35 [1996]
¡°The load sliceCore microarchitecture , ¡± in Proceedings of the 42nd Annual International Symposium onComputer Architecture
pp . 272 ? 284 [2015]
¡°The heterogeneous block architecture , ¡± inComputer Design ( ICCD )
pp . 386 ? 393 [2014]
¡°The alpha 21264 microprocessor
vol . 19 , no . 2 , pp . 24 ? 36 [1999]
¡°Store vulnerability window ( svw ) : Re-execution filtering for enhanced load optimization , ¡± in Proceedings of the 32nd Annual International Symposium onComputer Architecture
pp . 458 ? 468 [2005]
¡°Speculative precomputation : Long-range prefetching of delin137quent loads ,
pp . 14 ? 25 . [2001]
¡°Speculation techniques for improving load related instruction scheduling
pp . 42 ? 53 [1999]
¡°Software-hardwareCooperative memory disambiguation , ¡± in 2006 IEEE 12th International Symposium on HighPerformanceComputer Architecture ( HPCA )
pp . 244 ? 253 [2006]
¡°Scalable store-load forwarding via store queue index prediction , ¡± in Proceedings of the 38th Annual ACM/IEEE International Symposium on Microarchitecture
pp . 159 ? 170 [2005]
¡°Scalable hardware memory disambiguation for high ilp processors ,
p. 399 . [2003]
¡°Runahead execution : An alternative to very large instruction windows for out-of-order processors
pp . 129 ? 140 [2003]
¡°Revisiting ilp designs for throughput-oriented gpgpu architecture , ¡± inCluster ,Cloud and GridComputing (CCGrid ) , 2015 15th IEEE/ACM International Symposium on
pp . 121 ? 130 [2015]
¡°Reno : a rename-based instruction optimizer , ¡± in Proceedings of the 32nd Annual International Symposium onComputer Architecture
pp . 98 ? 109 [2005]
¡°Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources
pp . 90 ? 101 [2001]
¡°Reducing designComplexity of the load/store queue
p. 411 . [2003]
¡°Ramulator : A fast and extensible dram simulator
vol . 15 , no . 1 , pp . 45 ? 49 [2016]
¡°Quantifying sources of error in mcpat and potential impacts on architectural studies , ¡± in 2015 IEEE 21st International Symposium on High-PerformanceComputer Architecture ( HPCA )
pp . 577 ? 589 [2015]
¡°Putting the fill unit to work : Dynamic optimizations for traceCache microprocessors ,
pp . 173 ? 181 [1998]
¡°Performance improvement by prioritizing the issue of the instructions in unconfident branch slices
pp . 82 ? 94 [2018]
¡°Parrot : power awareness through selective dynamically optimized traces
pp . 196 ? 214 . [2003]
¡°Overcome :Coarse-grained instructionCommit with handover register renaming ,
vol . 68 , no . 12 , pp . 1802 ? 1816 [2019]
¡°Nosq : Store-loadCommunication without a store queue , ¡± in Proceedings of the 39th Annual ACM/IEEE International Symposium on Microarchitecture
pp . 285 ? 296 [2006]
¡°Morphcore : An energy-efficient microarchitecture for high performance ilp and high throughput tlp ,
pp . 305 ? 316 . [2012]
¡°Mlp-aware dynamic instruction window resizing for adaptively exploiting both ilp and mlp ,
pp . 37 ? 48 . [2013]
¡°Large virtual robs by processorCheckpointing , ¡± Technical Report UPC-DAC-2002-39
[2002]
¡°Kilo-instruction processors : Overcoming the memory wall
vol . 25 , no . 3 , pp . 48 ? 57 , [2005]
¡°Investigating the implementation of a block structured processor architecture in an early design stage , ¡± in EUROMICROConference
vol . 1 . IEEEpp . 186 ? 193 [1999]
¡°Inside 6th-generation intelCore : New microarchitectureCode-named skylake
vol . 37 , no . 2 , pp . 52 ? 62 [2017]
¡°Increasing the size of atomic instruction blocks usingControl flow assertions , ¡± in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture
pp . 303 ? 313 [2000]
¡°Increasing the instruction fetch rate via multiple branch prediction and a branch addressCache
pp . 67 ? 76 [1993]
¡°Increasing processor performance through early register release ,
pp . 480 ? 487
¡°Improving traceCache effectiveness with branch promotion and trace packing ,
[1998]
¡°Hardware schemes for early register release , ¡± in Parallel Processing , 2002
pp . 5 ? 13 . [2002]
¡°Freeway : Maximizing mlp for slice-out-of-order execution , ¡± in 2019 IEEE 24th International Symposium on High-PerformanceComputer Architecture ( HPCA )
pp . 558 ? 569 [2019]
¡°Fiforder microarchitecture : Ready-aware instruction scheduling for ooo processors
pp . 716 ? 721 [2019]
¡°Exploring the performance limits of out-of-orderCommit
pp . 211 ? 220 [2017]
¡°Exploiting instruction level parallelism in processors byCaching scheduled groups
pp . 13 ? 25 [1997]
¡°Evaluation of issue queue delay : Banking tag ram and identifyingCorrectCritical path , ¡± in 2011 IEEE 29th InternationalConference onComputer Design ( ICCD )
pp . 313 ? 319 [2011]
¡°Energy and performance improvements in microprocessor design using a loopCache ,
pp . 378 ? 383 [1999]
¡°Energy : efficient instruction dispatch buffer design for superscalar processors
pp . 237 ? 242 [2001]
¡°Dynamos : dynamic schedule migration for heterogeneousCores ,
pp . 322 ? 333 [2015]
¡°Dynamic speculation and synchronization of data dependences , ¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture
pp . 181 ? 193 [1997]
¡°Design and evaluation of a hierarchical decoupled architecture ,
vol . 38 , no . 3 , pp . 237 ? 259 [2006]
¡°Delaying physical register allocation through virtual-physical registers ,
pp . 186 ? 192 [1999]
¡°Crob : implementing a large instruction window throughCompression , ¡± in Transactions on high-performance embedded architectures andCompilers III
pp . 115 ? 134 . [2011]
¡°Cprob :Checkpoint processing with opportunistic minimal recovery , ¡± in Parallel Architectures andCompilation Techniques
pp . 159 ? 168 . [2009]
¡°Compiler directed early register release , ¡± in 14th InternationalConference on Parallel Architectures andCompilation Techniques ( PACT¡¯05 )
pp . 110 ? 119 [2005]
¡°Cherry :Checkpointed early resource recycling in out-of-order microprocessors ,
pp . 3 ? 14 [2002]
¡°Checkpoint processing and recovery : Towards scalable large instruction window processors , ¡± in Proceedings of the 36th Annual ACM/IEEE International Symposium on Microarchitecture
pp . 423 ? 434 [2003]
¡°CASINOCore microarchitecture : Generating out-of-order schedules usingCascaded in-order scheduling windows ,
[2020]
¡°Beating in-order stalls with flea-flicker two-pass pipelining ,
36 [2003]
¡°AutomaticallyCharacterizing large scale program behavior , ¡± in Proceedings of the Tenth InternationalConference on Architectural Support for Programming Languages and Operating Systems
pp . 45 ? 57 . [2002]
¡°AssigningConfidence toConditional branch predictions
pp . 142 ? 152 [1996]
¡°Address-value decoupling for early register deallocation , ¡± in Parallel Processing
pp . 337 ? 346 [2006]
¡°AComplexity-effective out-of-order retirement microarchitecture
vol . 58 , no . 12 , pp . 1626 ? 1639 [2009]
¡°AComparative performance evaluation of various state maintenance mechanisms
pp . 70 ? 79 [1993]
¡°A large , fast instruction window for toleratingCache misses ,
pp . 59 ? 70 . [2002]
¡°A high-speed dynamic instruction scheduling scheme for superscalar processors
pp . 225 ? 236 . [2001]
¡°A group-commit mechanism for robbased processors implementing the x86 isa , ¡± in 2013 IEEE 19th International Symposium on High-PerformanceComputer Architecture ( HPCA )
pp . 47 ? 58 . [2013]
¡°A front-end execution architecture for high energy efficiency
pp . 419 ? 431 [2014]
¡°40-entry unified out-of-order scheduler and integer execution unit for the amd bulldozer x86 ?
pp . 80 ? 82 [2011]
¡° Multi2Sim : A Simulation Framework forCPU-GPUComputing
[2012]
criticality-aware resource allocation in ooo processors , ¡± in Proceedings of the 48th Annual ACM/IEEE International Symposium on Microarchitecture
pp . 334 ? 346 [2015]
[9] D. Folegnani and A. Gonzalez, ¡°Energy-effective issue logic,¡± in Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001, pp. 230?239.
[99] W. W. Hwu and Y. N. Patt, ¡°Checkpoint repair for out-of-order execution machines,¡± in Proceedings of the 14th Annual International Symposium on Computer Architecture, 1987, pp. 18?26.
[98] A.Cristal, D. Ortega, J. Llosa, and M. Valero, ¡°Out-of-orderCommit processors,¡± in 2004 IEEE 10th International Symposium on High-PerformanceComputer Architecture (HPCA), 2004, pp. 48?59.
pp . 48 ? 59 . [2004]
[94] J. E. Smith and A. R. Pleszkun, ¡°Implementation of precise interrupts in pipelined processors,¡± in Proceedings of the 12th Annual International Symposium on Computer Architecture, 1985, pp. 36?44.
[92] H. W. Cain and M. H. Lipasti, ¡°Memory ordering: A value-based approach,¡± in Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004, pp. 90?.
[8] C. Isci and M. Martonosi, ¡°Runtime power monitoring in high-end processors: Methodology and empirical data,¡± in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003, p. 93.
[86] G. Z. Chrysos and J. S. Emer, ¡°Memory dependence prediction using store sets,¡± in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 142?153.
[85] A. Moshovos and G. S. Sohi, ¡°Streamlining inter-operation memory communication via data dependence prediction,¡± in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997, pp. 235? 245.
[7] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc, ¡°Design of ion-implanted mosfet¡¯s with very small physical dimensions,¡± IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256?268, 1974.
[79] W. W. Ro and J.-L. Gaudiot, ¡°Spear: A hybrid model for speculative preexecution,¡± in 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. IEEE, 2004, p. 75.
[76] J. Dundas and T. Mudge, ¡°Improving data cache performance by pre-executing instructions under a cache miss,¡± in Proceedings of the 11th international conference on Supercomputing. ACM, 1997, pp. 68?75.
[75] C. Ozturk and R. Sendag, ¡°An analysis of hard to predict branches,¡± in Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010, pp. 213?222.
[74] J. Casazza, ¡°First the tick, now the tock: Intel microarchitecture (nehalem),¡± Intel Corporation, 2009.
[73] J. P. Shen and M. H. Lipasti, Modern processor design: fundamentals of superscalar processors. Waveland Press, 2013.
[6] S. Borkar and A. A. Chien, ¡°The future of microprocessors,¡± Communications of the ACM, vol. 54, no. 5, pp. 67?77, 2011.
[62] S. J. Patel and S. S. Lumetta, ¡°replay: A hardware framework for dynamic optimization,¡± IEEE transactions on computers, vol. 50, no. 6, pp. 590?608, 2001.
[5] J. Bolaria, ¡°Cortex-a57 extends ARM¡¯s reach,¡± Microprocessor Report, vol. 11, no. 5, pp. 12?1, 2012.
[57] P. Greenhalgh, ¡°Big. little processing with arm cortex-a15 & cortex-a7,¡± ARM White paper, vol. 17, 2011.
[56] K. Krewell, ¡°Cortex-a53 is ARM¡¯s next little thing,¡± Microprocessor Report, vol. 11, no. 5, pp. 12?2, 2012.
[55] A. Cortex, ¡°A9 processor,¡± https://www.arm.com/products/processors/cortexa/cortex-a9.php, 2011.
[53] P. Shivakumar and N. P. Jouppi, ¡°Cacti 3.0: An integrated cache timing, power, and area model,¡± Technical Report 2001/2, Compaq Computer Corporation, Tech. Rep., 2001.
[4] D. Schor, ¡°Intel reveals 10nm sunny cove core, a new core roadmap, and teases ice lake chips.¡± https://bit.ly/2NLEbTg, 2018.
[49] C. D. Spradling, ¡°SPEC CPU2006 Benchmark Tools,¡± SIGARCH Computer Architecture News, vol. 35, March 2007.
[48] K. Pagiamtzis and A. Sheikholeslami, ¡°Content-addressable memory (cam) circuits and architectures: A tutorial and survey,¡± IEEE journal of solid-state circuits, vol. 41, no. 3, pp. 712?727, 2006.
[42] E. Safi, A. Moshovos, and A. Veneris, ¡°Two-stage, pipelined register renaming,¡± IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 10, pp. 1926?1931, 2010.
vol . 19 , no . 10 , pp .
[39] P. Salverda and C. Zilles, ¡°Dependence-based scheduling revisited: A tale of two baselines,¡± in 6th Annual Workshop on Duplicating, Deconstructing, and Debunking, 2007.
[37] E. Talpes and D. Marculescu, ¡°Execution cache-based microarchitecture for power-efficient superscalar processors,¡± IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 1, pp. 14?26, 2005.
[33] D. S. McFarlin, C. Tucker, and C. Zilles, ¡°Discerning the dominant out-of-order performance advantage: Is it speculation or dynamism?¡± in Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ¡¯13. New York, NY, USA: ACM, 2013, pp. 241?252. [Online]. Available: http://doi.acm.org/10.1145/2451116.2451143
[32] F. M. Sleiman and T. F. Wenisch, ¡°Efficiently scaling out-of-order cores for simultaneous multithreading,¡± in Proceedings of the 43rd Annual International Symposium on Computer Architecture, 2016, pp. 431?443.
[26] A. Buyuktosunoglu, A. El-Moursy, and D. H. Albonesi, ¡°An oldest-first selection logic implementation for non-compacting issue queues,¡± in 15th International ASIC/SOC Conference. Citeseer, 2002, pp. 31?35.
[24] P. G. Sassone, J. Rupley, II, E. Brekelbaum, G. H. Loh, and B. Black, ¡°Matrix scheduler reloaded,¡± in Proceedings of the 34th Annual International Symposium onComputer Architecture, 2007, pp. 335?346.
pp . 335 ? 346 [2007]
[23] S. Palacharla, N. P. Jouppi, and J. E. Smith, ¡°Complexity-effective superscalar processors,¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture, 1997, pp. 206?218.
pp . 206 ? 218 . [1997]
[19] J. Stark, M. D. Brown, and Y. N. Patt, ¡°On pipelining dynamic instruction scheduling logic,¡± in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000, pp. 57?66.
pp . 57 ? 66 [2000]
[18] S. Palacharla, N. P. Jouppi, and J. E. Smith, Quantifying the complexity of superscalar processors. University of Wisconsin-Madison, Computer Sciences Department, 1996.
[17] J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.
[13] A. Ros and S. Kaxiras, ¡°The superfluous load queue,¡± in Proceedings of the 51st Annual ACM/IEEE International Symposium on Microarchitecture, 2018, pp. 95?107.
[11] N. P. Jouppi and J. Smith, ¡°Complexity-effective superscalar processors,¡± in Proceedings of the 24th Annual International Symposium onComputer Architecture, 1997.
[1997]
[10] N. H. Weste and D. Harris, CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
[109] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith, ¡°Trace processors,¡± in Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, 1997, pp. 138?148.
pp . 138 ? 148 [1997]
[107] E. Sprangle and Y. Patt, ¡°Facilitating superscalar processing via a combined static/dynamic register renaming scheme,¡± in Proceedings of the 27th Annual IEEE/ACM International Symposium on Microarchitecture, 1994, pp. 143? 147.
[105] W. W. Hwu and Y. N. Patt, ¡°Checkpoint repair for high-performance outof-order execution machines,¡± IEEE Transactions on Computers, vol. 100, no. 12, pp. 1496?1514, 1987.
[103] A. Cristal, O. J. Santana, M. Valero, and J. F. Martinez, ¡°Toward kiloinstruction processors,¡± ACM Transactions on Architecture and Code Optimization (TACO), vol. 1, no. 4, pp. 389?417, 2004.
[100] K. Jothi and H. Akkary, ¡°Tuning the continual flow pipeline architecture,¡± in Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 2013, pp. 243?252.
D. M. Tullsen , and N. P. Jouppi , Mcpat : an integrated power , area , and timing modeling framework for multicore and manycore architectures , ¡± in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
pp . 469 ? 480 [2009]

Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors

유사주제 논문( 1,605)

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 주제별 논문영향력

주제별 논문영향력

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 참고문헌

' Energy-efficient instruction scheduling mechanisms for out-of-order superscalar processors' 의 유사주제( ) 논문