High level synthesis of OpenCL kernels for FPGAs

조강원 2020년

활용도
공유도
영향력

논문상세정보

- 저자 조강원
- 기타서명 FPGA를 위한 OpenCL 커널의 고수준 합성
- 형태사항 26 cm: ix, 102 장: 삽화, 표
- 일반주기 참고문헌 수록
- 학위논문사항 학위논문(박사) -, 전기·컴퓨터공학부, 서울대학교 대학원, 2020. 2
- DDC 22, 621.3
- 발행지 서울
- 언어 eng
- 출판년 2020
- 발행사항 서울대학교 대학원
- 주제어 Datapath High Level Synthesis Memory Access Pattern Work-Item Pipelining FPGA OpenCL
- 참고문헌( 81)
유사주제 논문( 4,942)

인용/피인용

High level synthesis of OpenCL kernels for FP ...

' High level synthesis of OpenCL kernels for FPGAs' 의 주제별 논문영향력

논문영향력 요약
동일주제 총논문수	논문피인용 총횟수	주제별 논문영향력의 평균
주제	응용 물리 Datapath High Level Synthesis Memory Access Pattern Work-Item Pipelining fpga opencl
4,949	0	0.0%

논문영향력
주제		주제별 논문수	주제별 논문영향력
주제분류(KDC/DDC)	응용 물리	4,649	0.0%
주제어	Datapath	1	0.0%
	High Level Synthesis	4	0.0%
	Memory Access Pattern	1	0.0%
	Work-Item Pipelining	1	0.0%
	fpga	249	0.0%
	opencl	44	0.0%
계		4,949	0.0%
* 다른 주제어 보유 논문에서 피인용된 횟수

' High level synthesis of OpenCL kernels for FPGAs' 의 참고문헌

¡°Exploiting memory access patterns to improve memory performance in data-parallel architectures ,
22 ( 1 ) :105–118 [2011]
[8] M. Bauer, H. Cook, and B. Khailany. CudaDMA. http://lightsighter.github.io/CudaDMA/.
[80] Xilinx. Virtex UltraScale+. https://www.xilinx.com/products/ silicon-devices/fpga/virtex-ultrascale-plus.html.
[78] M. Weinhardt and W. Luk. Pipeline vectorization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(2):234–248, 2001.
[77] P. Tu and D. Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 9th International Conference on Supercomputing, pages 414–423, 1995.
[6] AMD. AMD APP SDK OpenCL optimization guide. http://amddev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_ OpenCL_Programming_Optimization_Guide2.pdf, 2015.
[69] K. Shagrithaya, K. Kepa, and P. Athanas. Enabling development of OpenCL applications on FPGA platforms. In Proceedings of the 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, pages 26–30, 2013.
[68] Seoul National University. SnuCL suite: OpenCL frameworks and tools for heterogeneous clusters. http://snucl.snu.ac.kr.
[65] A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Hasel94man, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, pages 13–24, 2014.
[60] K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. S. Chung. Accelerating deep convolutional neural networks using specialized hardware. Technical report, Microsoft Research, 2015. https://www.microsoft.com/enus/research/publication/accelerating-deep-convolutionalneural-networks-using-specialized-hardware/.
[5] Amazon. Amazon EC2 F1 instances. https://aws.amazon.com/ec2/instance-types/f1/.
[58] NVIDIA. CUDA C best practices guide. http://docs.nvidia.com/ cuda/cuda-c-best-practices-guide/, 2015.
[56] S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997.
[51] Khronos OpenCL Working Group. The OpenCL specification, version 2.1. https://www.khronos.org/registry/OpenCL/specs/opencl-2.1.pdf, 2015.
[50] Khronos OpenCL Working Group. The OpenCL specification, version 1.2. https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf, 2012.
[4] OpenACC directives for accelerators. https://www.openacc.org/.
[49] Khronos Group. SPIR generator/Clang. https://github.com/KhronosGroup/SPIR.
[45] H. M. Jacobson, P. N. Kudva, P. Bose, P. W.Cook, S. E. Schuster, E. G. Mercer, andC. J. Myers. Synchronous interlocked pipelines. In Proceedings of Eighth International Symposium on AsynchronousCircuits and Systems, pages 3–12, 2002.
pages 3–12 [2002]
[44] Intel. Avalon interface specifications. https://www.intel.com/ content/dam/www/programmable/us/en/pdfs/literature/manual/ mnl_avalon_spec.pdf, 2019.
[43] Intel. Open programmable acceleration engine documentation. https://opae.github.io/.
[41] Intel. Intel Quartus Prime. https://www.intel.com/content/www/ us/en/software/programmable/quartus-prime/overview.html.
[3] MIOpen. https://github.com/ROCmSoftwarePlatform/MIOpen.
[39] M. R. Haghighat and C. D. Polychronopoulos. Symbolic analysis for parallelizing compilers. In ACM Transactions on Programming Languages and Systems, volume 18, pages 477–518, 1996.
[35] T. Grosser, A. Groesslinger, and C. Lengauer. Polly – performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(4), 2012.
[34] S. Grauer-Gray and L.-N. Pouchet. PolyBench/GPU. http://web.cse. ohio-state.edu/~pouchet.2/software/polybench/GPU/.
[2] clTorch. https://github.com/hughperkins/cltorch.
[22] Standard Performance Evaluation Corporation. SPEC ACCEL. https://www.spec.org/accel/.
[1] clFFT. https://clmathlibraries.github.io/clFFT/.
[19] S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011.
[16] J. M. P. Cardoso, P. C. Diniz, and M. Weinhardt. Compiling for reconfigurable computing: A survey. ACM Computing Surveys, 42(4):13:1–13:65, 2010.
[14] T. J. Callahan and J. Wawrzynek. Adapting software pipelining for reconfigurable computing. In Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 57–64, 2000.
[13] T. J. Callahan and J. Wawrzynek. Instruction-level parallelism for reconfigurable computing. In Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm, pages 248–257, 1998.
[12] T. J. Callahan, J. R. Hauser, and J. Wawrzynek. The garp architecture and C compiler. IEEE Computer, 33(4):62–69, 2000.
[11] M. Budiu, G. Venkataramani, T.Chelcea, and S.C. Goldstein. SpatialComputation. In Proceedings of the 11th InternationalConference on Architectural Support for Programming Languages and Operating Systems, pages 14–26, 2004.
pages 14–26 [2004]
Vivado design suite
Very long instruction word architectures and the ELI512
pages 140–150 [1983]
Trace scheduling : A technique for global microcode compaction .
C-30 ( 7 ) :478–490 [1981]
Theano : A Python framework for fast computation of mathematical expressions .
The program dependence web : A representation supporting control, data, and demanddriven interpretation of imperative languages .
pages 257–271 , [1990]
The OpenMP API specification for parallel programming
Synthesis of synchronous elastic architectures
pages 657–662 [2006]
Synthesis of platform architectures from OpenCL programs
pages 186–193 [2011]
Structural analysis : a new approach to flow analysis in optimizing compilers
5 ( 3–4 ) :141–153 [1980]
SPARK : A high-level synthesis framework for applying parallelizingCompiler transformations
pages 461–466 [2003]
SDAccel development environment
Rodinia : A benchmark suite for heterogeneousComputing
pages 44–54 [2009]
Points-to analysis in almost linear time
pages 32–41 [1996]
Platform-based behavior-level and system-level synthesis
pages 199–202 [2006]
Performance and power of cache-based reconfigurable computing
pages 395–405 [2009]
Parboil : A revised benchmark suite for scientific and commercial throughput computing .
[2012]
PIPSEA : A practical IPsec gateway on embedded APUs
pages 1255–1267 [2016]
Optimized generation of data-path from C codes for FPGAs
pages 112–117 [2005]
Optimization and architecture effects on GPU computing workload performance
[2012]
Openrcl : Low-power highperformance computing with reconfigurable devices
pages 458–463 [2010]
OpenCL overview - the open standard for parallel programming of heterogeneous systems
OpenCL for FPGAs : Prototyping a compiler
[2012]
Memory access patterns : The missing piece of the multi-GPU puzzle
[2015]
MASA-OpenCL : Parallel pruned com89parison of long DNA sequences with OpenCL . Concurrency and Computation Practice and Experience
31 ( 11 ) : e5039 [2019]
LegUp : High-level synthesis for FPGA-based processor/accelerator systems
pages 33–36 [2011]
J. D. Poznanovic , and M. B. Gokhale . Trident : An FPGA compiler framework for floating-point algorithms .
pages 317–322 [2005]
Intel Stratix 10 FPGAs overview
Intel FPGA SDK for OpenCL
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
pages 364–373 , [1990]
Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems
pages 17–24 [2012]
Impact of FPGA architecture on resource sharing in high-level synthesis
pages 111–114 [2012]
High-level Synthesis : Introduction to Chip and System Design
[1992]
GRAPHITE : Polyhedral analyses and optimizations for GCC .
[2006]
From OpenCL to high-performance hardware on FPGAs
pages 531–534 [2012]
FCUDA : Enabling efficient compilation of CUDA kernels onto FPGAs .
pages 35–42 [2009]
Efficiently computing static single assignment form and the control dependence graph
13 ( 4 ) :451–490 [1991]
CudaDMA : Optimizing GPU memory bandwidth via warp specialization
[2011]
Creating HW/SW codesigned MPSoPC ’ s from high level programming models
pages 554–560 [2011]
CUDASW++ : optimizing SmithWaterman sequence database searches for CUDA-enabled graphics processing units
2 , [2009]
Automatic OpenCL work-group size selection for multicore CPUs
pages 387–397 [2013]
An introduction to high-level synthesis
26 ( 4 ) :8–17 [2009]
Achieving a single compute device image in OpenCL for multiple GPUs
pages 277–288 [2011]
APUNet : Revitalizing GPU as packet processing accelerator .
pages 83–96 [2017]
A scalable highbandwidth architecture for lossless compression on FPGAs
pages 52–59 [2015]
A practical automatic polyhedral parallelizer and locality optimizer
pages 101–113 [2008]
57 ] K. G. Murty
[1983]
3D finite difference computation on GPUs using CUDA
pages 79–84 [2009]

High level synthesis of OpenCL kernels for FPGAs

유사주제 논문( 4,942)

' High level synthesis of OpenCL kernels for FPGAs' 의 주제별 논문영향력

주제별 논문영향력

' High level synthesis of OpenCL kernels for FPGAs' 의 참고문헌

' High level synthesis of OpenCL kernels for FPGAs' 의 유사주제( ) 논문