A Flexible and General-Purpose Platform for Heterogeneous Computing

Garcia-Hernandez, Jose Juan and Morales-Sandoval, Miguel and Elizondo-Rodríguez, Erick (2023) A Flexible and General-Purpose Platform for Heterogeneous Computing. Computation, 11 (5). p. 97. ISSN 2079-3197

[thumbnail of computation-11-00097.pdf] Text
computation-11-00097.pdf - Accepted Version

Download (1MB)

Abstract

In the big data era, processing large amounts of data imposes several challenges, mainly in terms of performance. Complex operations in data science, such as deep learning, large-scale simulations, and visualization applications, can consume a significant amount of computing time. Heterogeneous computing is an attractive alternative for algorithm acceleration, using not one but several different kinds of computing devices (CPUs, GPUs, or FPGAs) simultaneously. Accelerating an algorithm for a specific device under a specific framework, i.e., CUDA/GPU, provides a solution with the highest possible performance at the cost of a loss in generality and requires an experienced programmer. On the contrary, heterogeneous computing allows one to hide the details pertaining to the simultaneous use of different technologies in order to accelerate computation. However, effective heterogeneous computing implementation still requires mastering the underlying design flow. Aiming to fill this gap, in this paper we present a heterogeneous computing platform (HCP). Regarding its main features, this platform allows non-experts in heterogeneous computing to deploy, run, and evaluate high-computational-demand algorithms following a semi-automatic design flow. Given the implementation of an algorithm in C with minimal format requirements, the platform automatically generates the parallel code using a code analyzer, which is adapted to target a set of available computing devices. Thus, while an experienced heterogeneous computing programmer is not required, the process can run over the available computing devices on the platform as it is not an ad hoc solution for a specific computing device. The proposed HCP relies on the OpenCL specification for interoperability and generality. The platform was validated and evaluated in terms of generality and efficiency through a set of experiments using the algorithms of the Polybench/C suite (version 3.2) as the input. Different configurations for the platform were used, considering CPUs only, GPUs only, and a combination of both. The results revealed that the proposed HCP was able to achieve accelerations of up to 270× for specific classes of algorithms, i.e., parallel-friendly algorithms, while its use required almost no expertise in either OpenCL or heterogeneous computing from the programmer/end-user.

In the big data era, processing large amounts of data imposes several challenges, mainly in terms of performance. Complex operations in data science, such as deep learning, large-scale simulations, and visualization applications, can consume a significant amount of computing time. Heterogeneous computing is an attractive alternative for algorithm acceleration, using not one but several different kinds of computing devices (CPUs, GPUs, or FPGAs) simultaneously. Accelerating an algorithm for a specific device under a specific framework, i.e., CUDA/GPU, provides a solution with the highest possible performance at the cost of a loss in generality and requires an experienced programmer. On the contrary, heterogeneous computing allows one to hide the details pertaining to the simultaneous use of different technologies in order to accelerate computation. However, effective heterogeneous computing implementation still requires mastering the underlying design flow. Aiming to fill this gap, in this paper we present a heterogeneous computing platform (HCP). Regarding its main features, this platform allows non-experts in heterogeneous computing to deploy, run, and evaluate high-computational-demand algorithms following a semi-automatic design flow. Given the implementation of an algorithm in C with minimal format requirements, the platform automatically generates the parallel code using a code analyzer, which is adapted to target a set of available computing devices. Thus, while an experienced heterogeneous computing programmer is not required, the process can run over the available computing devices on the platform as it is not an ad hoc solution for a specific computing device. The proposed HCP relies on the OpenCL specification for interoperability and generality. The platform was validated and evaluated in terms of generality and efficiency through a set of experiments using the algorithms of the Polybench/C suite (version 3.2) as the input. Different configurations for the platform were used, considering CPUs only, GPUs only, and a combination of both. The results revealed that the proposed HCP was able to achieve accelerations of up to 270× for specific classes of algorithms, i.e., parallel-friendly algorithms, while its use required almost no expertise in either OpenCL or heterogeneous computing from the programmer/end-user.
05 11 2023 97 computation11050097 PRODEP http://dx.doi.org/10.13039/ 30526 https://creativecommons.org/licenses/by/4.0/ 10.3390/computation11050097 https://www.mdpi.com/2079-3197/11/5/97 https://www.mdpi.com/2079-3197/11/5/97/pdf 10.1109/ICSIMA.2013.6717967 Alzeini, H.I., Hameed, S.A., and Habaebi, M.H. (2013, January 25–27). Optimizing OLAP heterogeneous computing based on Rabin-Karp Algorithm. Proceedings of the 2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), Kuala Lumpur, Malaysia. 10.3390/app12168248 Yoo, K.H., Leung, C.K., and Nasridinov, A. (2022). Big Data Analysis and Visualization: Challenges and Solutions. Appl. Sci., 12. Hoefler Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis ACM Comput. Surv. 2019 52 65 10.3390/w10050589 Liu, Q., Qin, Y., and Li, G. (2018). Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing. Water, 10. Numan Towards automatic high-level code deployment on reconfigurable platforms: A survey of high-level synthesis tools and toolchains IEEE Access 2020 10.1109/ACCESS.2020.3024098 8 174692 Huang PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow IEEE Trans. Comput. 2021 70 2015 Marowka Python accelerators for high-performance computing J. Supercomput. 2018 10.1007/s11227-017-2213-5 74 1449 Zacharopoulos RegionSeeker: Automatically Identifying and Selecting Accelerators From Application Source Code IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2019 10.1109/TCAD.2018.2818689 38 741 Curzel End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators IEEE Trans. Comput. 2022 71 3074 10.1109/ISVLSI.2018.00142 Wang, S., Prakash, A., and Mitra, T. (2018, January 8–11). Software support for heterogeneous computing. Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Hong Kong, China. 10.1109/MECO49872.2020.9134172 Ivutin, A.N., Voloshko, A.G., and Novikov, A.S. (2020, January 8–11). Optimization Problem for Heterogeneous Computing Systems. Proceedings of the 2020 9th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro. Garcia-Hernandez, J.J., Morales-Sandoval, M., and Elizondo-Rodriguez, E. (A Flexible and General-Purpose Platform for Heterogeneous Computing, 2023). A Flexible and General-Purpose Platform for Heterogeneous Computing, version 1.0. Grasso A uniform approach for programming distributed heterogeneous computing systems J. Parallel Distrib. Comput. 2014 10.1016/j.jpdc.2014.08.002 74 3228 10.1109/LLVM-HPC.2014.9 Haidl, M., and Gorlatch, S. (2014, January 16–21). PACXX: Towards a unified programming model for programming accelerators using C++ 14. Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, New Orleans, LA, USA. Diener Heterogeneous computing with OpenMP and Hydra Concurr. Comput. Pract. Exp. 2020 10.1002/cpe.5728 32 e5728 Navarro Heterogeneous parallel_for template for CPU–GPU chips Int. J. Parallel Program. 2019 10.1007/s10766-018-0555-0 47 213 Fraguela Heterogeneous distributed computing based on high-level abstractions Concurr. Comput. Pract. Exp. 2018 10.1002/cpe.4664 30 e4664 10.1145/3373376.3378508 Zheng, S., Liang, Y., Wang, S., Chen, R., and Sheng, K. (2020, January 16–20). FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland. Fang Parallel programming models for heterogeneous many-cores: A comprehensive survey CCF Trans. High Perform. Comput. 2020 10.1007/s42514-020-00039-4 2 382 Verdoolaege Polyhedral parallel code generation for CUDA ACM Trans. Archit. Code Optim. TACO 2013 9 54 Verdoolaege, S., and Grosser, T. (2012, January 23–25). Polyhedral extraction tool. Proceedings of the Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12), Paris, France. Free Software Foundation, Inc. (2021, September 21). GCC, the GNU Compiler Collection. Available online: https://gcc.gnu.org/. Pouchet, L.N., and Yuki, T. (2023, May 09). Polybench: The Polyhedral Benchmark Suite. Available online: https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1.

Item Type: Article
Subjects: Academic Digital Library > Computer Science
Depositing User: Unnamed user with email info@academicdigitallibrary.org
Date Deposited: 30 May 2023 11:32
Last Modified: 27 Dec 2023 07:27
URI: http://publications.article4sub.com/id/eprint/1644

Actions (login required)

View Item
View Item