Hardware Software Co Design of Deep Learning Accelerated Digital Signal Processing Cores for Low Latency Multimedia Applications

Authors

  • Taufiq Dwi Cahyono Universitas Semararang
  • Abdul Muchlis Universitas Gunadarma
  • Sandy Suryady Universitas Gunadarma

Keywords:

Hardware Software Co Design, Deep Learning, Multimedia Applications, DSP Systems, Latency Reduction

Abstract

The increasing demand for low latency and high-throughput multimedia applications has spurred significant advancements in hardware software co design. This study explores the integration of custom digital signal processing (DSP) hardware accelerators with optimized software frameworks to enhance deep learning accelerated DSP tasks. The proposed co design approach significantly reduces latency and improves throughput compared to traditional software-only DSP implementations. Through the development of custom hardware accelerators built with FPGA technology, the system achieves up to a 1.85x reduction in latency and a 1.5x improvement in throughput for real-time multimedia tasks such as image recognition, video decoding, and audio processing. The combination of hardware and software optimizations allows for better resource utilization, enabling the parallel processing of computationally intensive tasks while the software framework handles less demanding operations. Additionally, the co design system demonstrated improved energy efficiency, making it highly suitable for embedded systems. The results show that the hardware software co design approach offers substantial advantages in performance, latency reduction, and energy efficiency, positioning it as a viable solution for real-time multimedia applications. The findings have important implications for applications requiring fast data processing, such as autonomous driving, healthcare, and disaster management. Future research could explore alternative hardware accelerators, advanced software optimizations, and AI-based resource management to further improve the system’s efficiency and scalability for more complex multimedia tasks.

References

[1] S.-C. Chen, “Multimedia Meets Deep Reinforcement Learning,” IEEE Multimed., vol. 29, no. 3, pp. 5 – 7, 2022, doi: 10.1109/MMUL.2022.3196479.

[2] U. A. Bhatti, J. Li, M. Huang, S. U. Bazai, and M. Aamir, Deep Learning for Multimedia Processing Applications: Volume Two: Signal Processing and Pattern Recognition. 2024. doi: 10.1201/9781032646268.

[3] D. Jaiswal and P. Kumar, “A survey on parallel computing for traditional computer vision,” Concurr. Comput. Pract. Exp., vol. 34, no. 4, 2022, doi: 10.1002/cpe.6638.

[4] S.-C. Chen, “Multimedia Data Analysis with Edge Computing,” IEEE Multimed., vol. 28, no. 4, pp. 5 – 7, 2021, doi: 10.1109/MMUL.2021.3124292.

[5] A. Sassu, J. F. Saenz-Cogollo, and M. Agelli, “Deep-framework: A distributed, scalable, and edge-oriented framework for real-time analysis of video streams,” Sensors, vol. 21, no. 12, 2021, doi: 10.3390/s21124045.

[6] T. Pfau, Real-Time Implementation of High-Speed Digital Coherent Transceivers. 2016. doi: 10.1002/9781119078289.ch12.

[7] J. Zheng, Y. Liu, X. Liu, L. Liang, D. Chen, and K.-T. Cheng, “ReAAP: A Reconfigurable and Algorithm-Oriented Array Processor With Compiler-Architecture Co-Design,” IEEE Trans. Comput., vol. 71, no. 12, pp. 3088 – 3100, 2022, doi: 10.1109/TC.2022.3213177.

[8] S. Zouzoula, M. W. Azhar, and P. Trancoso, “RAINBOW: Multi-Dimensional Hardware-Software Co-Design for DL Accelerator On-Chip Memory,” in Proceedings - 2023 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2023, 2023, pp. 352 – 354. doi: 10.1109/ISPASS57527.2023.00050.

[9] A. Dube, A. Wagle, G. Singh, and S. Vrudhula, “Tunable precision control for approximate image filtering in an in-memory architecture with embedded neurons,” in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2022. doi: 10.1145/3508352.3549385.

[10] A. Anderson, J. Su, R. Dahyot, and D. Gregg, “Performance-Oriented Neural Architecture Search,” in 2019 International Conference on High Performance Computing and Simulation, HPCS 2019, 2019, pp. 177 – 184. doi: 10.1109/HPCS48598.2019.9188213.

[11] K.-A. Tran, A. Jimborean, T. E. Carlson, K. Koukos, M. Själander, and S. Kaxiras, “SWOOP: Software-hardware co-design for non-speculative, execute-ahead, in-order cores,” ACM SIGPLAN Not., vol. 53, no. 4, pp. 328 – 343, 2018, doi: 10.1145/3192366.3192393.

[12] U. A. Bhatti, J. Li, M. Huang, S. U. Bazai, and M. Aamir, Deep Learning for Multimedia Processing Applications: Volume One: Image Security and Intelligent Systems for Multimedia Processing. 2024. doi: 10.1201/9781003427674.

[13] H. Xiong et al., “Advances in mathematical theory for multimedia signal processing; [多媒体信号处理的数学理论前沿进展],” J. Image Graph., vol. 25, no. 1, pp. 1 – 18, 2020, doi: 10.11834/jig.190468.

[14] L. Moysis et al., “Music Deep Learning: Deep Learning Methods for Music Signal Processing - A Review of the State-of-the-Art,” IEEE Access, vol. 11, pp. 17031 – 17052, 2023, doi: 10.1109/ACCESS.2023.3244620.

[15] Y. Liu, Y. Li, Y. Zhu, Y. Niu, and P. Jia, “A Brief Review on Deep Learning in Application of Communication Signal Processing,” in 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020, 2020, pp. 51 – 54. doi: 10.1109/ICSIP49896.2020.9339345.

[16] S. Niu, “Research on the application of machine learning big data mining algorithms in digital signal processing,” in Proceedings of IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers, IPEC 2021, 2021, pp. 776 – 779. doi: 10.1109/IPEC51340.2021.9421229.

[17] R. Venkatasubramanian, Quest for energy efficiency in digital signal processing: Architectures, algorithms, and systems. 2017. doi: 10.1201/b17635.

[18] S. Agharass, M. Laaboubi, A. Saddik, and R. Latif, “Hardware Software Co-design based CPU-FPGA Architecture: Overview and Evaluation,” in Proceedings - 2021 International Conference on Digital Age and Technological Advances for Sustainable Development, ICDATA 2021, 2021, pp. 147 – 154. doi: 10.1109/ICDATA52997.2021.00037.

[19] B.-P. Tine, S. Yalamanchili, and H. Kim, “Tango: An Optimizing Compiler for Just-In-Time RTL Simulation,” in Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020, 2020, pp. 157 – 162. doi: 10.23919/DATE48585.2020.9116253.

[20] Q. Xiao, S. Zheng, B. Wu, P. Xu, X. Qian, and Y. Liang, “HASCO: Towards agile hardware and software CO-design for tensor computation,” in Proceedings - International Symposium on Computer Architecture, 2021, pp. 1055 – 1068. doi: 10.1109/ISCA52012.2021.00086.

[21] N. Hou, X. Yan, and F. He, “A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design,” Des. Autom. Embed. Syst., vol. 23, no. 1–2, pp. 57 – 77, 2019, doi: 10.1007/s10617-019-09220-7.

[22] Y. Oshima, Y. Yamaguchi, R. Tsugami, T. Fujiwara, T. Fukui, and S. Narikawa, “FPGA-Based Improved Background Subtraction for Ultra-Low Latency,” IEEE Access, vol. 12, pp. 164063 – 164080, 2024, doi: 10.1109/ACCESS.2024.3483548.

[23] D. Nagy, L. Plavecz, and F. Hegedűs, “The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs,” Commun. Nonlinear Sci. Numer. Simul., vol. 112, 2022, doi: 10.1016/j.cnsns.2022.106521.

[24] M. Nazemi, A. Fayyazi, A. Esmaili, A. Khare, S. N. Shahsavani, and M. Pedram, “NullaNet Tiny: Ultra-low-latency DNN Inference through Fixed-function Combinational Logic,” in Proceedings - 29th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2021, 2021, pp. 266 – 267. doi: 10.1109/FCCM51124.2021.00053.

Downloads

Published

2026-01-20