Hardware Software Co Design of Deep Learning Accelerated Digital Signal Processing Cores for Low Latency Multimedia Applications

Authors

  • Taufiq Dwi Cahyono Universitas Semararang
  • Abdul Muchlis Universitas Gunadarma
  • Sandy Suryady Universitas Gunadarma

DOI:

https://doi.org/10.66472/casp.v1i1.36

Keywords:

Hardware Software Co Design, Deep Learning, Multimedia Applications, DSP Systems, Latency Reduction

Abstract

The increasing demand for low latency and high-throughput multimedia applications has spurred significant advancements in hardware software co design. This study explores the integration of custom digital signal processing (DSP) hardware accelerators with optimized software frameworks to enhance deep learning accelerated DSP tasks. The proposed co design approach significantly reduces latency and improves throughput compared to traditional software-only DSP implementations. Through the development of custom hardware accelerators built with FPGA technology, the system achieves up to a 1.85x reduction in latency and a 1.5x improvement in throughput for real-time multimedia tasks such as image recognition, video decoding, and audio processing. The combination of hardware and software optimizations allows for better resource utilization, enabling the parallel processing of computationally intensive tasks while the software framework handles less demanding operations. Additionally, the co design system demonstrated improved energy efficiency, making it highly suitable for embedded systems. The results show that the hardware software co design approach offers substantial advantages in performance, latency reduction, and energy efficiency, positioning it as a viable solution for real-time multimedia applications. The findings have important implications for applications requiring fast data processing, such as autonomous driving, healthcare, and disaster management. Future research could explore alternative hardware accelerators, advanced software optimizations, and AI-based resource management to further improve the system’s efficiency and scalability for more complex multimedia tasks.

References

[1] S.-C. Chen, “Multimedia Meets Deep Reinforcement Learning,” IEEE Multimed., vol. 29, no. 3, pp. 5 – 7, 2022, doi: 10.1109/MMUL.2022.3196479.

[2] U. A. Bhatti, J. Li, M. Huang, S. U. Bazai, and M. Aamir, Deep Learning for Multimedia Processing Applications: Volume Two: Signal Processing and Pattern Recognition. 2024. doi: 10.1201/9781032646268.

[3] D. Jaiswal and P. Kumar, “A survey on parallel computing for traditional computer vision,” Concurr. Comput. Pract. Exp., vol. 34, no. 4, 2022, doi: 10.1002/cpe.6638.

[4] S.-C. Chen, “Multimedia Data Analysis with Edge Computing,” IEEE Multimed., vol. 28, no. 4, pp. 5 – 7, 2021, doi: 10.1109/MMUL.2021.3124292.

[5] A. Sassu, J. F. Saenz-Cogollo, and M. Agelli, “Deep-framework: A distributed, scalable, and edge-oriented framework for real-time analysis of video streams,” Sensors, vol. 21, no. 12, 2021, doi: 10.3390/s21124045.

[6] Danang, T. Wahyono, I. Sembiring, T. Wellem, and N. H. Dzulkefly, “An Adaptive Framework Integrating ML Blockchain and TEE for Cloud Security,” in Proceeding - 2025 4th International Conference on Creative Communication and Innovative Technology: Empowering Transformative MATURE LEADERSHIP: Harnessing Technological Advancement for Global Sustainability, ICCIT 2025, 2025. doi: 10.1109/ICCIT65724.2025.11167152.

[7] D. Danang and Z. Mustofa, “CLSTMNet Architecture: A CNN–LSTM-Based Hybrid Deep Learning Model for DDoS Attack Detection and Mitigation in Network Security,” J. Artif. Intell. Technol., 2026.

[8] T. Pfau, Real-Time Implementation of High-Speed Digital Coherent Transceivers. 2016. doi: 10.1002/9781119078289.ch12.

[9] S. Zouzoula, M. W. Azhar, and P. Trancoso, “RAINBOW: Multi-Dimensional Hardware-Software Co-Design for DL Accelerator On-Chip Memory,” in Proceedings - 2023 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2023, 2023, pp. 352 – 354. doi: 10.1109/ISPASS57527.2023.00050.

[10] J. Zheng, Y. Liu, X. Liu, L. Liang, D. Chen, and K.-T. Cheng, “ReAAP: A Reconfigurable and Algorithm-Oriented Array Processor With Compiler-Architecture Co-Design,” IEEE Trans. Comput., vol. 71, no. 12, pp. 3088 – 3100, 2022, doi: 10.1109/TC.2022.3213177.

[11] D. Danang and Z. Mustofa, “Digital Forensics and Automated Incident Response Framework Leveraging Big Data Analytics and Real Time Network Traffic Profiling in Heterogeneous Cyber Environments,” Cyber Secur. Netw. Manag., vol. 1, no. 1, pp. 44–45, 2026.

[12] E. Siswanto, D. Danang, I. Kusumaningroem, and I. Akhsani, “Assessing Software Architecture Resilience Using Quantitative Metrics in Cloud Native Application Development Environments,” Indones. J. Infomatics, vol. 1, no. 1, pp. 11–21, 2026.

[13] A. Dube, A. Wagle, G. Singh, and S. Vrudhula, “Tunable precision control for approximate image filtering in an in-memory architecture with embedded neurons,” in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2022. doi: 10.1145/3508352.3549385.

[14] A. Anderson, J. Su, R. Dahyot, and D. Gregg, “Performance-Oriented Neural Architecture Search,” in 2019 International Conference on High Performance Computing and Simulation, HPCS 2019, 2019, pp. 177 – 184. doi: 10.1109/HPCS48598.2019.9188213.

[15] K.-A. Tran, A. Jimborean, T. E. Carlson, K. Koukos, M. Själander, and S. Kaxiras, “SWOOP: Software-hardware co-design for non-speculative, execute-ahead, in-order cores,” ACM SIGPLAN Not., vol. 53, no. 4, pp. 328 – 343, 2018, doi: 10.1145/3192366.3192393.

[16] U. A. Bhatti, J. Li, M. Huang, S. U. Bazai, and M. Aamir, Deep Learning for Multimedia Processing Applications: Volume One: Image Security and Intelligent Systems for Multimedia Processing. 2024. doi: 10.1201/9781003427674.

[17] H. Xiong and others, “Advances in Mathematical Theory for Multimedia Signal Processing,” J. Image Graph., vol. 25, no. 1, pp. 1–18, 2020, doi: 10.11834/jig.190468.

[18] L. Moysis et al., “Music Deep Learning: Deep Learning Methods for Music Signal Processing - A Review of the State-of-the-Art,” IEEE Access, vol. 11, pp. 17031 – 17052, 2023, doi: 10.1109/ACCESS.2023.3244620.

[19] D. Danang, M. U. Dewi, and G. Widhiati, “Federated Hybrid CNN GRU and COBCO Optimized Elman Neural Network for Real Time DDoS Detection in Cloud Edge Environments,” Int. J. Electr. Eng. Math. Comput. Sci., vol. 2, no. 2, pp. 28–35, 2025, doi: https://doi.org/10.62951/ijeemcs.v2i2.293.

[20] Y. Liu, Y. Li, Y. Zhu, Y. Niu, and P. Jia, “A Brief Review on Deep Learning in Application of Communication Signal Processing,” in 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020, 2020, pp. 51 – 54. doi: 10.1109/ICSIP49896.2020.9339345.

[21] S. Niu, “Research on the application of machine learning big data mining algorithms in digital signal processing,” in Proceedings of IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers, IPEC 2021, 2021, pp. 776 – 779. doi: 10.1109/IPEC51340.2021.9421229.

[22] D. Danang, E. Siswanto, W. Aryani, and P. Wibowo, “Hybrid Federated Ensemble Learning Approach for Real-Time Distributed DDoS Detection in IIoT Edge Computing Environment,” J. Eng. Electr. Informatics, vol. 5, no. 1, pp. 9–17, 2025, doi: https://doi.org/10.55606/jeei.v5i1.5099.

[23] D. Danang, M. U. Dewi, and W. Aryani, “Systematic Literature Review on the Application of Blockchain in Enhancing Server Security: Research Methods for Mitigating Ransomware and Malware Attacks,” Int. J. Comput. Technol. Sci., vol. 1, no. 4, pp. 27–51, 2024, doi: https://doi.org/10.62951/ijcts.v1i4.186.

[24] R. Venkatasubramanian, Quest for energy efficiency in digital signal processing: Architectures, algorithms, and systems. 2017. doi: 10.1201/b17635.

[25] D. Danang, H. Haryani, Q. Aini, F. A. Ramahdan, and J. Edwards, “Empowering Digital Literacy Through Blockchain Based Alphasign for Secure and Sustainable E-Governance,” 2025.

[26] S. Agharass, M. Laaboubi, A. Saddik, and R. Latif, “Hardware Software Co-design based CPU-FPGA Architecture: Overview and Evaluation,” in Proceedings - 2021 International Conference on Digital Age and Technological Advances for Sustainable Development, ICDATA 2021, 2021, pp. 147 – 154. doi: 10.1109/ICDATA52997.2021.00037.

[27] N. Hou, X. Yan, and F. He, “A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design,” Des. Autom. Embed. Syst., vol. 23, no. 1–2, pp. 57 – 77, 2019, doi: 10.1007/s10617-019-09220-7.

[28] B.-P. Tine, S. Yalamanchili, and H. Kim, “Tango: An Optimizing Compiler for Just-In-Time RTL Simulation,” in Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020, 2020, pp. 157 – 162. doi: 10.23919/DATE48585.2020.9116253.

[29] D. Danang, I. A. Dianta, A. B. Santoso, and S. Kholifah, “Hybrid CNN GRU Framework for Early Detection and Adaptive Mitigation of DDoS Attacks in SDN using Image Based Traffic Analysis,” Int. J. Inf. Eng. Sci., vol. 2, no. 2, pp. 66–78, 2025, doi: https://doi.org/10.62951/ijies.v2i2.292.

[30] D. Danang, N. D. Setiawan, and E. Siswanto, “Pemanfaatan Teknologi Internet of Things untuk Monitoring Kualitas Air Sungai di Wilayah Perkotaan,” J. New Trends Sci., vol. 2, no. 1, pp. 23–34, 2024.

[31] Q. Xiao, S. Zheng, B. Wu, P. Xu, X. Qian, and Y. Liang, “HASCO: Towards agile hardware and software CO-design for tensor computation,” in Proceedings - International Symposium on Computer Architecture, 2021, pp. 1055 – 1068. doi: 10.1109/ISCA52012.2021.00086.

[32] Y. Oshima, Y. Yamaguchi, R. Tsugami, T. Fujiwara, T. Fukui, and S. Narikawa, “FPGA-Based Improved Background Subtraction for Ultra-Low Latency,” IEEE Access, vol. 12, pp. 164063 – 164080, 2024, doi: 10.1109/ACCESS.2024.3483548.

[33] D. Danang, A. B. Santoso, and M. U. Dewi, “CICA Framework: Harnessing CSR, AI, and Blockchain for Sustainable Digital Culture,” Int. J. Adv. Comput. Sci. & Appl., vol. 16, no. 11, 2025.

[34] D. Nagy, L. Plavecz, and F. Hegedűs, “The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs,” Commun. Nonlinear Sci. Numer. Simul., vol. 112, 2022, doi: 10.1016/j.cnsns.2022.106521.

[35] M. Nazemi, A. Fayyazi, A. Esmaili, A. Khare, S. N. Shahsavani, and M. Pedram, “NullaNet Tiny: Ultra-low-latency DNN Inference through Fixed-function Combinational Logic,” in Proceedings - 29th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2021, 2021, pp. 266 – 267. doi: 10.1109/FCCM51124.2021.00053.

[36] D. Danang, E. Siswanto, N. D. Setiawan, and P. Wibowo, “Hybrid Zero Trust Container Based Model for Proactive Service Continuity under Intelligent DDoS Attacks in Cloud Environment,” Int. J. Comput. Technol. Sci., vol. 2, no. 3, pp. 41–49, 2025, doi: https://doi.org/10.62951/ijcts.v2i3.291.

Downloads

Published

2026-01-20