Memory Hierarchy Optimization and Cache Aware Signal Processing Pipelines for Next Generation High Throughput Computing Architectures

Authors

  • Hari Imbrani Universitas Al Ghifari
  • Achmad Subagdja STIE Gema Widya Bangsa

Keywords:

Signal Processing, Memory Hierarchies, Throughput Improvement, Pipeline Design, Cache Aware Optimizations

Abstract

This research explores the impact of Cache Aware optimizations on signal processing pipelines in High Throughput computing systems. The growing demand for efficient memory management in modern computing systems, especially for data-intensive applications such as artificial intelligence (AI) and multimedia processing, necessitates the development of optimized memory hierarchies. Traditional memory systems often suffer from memory bottlenecks, significantly reducing the performance of these systems. This study investigates how memory hierarchy optimizations, particularly cache line aware optimization, dependency-aware caching, and adaptive cache replacement algorithms, can mitigate these challenges and improve system performance. Through analytical modeling and experimental benchmarking, this work evaluates various memory hierarchy configurations, including processing-in-memory (PIM) and three-dimensional integrated circuits (3D ICs), comparing them to conventional systems. The results demonstrate that Cache Aware optimizations lead to a reduction in memory access latency by up to 30%, while throughput improved by up to 40%. Additionally, cache hit rates increased by 25%, and energy consumption was reduced by up to 20%, highlighting the effectiveness of optimized memory management. The research contributes to the field by providing valuable insights into the design and implementation of efficient signal processing pipelines. It also identifies key challenges, including the need for dynamic occupancy mechanisms and DAG-aware scheduling algorithms, and suggests potential areas for future research, such as the exploration of collaborative caching approaches and further optimization of cache-adaptive algorithms. This work lays the foundation for more efficient, high-performance computing systems that can handle large datasets and complex tasks in real-time applications.

References

[1] G. Zhang, Z. Song, W. Zhang, X. Chen, S. Huang, and Y. Dong, “Survey of storage systems in high performance computing,” CCF Trans. High Perform. Comput., 2025, doi: 10.1007/s42514-025-00268-5.

[2] K. S. Mohamed, “Analyzing the Trade-off Between Different Memory Cores and Controllers,” Analog Circuits Signal Process., pp. 51 – 76, 2016, doi: 10.1007/978-3-319-22035-2_3.

[3] X. Zou, S. Xu, X. Chen, L. Yan, and Y. Han, “Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology,” Sci. China Inf. Sci., vol. 64, no. 6, 2021, doi: 10.1007/s11432-020-3227-1.

[4] W. Li et al., “MACT: Discrete memory access requests batch processing mechanism for high-throughput many-core processor,” Jisuanji Yanjiu yu Fazhan/Computer Res. Dev., vol. 52, no. 6, pp. 1254 – 1265, 2015, doi: 10.7544/issn1000-1239.2015.20150154.

[5] V. Y. Raparti and S. Pasricha, “Approximate NoC and Memory Controller Architectures for GPGPU Accelerators,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 5, pp. 25 – 39, 2020, doi: 10.1109/TPDS.2019.2958344.

[6] R. Kaplan, “Student Research Poster: From Processing-in-Memory to Processing-in-Storage,” in Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 2016, p. 453. doi: 10.1145/2967938.2971463.

[7] Z. Wang, Y. Zhai, W. Tao, H. Yang, H. Zhang, and D. Qing, “Research on Technology of Data Storage and Access in High-throughput Simulation,” Xitong Fangzhen Xuebao / J. Syst. Simul., vol. 29, no. 9, pp. 2016 – 2024, 2017, doi: 10.16182/j.issn1004731x.joss.201709019.

[8] M. E. Fouda, H. E. Yantir, A. M. Eltawil, and F. Kurdahi, “In-Memory Associative Processors: Tutorial, Potential, and Challenges,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 69, no. 6, pp. 2641 – 2647, 2022, doi: 10.1109/TCSII.2022.3170468.

[9] C.-J. Jhang, P.-C. Chen, and M.-F. Chang, “Challenges of computation-in-memory circuits for AI edge applications,” in VLSI-TSA 2021 - 2021 International Symposium on VLSI Technology, Systems and Applications, Proceedings, 2021. doi: 10.1109/VLSI-TSA51926.2021.9440045.

[10] H. Mao, J. Shu, F. Li, and Z. Liu, “Development of processing-in-memory; [内存计算研究进展],” Sci. Sin. Informationis, vol. 51, no. 2, pp. 173 – 205, 2021, doi: 10.1360/SSI-2020-0037.

[11] H. E. Yantir, A. M. Eltawil, and K. N. Salama, “An Efficient 2D Discrete Cosine Transform Processor for Multimedia Applications,” in 2020 28th Signal Processing and Communications Applications Conference, SIU 2020 - Proceedings, 2020. doi: 10.1109/SIU49456.2020.9302059.

[12] V. T. K. Gannavaram and A. K. Gajula, “Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing,” in 2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2024, 2024, pp. 1634 – 1637. doi: 10.1109/ICACITE60783.2024.10616405.

[13] J. G. Wingbermuehle, R. K. Cytron, and R. D. Chamberlain, “Superoptimized memory subsystems for streaming applications,” in FPGA 2015 - 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015, pp. 126 – 135. doi: 10.1145/2684746.2689069.

[14] A. Lima Damasceno, S. Roberto Fernandes, and G. Girao Barreto Da Silva, “Impact Analysis On A Memory Hierarchy Applied to IPNoSys Architecture,” IEEE Lat. Am. Trans., vol. 15, no. 4, pp. 619–625, 2017, doi: 10.1109/TLA.2017.7896346.

[15] H.-Y. Tseng, S.-T. Liu, and S.-D. Wang, “An FPGA memory hierarchy for high-level synthesized OpenCL kernels,” in Proceedings - 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security and 2015 IEEE 12th International Conference on Embedded Software and Systems, HPCC-CSS-ICESS 2015, 2015, pp. 1719 – 1724. doi: 10.1109/HPCC-CSS-ICESS.2015.210.

[16] Y. Yan, R. Brightwell, and X.-H. Sun, “Principles of memory-centric programming for high performance computing,” in Proceedings of MCHPC 2017: Workshop on Memory Centric Programming for HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, pp. 2 – 6. doi: 10.1145/3145617.3158212.

[17] K. Hoya, K. Hatsuda, K. Tsuchida, Y. Watanabe, Y. Shirota, and T. Kanai, “A perspective on NVRAM technology for future computing system,” in 2019 International Symposium on VLSI Technology, Systems and Application, VLSI-TSA 2019, 2019. doi: 10.1109/VLSI-TSA.2019.8804706.

[18] S. Qiao, “A comparative analysis of mathematical transformations for signal processing,” in Proceedings of SPIE - The International Society for Optical Engineering, 2023. doi: 10.1117/12.2673879.

[19] S. Ramos and T. Hoefler, “Cache Line Aware Algorithm Design for Cache-Coherent Architectures,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 10, pp. 2824 – 2837, 2016, doi: 10.1109/TPDS.2016.2516540.

[20] Z. Zhao, H. Zhang, X. Geng, and H. Ma, “Resource-aware cache management for in-memory data analytics frameworks,” in Proceedings - 2019 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019, 2019, pp. 364 – 371. doi: 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00060.

[21] Y. Zhao, J. Dong, H. Liu, J. Wu, and Y. Liu, “Performance improvement of dag-aware task scheduling algorithms with efficient cache management in spark,” Electron., vol. 10, no. 16, 2021, doi: 10.3390/electronics10161874.

[22] C. Yang, “Research on Optimization of Adaptive Cache Replacement Algorithm Strategy,” in ACM International Conference Proceeding Series, 2024, pp. 22 – 27. doi: 10.1145/3700906.3700910.

[23] M. A. Bender et al., “Closing the Gap between Cache-oblivious and Cache-adaptive Analysis,” in Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2020, pp. 63 – 73. doi: 10.1145/3350755.3400274.

[24] A. Lincoln, J. Lynch, Q. C. Liu, and H. Xu, “Cache-adaptive exploration: Experimental results and scan-hiding for adaptivity,” in Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2018, pp. 213 – 222. doi: 10.1145/3210377.3210382.

Downloads

Published

2026-01-20