A Deep Learning Based Approach to Real Time Video Content Analysis and Visualization for Intelligent Human Computer Interaction in Multimedia Systems

Arsito Ari Kuncoro; Siswanto Siswanto; Siti Kholifah; Ratma Dewi

doi:10.66472/dimuvi.v1i1.44

Authors

Arsito Ari Kuncoro Universitas Sains dan Teknologi Komputer
Siswanto Siswanto Universitas Sains dan Teknologi Komputer
Siti Kholifah Universitas Sains dan Teknologi Komputer
Ratma Dewi Universitas Gajah Putih Acah

DOI:

https://doi.org/10.66472/dimuvi.v1i1.44

Keywords:

Deep learning, Video analysis, Convolutional networks, Human-computer interaction, Real-time processing

Abstract

This study explores the integration of deep learning based approaches in real time video content analysis for intelligent human computer interaction (HCI) in multimedia systems. Traditional video analysis techniques, such as rule-based methods and offline processing, struggle with real time performance and adaptability to complex video data. In contrast, the deep learning model used in this research, particularly Convolutional Neural Networks (CNNs), provides high accuracy in object detection, feature extraction, and real time processing. The integration of CNNs with interactive visualization modules enables dynamic adjustments to video content based on user interactions, ensuring a seamless and engaging user experience. The system was benchmarked in terms of its processing speed, accuracy, and responsiveness, showing significant improvements over traditional approaches in real time video analysis. Moreover, the study demonstrates that combining deep learning with real time visualization enhances the efficiency of interactive multimedia applications, making it suitable for dynamic environments such as surveillance, security monitoring, and interactive media. Despite the system's strong performance, challenges such as computational demands in high-resolution video processing were identified, highlighting the need for further optimization. Future work will focus on optimizing the system for different hardware platforms, incorporating multimodal inputs, and refining deep learning models to address computational bottlenecks. This research contributes to advancing HCI by providing insights into the integration of deep learning for real time video content analysis, which is pivotal for enhancing the interactivity and adaptability of intelligent multimedia systems.

References

[1] Z. Yang, X. He, J. Wu, X. Wang, and Y. Zhao, “Edge computing technologies for streaming video analytics; [面向实时视频流分析的边缘计算技术],” Sci. Sin. Informationis, vol. 52, no. 1, pp. 1 – 53, 2022, doi: 10.1360/SSI-2021-0133.

[2] K. Randive and M. Sridevi, “Fast feature extraction on graphic processing unit for a video sequence,” Adv. Intell. Syst. Comput., vol. 709, pp. 481 – 488, 2018, doi: 10.1007/978-981-10-8633-5_47.

[3] S.-C. Chen, “Multimedia Data Analysis with Edge Computing,” IEEE Multimed., vol. 28, no. 4, pp. 5 – 7, 2021, doi: 10.1109/MMUL.2021.3124292.

[4] N. Venkatesvara Rao, D. Venkatavara Prasad, and M. Sugumaran, “Real-time video object detection and classification using hybrid texture feature extraction,” Int. J. Comput. Appl., vol. 43, no. 2, pp. 119–126, 2021, doi: 10.1080/1206212X.2018.1525929.

[5] G. Sonugür and B. Gökçe, “A NEW GRADIENT-BASED FEATURE EXTRACTION METHOD FOR REAL-TIME DETECTION OF MOVING OBJECTS USING STEREO CAMERAS,” Comptes Rendus L’Academie Bulg. des Sci., vol. 75, no. 3, pp. 414 – 421, 2022, doi: 10.7546/CRABS.2022.03.11.

[6] S. Sang, Z. Huang, and Z. Kang, “A human activity recognition method using the maximum optical flow based feature bounding box,” in ACM International Conference Proceeding Series, 2018, pp. 214 – 219. doi: 10.1145/3195106.3195141.

[7] S. Yang and X. Chong, “Study on feature extraction technology of real-time video acquisition based on deep CNN,” Multimed. Tools Appl., vol. 80, no. 25, pp. 33937 – 33950, 2021, doi: 10.1007/s11042-021-11417-7.

[8] P. Jain, V. K. Gupta, H. Tiwari, A. Shukla, P. Pandey, and A. Gupta, “Human-Computer Interaction: A Systematic Review,” in Proceedings - 2023 International Conference on Advanced Computing and Communication Technologies, ICACCTech 2023, 2023, pp. 31 – 36. doi: 10.1109/ICACCTech61146.2023.00015.

[9] R. Pushpakumar et al., “Human-Computer Interaction: Enhancing User Experience in Interactive Systems,” in E3S Web of Conferences, 2023. doi: 10.1051/e3sconf/202339904037.

[10] J. Song, “Application of Deep Learning in Visual Communication Content Optimization and User Perception Analysis,” Adv. Transdiscipl. Eng., vol. 74, pp. 555 – 564, 2025, doi: 10.3233/ATDE250640.

[11] S. V Sheela, P. Abhinand, and K. R. Radhika, Practical case studies on human-computer interaction. 2023. doi: 10.1016/B978-0-323-99891-8.00007-3.

[12] U. A. Bhatti, J. Li, M. Huang, S. U. Bazai, and M. Aamir, Deep Learning for Multimedia Processing Applications: Volume Two: Signal Processing and Pattern Recognition. 2024. doi: 10.1201/9781032646268.

[13] S. Sudharsan, M. Manoj, V. Jeevan Raj, and T. Sivasakthi, “AI-Enabled Video Frame Segmentation for Specific Person Identification,” in 2025 International Conference on Computing and Communication Technologies, ICCCT 2025, 2025. doi: 10.1109/ICCCT63501.2025.11020271.

[14] H. Mohammedqasim, R. Mohammedqasem, B. A. Ozturk, H. R. Hamedy, and A. bin Asghar, “Human-Centric Video Analysis in Industrial Environments,” Lect. Notes Networks Syst., vol. 1292 LNNS, pp. 319 – 332, 2025, doi: 10.1007/978-981-96-3250-3_26.

[15] X. Zhang, W. Wu, J. Guo, Y. Sun, Y. Li, and M. Li, “Collaborative Hand-eye Virtual Interaction Visualization Method and Technologies,” in 2022 28th International Conference on Mechatronics and Machine Vision in Practice, M2VIP 2022, 2022. doi: 10.1109/M2VIP55626.2022.10041073.

[16] L. Qiao, X. Zhang, and S. He, “Visual Defect Detection and Analysis of Digital Robot Based on Virtual Artificial Intelligence Algorithm,” in Procedia Computer Science, 2024, pp. 601 – 609. doi: 10.1016/j.procs.2024.09.073.

[17] Y. Li, W. Ren, T. Zhu, Y. Ren, Y. Qin, and W. Jie, “RIMS: A Real-time and Intelligent Monitoring System for live-broadcasting platforms,” Futur. Gener. Comput. Syst., vol. 87, pp. 259 – 266, 2018, doi: 10.1016/j.future.2018.04.012.

[18] G. Suchetha, N. Bhaskar, A. Chirag, A. S. Pereira, D. Kishore, and V. Joshi, “An Automated Approach for the Detection of Synthetic and Deepfake Media Using Deep Learning,” Lect. Notes Electr. Eng., vol. 1420 LNEE, pp. 543 – 554, 2025, doi: 10.1007/978-981-96-6406-1_42.

[19] H. Singh, R. Kumar, M. Gupta, and V. S. Babu Chilluri, “Detecting Digital Deception: A CNN-RNN hybrid Approach of Deepfake Detection,” in 2025 International Conference on Pervasive Computational Technologies, ICPCT 2025, 2025, pp. 667 – 672. doi: 10.1109/ICPCT64145.2025.10940830.

[20] X. Du et al., “Classifying cutting volume at shale shakers in real-time via video streaming using deep-learning techniques,” SPE Drill. Complet., vol. 35, no. 3, pp. 317 – 328, 2020, doi: 10.2118/194084-PA.

[21] X. Liu et al., “Mariclip: Real-Time Maritime Video Segment Extraction Via Edge-Optimized Detection and Tracking,” in 2025 IEEE 14th International Conference on Communications, Circuits, and Systems, ICCCAS 2025, 2025, pp. 438 – 442. doi: 10.1109/ICCCAS65806.2025.11102155.

[22] E. H. C. Isles and F. F. Balahadia, “BeAwareOfYourAct: A Framework for Behavioural Action Detection in Workplace through Deep Learning Analysis and Augmented Action Pattern Recognition,” in Proceedings - 2022 2nd International Conference in Information and Computing Research, iCORE 2022, 2022, pp. 89 – 93. doi: 10.1109/iCORE58172.2022.00036.

[23] C. Ceccarini, “HCI methodologies and data visualization to foster user awareness,” in CEUR Workshop Proceedings, 2021, pp. 28 – 35.

[24] H.-T. Lee et al., “A review of hybrid EEG-based multimodal human–computer interfaces using deep learning: applications, advances, and challenges,” Biomed. Eng. Lett., vol. 15, no. 4, pp. 587 – 618, 2025, doi: 10.1007/s13534-025-00469-5.

[25] C. Troussas, A. Krouska, and C. Sgouropoulou, “Human-Computer Interaction and Augmented Intelligence: The Paradigm of Interactive Machine Learning in Educational Software,” Cogn. Syst. Monogr., vol. 34, pp. 1 – 431, 2025, doi: 10.1007/978-3-031-84453-9.

[26] Z. Lv, F. Poiesi, Q. Dong, J. Lloret, and H. Song, “Deep Learning for Intelligent Human–Computer Interaction,” Appl. Sci., vol. 12, no. 22, 2022, doi: 10.3390/app122211457.

[27] P. Gong, C. Wang, and L. Zhang, “MMG-HCI: A Non-contact Non-intrusive Real-Time Intelligent Human-Computer Interaction System,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 13069 LNAI, pp. 158 – 167, 2021, doi: 10.1007/978-3-030-93046-2_14.

[28] D. Danang, M. U. Dewi, and G. Widhiati, “Federated Hybrid CNN GRU and COBCO Optimized Elman Neural Network for Real Time DDoS Detection in Cloud Edge Environments,” Int. J. Electr. Eng. Math. Comput. Sci., vol. 2, no. 2, pp. 28–35, 2025, doi: 10.62951/ijeemcs.v2i2.293.

[29] D. Danang, S. Siswanto, W. Aryani, and P. Wibowo, “Hybrid Federated Ensemble Learning Approach for Real-Time Distributed DDoS Detection in IIoT Edge Computing Environment,” J. Eng. Electr. Informatics, vol. 5, no. 1, pp. 9–17, 2025, doi: 10.55606/jeei.v5i1.5099.

[30] H. R. Putranti, R. Retnowati, A. A. Sihombing, and D. Danang, “Performance assessment through work gamification: Investigating engagement,” South African J. Bus. Manag., vol. 55, no. 1, pp. 1–12, 2024.

[31] D. Danang, A. B. Santoso, and M. U. Dewi, “CICA Framework: Harnessing CSR, AI, and Blockchain for Sustainable Digital Culture,” Int. J. Adv. Comput. Sci. & Appl., vol. 16, no. 11, 2025.

[32] D. Danang, E. Siswanto, N. D. Setiawan, and P. Wibowo, “Hybrid Zero Trust Container Based Model for Proactive Service Continuity under Intelligent DDoS Attacks in Cloud Environment,” Int. J. Comput. Technol. Sci., vol. 2, no. 3, pp. 41–49, 2025, doi: 10.62951/ijcts.v2i3.291.

[33] D. Danang, H. Haryani, Q. Aini, F. A. Ramahdan, and J. Edwards, “Empowering digital literacy through blockchain based alphasign for secure and sustainable e-governance,” 2025.

A Deep Learning Based Approach to Real Time Video Content Analysis and Visualization for Intelligent Human Computer Interaction in Multimedia Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

menu

Current Issue

Information