A Framework for Scalable Big Data Analytics and Workflow Orchestration in Heterogeneous Cloud Native Software Platforms for Smart Cities
Keywords:
smart city, data processing, workflow orchestration, fault tolerance, cloud native architectureAbstract
Smart cities are increasingly leveraging advanced technologies such as the Internet of Things (IoT), Artificial Intelligence (AI), and Big Data Analytics to optimize urban management and improve the quality of life for citizens. However, managing vast and diverse datasets from numerous sources in real-time presents several challenges. This research proposes a modular framework that integrates distributed data processing engines with container-based workflow orchestration to address scalability, latency, adaptability, and fault tolerance in smart city data analytics. The framework utilizes cloud native technologies, including Apache Spark and Kubernetes, to efficiently manage resources and ensure high availability. The experimental setup tested the framework’s ability to handle dynamic data loads, demonstrating scalability through real-time resource allocation and low-latency processing. The adaptability of the framework was evident in its seamless integration with various data sources, such as environmental sensors and traffic management systems, which require different processing methods. Additionally, the framework’s modularity provided fault tolerance, enabling continued operation even if individual components failed, a crucial feature for mission-critical applications in smart cities. Compared to traditional monolithic systems, the proposed framework outperformed in flexibility, scalability, and performance, offering significant improvements in handling real-time data streams. Despite these advantages, challenges remain, particularly in integrating heterogeneous data formats and optimizing real-time processing for high-priority applications. The research highlights the importance of scalable data analytics and efficient workflow orchestration for the future of smart city platforms, offering a foundation for the development of more resilient, adaptable, and efficient cloud native infrastructures.
References
[1] M. Chinnici, G. Ponti, and G. Santomauro, “Towards Scalable, Interoperable and Replicable Smart City Platform for Urban Application: The ENEA Experience,” Lect. Notes Electr. Eng., vol. 918 LNEE, pp. 375 – 388, 2023, doi: 10.1007/978-3-031-08136-1_57.
[2] K. Gupta, Z. Yang, and R. K. Jain, “Urban Data Integration Using Proximity Relationship Learning for Design, Management, and Operations of Sustainable Urban Systems,” J. Comput. Civ. Eng., vol. 33, no. 2, 2019, doi: 10.1061/(ASCE)CP.1943-5487.0000806.
[3] V. Sharma, T. K. Vashishth, K. K. Sharma, S. Chaudhary, B. Kumar, and R. Panwar, The Role of AI and Big Data Analytics in Smart Cities: Leveraging Digital Platforms, Cloud Computing, and IoT. 2025. doi: 10.1002/9781394233823.ch24.
[4] J. Pereira, T. Batista, E. Cavalcante, A. Souza, F. Lopes, and N. Cacho, “A platform for integrating heterogeneous data and developing smart city applications,” Futur. Gener. Comput. Syst., vol. 128, pp. 552 – 566, 2022, doi: 10.1016/j.future.2021.10.030.
[5] I. Tsampoulatidis, N. Komninos, E. Syrmos, and D. Bechtsis, “Universality and Interoperability Across Smart City Ecosystems,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 13325 LNCS, pp. 218 – 230, 2022, doi: 10.1007/978-3-031-05463-1_16.
[6] K. Wolf et al., “Building enduring smart city data platforms to provide urban management support: lessons learnt from UK Urban Observatories and the US Smart Columbus Operating System,” Front. Sustain. Cities, vol. 7, 2025, doi: 10.3389/frsc.2025.1512847.
[7] C. Eleftheriadis et al., Data security for smart cities. 2025. doi: 10.1049/PBBE009E_ch8.
[8] K. Mori and K. R. Dodiya, The Role of Digital Technologies, Governance, and Sustainability in Unlocking the Smart Cities Challenges. 2025. doi: 10.4018/979-8-3373-2327-5.ch001.
[9] M. Kettouch, C. Luca, O. Khorief, R. Wu, and S. Dascalu, “Semantic data management in smart cities,” in Proceedings - 2017 International Conference on Optimization of Electrical and Electronic Equipment, OPTIM 2017 and 2017 Intl Aegean Conference on Electrical Machines and Power Electronics, ACEMP 2017, 2017, pp. 1126 – 1131. doi: 10.1109/OPTIM.2017.7975123.
[10] A. D. Cartier, D. H. Lee, B. Kantarci, and L. Foschini, “IoT-big data software ecosystems for smart cities sensing: challenges, open issues, and emerging solutions,” Commun. Comput. Inf. Sci., vol. 707, pp. 5 – 18, 2018, doi: 10.1007/978-3-319-72125-5_1.
[11] S. P. Singh Rathore, C. Vishnubhai Dalabhai, C. K. Babubhai Patel, R. Sharma, A. Mathur, and A. Yadav, “Big Data Analytics for Smart Cities,” in Proceedings - IEEE 2024 1st International Conference on Advances in Computing, Communication and Networking, ICAC2N 2024, 2024, pp. 1289–1294. doi: 10.1109/ICAC2N63387.2024.10895781.
[12] A. Tabbassum, S. Parakh, A. P. Perumal, and P. Chintale, “Developing Cloud-Native Autonomous Systems for Real-Time Edge Analytics,” in 2024 IEEE International Conference on Blockchain and Distributed Systems Security, ICBDS 2024, 2024. doi: 10.1109/ICBDS61829.2024.10837008.
[13] Y. D. Dessalk, N. Nikolov, M. Matskin, A. Soylu, and D. Roman, “Scalable Execution of Big Data Workflows using Software Containers,” in Proceedings of the 12th International Conference on Management of Digital EcoSystems, MEDES 2020, 2020, pp. 76 – 83. doi: 10.1145/3415958.3433082.
[14] Y. Ding, “Research on Management and Optimization of Big Data Computing Engine Based on Cloud Native Technology IT Architecture,” in Procedia Computer Science, 2024, pp. 910 – 917. doi: 10.1016/j.procs.2024.09.109.
[15] D. Talia, “Programming Big Data Analysis on Clouds and Extreme Scale Systems,” Adv. Parallel Comput., vol. 30, pp. 161 – 173, 2017, doi: 10.3233/978-1-61499-816-7-161.
[16] B. Di Martino, G. Cretella, and A. Esposito, Cloud Portability and Interoperability. 2016. doi: 10.1002/9781118821930.ch14.
[17] B. Di Martino et al., “Strategies for flow-based deployment and orchestration in cloud-edge interactive computing,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 250, 2025, pp. 400–407. doi: 10.1007/978-3-031-87778-0_39.
[18] N. Nikolov et al., “Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers,” Internet of Things (Netherlands), vol. 16, 2021, doi: 10.1016/j.iot.2021.100440.
[19] P. Emami Khoonsari et al., “Interoperable and scalable data analysis with microservices: Applications in metabolomics,” Bioinformatics, vol. 35, no. 19, pp. 3752–3760, 2019, doi: 10.1093/bioinformatics/btz160.
[20] M. Babar, W. Iqbal, and S. Kaleem, “Internet of things based smart community design and planning using hadoop-based big data analytics,” Lect. Notes Networks Syst., vol. 69, pp. 1046 – 1057, 2020, doi: 10.1007/978-3-030-12388-8_72.
[21] T. R. Rao, P. Mitra, R. Bhatt, and A. Goswami, “The big data system, components, tools, and technologies: a survey,” Knowl. Inf. Syst., vol. 60, no. 3, pp. 1165 – 1245, 2019, doi: 10.1007/s10115-018-1248-0.
[22] A. M. S. Osman, “A novel big data analytics framework for smart cities,” Futur. Gener. Comput. Syst., vol. 91, pp. 620 – 633, 2019, doi: 10.1016/j.future.2018.06.046.
[23] H. Nasiri, S. Nasehi, and M. Goudarzi, “A survey of distributed stream processing systems for smart city data analytics,” in ACM International Conference Proceeding Series, 2018. doi: 10.1145/3269961.3282845.
[24] A. Rai, R. Kumar, N. Kumar, and S. Fatima, Strategies and tools for big data analytics in smart city environments: algorithms and data types. 2025. doi: 10.1201/9781003616252-74.
[25] M. Jayanthi and C. Pravallika Reddy, “Theoretical design and experimental study for urban data management using energy-saved IoT big data,” Lect. Notes Networks Syst., vol. 119, pp. 285 – 292, 2020, doi: 10.1007/978-981-15-3338-9_33.
[26] A. M. S. Osman, A. Elragal, and B. Bergvall-Kåreborn, “Big data analytics and smart cities: A loose or tight couple?,” in Proceedings of the International Conference on ICT, Society and Human Beings 2017 - Part of the Multi Conference on Computer Science and Information Systems 2017, 2017, pp. 157 – 168. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85040188916&partnerID=40&md5=a8a6b40e42bc4042e5c97897032eabae
[27] B. Di Martino, G. Cretella, and A. Esposito, “A comparison between TOSCA and OpenStack HOT through cloud patterns composition,” Int. J. Grid Util. Comput., vol. 8, no. 4, pp. 299–311, 2017, doi: 10.1504/IJGUC.2017.088259.
[28] K. K. Mohbey, “An efficient framework for smart city using big data technologies and internet of things,” Adv. Intell. Syst. Comput., vol. 714, pp. 319 – 328, 2019, doi: 10.1007/978-981-13-0224-4_29.
[29] J. Chen, A. Li, K. Wang, N. Yan, Q. Liu, and S. Ling, “Design and Implementation of Hydropower and New Energy Big Data Platform Based on Cloud-Native Technology,” in ICNISC 2025 - 11th Annual International Conference on Network and Information Systems for Computers, 2025, pp. 154 – 162. doi: 10.1145/3776942.3776995.
[30] R. Gu et al., “Fluid: Dataset Abstraction and Elastic Acceleration for Cloud-native Deep Learning Training Jobs,” in Proceedings - International Conference on Data Engineering, 2022, pp. 2182 – 2195. doi: 10.1109/ICDE53745.2022.00209.
[31] G. Ramesh et al., “A Comprehensive Review on Scaling Machine Learning Workflows Using Cloud Technologies and DevOps,” IEEE Access, vol. 13, pp. 148559 – 148594, 2025, doi: 10.1109/ACCESS.2025.3599281.
[32] S. A. Goswami, K. C. Kumar Patel, D. A. Darji, S. Patel, and S. Patel, AI workload automation and orchestration in cloud environments. 2025. doi: 10.4018/979-8-3693-9694-0.ch002.
[33] M. Adhikari, T. Amgoth, and S. N. Srirama, “A survey on scheduling strategies for workflows in cloud environment and emerging trends,” ACM Comput. Surv., vol. 52, no. 4, 2020, doi: 10.1145/3325097.
[34] H. T. El-Kassabi, M. Adel Serhani, R. Dssouli, and A. N. Navaz, “Trust enforcement through self-adapting cloud workflow orchestration,” Futur. Gener. Comput. Syst., vol. 97, pp. 462 – 481, 2019, doi: 10.1016/j.future.2019.03.004.
[35] A. Zafeiropoulos et al., “Data Management and Exchange between a Meta-Orchestration Platform and Data Spaces,” in ACM International Conference Proceeding Series, 2024, pp. 33 – 36. doi: 10.1145/3685651.3686698.
[36] E. Saeedizade and M. Ashtiani, “Scientific workflow scheduling algorithms in cloud environments: a comprehensive taxonomy, survey, and future directions,” J. Sched., vol. 28, no. 1, pp. 1 – 63, 2025, doi: 10.1007/s10951-024-00820-1.
[37] F. Safi-Esfahani and N. Khatibi, “Adaptable decentralized workflow execution with fuzzy framework in cloud computing (ADWEF.Cloud),” Computing, vol. 107, no. 6, 2025, doi: 10.1007/s00607-025-01480-5.
[38] R. Dukaric and M. B. Juric, “BPMN extensions for automating cloud environments using a two-layer orchestration approach,” J. Vis. Lang. Comput., vol. 47, pp. 31 – 43, 2018, doi: 10.1016/j.jvlc.2018.06.002.


