Designing Robust Data Quality Governance Strategies for Distributed Software Systems: Integrating Real Time Monitoring and Automated Anomaly Detection
Keywords:
Data Quality, Distributed Systems, Anomaly Detection, Machine Learning, Real-Time MonitoringAbstract
Distributed software systems face significant challenges related to data quality due to their complex, decentralized architecture. These systems often involve multiple nodes responsible for processing and storing data, making it difficult to maintain consistency and ensure accurate data across the entire network. In particular, issues like data inconsistency, latency, and data fragmentation are prevalent in distributed environments. To address these challenges, this study proposes an integrated data quality governance strategy that combines real time monitoring and automated anomaly detection using machine learning models. The proposed strategy aims to improve data consistency, enhance anomaly detection capabilities, and reduce the need for manual intervention, ultimately improving overall data governance in distributed systems. Real time monitoring ensures immediate identification of data issues as they occur, while machine learning models, such as autoencoders and Isolation Forests, automate the detection of anomalies based on high reconstruction errors and data isolation techniques. The study evaluates the proposed strategy through real-world distributed system scenarios, comparing its effectiveness to traditional approaches like periodic audits and manual validation. Results demonstrate that the integrated approach leads to faster anomaly detection, reduced data inconsistencies, and improved overall system performance. The use of advanced machine learning techniques and real time analytics significantly enhances the system's ability to maintain high data quality standards across multiple distributed nodes. This strategy has wide-ranging implications for industries that rely on distributed systems, such as finance, healthcare, and IoT, where data integrity is essential for operational success. Future research can focus on integrating more advanced machine learning techniques and optimizing the real time monitoring framework to handle larger and more complex systems.
References
[1] I. Beschastnikh, P. Wang, Y. Brun, and M. D. Ernst, “Debugging distributed systems,” Commun. ACM, vol. 59, no. 8, pp. 32 – 37, 2016, doi: 10.1145/2909480.
[2] I. Gorton and J. Klein, “Distribution, data, deployment: Software architecture convergence in big data systems,” IEEE Softw., vol. 32, no. 3, pp. 78 – 85, 2015, doi: 10.1109/MS.2014.51.
[3] Y. Asano et al., “Bidirectional Collaborative Frameworks for Decentralized Data Management,” Commun. Comput. Inf. Sci., vol. 1457 CCIS, pp. 13 – 51, 2022, doi: 10.1007/978-3-030-93849-9_2.
[4] N. Saeed, M. Ashour, and M. Mashaly, “Comprehensive review of federated learning challenges: a data preparation viewpoint,” J. Big Data, vol. 12, no. 1, 2025, doi: 10.1186/s40537-025-01195-6.
[5] L. Lu, H. Zhang, and X.-Z. Gao, “Integrate inconsistent and heterogeneous data based on user feedback,” Int. J. Intell. Comput. Cybern., vol. 8, no. 2, pp. 187 – 203, 2015, doi: 10.1108/IJICC-04-2014-0013.
[6] B. Takieddine, D. Badis, and B. S. Yacine, “Data-Quality-based Aggregation Methods in Federated Learning: A Comprehensive Study,” in PAIS 2025 - Proceeding: 7th International Conference on Pattern Analysis and Intelligent Systems, 2025. doi: 10.1109/PAIS66004.2025.11126046.
[7] H. Xu, Y. Feng, and K. Xie, “Verifiable Federated Learning Based on Data Service Quality,” in 2024 5th International Conference on Information Science, Parallel and Distributed Systems, ISPDS 2024, 2024, pp. 243 – 248. doi: 10.1109/ISPDS62779.2024.10667494.
[8] E. Santos-Fernandez et al., “Unsupervised Anomaly Detection in Spatio-Temporal Stream Network Sensor Data,” Water Resour. Res., vol. 60, no. 11, 2024, doi: 10.1029/2023WR035707.
[9] Y. Wang and A. Zhang, “SDADS: Stream Data Anomaly Detection System,” in 2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering, CBASE 2023, 2023, pp. 222 – 225. doi: 10.1109/CBASE60015.2023.10439096.
[10] P. Mahendra, P. Doshi, A. Verma, and S. Shrivastava, “A Comprehensive Review of AI and ML in Data Governance and Data Quality,” in Proceedings of the 2025 3rd International Conference on Inventive Computing and Informatics, ICICI 2025, 2025, pp. 356 – 361. doi: 10.1109/ICICI65870.2025.11069464.
[11] N. K. Alapati and S. Dhanasekaran, “Addressing Data Quality and Consistency Issues in Cloud-Based Big Data Environments,” in 2025 International Conference on Networks and Cryptology, NETCRYPT 2025, 2025, pp. 458 – 462. doi: 10.1109/NETCRYPT65877.2025.11102213.
[12] S. B. R. Karri, V. K. Devalla, R. K. Bojja, and M. S. Pandey, “An Architecture for Model Monitoring System with Automated Data Validation and Failure Handling,” in 2025 3rd International Conference on Communication, Security, and Artificial Intelligence, ICCSAI 2025, 2025, pp. 1960 – 1966. doi: 10.1109/ICCSAI64074.2025.11064092.
[13] L. Luan, L. Long, and B. V. D. Kumar, “AI-Driven Anomaly Detection in Distributed Systems: A Scalable and Sustainable Monitoring Framework,” Int. Conf. Comput. Commun. Eng. Technol. CCET, no. 2025, pp. 32 – 36, 2025, doi: 10.1109/CCET66260.2025.11199452.
[14] M. A. K. Azrag, N. Ahmad, N. A. Azuan, Z. Mohamad, and J. B. Odili, “Review: Fusion Fault Tolerance Replication model andFragmentation in Grid-cloud Distributed Environments,” J. Comput. Sci., vol. 21, no. 7, pp. 1490 – 1503, 2025, doi: 10.3844/jcssp.2025.1490.1503.
[15] H. Cai, “A Survey of Program Analysis for Distributed Software Systems,” ACM Comput. Surv., vol. 57, no. 12, 2025, doi: 10.1145/3742900.
[16] G. Cheng, Y. Li, Z. Gao, and X. Liu, “Cloud data governance maturity model,” in Proceedings of the IEEE International Conference on Software Engineering and Service Sciences, ICSESS, 2017, pp. 517 – 520. doi: 10.1109/ICSESS.2017.8342968.
[17] D. Hickey, R. O. Connor, P. McCormack, P. Kearney, R. Rosti, and R. Brennan, “The Data Quality Index: Improving Data Quality in Irish Healthcare Records,” in International Conference on Enterprise Information Systems, ICEIS - Proceedings, 2021, pp. 625 – 636. doi: 10.5220/0010441906250636.
[18] Sunita, A. Verma, A. Sharma, S. Sharma, S. Thukral, and A. Sharma, Challenges in Traditional Healthcare Data Management. 2025. doi: 10.4324/9781003529910-3.
[19] H. Das, N. Dey, and V. E. Balas, Real-Time Data Analytics for Large Scale Sensor Data. 2019. doi: 10.1016/C2018-0-02208-2.
[20] B. Tidke, R. G. Mehta, and J. Dhanani, “Real-time bigdata analytics: A stream data mining approach,” Adv. Intell. Syst. Comput., vol. 708, pp. 345 – 351, 2018, doi: 10.1007/978-981-10-8636-6_36.
[21] F. Gurcan and M. Berigel, “Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges,” in ISMSIT 2018 - 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies, Proceedings, 2018. doi: 10.1109/ISMSIT.2018.8567061.
[22] R. K. Chamoun, W. Wazen, and M. Gharib, “Design and Implementation of a Real-Time Web Infrastructure for Student Monitoring: A Kafka-Based Plugin for Moodle,” in International Conference on Web Information Systems and Technologies, WEBIST - Proceedings, 2025, pp. 205 – 212. doi: 10.5220/0013753200003985.
[23] R. K. Behera, S. Das, M. Jena, S. K. Rath, and B. Sahoo, “A Comparative Study of Distributed Tools for Analyzing Streaming Data,” in Proceedings - 2017 International Conference on Information Technology, ICIT 2017, 2018, pp. 79 – 84. doi: 10.1109/ICIT.2017.32.
[24] E. Costa E Silva, O. Oliveira, and B. Oliveira, “Enhancing real-time analytics: Streaming data quality metrics for continuous monitoring,” in ACM International Conference Proceeding Series, 2024, pp. 97–101. doi: 10.1145/3686592.3686609.
[25] S. Krishnan and K. Jayavel, Distributed streaming big data analytics for internet of things (IoT). 2018. doi: 10.4018/978-1-5225-3142-5.ch012.
[26] K. Elavarasi and K. Ct, “Live Video Stream Analysis in Real-Time Using Edge Enhanced Clouds,” in 2024 3rd International Conference on Smart Technologies and Systems for Next Generation Computing, ICSTSN 2024, 2024. doi: 10.1109/ICSTSN61422.2024.10671163.
[27] P. Raj, C. Surianarayanan, K. Seerangan, and G. Ghinea, Streaming Analytics: Concepts, architectures, platforms, use cases and applications. 2022. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159020245&partnerID=40&md5=f7f453df672cde110d8a10ede417855d
[28] J. Morewood, “Building energy performance monitoring through the lens of data quality: A review,” Energy Build., vol. 279, 2023, doi: 10.1016/j.enbuild.2022.112701.
[29] A. Terra, M. Nour, and N. Abdelbaki, “Assessing Anomaly Detection Algorithms in Mobile Networks,” in 2024 International Conference on Machine Intelligence and Smart Innovation, ICMISI 2024 - Proceedings, 2024, pp. 32 – 36. doi: 10.1109/ICMISI61517.2024.10580726.
[30] N. A. Nizar, P. M. Krishna Raj, and B. P. Vijaya Kumar, “Anomaly Detection In Telemetry Data Using Ensemble Machine Learning,” in 2022 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2022, 2022. doi: 10.1109/CONECCT55679.2022.9865730.
[31] S. R. Krishnan, P. Amudha, and S. Sivakumari, “Comprehensive survey on video anomaly detection using deep learning techniques,” Int. J. Comput. Vis. Robot., vol. 14, no. 4, pp. 445 – 466, 2024, doi: 10.1504/IJCVR.2024.139544.
[32] O. I. Provotar, Y. M. Linder, and M. M. Veres, “Unsupervised Anomaly Detection in Time Series Using LSTM-Based Autoencoders,” in 2019 IEEE International Conference on Advanced Trends in Information Theory, ATIT 2019 - Proceedings, 2019, pp. 513 – 517. doi: 10.1109/ATIT49449.2019.9030505.
[33] P. Myles, E. Axson, and C. Mitchell, “Data quality, provenance and transparency in real-world data: Aligning quality standards with data governance legal frameworks,” J. Data Prot. Priv., vol. 8, no. 2, pp. 131 – 143, 2026, doi: 10.69554/PGGW3813.
[34] M. Yalaoui and S. Boukhedouma, “A survey on data quality: Principles, taxonomies and comparison of approaches.,” in Proceedings - 2021 International Conference on Information Systems and Advanced Technologies, ICISAT 2021, 2021. doi: 10.1109/ICISAT54145.2021.9678209.
[35] J. Kuzio, M. Ahmadi, K.-C. Kim, M. R. Migaud, Y.-F. Wang, and J. Bullock, “Building better global data governance,” Data Policy, vol. 4, no. 4, 2022, doi: 10.1017/dap.2022.17.
[36] A. M. Mishra, D. Yadav, A. Shakya, V. Jayesh, and N. Bala, “A Hybrid Deep Learning Approach for Detecting Anomalies in Real-Time Data Streams,” in 2025 6th International Conference for Emerging Technology, INCET 2025, 2025. doi: 10.1109/INCET64471.2025.11140026.
[37] L. Guerreiro, M. D. R. Bernardo, J. Martins, R. Gonçalves, and F. Branco, “Preliminary Research to Propose a Master Data Management Framework Aimed at Triggering Data Governance Maturity,” Lect. Notes Networks Syst., vol. 800, pp. 183 – 189, 2024, doi: 10.1007/978-3-031-45645-9_17.


