Telemetry Driven Cost Governance for Enterprise Data and AI Platforms
DOI:
https://doi.org/10.15662/IJEETR.2025.0701006Keywords:
Enterprise Artificial Intelligence, Traceability, Data Lineage, Reproducibility, Data Governance, AI Reliability, Model VersioningAbstract
Modern organizations are considerate of enterprise data and AI platforms, which usually provoke excessive and unpredictable operating expenses. Dynamism in cost management Traditional cost management systems use manual reviews, and reports that are backwards looking and which are never effective when dealing with dynamic clouds. The paper is suggesting a Telemetry-Driven Cost Governance model which operates on a real-time platform telemetry, automated policy assessment and controlled execution in order to govern costs on a continuous basis. The model combines the compute, storage, and data pipeline telemetry into a single analytics layer and measures the costs against specific preconceived thresholds. The approval gates and safety checks are automated through recommendations that are carried out in order to maintain system stability. A quantitative analysis presents good outcomes. The total platform cost has been cut by 24 percent, and compute cost is lowered by 28 percent and storage cost has been cut down by 19 percent. Effort of manual analysis of costs was reduced 90%. There was an increase in policy compliance of 76-96 and job success rate was 97.8 percent to 98.6 percent respectively. The findings have shown that Telemetry-based governance can provide a proactive, safe, and quantifiable cost optimization of enterprise-level data and AI platforms.
References
[1] Thangaraju, D. (2022). Optimizing Enterprise Data Platform Costs through Usage-Based Attribution and Show-back Mechanisms. International Journal Research of Leading Publication (IJLRP), 3(8), 1–
3. https://www.ijlrp.com/papers/2022/8/1437.pdf
[2] Alaria, S. K., & Agarwal, P. (2019). Cloud cost management and optimization. Türk Bilgisayar Ve Matematik Eğitimi Dergisi, 10(3), 1173–1176. https://doi.org/10.61841/turcomat.v10i3.14397
[3] Luong, N. C., Wang, P., Niyato, D., Yonggang, W., & Han, Z. (2017). Resource management in cloud networking using economic analysis and pricing models: a survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1701.01963
[4] Goedegebuure, A., Kumara, I., Driessen, S., Van Den Heuvel, W., Monsieur, G., Tamburri, D. A., & Di Nucci, D. (2024). Data Mesh: A Systematic Gray Literature Review. ACM Computing Surveys, 57(1), 1–
36. https://doi.org/10.1145/3687301
[5] Mukherjee, K., Shah, R., Saini, S. K., Singh, K., Khushi, Kesarwani, H., Barnwal, K., & Chauhan, A. (2023). Towards optimizing storage costs on the cloud. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.14818
[6] Casimiro, M., Didona, D., Romano, P., Rodrigues, L., Zwanepoel, W., & Garlan, D. (2019, May 6). LynCEUS: Cost-efficient tuning and provisioning of data analytic jobs. arXiv.org. https://arxiv.org/abs/1905.02119
[7] Hsu, C., Nair, V., Freeh, V. W., & Menzies, T. (2017). Low-Level augmented Bayesian optimization for finding the best cloud VM. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1712.10081
[8] Niedermaier, S., Koetter, F., Freymann, A., & Wagner, S. (2019). On Observability and Monitoring of Distributed Systems – an industry interview study. In Lecture notes in computer science (pp. 36–52). https://doi.org/10.1007/978-3-030-33702-5_3
[9] Kratzke, N. (2022). Cloud-Native Observability: The Many-Faceted Benefits of Structured and Unified Logging—A Multi-Case Study. Future Internet, 14(10), 274. https://doi.org/10.3390/fi14100274
[10] Anbalagan, A., Saminathan, M., & Bairi, A. R. (2021, August 11). Cost optimization techniques in cloud workloads through Telemetry-Driven Analytics. https://www.sydneyacademics.com/index.php/ajmlra/article/view/238
[11] Arzani, B., & Rouhani, B. (2020). Towards a Domain-Customized automated machine learning framework for networks and systems. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2004.11931
[12] Kardani-Moghaddam, S., Buyya, R., & Ramamohanarao, K. (2018). Performance-Aware Management of Cloud Resources: A Taxonomy and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1808.02254
[13] Anbalagan, A., Kanka, V., & Murthy, C. G. (2021, September 20). Cross-Cloud Telemetry Management: unified monitoring and Vendor-Neutral solutions for Multi-Cloud environments. https://thesciencebrigade.org/jst/article/view/565
[14] Anbalagan, A., Tomar, M., & Kanka, V. (2021, June 21). High-Performance Telemetry Pipelines for cloud Architectures: Optimization and scalability Strategies. https://www.sydneyacademics.com/index.php/ajmlra/article/view/241?articlesBySimilar ityPage=5
[15] Shkuro, Y., Renard, B., & Singh, A. (2022). Positional Paper: Schema-First Application Telemetry. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2206.11380
[16] Alexander, K., Hanif, M., Lee, C., Kim, E., & Helal, S. (2020). Cost-aware orchestration of applications over heterogeneous clouds. PLoS ONE, 15(2), e0228086. https://doi.org/10.1371/journal.pone.0228086
[17] Shatnawi, A., Orrù, M., Mobilio, M., Riganelli, O., & Mariani, L. (2018). Cloudhealth. Cloudhealth, 99, 40–47. https://doi.org/10.1145/3194124.3194130
[18] Ramanathan, S., Shivaraman, N., Suryasekaran, S., Easwaran, A., Borde, E., & Steinhorst, S. (2020, April 30). A survey on Time-Sensitive Resource Allocation in the Cloud Continuum. arXiv.org. https://arxiv.org/abs/2004.14559
[19] Theodorou, V., Gerostathopoulos, I., Alshabani, I., Abello, A., & Breitgand, D. (2021, February 25). MEDAL: an AI-driven data fabric concept for elastic Cloud-to-Edge intelligence. arXiv.org. https://arxiv.org/abs/2102.13125
[20] Yu, L., Jiang, T., & Zou, Y. (2017). Fog-Assisted operational cost reduction for cloud data centers. IEEE Access, 5, 13578–13586. https://doi.org/10.1109/access.2017.2728624





