- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
This blog was written by Puneet Gupta.
As the pursuit of achieving unparalleled observability for our systems at Palo Alto Networks continues, I’m excited to share the intricate details of a game-changing development — the creation of a unified API layer for Garuda. If you haven’t already, I invite you to explore the foundation of our observability journey by checking out the first part of this series, where we introduced the Garuda platform and the revolutionary Garuda Operator.
Now, in this latest instalment, we dive deeper into the architecture, features, and transformative impact of a comprehensive API layer that encompasses logs, metrics, rules, and everything Garuda needs. Notably, we’ve incorporated cardinality insights APIs, enriching our observability capabilities. But that’s not all — join me in discovering how we’ve extended our innovation to the front end, creating a powerful visualization interface to harness the full potential of Garuda’s insights.
As we embarked on the journey of constructing the Garuda observability platform, we faced challenges that demanded a cohesive and innovative solution. The Unified API Layer emerged as the key element to address critical aspects, ensuring Garuda’s observability capabilities could evolve seamlessly. Now, let’s delve into the reasons why we created this platform and how it’s set to change the way we perceive things.
In summary, the Unified API Layer plays a pivotal role in the Garuda observability platform. By addressing cardinality, optimizing rules, ensuring health, simplifying migrations, and enhancing security, it helps enable a robust, scalable, and secure user experience. Now, let’s explore the Garuda API architecture for insights.
Architecture of Garuda API :
Cardinality Insights:
in the context of our platform, signifies the count of unique combinations of label values or series. A cardinality explosion occurs when this count becomes excessively high, potentially causing platform disruptions due to increased processing times and query complexities. Retrieving and processing all cardinality during queries can be time-consuming. Therefore, a robust platform must proactively manage high cardinality metrics to ensure efficient performance and prevent potential breakdowns.
Our API offers two essential functionalities: one reveals the cardinality of every metric, while the other provides insights into the cardinality of labels and label values, aiding in the identification and management of high cardinality metrics within our platform.
Our system meticulously discerns tenant-specific metric usage by extracting data from user Grafana dashboards, alert and recording rules, and queries. Used metrics are those actively employed by tenants, contributing to their monitoring and analytics needs. On the contrary, unused metrics are those that remain dormant or unutilized. This metric categorization facilitates a detailed analysis, allowing users to prioritize and manage the cardinality of both actively utilized (used) and inactive (unused) metrics effectively for optimized resource allocation.
Our API efficiently identifies labels with zero used metrics and provides cardinality insights, detailing the extent of metric consumption for each of these inactive labels. This empowers users to streamline resource allocation by addressing and optimizing unused metric labels.
Our system detects instances where a metric is ingested more than once, often occurring when the same metric is sent from different exporters running on the customer’s end. This ensures accurate monitoring and avoids redundancy in metric ingestion.
Our API provides insights into the total log volume sent by the customer to the platform, breaking it down further to reveal the contribution of each service, such as clusters, namespaces, and apps. This granular analysis enables customers to pinpoint noisy services, facilitating informed actions for efficient log management and resource optimization.
Report Service and Auto-Remediation:
Impact of cardinality insights apis:
The impact of our Cardinality Insights APIs has been significant; one customer, leveraging these APIs, efficiently dropped top unused metrics, reducing active series by nearly 80% and realizing substantial cost savings. This practice is consistently applied across all customers, leading to frequent drops of unused metrics, resulting in significant cost reductions and enhanced platform performance.
By analyzing recording intervals, our system records and displays the internal recording interval for users. This information aids users in understanding the number of recording rules operating at specific intervals. Increasing the interval, especially for 1-second recording rules, can significantly enhance overall system performance, optimizing resource utilization and responsiveness.
The current version of Garuda is [insert version number]. Garuda’s open-source components, including Grafana, Mimir, and Loki, are running on their respective versions, with comprehensive health check information readily available for users.
In conclusion, Garuda stands as a robust observability platform, seamlessly integrating powerful features like Cardinality Insights APIs to optimize resource usage and enhance performance. With proactive identification of unused metrics, customers experience significant cost savings and improved efficiency. The platform’s versioned components and comprehensive health checks ensure a stable and cutting-edge environment, making Garuda a reliable ally in the dynamic landscape of observability.
Achieving a seamless and unified API layer for Garuda was no small feat, but now, with this comprehensive solution in place, consistent and reliable monitoring has become a reality. The Unified API Layer, intricately designed, empowers users to navigate through diverse metrics effortlessly. If you’ve encountered similar challenges or have insights to share, feel free to drop your thoughts in the comments section or connect with us on LinkedIn.
From the Observability Platform Team at Palo Alto Networks:
Thanks for reading!