- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
This blog written in collaboration with Dhanasekar Kandasamy, Pradeepkumar Vijaya Kumar, and Shubham Ranjan.
In the realm of platform engineering and infrastructure, the challenges often seem universal. Our journey through the tech landscape, equipped with an engineer's attitude, has been about crafting solutions—whether by harnessing open-source tools or developing new services.
Working at a large company has illuminated a common pattern: many of us face similar hurdles and resort to analogous strategies for resolution. This realization sparked our initiative to build a more unified, effective approach.
Our initial solution involved creating an Internal Developer Platform (IDP) using Backstage, a platform designed for building developer portals. We integrated a variety of plugins—both standard ones from Backstage and custom-built ones by our team—to meet specific needs. Our goal was to cultivate a community where contribution and shared benefits were the norms.
However, we encountered significant challenges. The primary tech stacks of our SRE, DevOps, and Infrastructure teams are centered around Python and Golang, with less emphasis on TypeScript. This diversity led to our platform being used selectively, similar to other team-specific tools, which ultimately diluted the intended community-driven approach.
Additionally, the limitation of needing to write plugins for every use case prevented us from onboarding services built by other teams. We also faced issues with limited database support and the lack of built-in monitoring. Furthermore, as the number of plugins and users grew, maintaining performance and ensuring scalability became challenging, requiring significant customization and optimization. Customizing Backstage to fit our specific organizational workflows and requirements also proved to be time-consuming, involving substantial development work to tailor the platform to our needs.
Upon reassessing, we refined our architectural goals to better align with our needs and those of our teams:
Technically, the platform includes built-in observability, workflow management tools, an audit framework, notification systems, and comprehensive support for role-based access control (RBAC) and a diverse range of databases.
Our platform serves as a foundational toolset that is used daily by our teams, offering essential services like:
These tools are modular, allowing teams to focus on core business problems while leveraging a vast, ever-growing library of modules for tailored needs. The entire framework is configuration-driven, simplifying the integration and customization process.
Our platform also features user-centric capabilities such as Just-In-Time Access (JITA) for production, and a unified portal that simplifies access and enhances productivity.
In this section, we will discuss some of the custom core tools developed to enhance the core functionalities of our framework, focusing on Role-Based Access Control (RBAC), observability & UI framework.
After thorough analysis of various open-source tools, we decided to leverage OPA Policy and Istio Authorization to support RBAC in our framework. This decision was driven by the need for a robust and flexible access control mechanism that integrates seamlessly with our existing infrastructure.
Key Components:
Use Cases:
Garuda is an efficient, reliable, and flexible observability platform that turns scale data (metrics, logs, and distributed traces) into actionable insights with auto-remediation capabilities. The platform enhances the visibility of our framework, ensuring that we can monitor and respond to issues in real time.
Key Components:
Add-On Services:
Backstage serves as the core of our UI framework, providing a centralized portal for managing our services and infrastructure. By leveraging Backstage, we can offer a seamless and integrated user experience, making it easier for our teams to navigate and manage the various components of our framework.
Key Features:
Understanding the relations between various services has been a lot simpler since we onboarded our resources in backstage
In the upcoming sections, I will explore how tools like Backstage, Grafana stack, and Helm have been instrumental in building this config-driven framework. Stay tuned for an in-depth exploration of how we leverage these technologies to foster a robust, scalable infrastructure platform.