Building an IDP with Backstage and Kubernetes: A Comprehensive Guide (Part 1)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
L1 Bithead

Title_Building-IDP-with-Backstage-and-Kubernetes_palo-alto-networks.jpg

 

This blog written in collaboration with Dhanasekar Kandasamy, Pradeepkumar Vijaya Kumar, and Shubham Ranjan.

 

 

Introduction

 

In the realm of platform engineering and infrastructure, the challenges often seem universal. Our journey through the tech landscape, equipped with an engineer's attitude, has been about crafting solutions—whether by harnessing open-source tools or developing new services.


Working at a large company has illuminated a common pattern: many of us face similar hurdles and resort to analogous strategies for resolution. This realization sparked our initiative to build a more unified, effective approach.

 

The Challenge of Diverse Technology Stacks

 

Our initial solution involved creating an Internal Developer Platform (IDP) using Backstage, a platform designed for building developer portals. We integrated a variety of plugins—both standard ones from Backstage and custom-built ones by our team—to meet specific needs. Our goal was to cultivate a community where contribution and shared benefits were the norms.

 

However, we encountered significant challenges. The primary tech stacks of our SRE, DevOps, and Infrastructure teams are centered around Python and Golang, with less emphasis on TypeScript. This diversity led to our platform being used selectively, similar to other team-specific tools, which ultimately diluted the intended community-driven approach.

 

Additionally, the limitation of needing to write plugins for every use case prevented us from onboarding services built by other teams. We also faced issues with limited database support and the lack of built-in monitoring. Furthermore, as the number of plugins and users grew, maintaining performance and ensuring scalability became challenging, requiring significant customization and optimization. Customizing Backstage to fit our specific organizational workflows and requirements also proved to be time-consuming, involving substantial development work to tailor the platform to our needs.

 

Revising Our Strategy

 

Upon reassessing, we refined our architectural goals to better align with our needs and those of our teams:

 

  • Build once, deploy anywhere: Ensuring that our solutions are universally applicable and easy to deploy.
  • Enable contributions through SDKs and templates: Facilitating a broader range of contributions by supporting multiple languages and frameworks.
  • Foundation for functionality: Establishing a robust framework that emphasizes clarity and safety in adding new features.
  • Empowerment without borders: Allowing Engineers to safely develop significant functionalities without the need for extensive software engineering expertise.
  • Extensibility and multi-tenancy: Creating a platform that is adaptable to product-specific needs and shared across all teams.
  • High Availability and Disaster Recovery: Implementing fail-safes like break-glass modes to ensure continuity.

 

Technically, the platform includes built-in observability, workflow management tools, an audit framework, notification systems, and comprehensive support for role-based access control (RBAC) and a diverse range of databases.


Why a Unified Platform?

 

Our platform serves as a foundational toolset that is used daily by our teams, offering essential services like:

 

  • Database management
  • Caching
  • RBAC
  • Audit trails
  • Certificate management
  • Monitoring
  • Infrastructure management
  • Deployment management

 

These tools are modular, allowing teams to focus on core business problems while leveraging a vast, ever-growing library of modules for tailored needs. The entire framework is configuration-driven, simplifying the integration and customization process.

 

Platform Core Capabilities

  • Managed Databases: PostgreSQL, MySQL/MariaDB, MongoDB
  • Cache as a Service: Redis
  • Message Brokers as a Service: RabbitMQ, NATS
  • Pipeline as a Service: Deploy on-demand ETL pipelines for data operations.
  • Automation & Orchestration: Stackstorm, Argo
  • Observability: Loki, Mimir, Prometheus, Tempo, Grafana, Devlake

 

Our platform also features user-centric capabilities such as Just-In-Time Access (JITA) for production, and a unified portal that simplifies access and enhances productivity.

 

Architecture

 

Fig 1_Building-IDP-with-Backstage-and-Kubernetes_palo-alto-networks.png

 

In this section, we will discuss some of the custom core tools developed to enhance the core functionalities of our framework, focusing on Role-Based Access Control (RBAC), observability & UI framework.

 

Role-Based Access Control (RBAC)

After thorough analysis of various open-source tools, we decided to leverage OPA Policy and Istio Authorization to support RBAC in our framework. This decision was driven by the need for a robust and flexible access control mechanism that integrates seamlessly with our existing infrastructure.

 

Key Components:

 

  • OPA Policy: Provides a policy-based control mechanism, allowing us to define fine-grained access controls.
  • Istio Authorization: Simplifies the implementation of RBAC by offloading the authorization logic from the application to the infrastructure layer.

 

Use Cases:

 

  • Service Account/Token Support: Ensures that service accounts and tokens are managed securely and can be used to control access to resources.
  • Real-Time User Updates: Allows for dynamic updates to user roles and permissions, ensuring that access controls are always up-to-date.
  • LDAP Group and Token Resource Mapping: Integrates with our LDAP system to map user groups and tokens to specific resources, providing a seamless authentication and authorization experience.
  • Istio Authorization Support using RBAC: Enables applications to delegate authorization logic to Istio, eliminating the need for applications to implement complex access control mechanisms.

 

Garuda Observability

Garuda is an efficient, reliable, and flexible observability platform that turns scale data (metrics, logs, and distributed traces) into actionable insights with auto-remediation capabilities. The platform enhances the visibility of our framework, ensuring that we can monitor and respond to issues in real time.

 

Key Components:

 

  • Loki: A highly efficient log aggregation system that allows us to collect and query logs with ease.
  • Mimir: Extends the capabilities of Prometheus, providing a highly available and scalable metrics storage and querying solution.
  • Prometheus: The cornerstone of our monitoring stack, responsible for collecting and storing metrics.
  • Grafana: A powerful visualization tool that allows us to create dashboards and alerts based on metrics, logs, and traces.

 

Add-On Services:

 

  • StackStorm: Acts as a robust workflow engine, enabling complex workflows and automations to be defined and executed efficiently.
  • SMAAS (Secrets Management): Ensures that sensitive information, such as API keys and passwords, are stored securely and accessed only by authorized services.

 

UI Framework & Portal

Backstage serves as the core of our UI framework, providing a centralized portal for managing our services and infrastructure. By leveraging Backstage, we can offer a seamless and integrated user experience, making it easier for our teams to navigate and manage the various components of our framework.


Key Features:

 

  • Service Catalog: Backstage's catalog feature allows us to maintain a comprehensive inventory of all our services, providing visibility and easy access to service documentation, ownership information, and status.

 

Understanding the relations between various services has been a lot simpler since we onboarded our resources in backstage

 

Fig 2_Building-IDP-with-Backstage-and-Kubernetes_palo-alto-networks.png

 

  • Plugin Architecture: Backstage's flexible plugin architecture enables us to integrate various tools and services directly into the portal, offering a unified interface for our users. The extensive set of open-source plugins available, such as JIRA, ArgoCD, and GitLab, significantly aids in understanding and updating the CI/CD processes across the organization. This integration not only enhances efficiency but also ensures that our development and deployment pipelines are seamlessly managed and monitored.
  • Developer Portal: Provides a centralized location for developers to access all the necessary resources, including service templates, CI/CD pipelines, and deployment tools.
  • Templates: Backstage templates streamline the process of creating new services by providing standardized, reusable templates that ensure consistency, reduce setup times, and promote best practices across teams, thus enhancing overall productivity and quality of our deployments.

 

Fig 3_Building-IDP-with-Backstage-and-Kubernetes_palo-alto-networks.png

 

  • Extensibility: The platform's extensibility allows us to develop custom plugins tailored to our specific needs, enhancing the overall functionality of the portal.

 

Next Steps

 

In the upcoming sections, I will explore how tools like Backstage, Grafana stack, and Helm have been instrumental in building this config-driven framework. Stay tuned for an in-depth exploration of how we leverage these technologies to foster a robust, scalable infrastructure platform.

 

1 Comment