Source URL: https://cloud.google.com/blog/products/containers-kubernetes/infrastructure-as-code-at-waze-using-config-connector/
Source: Cloud Blog
Title: Waze’s journey to Infrastructure as Code with Google Cloud’s KCC
Feedly Summary: In 2023, the Waze platform engineering team transitioned to Infrastructure as Code (IaC) using Google Cloud’s Config Connector (KCC) — and we haven’t looked back since. We embraced Config Connector, an open-source Kubernetes add-on, to manage Google Cloud resources through Kubernetes. To streamline management, we also leverage Config Controller, a hosted version of Config Connector on Google Kubernetes Engine (GKE), incorporating Policy Controller and Config Sync. This shift has significantly improved our infrastructure management and is shaping our future infrastructure.
The shift to Config Connector
Previously, Waze relied on Terraform to manage resources, particularly during our dual-cloud, VM-based phase. However, maintaining state and ensuring reconciliation proved challenging, leading to inconsistent configurations and increased management overhead.
In 2023, we adopted Config Connector, transforming our Google Cloud infrastructure into Kubernetes Resource Modules (KRMs) within a GKE cluster. This approach addresses the reconciliation issues encountered with Terraform. Config Sync, paired with Config Connector, automates KRM synchronization from source repositories to our live GKE cluster. This managed solution eliminates the need for us to build and maintain custom reconciliation systems.
The shift helped us meet the needs of three key roles within Waze’s infrastructure team:
Infrastructure consumers: Application developers who want to easily deploy infrastructure without worrying about the maintenance and complexity of underlying resources.
Infrastructure owners: Experts in specific resource types (e.g., Spanner, Google Cloud Storage, Load Balancers, etc.), who want to define and standardize best practices in how resources are created across Waze on Google Cloud.
Platform engineers: Engineers who build the system that enables infrastructure owners to codify and define best practices, while also providing a seamless API for infrastructure consumers.
aside_block
First stop: Config Connector
It may seem circular to define all of our Google Cloud infrastructure as KRMs within a Google Cloud service, however, KRM is actually a great representation for our infrastructure as opposed to existing IaC tooling.
Terraform’s reconciliation issues – state drift, version management, out of band changes – are a significant pain. Config Connector, through Config Sync, offers out-of-the-box reconciliation, a managed solution we prefer. Both KRM and Terraform offer templating, but KCC’s managed nature aligns with our shift to Google Cloud-native solutions and reduces our maintenance burden.
Infrastructure complexity requires generalization regardless of the tool. We can see this when we look at the Spanner requirements at Waze:
Consistent backups for all Spanner databases
Each Spanner database utilizes a dedicated Cloud Storage bucket and Service Account to automate the execution of DDL jobs.
All IAM policies for Spanner instances, databases, and Cloud Storage buckets are defined in code to ensure consistent and auditable access control.
To define these resources, we evaluated various templating and rendering tools and selected Helm, a robust CNCF package manager for Kubernetes. Its strong open-source community, rich templating capabilities, and native rendering features made it a natural fit. We can now refer to our bundled infrastructure configurations as ‘Charts.’ While KRO has since emerged that achieves a similar purpose, our selection process predated its availability.
Under the hood
Let’s open the hood and dive into how the system works and is driving value for Waze.
Waze infrastructure owners generically define Waze-flavored infrastructure in Helm Charts.
Infrastructure consumers use these Charts with simplified inputs to generate infrastructure (demo).
Infrastructure code is stored in repositories, enabling validation and presubmit checks.
Code is uploaded to a Artifact Registry where Config Sync and Config Connector align Google Cloud infrastructure with the code definitions.
This diagram represents a single “data domain," a collection of bounded services, databases, networks, and data. Many tech orgs today consist of Prod, QA, Staging, Development, etc.
Approaching our destination
So why does all of this matter? Adopting this approach allowed us to move from Infrastructure as Code to Infrastructure as Software. By treating each Chart as a software component, our infrastructure management goes beyond simple code declaration. Now, versioned Charts and configurations enable us to leverage a rich ecosystem of software practices, including sophisticated release management, automated rollbacks, and granular change tracking.
Here’s where we apply this in practice: our configuration inheritance model minimizes redundancy. Resource Charts inherit settings from Projects, which inherit from Bootstraps. All three are defined as Charts. Consequently, Bootstrap configurations apply to all Projects, and Project configurations apply to all Resources.
Every change to our infrastructure – from changes on existing infrastructure to rolling out new resource types – can be treated like a software rollout.
Now that all of our infrastructure is treated like software, we can see what this does for us system-wide:
Reaching our destination
In summary, Config Connector and Config Controller have enabled Waze to achieve true Infrastructure as Software, providing a robust and scalable platform for our infrastructure needs, along with many other benefits including:
Infrastructure consumers receive the latest best practices through versioned updates.
Infrastructure owners can iterate and improve infrastructure safely.
Platform Engineers and Security teams are confident our resources are auditable and compliant
Config Connector leverages Google’s managed services, reducing operational overhead.
AI Summary and Description: Yes
Summary: The transition of Waze to Infrastructure as Code (IaC) utilizing Google Cloud’s Config Connector represents a significant advancement in their infrastructure management. This shift not only enhances operational efficiency but also aligns with best practices in security and governance, addressing previous inconsistencies experienced with Terraform.
Detailed Description:
Waze’s engineering team shifted toward Infrastructure as Code (IaC) using Google Cloud’s Config Connector to transform their resource management practices. This transition has yielded several advantages in terms of operational simplicity, governance, and security compliance.
Key points include:
– **Background Context**: Waze initially relied on Terraform for managing dual-cloud, VM-based resources. However, challenges such as state drift and configuration inconsistencies led to heavier management overhead.
– **Adoption of Config Connector**: By leveraging Config Connector within Google Kubernetes Engine (GKE), Waze turned their Google Cloud infrastructure into Kubernetes Resource Modules (KRMs). This integration significantly addresses issues related to reconciliation that were previously faced with Terraform.
– **Automation and Standardization**: The use of Config Sync automates synchronization between code repositories and live GKE clusters, thus minimizing the need for custom reconciliation systems.
– **Impact on Roles**:
– **Infrastructure Consumers**: Allow simple deployment of infrastructure by abstracting complexity.
– **Infrastructure Owners**: Focus on defining and standardizing best practices across resources.
– **Platform Engineers**: Empower infrastructure owners through APIs for enforcing consistent practices.
– **Advantages of Config Connector**:
– **Reduction of Management Overhead**: The managed solution means less maintenance and more focus on policy enforcement.
– **Enhanced Governance**: By defining IAM policies in code, Waze ensures consistency in access control, thus improving auditability.
– **Choice of Helm for Templating**: Helm was selected for its robust capabilities, facilitating the bundling of infrastructure configurations into ‘Charts’, which streamline deployment processes.
– **Infrastructure as Software**: The transformation enables treating infrastructure changes similarly to software rollouts, enhancing version control, release management, and tracking of changes.
– **Overall Benefits**:
– Continuous incorporation of best practices through versioning
– Safer iteration and improvement of infrastructure
– Assurance to security teams of compliant and auditable resources
– Leverage of Google-managed services to further minimize operational complexities
As a result of this transition, Waze not only advances its technical capabilities but also enhances its compliance posture and governance frameworks, which are vital considerations in today’s infrastructure and security landscape.