An Elegant, Automated Infrastructure Factory
Every Platform Engineer has a dream: to build a perfectly working machine where developers can simply create the infrastructure they need without causing any trouble or urgent work for the platform team. This infrastructure should comply with security policies, be consistent, and be observable.
My initial goal was to build this dream. I wanted to create a Proof of Concept (PoC) of such a platform, document my findings in an article, and share it with a public audience. I wanted to learn a modern open-source project called Backstage because it seemed like a well-documented solution that would be straightforward to try.
The premise was an elegant architecture to solve the “WET” (Write Everything Twice) problem. Many organizations, as their teams grow, struggle with challenges as the idea of IaC (Infrastructure as Code) degrades into a mess of duplicated Terraform modules. This is a situation where a simple change requires a manual copy-paste approach, carrying a high possibility of making an error.
My solution was a marriage of Developer Experience and Platform Engineering, combining two key parts:
- The Portal: A simple UI for developers to request infrastructure, like S3 buckets or RDS instances.
- The IaC Orchestration: An automated backend driven by Terragrunt and managed by the Portal.
Here is what it was supposed to look like:
Then, I tried to build it. And that’s when the article I intended to write died, and this one was born.
Backstage is a Framework, Not a Product
My plan was scoped with three main goals:
- Get a basic Backstage instance running on my local
- Connect it to a Postgres database
- Build a simple Scaffolder template.
But this “simple” plan turned out to be incredibly complicated. Every small step had a dozen other tricky steps hidden inside it. Even before writing a single line of a Scaffolder template, I had to debug the Node.js backend, wrestle with authentication providers, manage building a Docker Container, and get Backstage itself running.
I stopped. The article I wanted to write was about the synergy between a developer portal and Terragrunt. Instead, I found myself on a long and frustrating detour to becoming a full-time Backstage administrator.
Backstage has a lot of promise. It can be a single, central hub for all of a company’s engineering tools, and you can customize it to do anything. But the implementation is a full software project that requires a dedicated team of engineers with skills in React, TypeScript, and Node.js. Its Total Cost of Ownership (TCO) is not the license fee (it’s free), but the significant and ongoing investment in dedicated engineering talent.
Why Terragrunt Still Wins
Even though the Portal part didn’t work, the automated backend design is still the right approach and is more crucial than ever. The complexity of the portal does not invalidate the need for a DRY orchestration layer.
Regardless of whether requests come from a homegrown Backstage instance, a managed IDP, or even a CI/CD job, Terragrunt is the key to solving the WET problem and ensuring long-term maintainability.
The Key Terragrunt Patterns:
- DRY Configuration: Define common settings (like backend state, provider versions, and default tags) once in a root
terragrunt.hclfile. Child configurations inherit these settings, eliminating code duplication. - Automated State Management: Terragrunt can automatically create and configure the remote state backend (e.g., an S3 bucket and DynamoDB table) for each module, ensuring state files are isolated and managed correctly without manual intervention.
- Immutable, Versioned Modules: The factory model separates reusable Terraform modules (the “blueprints”) from the live Terragrunt configurations (the “instantiations”). This allows you to version your modules and roll out updates progressively by simply changing a
reftag.
Here’s a glimpse of the elegance. Instead of a 100-line .tf file, the output of the automation is this small, declarative terragrunt.hcl file:
# terragrunt.hcl for a specific service's RDS database
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "git::ssh://[email protected]/your-org/terraform-modules.git//aws/rds?ref=v1.2.1"
}
inputs = {
instance_class = "db.t3.micro"
allocated_storage = 20
db_name = "my-app-database"
environment = "staging"
}
This is the fundamental trade-off: A template generating raw .tf files is faster for Day 1, but creates a nightmare of maintenance debt on Day 2. The Terragrunt approach creates a contract, not a copy. An update to the central rds module to version v1.2.2 can be rolled out across hundreds of services in a controlled, automated fashion using terragrunt run --all. That is the power of a true factory.
Conclusion
I set out to build a factory and instead discovered the high cost of building my own tooling. The experience wasn’t a failure, but a refinement of the blueprint. The core principles of DRY IaC and central orchestration with Terragrunt are more critical than ever. However, the path to a developer-friendly frontend is not one-size-fits-all.
Before you commit to building your developer portal from the ground up with a powerful framework like Backstage, take a hard look at the total cost of ownership. The most elegant architecture is useless if it never gets built. Sometimes, the most pragmatic solution is to buy the storefront and focus your energy on perfecting the factory floor.
Resources
- Backstage: Official Documentation – The open-source framework for building developer portals.
- Terragrunt: Official Website – A thin wrapper for Terraform that provides extra tools for keeping your configurations DRY.
