Streamlining Terraform Refactoring: Harnessing AI Agents for Production-Ready IaC
Converting existing infrastructure into Terraform code, often a necessity for managing complex environments, can frequently lead to a chaotic and unmaintainable codebase. While tools like Terraformer excel at generating initial Terraform manifests from existing resources, the output is notoriously messy, laden with superfluous details, inconsistent naming conventions, and hard-coded values. Traditionally, developers would spend countless hours manually refactoring this autogenerated code to meet production standards. However, a novel approach leverages a team of AI agents to automate this tedious cleanup and refactoring process, transforming raw Terraformer output into clean, efficient, and maintainable Infrastructure as Code (IaC).
The Challenge of Autogenerated Terraform
The core problem lies in the nature of direct infrastructure exports. Terraformer, while powerful, produces Terraform configurations with several significant drawbacks:
* Inconsistent Naming: Resources often feature awkward names with double-dashes, mixed cases, and embedded IDs, deviating from best practices like snake_case
.
* Superfluous Attributes: Default or constructed values for attributes like tags_all
and region
are explicitly defined, bloating the code unnecessarily.
* Broken State Management: Generated variables.tf
, outputs.tf
, and terraform.tfstate
files are often dysfunctional or reference remote states in peculiar ways.
* Static Dependencies: Resources frequently contain hard-coded IDs instead of implicit dependencies, making the code brittle and difficult to manage.
* Lack of Implicit Relationships: The generated code lacks the intelligent connections between resources that are essential for robust Terraform.
* Outdated Compatibility: State files might be generated for older Terraform versions, creating immediate upgrade hurdles.
These issues collectively render the autogenerated code unsuitable for direct use in production, necessitating extensive manual intervention.
An AI-Driven Solution: Docker’s Cagent and Terraform MCP Server
To tackle this challenge, an innovative workflow employs a team of AI agents orchestrated by Docker’s cagent
tool, working in tandem with the HashiCorp terraform-mcp-server
. This combination offers a no-code/low-code solution for complex refactoring tasks, blending the deterministic precision of scripts with the non-deterministic reasoning of Large Language Models (LLMs).
The Multi-Agent Workflow:
The solution is structured as a sequential multi-agent pipeline, where each agent specializes in a distinct aspect of the cleanup process:
- Root Agent: The orchestrator, managing the flow between specialized sub-agents.
- Cleaner Agent: Performs initial sanitization tasks, including:
- Removing
tags_all
andregion
attributes. - Replacing double dashes in resource names.
- Deleting unnecessary
terraform.tfstate
,outputs.tf.json
,variables.tf.json
files, and.terraform
directories. - Updating references to remote state outputs with local equivalents.
- Crucially, using the
terraform-mcp-server
to identify and remove resource attributes that are defined with default values, leveraging provider documentation for accuracy.
- Removing
- Connecter Agent: Focuses on establishing implicit dependencies. It intelligently identifies static attribute values (e.g.,
vpc_id
,subnet_id
,iam_role_arn
) that correspond to other generated resources and converts them into Terraform expressions (e.g.,${aws_vpc.tfer_vpc-id.id}
). This step is vital for creating a truly interconnected and manageable IaC. Theterraform-mcp-server
is instrumental here for looking up resource output attributes. - Importer Agent: Generates Terraform
import
blocks for all resources. It queries theterraform-mcp-server
to determine if a resource supports import and, if so, constructs the appropriate import block within a newimports.tf.json
file. Resources that cannot be imported are removed from the codebase. - Finalizer Agent: Acts as the quality assurance layer. It reviews all changes, makes final adjustments based on Terraform best practices, and ensures the
imports.tf.json
file is correctly formatted. A key feature of this agent is its ability to convert the cleaned JSON Terraform into standard HCL format and even rename resources to be more descriptive based on its understanding of the environment and style guidelines.
Practical Outcomes and Key Insights
Implementing this AI-driven refactoring against a substantial AWS network deployment yielded impressive results, achieving approximately 95% accuracy. While validation of the output remains critical (as LLM outputs are non-deterministic and can miss minor details), the significant reduction in manual effort is undeniable.
Several valuable lessons emerged from this endeavor:
- Frontier Models are Superior: The choice of LLM model profoundly impacts performance. High-quality “frontier models” demonstrate significantly better tool-calling capabilities and consistent results compared to less mature local models, making them a worthwhile investment.
- Embrace Mixed Solutions: The most effective solutions combine deterministic scripts for predictable tasks (e.g., removing specific attributes, replacing characters) with non-deterministic AI agents for complex, nuanced decisions (e.g., identifying implicit dependencies, refactoring based on style guidelines). This hybrid approach maximizes both efficiency and accuracy.
- The Power of MCP: The
terraform-mcp-server
proved indispensable. Its ability to provide schema information and lookup resource details enabled agents to make informed, provider-aware decisions, a capability often lacking in generic LLM applications. - Unexpected Agent Capabilities: Beyond core instructions, agents, particularly the Finalizer, demonstrated surprising capabilities such as inferring more descriptive resource names and seamlessly converting JSON to HCL, adding significant value beyond initial expectations.
- Documentation Challenges with New Tools: While powerful, new open-source tools like
cagent
may have evolving or incomplete documentation, requiring developers to occasionally delve into examples or the codebase itself to uncover advanced features.
Conclusion
Automating the cleanup and refactoring of Terraformer output using AI agentic workflows represents a significant leap forward in managing infrastructure as code. By strategically deploying tools like Docker’s cagent
and the terraform-mcp-server
, organizations can transform a cumbersome, error-prone manual process into an efficient, largely automated pipeline. This not only saves considerable time and resources for DevOps teams but also leads to more robust, maintainable, and production-ready Terraform configurations, ultimately enhancing infrastructure agility and reliability. Always remember to validate the AI-generated output, but with the right approach, AI-powered refactoring can become an invaluable asset in your IaC toolkit.