Automation
Terraform Azure: secure a private state backend without breaking CI
Design an Azure Terraform backend based on a private Storage Account with CI identity, controlled network access, locking, separate bootstrap, and a diagnostic runbook when init or plan fails.
The Terraform state backend is a critical dependency that is mostly noticed when it fails. As long as terraform init works, the subject feels solved. The day a CI runner can no longer reach the Storage Account, a lock remains stuck, or a network rule cuts off the team, the state becomes the mandatory gateway to the whole infrastructure.
The scenario here is an Azure platform managed by Terraform, with an azurerm backend stored in a Storage Account. The objective is to reduce state exposure without making deployments impossible: identity-based access, controlled network path, separate bootstrap, clear diagnosis, and recovery procedure.
Treat state as an operational resource
The state file contains resource identifiers, dependencies, and sometimes sensitive values depending on how modules are written. It should not be treated as a simple technical artifact. You need to know who can read it, who can modify it, who can remove a lock, and from which networks these actions are possible.
Questions to answer
Where is the state stored?
Which CI identity can read and write it?
Which people can intervene in an emergency?
From which networks is access allowed?
How is a stuck lock removed without breaking history?
How is a previous version restored if necessary? This clarification avoids two extremes: a Storage Account opened too broadly so everything works, or a backend so locked down that every network change breaks deployments.
Separate bootstrap from managed infrastructure
The Terraform backend cannot depend entirely on the code that uses it. If the state Storage Account, its container, network rules, and rights are created by the same state, a mistake can make the stack hard to recover. It is better to separate backend bootstrap from application stacks.
Backend bootstrap
State resource group
Storage Account
State container
Private Endpoint if used
Minimal roles for the CI identity
Documented recovery procedure
Application Terraform stacks
Networks, services, identities, applications
Consume the existing backend
Do not recreate the state foundation at every plan The bootstrap can remain very simple. The important part is that it is understandable and recoverable. A team must be able to rebuild or verify the backend without launching the full application infrastructure.
Use a dedicated CI identity
Access to the state should not rely on a shared secret or personal account. A dedicated CI identity, with OIDC federation when possible, makes access easier to read. It should have the required rights on the state container, but no more.
Terraform CI identity
Read the state
Write the state
Manage the lock
Read resources required for plan
Apply only in expected scopes
Avoid
Global administrator account
Long-lived unrotated secret
Same identity for every platform
Default Owner rights without justification This separation also makes reviews easier. When a Terraform change writes to the state, the action should be tied to a pipeline and a readable identity, not to a key copied across several environments.
Lock down the network without forgetting runners
Moving the Storage Account to private access is often desirable, but it changes the CI path. If runners are hosted outside the private network, terraform init will no longer reach the backend. Decide where Terraform runs: runner in Azure, runner connected to the private network, or tightly controlled network exception.
Option A: runner in an authorized Azure network
Access to the Storage Account through Private Endpoint
privatelink DNS validated from the runner
Robust model for sensitive environments
Option B: external runner with limited public access
Strict network rules
Strong identity mandatory
Simpler, but depends on stable source addresses
Option C: temporary maintenance runner
Used for recovery or bootstrap
Short access, traced and disabled after intervention The critical point is DNS. With a Private Endpoint, the runner must resolve the Storage Account name to the expected private address. Otherwise, the problem will look like a Terraform error while it is actually a resolution or network path issue.
Make Terraform errors readable
A terraform init or terraform plan failure can come from the backend, identity, network, a lock, or the Azure provider. The runbook should isolate the layer before changing rights or network rules.
Error: cannot reach the backend
Check Storage Account DNS from the runner
Check route, firewall, and Private Endpoint
Test HTTPS access to the blob endpoint
Error: authorization failed
Check the CI identity role on the expected scope
Check OIDC federation or secret used
Check target subscription and tenant
Error: state lock
Identify the pipeline or user owning the lock
Check whether an apply is still running
Remove the lock only after validation This grid avoids dangerous fixes, such as adding global rights for a DNS problem or removing a lock while an apply is still active.
Plan locking and recovery
The lock protects against concurrent writes. It should not be bypassed by habit. However, you need to know what to do after a pipeline interruption, a stopped runner, or a network error at the wrong time. The recovery procedure must be short and strict.
Stuck lock procedure
1. Identify the lock ID and creation time
2. Check the associated pipeline
3. Confirm that no apply is still running
4. Notify the team involved
5. Remove the lock with the appropriate command
6. Run a plan before any apply The last point matters. After intervening on the state, do not jump directly into a destructive change. A plan verifies that Terraform still understands the real state.
Keep simple but regular controls
A secured state backend can drift: role removed, private DNS broken, runner moved, network rule changed, versioning disabled, forgotten maintenance access. A few periodic checks are enough to avoid surprises.
Periodic controls
terraform init from the main runner
Storage Account DNS resolution from the runner
Active roles on the Storage Account and container
Public network disabled or limited depending on the selected model
Versioning and delete protection verified
Recovery procedure tested by another person These controls do not replace monitoring, but they provide usable proof. The backend is not only present: it remains reachable through the right paths and closed to unintended paths.
Conclusion
Securing an Azure Terraform backend is not just creating a private Storage Account. Think through the full cycle: bootstrap, CI identity, network path, DNS, locking, recovery, and periodic validation. The state is an operational resource, not a pipeline detail.
A healthy base is to isolate the backend foundation, limit CI identity rights, explicitly choose where runners execute, then document init, plan, and lock diagnostics. With this discipline, the backend remains protected without becoming the blocker for every infrastructure change.