Terraform 301 — The 360° view

Parts 1 and 2 taught the components: files, modules, environments, SOPs. This part shows how they connect. Five diagrams, in order: zoom out, then drill in. Read them top to bottom; each one assumes you've understood the one before.

Companion to Part 1 and Part 2.

01 System map — the seven actors

Question this answers: "When I make an infra change, who/what is involved, and what does each actor own?"

There are seven actors in this story. Each owns a different thing. Most "weird bugs" are really one of them disagreeing with another.

The seven actors that touch one Terraform change Eng 1. Engineer writes code runs plan locally opens PR OWNS: intent Git 2. Git repo main + branches PRs, reviews CODEOWNERS OWNS: history of intent CI 3. CI bot fmt / validate plan -> PR comment tflint, tfsec OWNS: validation CD 4. CD pipeline terraform apply on merge to main env-by-env OWNS: applying changes S3 5. State (S3+DDB) terraform.tfstate DynamoDB lock one per env OWNS: recorded reality AWS 6. AWS API + reality VPCs, EC2, RDS… runs your workloads visible in console OWNS: live infrastructure USR 7. Customers consume your service never see Terraform but feel every outage OWNS: the SLA you keep IAM trust boundary SSO → engineer role role → TerraformDeploy CD → OIDC role enforces who can do what Ticket / charter starts every change JIRA / ServiceNow approval + ID closes after verification assigned push branch / PR webhook plan comment merge to main triggers CD apply read+lock+write refresh: state <- AWS serves every step crosses an IAM boundary Read this map as a story: a ticket arrives (left). The engineer turns intent into code (1) and pushes it to Git (2). CI (3) checks it and posts a plan back to the PR. After approval & merge, CD (4) reads state (5), reconciles AWS (6). Customers (7) feel the change. Every arrow crosses an IAM trust boundary, which is why credential bugs are so common.
Most "drift" = actor 6 (AWS) and actor 5 (state) disagree.
Most "merge conflicts" = two engineers (1) writing into actor 2 (Git) at once.
Most "permission denied" = the IAM boundary between any two adjacent actors.

02 Trace one value through 9 places

Question this answers: "I changed m6i.large to m6i.xlarge in one file. Where does that value live after the change?"

This is the diagram that makes everything click. The same value appears in nine places between your laptop and the running EC2 instance. Each place is owned by a different actor. Bugs hide in the gaps.

Tracing "instance_type = m6i.xlarge" through nine homes ACTOR · OWNER WHERE THE VALUE LIVES EXACTLY WHAT IT LOOKS LIKE 1 Ticket / JIRA human-readable charter INFRA-2210: bump UAT app tier to m6i.xlarge for load test on 2026-05-12. Approver: anil.k. Rollback: revert PR. 2 envs/uat/uat.tfvars on engineer's laptop instance_type = "m6i.xlarge" # committed in working directory; not yet shared 3 Git branch on origin infra-2210-uat-app-tier commit 4f9a1c2 · "INFRA-2210: bump UAT app tier to m6i.xlarge" + instance_type = "m6i.xlarge" - instance_type = "m6i.large" 4 Pull request PR #491 (open) PR diff: ~ instance_type "m6i.large" -> "m6i.xlarge" reviewers see exactly what you changed; nothing more 5 CI plan comment automated, on the PR module.app.aws_launch_template.app will be updated in-place ~ instance_type = "m6i.large" -> "m6i.xlarge" · Plan: 0+ 1~ 0- 6 main on origin after squash merge main — commit a8b3f0d · "INFRA-2210: bump UAT app tier (#491)" git revert a8b3f0d <-- this is your rollback button forever 7 CD apply log in CI/CD UI, audit trail module.app.aws_launch_template.app: Modifying... [id=lt-0a1b2c3d] module.app.aws_launch_template.app: Modifications complete after 2s 8 terraform.tfstate in S3, KMS-encrypted { "resources":[{ "type":"aws_launch_template", "instances":[{ "attributes":{ "instance_type":"m6i.xlarge", ... } }] }] } 9 AWS Launch Template + EC2 instances live aws ec2 describe-launch-templates → InstanceType: m6i.xlarge v$Latest ASG instance refresh rolls 4 instances over 6-10 min → customers see no impact YOU control these team controls Terraform/AWS control Why this matters: Bugs almost always live in the gap between two adjacent rows. Local plan (5) differs from CI plan (5)? Forgot to commit (gap 2 -> 3). CI plan clean but apply fails? Provider/state mismatch (gap 7 -> 8). State says xlarge but EC2 still large? Drift (gap 8 -> 9).
The same value flowing forward Engineer-owned Git-owned (history) Automation-owned (CI/state) AWS-owned (reality)
Drill for new engineers. When something breaks, ask: "which row was last correct?" Walk down from row 1 until you find the first row that has the wrong value. The gap between that row and the previous one is your bug.

03 File cross-references — what feeds what

Question this answers: "When I edit one file, what other files am I implicitly affecting?"

Terraform files don't exist in isolation. They reference each other through specific identifiers. This graph shows the references; arrows point from consumer to producer — "I read from you."

"Who reads from whom" · arrows point from consumer to producer envs/uat/ · an environment root variables.tf declares inputs (no values) var.environment, var.vpc_cidr uat.tfvars supplies values vpc_cidr = "10.20.0.0/16" TF_VAR_* env vars CI/secrets main.tf module calls + glue refs var.x, local.x, data.x.y, module.z.w locals.tf computed-internal tags, naming, derived CIDRs from var.vpc_cidr data.tf read-only AWS lookups aws_caller_identity, AMIs outputs.tf exports the surface vpc_id, subnet_ids… providers.tf region + assume_role default_tags backend.tf S3 bucket / key / DDB no variables allowed versions.tf · pins terraform & provider versions (read first) modules/···/ · reusable components modules/network aws_vpc, aws_subnet, NAT modules/security SG chain (alb->web->app->db) modules/iam role + instance_profile modules/compute launch_template + ASG modules/database RDS Aurora cluster Each module is itself a folder of: variables.tf main.tf outputs.tf terraform.tfstate S3 bucket + DDB lock recorded reality (JSON) AWS API + Reality VPCs, EC2, RDS… what's actually running .terraform.lock.hcl · pins exact provider versions used to apply commit this. Without it, two engineers can run with different providers. var.x overrides local.x var.x data.x.y module.x.out credential context module.network ... module.network.vpc_id writes & locks AWS API calls (assume-role) refresh

Read this graph as four kinds of arrow

Arrow colorMeansReal example
orangevariable / value flowmain.tf reads var.vpc_cidr declared in variables.tf and supplied by uat.tfvars
purplemodule compositionmain.tf calls module "network" { source = "../../modules/network" ... }
greenoutput / cross-module readmodule.network.vpc_id is consumed by module.security in the same root
redruntime context (provider, backend, AWS)providers.tf assumes the deploy role; AWS API gets called on apply
Editing rule of thumb derived from this graph.
· Edit *.tfvars → affects only this env. Safe.
· Edit variables.tf → affects this env + every consumer. Mind defaults & validations.
· Edit a module → affects every env that uses it. Treat as breaking change.
· Edit backend.tf → you're moving state. Do not do this casually.

04 End-to-end timeline (swimlanes)

Question this answers: "What happens, in order, from ticket-assigned to apply-finished — and which actor does each step?"

Same actors as section 1, now arranged as horizontal lanes. Time flows left to right. Read left-to-right; cross from one lane into another every time the change moves to a new owner.

ENGINEER GIT (origin) CI CD STATE (S3+DDB) AWS "INFRA-2210 · bump UAT app tier" · from ticket to running EC2 instance T+0 T+15min T+30min T+50min 1. Read ticket confirm scope 2. plan baseline no changes? OK 3. branch + edit smallest change 4. fmt + plan read every line 5. push + PR opens for review 6. PR opens webhook fires CI 10. CODEOWNERS approve required reviewers 11. squash merge main commit 7. fmt + validate tflint, tfsec 8. CI plan reads state in S3 9. post comment to PR 12. CD detects merge merge to main 13. terraform apply env-by-env read state (CI) no lock taken 14. lock + write CD's apply only 15. modify LT + ASG instance refresh post-verify eng runs plan: clean push webhook plan comment read state for plan main updated lock + write apply Customers see no impact Each lane is one actor. Every cross-lane arrow is an IAM-checked transition (push, assume-role, OIDC, S3 PutObject). If a step takes longer than expected, look at the cross-lane arrow right before it — that's where most stalls happen.

The "what's typical?" durations

Step spanTypical durationWhat slows it down
Eng read ticket → baseline plan (1-2)5 minPlan shows drift you didn't expect
Eng edit → local plan (3-4)5 minYou're touching modules instead of tfvars
Push → CI plan posted (5-9)3-8 minSlow runner; large state file
PR open → approvals (6-10)minutes to daysReviewer availability; prod requires 2
Merge → CD apply (11-14)1-3 min trigger + N min applyRDS / EBS / NAT changes are the long pole
Apply → verify clean (14-15)1-10 minASG instance refresh, eventual consistency

05 Inside one terraform plan run

Question this answers: "When I press Enter on terraform plan -var-file=uat.tfvars, what does Terraform actually do in what order?"

This is the data flow inside a single command. Knowing this order is what lets you debug "why did Terraform do X?" without guessing.

Anatomy of one plan run · data flow inside the binary 0. Backend read backend.tf parsed first S3 client + DDB lock client init no variables yet 1. Parse every *.tf main.tf, variables.tf, data.tf, locals.tf, outputs.tf, providers.tf order doesn't matter; HCL is declarative 2. Resolve variables defaults → .tfvars → -var-file → -var → TF_VAR_* (later wins) validation blocks fire here 3. Provider init & auth credential chain: env → profile → SSO assume_role + STS round-trip most "permission denied" comes from here 4. Resolve data sources data.aws_caller_identity, AMIs, availability zones, route53 live AWS reads → freshest values each run 5. Build the resource graph expand modules, evaluate locals resolve every reference (var, local, data, module) cycles error here 6. Lock & read state DynamoDB conditional put (lock) S3 GetObject (state JSON) on plan: lock then unlock fast 7. Refresh (default) describe each resource in AWS update in-memory state skip with -refresh=false 8. Diff — the heart of Terraform compare: desired (.tf + vars) vs recorded (state) classify each resource as: + create / ~ update / -/+ replace / - destroy / no-op honours lifecycle{} rules: prevent_destroy, ignore_changes, create_before_destroy 9. Render plan output human-readable diff to stdout machine-readable: -out=tfplan terraform show -json tfplan for tools 10. Release lock & exit DynamoDB delete (lock) exit code: 0 / 1 / 2 (changes) CI uses -detailed-exitcode for drift INPUTS (you control) .tf files + tfvars + env vars terraform.tfstate (S3) AWS API (live, via assume-role) OUTPUTS plan (text + JSON) no state changes (plan) exit code 0/1/2 provider ready → can call AWS feeds 2 feeds 1, 5 feeds 6, 8 feeds 4, 7 Where each kind of error originates Stage 0 — "Backend init failed": wrong bucket/key, no S3 access. Fix backend.tf or your assume-role. Stage 2 — "validation failed for variable": your tfvars value isn't in the allowed list (e.g. unknown env name). Stage 3 — "AccessDenied / NoCredentialProviders": SSO expired, wrong AWS_PROFILE, role can't be assumed. Stage 6 — "Error acquiring the state lock": someone else is running plan/apply. Stage 8 — surprise destroy = drift, see Part 1 sec 9.

06 "Where does X live?" — the lookup map

Question this answers: "Something is broken. Where do I look first?"

This single table connects everything in the previous five diagrams. When you don't know where to start debugging, find the symptom column and walk left to the actor and the file.

What you want to change / inspect Lives in Owned by which actor How to inspect it Symptom when wrong
VPC CIDR for an env envs/<env>/<env>.tfvars Engineer (in Git) grep vpc_cidr envs/uat/uat.tfvars Plan recreates VPC; CIDRs overlap with another env
EC2 instance size <env>.tfvars (override) or modules/compute/variables.tf (default) Engineer terraform plan diff or aws ec2 describe-launch-templates Wrong tier in plan, oversized cost
What resources Terraform thinks exist terraform.tfstate (S3) State (S3+DDB) terraform state list, terraform state show <addr> Plan wants to destroy something you didn't expect
What's actually running in AWS AWS API / console AWS aws ec2 describe-* / console Drift — reality differs from state
Provider version .terraform.lock.hcl Engineer (committed) cat .terraform.lock.hcl "plan looks different in CI vs local" = lock file not committed
Where state is stored envs/<env>/backend.tf Engineer (immutable in normal life) cat backend.tf Plan shows hundreds of resources to create — you're pointed at wrong env's state
Which AWS account providers.tf + deploy_role_arn in tfvars Engineer + IAM aws sts get-caller-identity "AccessDenied"; resources show up in the wrong account
Branch protection / required reviewers CODEOWNERS + repo settings Git (origin) GitHub UI → Settings → Branches PR can be merged without enough approvals
What CI ran on the PR .github/workflows/*.yml Engineer (committed) + CI GitHub Actions tab on the PR tflint/tfsec/plan didn't run; required check missing
What CD applied (and when) CD pipeline run logs CD CD UI / gh run view "Did the change actually apply?" Look at the most recent successful run
The actual instance type running now EC2 instance attributes (AWS) AWS aws ec2 describe-instances --filters Name=tag:Environment,Values=uat State says xlarge but instances are still large → ASG didn't refresh
An IAM role's trust policy modules/iam/main.tfaws_iam_role.assume_role_policy Engineer aws iam get-role --role-name lf-uat-ec2-app Service can't assume the role
A secret (DB password etc.) AWS Secrets Manager — not tfvars AWS + Secrets Manager aws secretsmanager get-secret-value --secret-id ... Found in tfvars or git history → rotate immediately
Why this change exists at all Ticket / PR description Engineer + ticket system Git log + ticket link Resource exists but no one knows why

Three questions to ask before any debug session

  1. Is the symptom in the code, the state, or the cloud? The answer tells you which actor owns the bug.
  2. What did the last terraform plan say? Plan is the bridge between code and state — it surfaces 90% of disagreements.
  3. Did it ever work? If yes, git log on main since the last good apply tells you what changed.
The 360° mental model in one sentence: a ticket becomes code (engineer), reviewed in Git, validated by CI, applied by CD, recorded in state, executed in AWS, and felt by customers — with IAM gating every transition. If you can name the actor at every step, you can debug anything.