Terraform 301 — The 360° view

Parts 1 and 2 taught the components: files, modules, environments, SOPs. This part shows how they connect. Five diagrams, in order: zoom out, then drill in. Read them top to bottom; each one assumes you've understood the one before.

Companion to Part 1 and Part 2.

01 System map — the seven actors

Question this answers: "When I make an infra change, who/what is involved, and what does each actor own?"

There are seven actors in this story. Each owns a different thing. Most "weird bugs" are really one of them disagreeing with another.

Most "drift" = actor 6 (AWS) and actor 5 (state) disagree.

Most "merge conflicts" = two engineers (1) writing into actor 2 (Git) at once.

Most "permission denied" = the IAM boundary between any two adjacent actors.

02 Trace one value through 9 places

Question this answers: "I changed m6i.large to m6i.xlarge in one file. Where does that value live after the change?"

This is the diagram that makes everything click. The same value appears in nine places between your laptop and the running EC2 instance. Each place is owned by a different actor. Bugs hide in the gaps.

The same value flowing forward Engineer-owned Git-owned (history) Automation-owned (CI/state) AWS-owned (reality)

Drill for new engineers. When something breaks, ask: "which row was last correct?" Walk down from row 1 until you find the first row that has the wrong value. The gap between that row and the previous one is your bug.

03 File cross-references — what feeds what

Question this answers: "When I edit one file, what other files am I implicitly affecting?"

Terraform files don't exist in isolation. They reference each other through specific identifiers. This graph shows the references; arrows point from consumer to producer — "I read from you."

Read this graph as four kinds of arrow

Arrow color	Means	Real example
orange	variable / value flow	`main.tf` reads `var.vpc_cidr` declared in `variables.tf` and supplied by `uat.tfvars`
purple	module composition	`main.tf` calls `module "network" { source = "../../modules/network" ... }`
green	output / cross-module read	`module.network.vpc_id` is consumed by `module.security` in the same root
red	runtime context (provider, backend, AWS)	`providers.tf` assumes the deploy role; AWS API gets called on apply

Editing rule of thumb derived from this graph.
· Edit *.tfvars → affects only this env. Safe.
· Edit variables.tf → affects this env + every consumer. Mind defaults & validations.
· Edit a module → affects every env that uses it. Treat as breaking change.
· Edit backend.tf → you're moving state. Do not do this casually.

04 End-to-end timeline (swimlanes)

Question this answers: "What happens, in order, from ticket-assigned to apply-finished — and which actor does each step?"

Same actors as section 1, now arranged as horizontal lanes. Time flows left to right. Read left-to-right; cross from one lane into another every time the change moves to a new owner.

cross-lane arrow right before it — that's where most stalls happen.

The "what's typical?" durations

Step span	Typical duration	What slows it down
Eng read ticket → baseline plan (1-2)	5 min	Plan shows drift you didn't expect
Eng edit → local plan (3-4)	5 min	You're touching modules instead of tfvars
Push → CI plan posted (5-9)	3-8 min	Slow runner; large state file
PR open → approvals (6-10)	minutes to days	Reviewer availability; prod requires 2
Merge → CD apply (11-14)	1-3 min trigger + N min apply	RDS / EBS / NAT changes are the long pole
Apply → verify clean (14-15)	1-10 min	ASG instance refresh, eventual consistency

05 Inside one `terraform plan` run

Question this answers: "When I press Enter on terraform plan -var-file=uat.tfvars, what does Terraform actually do in what order?"

This is the data flow inside a single command. Knowing this order is what lets you debug "why did Terraform do X?" without guessing.

06 "Where does X live?" — the lookup map

Question this answers: "Something is broken. Where do I look first?"

This single table connects everything in the previous five diagrams. When you don't know where to start debugging, find the symptom column and walk left to the actor and the file.

What you want to change / inspect	Lives in	Owned by which actor	How to inspect it	Symptom when wrong
VPC CIDR for an env	`envs/<env>/<env>.tfvars`	Engineer (in Git)	`grep vpc_cidr envs/uat/uat.tfvars`	Plan recreates VPC; CIDRs overlap with another env
EC2 instance size	`<env>.tfvars` (override) or `modules/compute/variables.tf` (default)	Engineer	`terraform plan` diff or `aws ec2 describe-launch-templates`	Wrong tier in plan, oversized cost
What resources Terraform thinks exist	`terraform.tfstate` (S3)	State (S3+DDB)	`terraform state list`, `terraform state show <addr>`	Plan wants to destroy something you didn't expect
What's actually running in AWS	AWS API / console	AWS	`aws ec2 describe-*` / console	Drift — reality differs from state
Provider version	`.terraform.lock.hcl`	Engineer (committed)	`cat .terraform.lock.hcl`	"plan looks different in CI vs local" = lock file not committed
Where state is stored	`envs/<env>/backend.tf`	Engineer (immutable in normal life)	`cat backend.tf`	Plan shows hundreds of resources to create — you're pointed at wrong env's state
Which AWS account	`providers.tf` + `deploy_role_arn` in tfvars	Engineer + IAM	`aws sts get-caller-identity`	"AccessDenied"; resources show up in the wrong account
Branch protection / required reviewers	`CODEOWNERS` + repo settings	Git (origin)	GitHub UI → Settings → Branches	PR can be merged without enough approvals
What CI ran on the PR	`.github/workflows/*.yml`	Engineer (committed) + CI	GitHub Actions tab on the PR	tflint/tfsec/plan didn't run; required check missing
What CD applied (and when)	CD pipeline run logs	CD	CD UI / `gh run view`	"Did the change actually apply?" Look at the most recent successful run
The actual instance type running now	EC2 instance attributes (AWS)	AWS	`aws ec2 describe-instances --filters Name=tag:Environment,Values=uat`	State says xlarge but instances are still large → ASG didn't refresh
An IAM role's trust policy	`modules/iam/main.tf` → `aws_iam_role.assume_role_policy`	Engineer	`aws iam get-role --role-name lf-uat-ec2-app`	Service can't assume the role
A secret (DB password etc.)	AWS Secrets Manager — not tfvars	AWS + Secrets Manager	`aws secretsmanager get-secret-value --secret-id ...`	Found in tfvars or git history → rotate immediately
Why this change exists at all	Ticket / PR description	Engineer + ticket system	Git log + ticket link	Resource exists but no one knows why

Three questions to ask before any debug session

Is the symptom in the code, the state, or the cloud? The answer tells you which actor owns the bug.
What did the last terraform plan say? Plan is the bridge between code and state — it surfaces 90% of disagreements.
Did it ever work? If yes, git log on main since the last good apply tells you what changed.

The 360° mental model in one sentence: a ticket becomes code (engineer), reviewed in Git, validated by CI, applied by CD, recorded in state, executed in AWS, and felt by customers — with IAM gating every transition. If you can name the actor at every step, you can debug anything.

01 System map — the seven actors

02 Trace one value through 9 places

03 File cross-references — what feeds what

Read this graph as four kinds of arrow

04 End-to-end timeline (swimlanes)

The "what's typical?" durations

05 Inside one terraform plan run

06 "Where does X live?" — the lookup map

Three questions to ask before any debug session

05 Inside one `terraform plan` run