IaC Part 1: From Manual Chaos to Infrastructure as Code: Designing a School Absence System
PUBLISHED ON FEB 15, 2026
/ 11 MIN READ
From Manual Chaos to Infrastructure as Code: Designing a School Absence System
Part 1 of 4: Building Production-Grade Infrastructure
The 7:45 AM Problem
It’s 7:45 AM on a Tuesday morning. The school secretary’s phone rings. It’s Mrs. Johnson calling in sick—again. The secretary scribbles a note on a sticky pad, crosses off Mrs. Johnson’s name on the coverage spreadsheet, and starts making calls to find a substitute for periods 3, 5, and 7.
By 8:00 AM, three more teachers have called. Two sent emails. One left a voicemail. The principal needs to approve Mr. Smith’s vacation request from next week, but that conversation happened in the hallway yesterday and nobody documented it. The front desk has no idea which periods need coverage because half the notifications came via different channels.
Welcome to absence tracking at a typical small school.
This article series documents how we transformed this chaos into an automated, enterprise-grade system using Infrastructure as Code.
The Business Problem: Death by Sticky Notes
Current State Pain Points
Our school, like many small institutions, relied on a patchwork of manual processes:
For Teachers:
Call or email the front desk to report absences
No confirmation that the request was received
No visibility into approval status
Repeat the same information to multiple people
For the Principal:
Approval requests come via hallway conversations
No audit trail of who approved what
Can’t distinguish retroactive sick leave from planned vacations
No way to track patterns or generate reports
For the Front Desk:
Manually track absences in a spreadsheet
Phone calls interrupt other work
No standardized information collection
Coverage requirements unclear (which periods? full day?)
Reporting is impossible (“How many sick days did Mrs. Johnson take this semester?”)
The Real Cost:
~10 hours per month of secretarial time on absence tracking
Missed coverage leading to classroom disruptions
Compliance risk (no audit trail for HR)
Principal approval bottlenecks
At $25/hour for administrative time, we’re spending $250/month on a process that could be automated.
Stakeholder Requirements
Through interviews with teachers, administrators, and front desk staff, we identified clear success criteria:
Teachers Need:
✅ Submit absence request in under 2 minutes
✅ Automatic confirmation email
✅ No phone calls during planning period
Principal Needs:
✅ Approve/deny planned absences with one click
✅ Automatic approval for retroactive sick leave (already happened)
✅ Audit trail for HR compliance
Front Desk Needs:
✅ Know exactly which periods need coverage
✅ Centralized tracking in Google Sheets (existing tool)
✅ Email notifications for coverage requirements
IT (Me) Needs:
✅ Secure, maintainable, cost-effective
✅ No secret credentials to manage
✅ Disaster recovery in under 30 minutes
✅ 99.9% uptime (no more than 43 minutes downtime per month)
Why Infrastructure as Code?
Before diving into architecture, let’s address the fundamental question: Why build this with Infrastructure as Code instead of just clicking around the AWS console?
The Traditional Approach (What We Avoided)
The “ClickOps” Method:
Log into AWS console
Click through wizards to create VPC, subnets, security groups
Launch EC2 instance, configure manually
Set up load balancer by clicking through forms
Screenshot settings for “documentation”
Pray nothing breaks
Problems with This Approach:
❌ “Works on my machine” syndrome: No guarantee staging matches production
❌ Knowledge loss: When I leave, how do the next IT person recreate this?
❌ No version control: Can’t track who changed what, when, or why
❌ No rollback: If a change breaks things, good luck remembering what you clicked
❌ Documentation drift: Screenshots go stale immediately
❌ No peer review: Changes go live without oversight
❌ Environment inconsistency: Staging is “kinda like” production
The IaC Approach (What We Built)
Infrastructure Defined in Code:
# This creates a VPC - same every time, reviewable, version controlled
resource"aws_vpc" "main" {
cidr_block ="10.0.0.0/16" enable_dns_hostnames =true tags = {
Name ="absence-system-vpc" }
}
Benefits:
✅ Reproducible:terraform apply creates identical infrastructure every time
✅ Version controlled: Every change in Git with author, timestamp, reason
✅ Peer reviewed: Pull requests catch issues before production
✅ Self-documenting: Code IS the documentation (can’t drift)
✅ Disaster recovery: Lost infrastructure? terraform apply rebuilds in minutes
✅ Team collaboration: Multiple engineers can work without conflicts (state locking)
✅ Environment parity: Dev, staging, prod use same code with different variables
Real-World Example: The 3 PM Incident
Two weeks after launch, our EC2 instance crashed at 3 PM—right when teachers submit afternoon absences.
Without IaC:
Panic
Try to remember: What instance type was it? Which AMI? What IAM role?
Manually recreate from screenshots (45+ minutes)
Cross fingers that it matches original config
Hope nothing was forgotten
With IaC:
$ terraform apply
# 12 minutes later: identical infrastructure rebuilt# Same configuration, same security, same monitoring# Guaranteed.
Our actual downtime: 12 minutes. Manual approach: Would have been 60+ minutes.
This incident alone justified the IaC investment.
Why This Matters for Schools
Small IT teams (often just 1-2 people) face unique challenges:
Limited Resources:
Can’t afford dedicated DevOps engineer
IaC multiplies effectiveness of small teams
Automation reduces toil
Budget Constraints:
Manual labor is expensive ($250/month in our case)
IaC enables self-service (teachers submit forms, no admin time)
Infrastructure costs are predictable and optimizable
Compliance Requirements:
Audit logs required for HR
No manual changes = no unauthorized modifications
Version control provides complete change history
Staff Turnover:
Knowledge captured in code, not in people’s heads
New IT staff can understand system by reading Terraform
Onboarding time reduced from weeks to days
Cost-Benefit Analysis:
Time investment in IaC: ~40 hours initial setup
Time saved per infrastructure change: ~2 hours (manual) → 10 minutes (code)
Break-even point: After ~20 changes (about 6 months)
ROI after 1 year: Massive (changes become trivial)
Architecture Design Decisions
Now that we understand why Infrastructure as Code, let’s examine how we designed the system.
Every architecture decision involves tradeoffs. Here’s our decision-making process:
Decision 1: Cloud Provider - AWS
Chosen: Amazon Web Services (AWS)
Why:
School already uses Google Workspace → needed integration
Azure: Excellent for Microsoft-heavy environments, but we’re Google Workspace
Google Cloud: Considered seriously (native Google integration), but AWS had better Terraform module ecosystem
On-premises: Rejected immediately (hardware costs, maintenance, no HA)
Tradeoff:
AWS complexity vs. mature ecosystem
We chose maturity and community support
Accepted: Steeper learning curve for some AWS-specific services
Decision 2: Networking - Private Subnet Isolation
Chosen: VPC with public/private subnet architecture
Design:
Internet
↓
Internet Gateway
↓
Public Subnets (ALB, NAT Gateway)
↓
Private Subnet (EC2 - no public IP!)
↓
NAT Gateway (for outbound only)
↓
Internet (Gmail API, package updates)
Why:
Defense in depth: EC2 instance has zero exposure to internet
Compliance: Meets security best practices for sensitive data
Attack surface reduction: Can’t SSH from internet even if port opened accidentally
Audit trail: All access via AWS Systems Manager Session Manager (logged)
Alternatives Considered:
Public subnet with SSH: Simple, but violates security principles
VPN access only: Considered, but overhead not justified for small team
Single subnet (public only): Rejected immediately (unacceptable risk)
Tradeoff:
Added complexity (NAT Gateway, routing tables) vs. security
Added cost (~$32/month for NAT Gateway) vs. peace of mind
We chose security. Worth every penny.
Interview talking point:“I implemented private subnet isolation because treating EC2 instances as internet-accessible by default is an anti-pattern. Defense in depth means assuming breach at every layer.”
Decision 3: HTTPS Everywhere with Custom Domain
Chosen: Application Load Balancer with SSL/TLS termination + custom domain
Design:
User visits: https://absences.smaschool.org
↓
ALB terminates SSL (port 443)
↓
Forwards plain HTTP to EC2 (port 5678)
↓
n8n responds
↓
ALB encrypts response, sends to user
Why:
Trust: Teachers trust absences.smaschool.org more than random AWS URLs
Security: HTTPS required for OAuth (Gmail login)
Professionalism: Green padlock = legitimate school system
Simplicity: ALB handles SSL certificates, EC2 doesn’t need to
Alternatives Considered:
Direct EC2 with SSL: Rejected (certificate management burden, no HA)
CloudFront only: Considered, but ALB provides better health checks for our use case
HTTP only: Rejected immediately (unacceptable for authentication)
Tradeoff:
Cost (~$16/month for ALB) vs. professional appearance + security
Certificate management complexity vs. AWS Certificate Manager (ACM) automation
We chose professionalism. The custom domain alone increased teacher adoption.
Pure code (Python/Node.js): Rejected (principal can’t modify workflow logic)
Tradeoff:
Running another service (Docker container) vs. development speed
n8n learning curve vs. Lambda familiarity
We chose speed. Workflow changes take minutes, not days.
Real-world benefit: When principal requested adding “coverage period” tracking, I updated the n8n workflow in 15 minutes. With Lambda, this would have been a sprint story.
Decision 5: CI/CD - GitHub Actions with OIDC
Chosen: GitHub Actions with OpenID Connect (OIDC) authentication
Jenkins self-hosted: Rejected (more infrastructure to manage)
Tradeoff:
Setup complexity (OIDC, IAM trust policies) vs. long-term security
We chose security. OIDC is industry best practice.
Interview talking point:“I use OIDC instead of long-lived credentials because credentials that can’t leak are credentials that won’t leak. The upfront complexity pays dividends in security posture.”
Decision 6: Monitoring - CloudWatch with SLO Tracking
Chosen: CloudWatch alarms with Service Level Objective (SLO) tracking