Blog
DevOps tutorials, Kubernetes guides, Terraform tips, cost optimization strategies, and cloud career advice from a 383K+ student instructor.
5 Things I Wish I Knew Before Running EKS in Production
Running Amazon EKS in a tutorial and running it in production are two very different experiences. After deploying a 5-microservice retail store application with real AWS services, here are the five lessons that would have saved me time, money, and plenty of late-night debugging sessions.
1. Cluster Autoscaler Doesn’t Consolidate Nodes
Cluster Autoscaler only removes empty nodes. If a node is running a single tiny pod at 10% utilization, it stays — and you keep paying for it.
Building a Complete Observability Stack for EKS with OpenTelemetry and ADOT
Most Kubernetes observability setups are incomplete. Teams install Prometheus, wire up a few dashboards, and call it done. Then a production incident hits and they’re grepping through logs at 3 AM, trying to find a needle in a haystack.
The problem isn’t the tooling — it’s the approach. You need all three observability pillars working together: Traces, Logs, and Metrics. Here’s how I built a complete stack on EKS using AWS Distro for OpenTelemetry (ADOT).
How to Handle Spot Instance Interruptions on EKS with Zero Downtime
“Spot instances are too risky for production.”
That’s the most common objection I hear from DevOps engineers. And it’s wrong. With the right architecture, you can run production workloads on Spot instances with 70% cost savings and zero downtime during interruptions. Here’s exactly how.
The Fear (and Why It’s Overblown)
The concern is legitimate on the surface: AWS can reclaim a Spot instance with just 2 minutes of notice. Without preparation, your pods get terminated, requests fail, and users see errors.
5 Terraform Mistakes That Cost You Money on AWS
If you’ve been running Terraform on AWS for any length of time, chances are your infrastructure has a few hidden cost leaks. I’ve seen these patterns across hundreds of student projects and enterprise environments. Here are the five most common Terraform mistakes that silently drain your AWS budget — and how to fix each one.
1. Not Setting instance_type Defaults Wisely
Many engineers copy-paste t3.large or m5.xlarge from tutorials without right-sizing. In Terraform, you should use variables with sensible defaults: