Back to all briefings
Cloud Refactoring Long-form briefing + advisory excerpt Production scaling

Refactoring Bloated EKS Estates: A 90-Day Cost Anatomy

A field-tested teardown of how a mid-size SaaS team trimmed 38% of their EKS spend without freezing roadmap delivery.

Cover image for Refactoring Bloated EKS Estates: A 90-Day Cost Anatomy

This briefing walks through the specific architectural inventory we performed on a Series B SaaS team running 14 EKS clusters across two regions. We cover request/limit calibration, the rightsizing of node groups, and why three of their managed services were quietly doubling effective compute. The narrative is paired with the exact dashboards we used during the engagement and the rollback safety net we set up before the first migration window. The advisory retainer behind this article also produced a runbook we now reuse with platform teams entering Q3 cost reviews.

Inclusions

What this briefing actually contains

  • Detailed cluster-by-cluster spend audit template
  • Karpenter vs cluster-autoscaler decision worksheet
  • CPU/memory request calibration playbook used in the engagement
  • Reserved instance vs Savings Plan trade tree for the workload mix
  • Postmortem of the two cost spikes that returned in week 6
  • Suggested observability hooks before any rightsizing pass
Outcomes

What you can take into your team

  1. A reproducible baseline of where compute money actually goes per service

  2. Confidence that latency budgets survived the rightsizing pass

  3. A 90-day calendar of follow-up checks the platform team can self-serve

Engagement

₩4,200,000

The fee covers full access to this briefing, the attached retainer notes, and one follow-up question to the responsible editor. Pricing is informational. Engagements are confirmed in writing during the kickoff conversation.

Format: Long-form briefing + advisory excerpt Read time: 20-30 min For teams of: Mid (20-100)
Open scope conversation Browse other briefings See refund & cancellation terms
FAQ

What we are most often asked about this briefing

About 70% of the calibration logic carries over. The rightsizing chapters are vendor-neutral, but the Reserved Instance and Savings Plan section is AWS-specific and would need to be reworked for committed-use discounts on GCP or reservations on Azure.

Reader notes

Reviews — including reservations

The Karpenter decision tree was the part I bookmarked. It does not pretend the answer is always Karpenter, which most posts on this topic do.

Anonymous via Client survey

Honest about what they will not solve for you. We used the rollback safety net section verbatim and it caught a regression in week 2.

Daniel R. · Staff SRE · mid-size SaaS, Seoul 4.7/5 · Trustpilot