Cluster Operations
Operations Under Load
Practice surge scheduling, node pressure, and calm communication during staged outages.
From 1,420,000 KRW — informational only, no checkout on this site.
Overview
You rotate through incident commander and note-taker roles while instructors inject node cordons and flaky endpoints. The goal is confident language during outages, not heroics.
What is included
- Rotating incident roles with timed injects
- Live traffic replay against sample microservices
- Post-incident template aligned to quality standards reviewers expect
- Capacity signal worksheet (latency, saturation, errors)
- Pair debugging on kubelet logs
- Warm handoff script for daytime crews
- Quiet-room option for reflection after heavy drills
Outcomes
- Facilitate a fifteen-minute stabilization huddle
- Choose between surge upgrade paths with tradeoffs spelled out
- Capture evidence an external reviewer can follow
Lead instructor for this track
SRE practice coach; collects retro formats from aviation and theater crews.
FAQ
Will we destroy shared infrastructure?
Faults are scoped to disposable namespaces. If a drill escapes, we snapshot state and rebuild—participants never owe infra repair hours.
Can we bring our runbooks?
Please do. We annotate them together so the language matches your internal quality standards.
Limitations?
We do not simulate multi-region control plane loss simultaneously; that is reserved for private workshops.
Recent learner notes
-
I liked that the injects felt petty-real—slow image pulls, not cartoonish fire drills.