Cluster Operations

Operations Under Load

Practice surge scheduling, node pressure, and calm communication during staged outages.

Duration
4 days
Format
Hybrid classroom
Skill level
Intermediate
Certification focus
CKA scenario depth
Team size
Teams up to 8

From 1,420,000 KRW — informational only, no checkout on this site.

Program cover art for Operations Under Load

Overview

You rotate through incident commander and note-taker roles while instructors inject node cordons and flaky endpoints. The goal is confident language during outages, not heroics.

What is included

  • Rotating incident roles with timed injects
  • Live traffic replay against sample microservices
  • Post-incident template aligned to quality standards reviewers expect
  • Capacity signal worksheet (latency, saturation, errors)
  • Pair debugging on kubelet logs
  • Warm handoff script for daytime crews
  • Quiet-room option for reflection after heavy drills

Outcomes

  • Facilitate a fifteen-minute stabilization huddle
  • Choose between surge upgrade paths with tradeoffs spelled out
  • Capture evidence an external reviewer can follow

Lead instructor for this track

Marcus Webb

SRE practice coach; collects retro formats from aviation and theater crews.

FAQ

Will we destroy shared infrastructure?

Faults are scoped to disposable namespaces. If a drill escapes, we snapshot state and rebuild—participants never owe infra repair hours.

Can we bring our runbooks?

Please do. We annotate them together so the language matches your internal quality standards.

Limitations?

We do not simulate multi-region control plane loss simultaneously; that is reserved for private workshops.

Recent learner notes

  • I liked that the injects felt petty-real—slow image pulls, not cartoonish fire drills.