Monitoring and Troubleshooting
Signals, Traces, and Noise Control
Wire metrics and traces without turning dashboards into wallpaper.
From 1,350,000 KRW — informational only, no checkout on this site.
Overview
Monitoring and troubleshooting week teaches you to choose a small signal set, propagate trace context responsibly, and narrate graphs during incidents without drowning teammates in panels.
What is included
- Prometheus scrape hygiene exercises
- Trace sampling tradeoff worksheet
- Log volume budgeting with sidecar pitfalls called out
- Dashboard critique studio
- Live tail pairing on structured logs
- Runbook snippet library for common kube-state signals
- Optional night lab for night-shift engineers
Outcomes
- Propose three golden signals for a sample service
- Demonstrate trace correlation across two services
- Trim redundant alerts from a starter rules file
Lead instructor for this track
Ex-observability vendor educator; now allergic to chart sprawl.
FAQ
Which stacks are installed?
Prometheus, Grafana, and OpenTelemetry collectors—versions pinned per cohort announcement.
Can we bring proprietary agents?
Not into shared clusters; talk to us about a private lab build if you need that fidelity.
What is out of scope?
We do not tune enterprise appliance appliances; focus stays on Kubernetes-native telemetry.
Recent learner notes
-
Finally someone said aloud that half our dashboards were vanity.