Docs
/
Docker Kubernetes
Chapter 15
15 — Production Patterns
Health Probes
spec:
containers:
- name: api
livenessProbe: # Is app alive? → Restart if failing
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3
readinessProbe: # Ready for traffic? → Remove from service if failing
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
startupProbe: # Has app started? → Don't check liveness until started
httpGet:
path: /health
port: 3000
failureThreshold: 30
periodSeconds: 10 # 300s max startup time
Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# api-hpa Deployment/api 45%/70% 2 10 3
> Requires metrics-server installed in the cluster.
Deployment Strategies
Rolling Update (Default)
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Max pods down during update
maxSurge: 1 # Max extra pods during update
Zero-downtime. Old pods replaced gradually.
Blue-Green
Run two full environments. Switch traffic at once.
# Deploy green (new version)
kubectl apply -f deployment-green.yaml
# Test green
# Switch service selector to green
kubectl patch svc api -p '{"spec":{"selector":{"version":"green"}}}'
# Rollback: switch back to blue
kubectl patch svc api -p '{"spec":{"selector":{"version":"blue"}}}'
Canary
Route small percentage of traffic to new version.
# Stable: 9 replicas with v1
# Canary: 1 replica with v2
# → 10% traffic goes to v2
# Or use Ingress annotations:
metadata:
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
| Strategy | Zero Downtime | Rollback Speed | Resource Cost |
|---|---|---|---|
| Rolling Update | ✅ | Slow (rollout undo) | Low |
| Blue-Green | ✅ | Instant (switch selector) | 2x resources |
| Canary | ✅ | Instant (remove canary) | Low-Medium |
Resource Management
resources:
requests: # Guaranteed minimum (scheduling)
memory: "256Mi"
cpu: "250m" # 0.25 CPU cores
limits: # Maximum allowed
memory: "512Mi"
cpu: "1" # 1 CPU core
Best practice: Always set requests. Set memory limits. CPU limits are debatable (throttling vs burst).
Resource Quotas (per namespace)
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
Pod Disruption Budget
Prevent too many pods from going down during maintenance.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # At least 2 pods must be running
# or: maxUnavailable: 1
selector:
matchLabels:
app: api
GitOps with ArgoCD / Flux
Git Repository (source of truth)
│
↓ watches for changes
ArgoCD / Flux
│
↓ applies manifests
Kubernetes Cluster
Workflow:
1. Developer pushes code → CI builds image → pushes to registry
2. CI updates image tag in Git manifests
3. ArgoCD detects change → deploys to cluster
4. ArgoCD ensures cluster matches Git (self-healing)
Production Checklist
□ Health probes (liveness + readiness + startup)
□ Resource requests and limits
□ HPA for auto-scaling
□ PodDisruptionBudget
□ NetworkPolicies (restrict traffic)
□ Secrets encrypted (SealedSecrets / Vault)
□ Non-root security context
□ Rolling update strategy
□ Monitoring + alerting (Prometheus + Grafana)
□ Centralized logging (EFK / Loki)
□ Ingress with TLS
□ GitOps deployment (ArgoCD / Flux)
□ RBAC configured
□ Image scanning in CI/CD
Key Takeaways
- Always configure liveness, readiness, and startup probes
- Use HPA to auto-scale based on CPU/memory
- Rolling updates for most deployments; blue-green for instant rollback; canary for gradual rollout
- Set resource requests (scheduling) and limits (protection)
- Use PodDisruptionBudgets to maintain availability during maintenance
- GitOps (ArgoCD/Flux) = Git as the single source of truth for cluster state