15 — Production Patterns

Health Probes

spec:
  containers:
    - name: api
      livenessProbe:           # Is app alive? → Restart if failing
        httpGet:
          path: /health
          port: 3000
        initialDelaySeconds: 15
        periodSeconds: 20
        failureThreshold: 3

      readinessProbe:          # Ready for traffic? → Remove from service if failing
        httpGet:
          path: /ready
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 10

      startupProbe:            # Has app started? → Don't check liveness until started
        httpGet:
          path: /health
          port: 3000
        failureThreshold: 30
        periodSeconds: 10      # 300s max startup time

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

kubectl get hpa
# NAME      REFERENCE       TARGETS       MINPODS   MAXPODS   REPLICAS
# api-hpa   Deployment/api  45%/70%       2         10        3

> Requires metrics-server installed in the cluster.

Deployment Strategies

Rolling Update (Default)

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1    # Max pods down during update
    maxSurge: 1          # Max extra pods during update

Zero-downtime. Old pods replaced gradually.

Blue-Green

Run two full environments. Switch traffic at once.

# Deploy green (new version)
kubectl apply -f deployment-green.yaml

# Test green
# Switch service selector to green
kubectl patch svc api -p '&#123;"spec":&#123;"selector":&#123;"version":"green"&#125;&#125;&#125;'

# Rollback: switch back to blue
kubectl patch svc api -p '&#123;"spec":&#123;"selector":&#123;"version":"blue"&#125;&#125;&#125;'

Canary

Route small percentage of traffic to new version.

# Stable: 9 replicas with v1
# Canary: 1 replica with v2
# → 10% traffic goes to v2

# Or use Ingress annotations:
metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"

Strategy	Zero Downtime	Rollback Speed	Resource Cost
Rolling Update	✅	Slow (rollout undo)	Low
Blue-Green	✅	Instant (switch selector)	2x resources
Canary	✅	Instant (remove canary)	Low-Medium

Resource Management

resources:
  requests:              # Guaranteed minimum (scheduling)
    memory: "256Mi"
    cpu: "250m"          # 0.25 CPU cores
  limits:                # Maximum allowed
    memory: "512Mi"
    cpu: "1"             # 1 CPU core

Best practice: Always set requests. Set memory limits. CPU limits are debatable (throttling vs burst).

Resource Quotas (per namespace)

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"

Pod Disruption Budget

Prevent too many pods from going down during maintenance.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2          # At least 2 pods must be running
  # or: maxUnavailable: 1
  selector:
    matchLabels:
      app: api

GitOps with ArgoCD / Flux

Git Repository (source of truth)
    │
    ↓ watches for changes
ArgoCD / Flux
    │
    ↓ applies manifests
Kubernetes Cluster

Workflow:
1. Developer pushes code → CI builds image → pushes to registry
2. CI updates image tag in Git manifests
3. ArgoCD detects change → deploys to cluster
4. ArgoCD ensures cluster matches Git (self-healing)

Production Checklist

□ Health probes (liveness + readiness + startup)
□ Resource requests and limits
□ HPA for auto-scaling
□ PodDisruptionBudget
□ NetworkPolicies (restrict traffic)
□ Secrets encrypted (SealedSecrets / Vault)
□ Non-root security context
□ Rolling update strategy
□ Monitoring + alerting (Prometheus + Grafana)
□ Centralized logging (EFK / Loki)
□ Ingress with TLS
□ GitOps deployment (ArgoCD / Flux)
□ RBAC configured
□ Image scanning in CI/CD

Key Takeaways

Always configure liveness, readiness, and startup probes
Use HPA to auto-scale based on CPU/memory
Rolling updates for most deployments; blue-green for instant rollback; canary for gradual rollout
Set resource requests (scheduling) and limits (protection)
Use PodDisruptionBudgets to maintain availability during maintenance
GitOps (ArgoCD/Flux) = Git as the single source of truth for cluster state

14 — Helm