Docs
/
AWS Cloud
Chapter 14

14 — CloudWatch & Monitoring

CloudWatch Overview

FeaturePurpose
MetricsNumerical data points (CPU, memory, request count)
LogsApplication and service log aggregation
AlarmsAlert when metrics cross thresholds
DashboardsVisualize metrics in real-time
X-RayDistributed tracing (request flow across services)
CloudTrailAudit log of all AWS API calls

Metrics

# Built-in metrics (free, 5-min intervals)
# EC2: CPUUtilization, NetworkIn/Out, DiskRead/Write
# RDS: DatabaseConnections, FreeStorageSpace, ReadLatency
# ALB: RequestCount, TargetResponseTime, HTTPCode_Target_5XX
# Lambda: Invocations, Duration, Errors, Throttles
# SQS: NumberOfMessagesVisible, ApproximateAgeOfOldestMessage

# Custom metrics
aws cloudwatch put-metric-data \
  --namespace MyApp \
  --metric-name ActiveUsers \
  --value 42 \
  --unit Count
import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';

const cw = new CloudWatchClient({});

await cw.send(new PutMetricDataCommand({
  Namespace: 'MyApp',
  MetricData: [{
    MetricName: 'OrderProcessingTime',
    Value: 150,
    Unit: 'Milliseconds',
    Dimensions: [{ Name: 'Service', Value: 'order-api' }],
  }],
}));

Logs

# Log groups & streams
/ecs/api           → Log group
  ecs/api/task-123 → Log stream

# Query logs (CloudWatch Logs Insights)
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50

# Filter patterns
{ $.level = "error" }
{ $.statusCode >= 500 }
{ $.duration > 5000 }

Structured Logging (Best Practice)

// ✅ Structured JSON logs
console.log(JSON.stringify({
  level: 'info',
  message: 'Order processed',
  orderId: '123',
  duration: 150,
  userId: 'user-456',
}));

// ❌ Unstructured logs (hard to query)
console.log('Order 123 processed in 150ms for user 456');

Alarms

# Alert when API errors exceed threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "High-5XX-Errors" \
  --metric-name HTTPCode_Target_5XX_Count \
  --namespace AWS/ApplicationELB \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123:alerts-topic

# Common alarms:
# CPU > 80% for 5 minutes
# 5XX errors > 10 in 5 minutes
# Lambda errors > 5% of invocations
# SQS queue depth > 1000
# RDS free storage < 5 GB

X-Ray (Distributed Tracing)

Client → API Gateway → Lambda → DynamoDB
              │                     │
              └── X-Ray traces the entire request path

Shows:
  - Total latency breakdown per service
  - Error rates per service
  - Service map (visual dependency graph)
// Enable in Lambda (just set env var)
// AWS_XRAY_TRACING_ENABLED=true

// For custom subsegments
import AWSXRay from 'aws-xray-sdk';

const subsegment = AWSXRay.getSegment()!.addNewSubsegment('processOrder');
await processOrder(orderId);
subsegment.close();

CloudTrail (Audit Logging)

Every AWS API call is logged:
  Who: IAM user/role
  What: API action (ec2:TerminateInstances)
  When: Timestamp
  Where: Source IP, region
  Result: Success/failure

Use cases:
  - Security auditing
  - Compliance
  - Troubleshooting ("who deleted the bucket?")

Cost Monitoring

# Enable Cost Explorer in Billing console
# Set up billing alerts

aws cloudwatch put-metric-alarm \
  --alarm-name "Monthly-Budget-Alert" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 21600 \
  --threshold 100 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123:billing-alerts

Key Takeaways

  • CloudWatch Metrics for monitoring; Alarms for alerting; Logs for debugging
  • Use structured JSON logging — enables powerful CloudWatch Logs Insights queries
  • Set up alarms for 5XX errors, CPU, queue depth, and billing
  • X-Ray for tracing requests across microservices (find bottlenecks)
  • CloudTrail for security auditing — always keep enabled
  • Monitor costs with billing alarms from day one