Docs
/
AWS Cloud
Chapter 14
14 — CloudWatch & Monitoring
CloudWatch Overview
| Feature | Purpose |
|---|---|
| Metrics | Numerical data points (CPU, memory, request count) |
| Logs | Application and service log aggregation |
| Alarms | Alert when metrics cross thresholds |
| Dashboards | Visualize metrics in real-time |
| X-Ray | Distributed tracing (request flow across services) |
| CloudTrail | Audit log of all AWS API calls |
Metrics
# Built-in metrics (free, 5-min intervals)
# EC2: CPUUtilization, NetworkIn/Out, DiskRead/Write
# RDS: DatabaseConnections, FreeStorageSpace, ReadLatency
# ALB: RequestCount, TargetResponseTime, HTTPCode_Target_5XX
# Lambda: Invocations, Duration, Errors, Throttles
# SQS: NumberOfMessagesVisible, ApproximateAgeOfOldestMessage
# Custom metrics
aws cloudwatch put-metric-data \
--namespace MyApp \
--metric-name ActiveUsers \
--value 42 \
--unit Count
import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';
const cw = new CloudWatchClient({});
await cw.send(new PutMetricDataCommand({
Namespace: 'MyApp',
MetricData: [{
MetricName: 'OrderProcessingTime',
Value: 150,
Unit: 'Milliseconds',
Dimensions: [{ Name: 'Service', Value: 'order-api' }],
}],
}));
Logs
# Log groups & streams
/ecs/api → Log group
ecs/api/task-123 → Log stream
# Query logs (CloudWatch Logs Insights)
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
# Filter patterns
{ $.level = "error" }
{ $.statusCode >= 500 }
{ $.duration > 5000 }
Structured Logging (Best Practice)
// ✅ Structured JSON logs
console.log(JSON.stringify({
level: 'info',
message: 'Order processed',
orderId: '123',
duration: 150,
userId: 'user-456',
}));
// ❌ Unstructured logs (hard to query)
console.log('Order 123 processed in 150ms for user 456');
Alarms
# Alert when API errors exceed threshold
aws cloudwatch put-metric-alarm \
--alarm-name "High-5XX-Errors" \
--metric-name HTTPCode_Target_5XX_Count \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123:alerts-topic
# Common alarms:
# CPU > 80% for 5 minutes
# 5XX errors > 10 in 5 minutes
# Lambda errors > 5% of invocations
# SQS queue depth > 1000
# RDS free storage < 5 GB
X-Ray (Distributed Tracing)
Client → API Gateway → Lambda → DynamoDB
│ │
└── X-Ray traces the entire request path
Shows:
- Total latency breakdown per service
- Error rates per service
- Service map (visual dependency graph)
// Enable in Lambda (just set env var)
// AWS_XRAY_TRACING_ENABLED=true
// For custom subsegments
import AWSXRay from 'aws-xray-sdk';
const subsegment = AWSXRay.getSegment()!.addNewSubsegment('processOrder');
await processOrder(orderId);
subsegment.close();
CloudTrail (Audit Logging)
Every AWS API call is logged:
Who: IAM user/role
What: API action (ec2:TerminateInstances)
When: Timestamp
Where: Source IP, region
Result: Success/failure
Use cases:
- Security auditing
- Compliance
- Troubleshooting ("who deleted the bucket?")
Cost Monitoring
# Enable Cost Explorer in Billing console
# Set up billing alerts
aws cloudwatch put-metric-alarm \
--alarm-name "Monthly-Budget-Alert" \
--metric-name EstimatedCharges \
--namespace AWS/Billing \
--statistic Maximum \
--period 21600 \
--threshold 100 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123:billing-alerts
Key Takeaways
- CloudWatch Metrics for monitoring; Alarms for alerting; Logs for debugging
- Use structured JSON logging — enables powerful CloudWatch Logs Insights queries
- Set up alarms for 5XX errors, CPU, queue depth, and billing
- X-Ray for tracing requests across microservices (find bottlenecks)
- CloudTrail for security auditing — always keep enabled
- Monitor costs with billing alarms from day one