14 — CloudWatch & Monitoring

CloudWatch Overview

Feature	Purpose
Metrics	Numerical data points (CPU, memory, request count)
Logs	Application and service log aggregation
Alarms	Alert when metrics cross thresholds
Dashboards	Visualize metrics in real-time
X-Ray	Distributed tracing (request flow across services)
CloudTrail	Audit log of all AWS API calls

Metrics

# Built-in metrics (free, 5-min intervals)
# EC2: CPUUtilization, NetworkIn/Out, DiskRead/Write
# RDS: DatabaseConnections, FreeStorageSpace, ReadLatency
# ALB: RequestCount, TargetResponseTime, HTTPCode_Target_5XX
# Lambda: Invocations, Duration, Errors, Throttles
# SQS: NumberOfMessagesVisible, ApproximateAgeOfOldestMessage

# Custom metrics
aws cloudwatch put-metric-data \
  --namespace MyApp \
  --metric-name ActiveUsers \
  --value 42 \
  --unit Count

import &#123; CloudWatchClient, PutMetricDataCommand &#125; from '@aws-sdk/client-cloudwatch';

const cw = new CloudWatchClient(&#123;&#125;);

await cw.send(new PutMetricDataCommand(&#123;
  Namespace: 'MyApp',
  MetricData: [&#123;
    MetricName: 'OrderProcessingTime',
    Value: 150,
    Unit: 'Milliseconds',
    Dimensions: [&#123; Name: 'Service', Value: 'order-api' &#125;],
  &#125;],
&#125;));

Logs

# Log groups & streams
/ecs/api           → Log group
  ecs/api/task-123 → Log stream

# Query logs (CloudWatch Logs Insights)
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50

# Filter patterns
&#123; $.level = "error" &#125;
&#123; $.statusCode &gt;= 500 &#125;
&#123; $.duration &gt; 5000 &#125;

Structured Logging (Best Practice)

// ✅ Structured JSON logs
console.log(JSON.stringify(&#123;
  level: 'info',
  message: 'Order processed',
  orderId: '123',
  duration: 150,
  userId: 'user-456',
&#125;));

// ❌ Unstructured logs (hard to query)
console.log('Order 123 processed in 150ms for user 456');

Alarms

# Alert when API errors exceed threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "High-5XX-Errors" \
  --metric-name HTTPCode_Target_5XX_Count \
  --namespace AWS/ApplicationELB \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123:alerts-topic

# Common alarms:
# CPU &gt; 80% for 5 minutes
# 5XX errors &gt; 10 in 5 minutes
# Lambda errors &gt; 5% of invocations
# SQS queue depth &gt; 1000
# RDS free storage &lt; 5 GB

X-Ray (Distributed Tracing)

Client → API Gateway → Lambda → DynamoDB
              │                     │
              └── X-Ray traces the entire request path

Shows:
  - Total latency breakdown per service
  - Error rates per service
  - Service map (visual dependency graph)

// Enable in Lambda (just set env var)
// AWS_XRAY_TRACING_ENABLED=true

// For custom subsegments
import AWSXRay from 'aws-xray-sdk';

const subsegment = AWSXRay.getSegment()!.addNewSubsegment('processOrder');
await processOrder(orderId);
subsegment.close();

CloudTrail (Audit Logging)

Every AWS API call is logged:
  Who: IAM user/role
  What: API action (ec2:TerminateInstances)
  When: Timestamp
  Where: Source IP, region
  Result: Success/failure

Use cases:
  - Security auditing
  - Compliance
  - Troubleshooting ("who deleted the bucket?")

Cost Monitoring

# Enable Cost Explorer in Billing console
# Set up billing alerts

aws cloudwatch put-metric-alarm \
  --alarm-name "Monthly-Budget-Alert" \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing \
  --statistic Maximum \
  --period 21600 \
  --threshold 100 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123:billing-alerts

Key Takeaways

CloudWatch Metrics for monitoring; Alarms for alerting; Logs for debugging
Use structured JSON logging — enables powerful CloudWatch Logs Insights queries
Set up alarms for 5XX errors, CPU, queue depth, and billing
X-Ray for tracing requests across microservices (find bottlenecks)
CloudTrail for security auditing — always keep enabled
Monitor costs with billing alarms from day one

13 — CI/CD on AWS 15 — Cognito & Auth