nonobie — Real dev skills, explained like a friend would

What you'll learn

By the end of this chapter you will:

Make your app stateless so any instance can handle any request
Set up the classic scaling : → ALB → Fargate auto-scaling → RDS + Redis
Add read replicas, PgBouncer connection pooling, and Redis caching to scale the database
Choose between ECS Fargate and Lambda based on traffic patterns and cold start tradeoffs
Understand event-driven architecture and when to split the
Follow the pragmatic scaling roadmap from MVP to hyperscale without over-engineering

Two ways to handle more traffic

Vertical scaling — bigger box

Move from a 2 CPU / 4 GB instance to 8 CPU / 32 GB. Buys you maybe 3–5×. Cheap to do, capped by physics, and a single point of failure.

Horizontal scaling — more boxes

Run 10 copies of the same app behind a . Each request goes to whichever instance is least busy. The path that scales to billions of requests.

You will end up doing both. But horizontal is the strategic answer — design for it from day one. Most of this chapter is "what does it take to be horizontally scalable".

The non-negotiable foundation: stateless apps

A stateless app is one where any request can be handled by any instance, and each instance can be killed and replaced without losing data.

Anti-pattern	Why it kills horizontal scaling
In-memory session store (Express `MemoryStore`)	User logged in via instance A, request lands on B → 401
Local disk caching	hits on A, misses on B; restart = wipe
Cron job inside the app	10 instances = job runs 10 times
In-memory rate limiter	Each instance has its own count
Sticky sessions	Locks a user to one instance — that instance dies, user logs out

The fix for all of them: push to shared infrastructure.

Was	Should be
In-memory sessions	(stateless) or Redis-backed sessions
Local file cache	Redis, Memcached, or CDN
In-process cron	A scheduler service (BullMQ, EventBridge, K8s CronJob) running on one node
In-memory rate limit	Redis-backed (`@nestjs/throttler` with Redis storage)

Once your app is stateless, scaling is just: "run more containers".

The classic scaling stack (containers, not serverless)

plaintext

Internet
   │
   ▼
[CDN / WAF — Cloudflare / CloudFront]
   │
   ▼
[Load balancer — ALB / Nginx]
   │
   ├──► API container (instance 1)
   ├──► API container (instance 2)
   └──► API container (instance N)        ← auto-scaling group
         │
         ├──► Postgres primary (writes)
         ├──► Postgres read replica(s)
         ├──► Redis (cache, queue, rate limit)
         ├──► S3 (files)
         └──► SQS / RabbitMQ (background jobs)
 
[Worker container] (instance 1..M) ──► reads from queue

Run the on ECS Fargate or . Auto-scale on CPU > 70% for 5 minutes. Each instance is identical and disposable.

This is the boring, scalable, production architecture for 99% of products. Don't reach for serverless until you have a reason.

Database scaling — the actual bottleneck

Your API can scale horizontally indefinitely. PostgreSQL cannot — there's exactly one primary, all writes go there. So:

1. Read replicas

Promote one or more replicas. Route reads of non-critical, slightly-stale data (reports, search, profiles) to a replica. Writes and read-after-write paths stay on primary.

// Sequelize with read replica
const sequelize = new Sequelize('db', 'user', 'pass', {
  dialect: 'postgres',
  replication: {
    write: { host: 'primary.rds.amazonaws.com' },
    read

Watch out

Replica lag is real — typically 100 ms to a few seconds. Reads immediately after a write may not see the write. For “show user their just-created order” → use primary.

2. Connection pooling

Each PostgreSQL connection costs ~10 MB of RAM on the server. 100 instances × 20 connections = 2,000 connections = 20 GB just for idle connections. PostgreSQL falls over around 500 concurrent connections.

Put PgBouncer (or AWS RDS Proxy) in front:

plaintext

API instances (each opens 5 connections to PgBouncer)
    │ × 100 instances = 500 connections to PgBouncer
    ▼
PgBouncer (multiplexes onto 50 real DB connections)
    ▼
PostgreSQL (only sees 50 connections — happy)

This is the single highest-leverage thing you can do for DB scaling.

3. Caching

Hit Redis before hitting Postgres. A 1 ms Redis lookup beats a 30 ms DB query.

async getProduct(id: string) {
  const cached = await redis.get(`product:${id}`);
  if (cached) return JSON.parse(cached);
 
  const

Patterns: cache-aside (above), read-through, write-through, write-behind. Cache-aside is the simplest and what 95% of apps need.

Tip

The hardest problem in caching is invalidation. Prefer short TTLs over manual invalidation when you can — it’s tolerant to bugs.

4. Partitioning / sharding

When one table has 5 billion rows, even with indexes things slow down. PostgreSQL native partitioning by date or tenant_id helps:

sql

CREATE TABLE transactions (
  id uuid, created_at timestamp, ...
) PARTITION BY RANGE (created_at);
 
CREATE TABLE transactions_2026_05 PARTITION OF transactions
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

Old partitions can be dropped instantly (no expensive DELETE). Queries that filter on created_at only touch the relevant partition.

Serverless — when (and when not) to use Lambda

AWS Lambda runs your code only when an event happens. No servers, no containers, no autoscaling configuration. You pay per millisecond of execution.

What Lambda is great at

Bursty, infrequent workloads — receivers, file-processing jobs, weekend report generators.
Event-driven processing — "S3 object uploaded → resize image", "DynamoDB row changed → update search ".
APIs with unpredictable traffic — a marketing campaign that 10× traffic for a weekend.
Glue code between AWS services — small functions that wire S3 to SQS to SES.

What Lambda is bad at

Problem	Why
Long-running tasks (>15 min)	Hard limit — Lambda kills the function at 15 min
Always-on websockets	Stateless invocation, no persistent connection model
Heavy CPU work	Pricier than an EC2 box doing the same
DB-heavy APIs	Each invocation opens a new DB connection — exhausts connection pool fast
Tight latency (<100 ms p99)	Cold starts can add 500–2000 ms
Predictable steady traffic	A reserved EC2/ECS instance is much cheaper at constant load

Cold starts — the headline gotcha

When a Lambda hasn't run for a while, the first invocation has to:

Provision a micro-VM
Download your zip / image
Start the Node runtime
Run your global init code (DB connections, etc.)

This is the cold start. For a Node Lambda that's 100–500 ms; with a fat dependency tree it can hit 2 seconds.

Mitigations:

Keep the deployment package small (esbuild bundle, not the whole node_modules).
Use Provisioned Concurrency for hot paths (pre-warmed instances; you pay for idle).
Avoid VPC-attached Lambdas if you can — VPC attachment used to add ~10 seconds (now ~1 second).
Use SnapStart (Node 22+) — snapshot of the initialised runtime.

The DB-connection problem

Every Lambda invocation can open a fresh DB connection. Burst traffic = thousands of connections = Postgres falls over.

Solutions:

RDS Proxy sits between Lambda and RDS, multiplexing connections.
DynamoDB instead of RDS for Lambda-heavy workloads — designed for this.
Connection caching across invocations (declare the client at module scope, reuse warm container's connection).

Lambda in NestJS — yes, it works

Watch out

Be careful — the whole point of NestJS is rich , DI, and complex business logic. If your Lambda cold-starts in 2 seconds because it loads 40 modules, you’ve defeated the cost model. For a NestJS-style app, ECS Fargate is usually the better target.

// lambda.ts
import { NestFactory } from '@nestjs/core';
import { ExpressAdapter } from '@nestjs/platform-express';
import serverlessExpress from '@vendia/serverless-express';
import express from 'express';
import { AppModule

You can deploy a NestJS app to Lambda using @vendia/serverless-express:

Use Lambda for:

Specific event-driven workers (S3 event → resize image)
Cron jobs (EventBridge schedule → run the job)
Webhook receivers that defer to a

Use ECS / EKS for the main API.

Beyond Lambda — the broader serverless toolkit

AWS Step Functions

Visual orchestration of multiple Lambdas with retry, timeout, branching. Great for "a multi-step process that takes minutes/hours and must survive failures".

plaintext

StepFunction:
  1. Validate input (Lambda)
  2. Fan out: process each row (Lambda — parallel)
  3. Aggregate results (Lambda)
  4. Send notification (Lambda)
  retries: 3, on-failure: route to DLQ

EventBridge

Pub-sub between AWS services. "When an order is created, fire 5 things" — payment, email, fraud check, audit log, analytics. Each subscriber is independent and can fail without breaking the others.

SQS

Reliable queue. Messages persist until consumed and acknowledged. The default for "I want to do this work later, but can't lose it." Pair with Lambda or ECS workers.

DynamoDB

Key-value store with single-digit-millisecond reads, infinite horizontal scaling, no schema. Great for session stores, leaderboards, IoT-style write-heavy workloads. Bad for ad-hoc queries — you must design access patterns up front.

API Gateway

Sits in front of Lambdas. Handles auth, throttling, custom domains, request transformation. Adds 30–50 ms latency vs ALB → Lambda direct, but is the standard pattern.

CloudFront

CDN. Cache static assets (JS, , images) at edge locations near users. Also Lambda@Edge for tiny per-request transformations.

A pragmatic scaling roadmap

Rather than reaching for Kubernetes on day one, scale up the simplest thing that works:

Stage	Traffic	Stack
MVP	<1 req/sec	One ECS Fargate task or one EC2 box; one RDS; one S3. Deploy via GitHub Actions.
Early growth	1–100 req/sec	2–4 Fargate tasks behind ALB; RDS with PITR; Redis ElastiCache (1 node); CloudFront over S3.
Steady growth	100–1k req/sec	Fargate auto-scaling 4–20 tasks; RDS with read replica; PgBouncer / RDS Proxy; SQS workers for work.
Scale	1k–10k req/sec	Multi-AZ everything; multi-region read replicas; partitioned tables; aggressive Redis caching; queues for non-critical writes.
Hyperscale	>10k req/sec	Sharded DB; CQRS (separate read & write models); event-sourced core; Kafka for events; some traffic on DynamoDB; multi-region active-active.

Almost no product ever reaches the bottom row. Most stay at "Steady growth". Pick the right tool for the actual traffic, not the imagined.

Event-driven thinking — the architectural shift that scales

A monolithic API does everything in the request:

plaintext

POST /orders
  ▶ insert order
  ▶ charge payment
  ▶ send email
  ▶ notify partner
  ▶ update analytics
  ▶ return 201

Each step adds latency. If "send email" is slow, the user waits. If "notify partner" fails, the whole request fails.

Event-driven splits the work:

plaintext

POST /orders
  ▶ insert order → in DB
  ▶ publish "OrderCreated" event → EventBridge / Kafka / SQS
  ▶ return 201 (50 ms)
 
Subscribers (independent, can be slow, can fail):
  ▶ payment service charges
  ▶ email service sends confirmation
  ▶ partner integration notifies
  ▶ analytics service records

Benefits:

Fast user-facing latency (one DB write, that's it).
Each subscriber retries independently (Chapter 17 — idempotency matters here).
Add a new subscriber without touching the existing endpoint.

Costs:

Eventual consistency — the email lands 200 ms after the response. Frontend may show "order created" before the confirmation email exists. Plan UX around it.
Harder to debug — you must trace events with a correlation_id (Chapter 12).

Most modern fintech / e-commerce backends look like this internally.

CQRS in one paragraph

Command Query Responsibility Segregation: separate the model that handles writes from the model that handles reads. Writes go to a normalised SQL DB. Reads come from a denormalised read model (Elasticsearch, materialised views, Redis projection) updated asynchronously from events.

You don't need CQRS until you have it: a real read/write asymmetry (1000 reads per write), or a search/analytics surface that's hard to serve from your transactional DB. When you hit it, the answer is to project events into a purpose-built read store.

What to actually go learn after this chapter

If you read one resource per topic, you'll be in the top 5% of your peers:

Topic	Resource
Stateless apps & 12-factor	The 12-factor app website (1 hour)
AWS fundamentals	"AWS Solutions Architect — Associate" study guide
Caching strategies	AWS docs — Caching patterns whitepaper
Read replicas, PgBouncer	PostgreSQL docs — Replication chapter
Lambda gotchas	AWS docs — Lambda best practices + Operating Lambda series
Event-driven patterns	Martin Fowler — Event-Driven Architecture
DynamoDB modelling	Alex DeBrie's DynamoDB Book
Distributed systems	Designing Data-Intensive Applications — Kleppmann

Anti-patterns to avoid as you scale

❌ Reaching for Kubernetes on day one — the operational burden is enormous. ❌ Sticky sessions — locks you out of horizontal scaling. ❌ Serverless for everything — cold starts, DB connection limits, vendor lock-in. Use it where it fits. ❌ No connection pooling — DB falls over at 500 connections. ❌ Caching with no invalidation strategy — stale data is sometimes worse than slow data. ❌ Microservices for a 5-engineer team — Conway's Law (Chapter 17). You need the team to justify it. ❌ Premature sharding — sharding is irreversible and slows everything down for years. ❌ In-memory anything — sessions, rate limits, caches, cron, locks. ❌ Same DB instance for OLTP and reporting — reporting queries lock your live API. ❌ Skipping observability before scaling — you can't fix what you can't see.

Closing thought

"Scalable" is not a property of any single technology. It's a property of how your app handles state. Push state out, make every component disposable, and any single piece can be replaced or multiplied. Lambda, ECS, K8s, RDS, DynamoDB — they're all just delivery mechanisms for that one idea.

Build the boring, stateless, horizontally-scalable monolith first. Add event-driven pieces as you find real bottlenecks. Reach for serverless and exotic distributed-systems patterns only when the problem genuinely demands it.

The teams that ship the most are the ones that resisted complexity until it was forced on them.

One thing to remember

“Scalable” is not a property of any single technology — it’s a property of how your app handles state. Push state out, make every component disposable. Build the boring, stateless, horizontally-scalable monolith first. Add event-driven pieces as you find real bottlenecks. Reach for serverless and CQRS only when the problem genuinely demands it.

Chapter 23 — Scalable Backend & Serverless

Two ways to handle more traffic

Vertical scaling — bigger box

Horizontal scaling — more boxes

The non-negotiable foundation: stateless apps

The classic scaling stack (containers, not serverless)

Database scaling — the actual bottleneck

1. Read replicas

2. Connection pooling

3. Caching

4. Partitioning / sharding

Serverless — when (and when not) to use Lambda

What Lambda is great at

What Lambda is bad at

Cold starts — the headline gotcha

The DB-connection problem

Lambda in NestJS — yes, it works

Beyond Lambda — the broader serverless toolkit

AWS Step Functions

EventBridge

SQS

DynamoDB

API Gateway

CloudFront

A pragmatic scaling roadmap

Event-driven thinking — the architectural shift that scales

CQRS in one paragraph

What to actually go learn after this chapter

Anti-patterns to avoid as you scale

Closing thought