Warming up the neural circuits...
By the end of this chapter you will:
It's 3am. Your on-call phone rings. "The portal is down. Agents can't submit orders." Without observability, you're guessing. With it, you find the root cause in 12 minutes.
Without observability: You check logs that say Error: something went wrong and a trace pointing to Sequelize internals. You have no idea which request failed, who triggered it, or what the database saw.
With observability: You check the dashboard, see p99 latency spiked at 2:47am, find the trace for the slowest request, see it spent 28 seconds waiting for a lock on the orders table, grep the logs by trace ID, find the exact query. Fixed in 12 minutes.
Observability is not a nice-to-have. It's the difference between 12 minutes and 8 hours.
Every production system needs all three. Each answers a different question:
| Pillar | Question answered | Tool in this project |
|---|---|---|
| Logs | What happened to this specific request? | Winston |
| Metrics | How is the system behaving overall? | Jaeger / Prometheus |
| Traces | Where did this slow request spend its time? | Jaeger / OpenTracing |
// ❌ You can't query this, can't filter it, can't aggregate it
console.log('Order created: ' + orderId + ' for user ' + userId);
console.error(e);// ✅ Every log is a queryable JSON object
this.logger.log('order.created', {
order_id: orderId,
user_id: userId,
restaurant_id: restaurantId,
trace_id: cid,
});
this.
Why structured? Because you can search for user_id:usr-123 in a log aggregator and see everything that happened for that user.
| Level | When to use | Who gets paged? |
|---|---|---|
error | Something broke — a request failed unexpectedly | Yes |
warn | Something suspicious — high retry count, slow query | No |
info | Normal events you want an audit trail of | No |
debug | Detailed diagnostic info — OFF in production | No |
process.env dumpEvery incoming request gets a unique ID. Every log line from that request includes that ID. When a bug is reported, you search for the ID and see the complete picture.
2026-05-01T10:23:01Z cid=7f3a-abc svc=api msg=POST /orders received
2026-05-01T10:23:01Z cid=7f3a-abc svc=api msg=placing order restaurant_id=rst-xyz
2026-05-01T10:23:01Z cid=7f3a-abc svc=orders msg=validating items count=3
2026-05-01T10:23:02Z cid=7f3a-abc svc=orders msg=restaurant closed opens_at=18:00 → RestaurantClosedException
2026-05-01T10:23:02Z cid=7f3a-abc svc=api msg=returning 422Grep for cid=7f3a-abc and you get the entire story in order.
const cid = req.headers['x-request-id'] ?? randomUUID();
res.setHeader('x-request-id', cid); // send back to client
asyncLocalStorage.run({ cid }, () =>const { cid } = asyncLocalStorage.getStore() ?? {};
this.logger.log({ ...event, cid });axios.post(url, body, {
headers: { 'x-request-id': cid }
});A trace is a recording of one request's journey, broken into spans:
Request total: 1.2 seconds
├── JwtGuard: 2ms
├── ValidationPipe: 1ms
├── OrdersService.create: 1197ms
│ ├── restaurantsService.isOpen: 5ms
│ ├── orderItems.validate: 12ms
│ ├── paymentsService.charge: 1150ms ← HERE IS THE PROBLEM
│ │ └── payment gateway API call: 1100ms (external timeout)
│ └── Order.create: 28ms
└── Response serialization: 1msWithout tracing, you'd know the request took 1.2 seconds. With tracing, you know it spent 1.1 seconds waiting for a database lock.
On every request (middleware/interceptor):
On every business event (service):
order.created, payment.processed, kyc.submittedOn every error:
| Metric | Why |
|---|---|
| HTTP request count by route × status | Are error rates rising? |
| HTTP p99 latency by route | Are routes getting slower? |
| DB query duration | Is the database the bottleneck? |
| DB connection pool usage | Are we running out of connections? |
| Outbound HTTP duration per vendor | Is a vendor degrading? |
| depth | Are jobs piling up? |
| Business: orders created/min | Is the business healthy? |
console.log rule// ❌ Banned everywhere in this codebase
console.log('debugging...');
console.error(e);
// ✅ Use NestJS Logger
private readonly logger = new Logger(OrdersService.name);
console.log bypasses the Winston logger, bypasses log levels, bypasses structured formatting, and bypasses the trace ID injection. It is useless in production.
Structured logs + correlation IDs = you can answer any "what happened?" question in under 2 minutes. Without them, you're debugging blind at 3am. Every log message should have a trace_id.