Warming up the neural circuits...
By the end of this chapter you will:
A web server without a database is a beautiful empty room. The database is where lives, where the truth lives, and — by total accident — where 80% of your production bugs come from.
Imagine running a clinic with a paper filing system. Patient records sit in folders, in drawers, sorted alphabetically. Three problems show up immediately:
A database is a filing cabinet with magic powers. Many people can read at once (concurrency). If the power fails mid-write, the file either fully wrote or didn't (atomicity). And finding "all patients in Mumbai with diabetes" is a single instruction (indexes).
Everything we'll learn — ACID, indexes, transactions — solves one of these three problems.
You could save data to JSON files. People do, for tiny apps. Three things break the moment you grow:
A database solves all three at the cost of being a separate process with its own language.
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id
HAVING
A grab-bag of four subfamilies:
| Type | Examples | Best for |
|---|---|---|
| Document | MongoDB, DynamoDB | Flexible-shaped data (profiles, content) |
| Key-value | Redis, Memcached | , session, counters |
| Wide-column | Cassandra, ScyllaDB | Massive write throughput, time series |
| Graph | Neo4j, JanusGraph | Friend-of-friend queries, fraud rings |
Most NoSQL DBs are schema-on-read — the database stores whatever you give it; your code is responsible for handling shape variation.
For a new backend, start with PostgreSQL. Move to NoSQL only when you have a real reason:
| Reason | Suitable NoSQL |
|---|---|
| Highly variable document shapes | MongoDB |
| Cache / session / pub-sub | Redis |
| 1M writes/sec across regions | Cassandra |
| Connected data (social, recommendation) | Neo4j |
If you can't name your specific reason, use Postgres.
Two completely different workloads, often confused.
The same database can do both badly. Use a dedicated OLAP store (BigQuery, Snowflake, ClickHouse, DuckDB) for analytics. Don't run a 1B-row aggregation on your production OLTP Postgres at lunchtime.
Relational DBs offer four guarantees that are easy to take for granted:
Example: transferring money. Two updates (debit account A, credit account B) wrapped in a transaction. If the second fails, the first is rolled back. The account never shows $-100.
Many NoSQL systems give up some of these for scale. Know what you're trading.
A database without indexes scans every row on every query. With indexes, it can find a row in O(log n).
SELECT * FROM users WHERE email = 'aditi@example.com';Without an → reads every row in users. 1M rows = slow.
With an index on email → reads ~20 rows along the B-tree. 1M rows = a few microseconds.
CREATE INDEX idx_users_email ON users(email);The cost. Indexes take space. Writes get slower (each insert updates every index). So you don't index everything — you index columns that show up in WHERE, JOIN, and ORDER BY.
Every SQL database has an EXPLAIN command that shows the plan:
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'aditi@example.com';Output:
Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=64)
(actual time=0.024..0.026 rows=1 loops=1)
Index Cond: (email = 'aditi@example.com'::text)
Planning Time: 0.080 ms
Execution Time: 0.044 msYou don't need to read every field. Look for:
ANALYZE).We'll go deep on this in L2.
Redis is not your primary database (don't use it as one). It's the swiss-army knife for:
You'll wire Redis into nearly every L4 project.
| Mistake | Why it's wrong | What to do |
|---|---|---|
| MongoDB for relational data | Joins become painful, transactions limited | Use Postgres when you have relationships |
| Postgres for unbounded JSON blobs with no shape | Hard to query, no benefits of relational | Use MongoDB or a document store |
| No indexes | Every query is a full scan | Index columns in WHERE/JOIN/ORDER BY |
| One index per column | Bloated writes, often unused | Composite indexes for common combinations |
| Storing dates as VARCHAR | Bad sorting, no time math | Use TIMESTAMPTZ |
Production. Take backups. Restore them periodically. An untested backup is a wish.
Performance. Watch pg_stat_statements for your top-10 slowest queries. The 80/20 rule rules — fixing the top 3 usually fixes everything.
Security. The database should be on a private network. Public DBs get scanned and breached within hours.
users table. Insert 1M rows (a script). Run SELECT * WHERE email = '...' with and without an index. Time both.EXPLAIN ANALYZE on a query you wrote at work. Identify whether it uses an index.Beginner. Why do we use indexes? So queries don't scan every row.
Senior. When would you choose NoSQL? When you have a specific workload that relational DBs handle poorly: flexible-shape documents, massive write throughput, deeply connected data, or pure caching.
Databases solve concurrency, durability, and search. Two big families: SQL (relational, ACID, schema-on-write — Postgres is the default) and NoSQL (document, KV, wide-column, graph — pick when you have a specific reason). Indexes are the biggest perf lever. ACID is what keeps your data sane. EXPLAIN is your debugger.