nonobie — Real dev skills, explained like a friend would

What you'll learn

By the end of this chapter you will:

Explain why apps need databases instead of files
Pick the right database family for a problem
Read an EXPLAIN output without panic
Understand what ACID actually buys you

The why

A web server without a database is a beautiful empty room. The database is where lives, where the truth lives, and — by total accident — where 80% of your production bugs come from.

💡The filing cabinet

Imagine running a clinic with a paper filing system. Patient records sit in folders, in drawers, sorted alphabetically. Three problems show up immediately:

Two nurses can't read the same folder at once.
If the fire alarm goes off mid-write, you have torn pages.
Finding "all patients in Mumbai with diabetes" means walking every drawer.

A database is a filing cabinet with magic powers. Many people can read at once (concurrency). If the power fails mid-write, the file either fully wrote or didn't (atomicity). And finding "all patients in Mumbai with diabetes" is a single instruction (indexes).

Everything we'll learn — ACID, indexes, transactions — solves one of these three problems.

Why not just files?

You could save data to JSON files. People do, for tiny apps. Three things break the moment you grow:

Concurrency. Two HTTP requests modifying the same file race and one wins.
Crash safety. A power loss mid-write leaves a half-written file. Goodbye data.
Query speed. Want all orders over $100? You have to read every file.

A database solves all three at the cost of being a separate process with its own language.

The two big families: SQL vs NoSQL

SQL (relational)

Examples: PostgreSQL, MySQL, Server, SQLite.
Data lives in tables with fixed columns and types.
You query with SQL — a declarative language.
Strong constraints: types, foreign keys, uniqueness, transactions.
"Schema-on-write" — the database refuses to store data that doesn't match the schema.

SQL

sql

SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id
HAVING

NoSQL (non-relational)

A grab-bag of four subfamilies:

Type	Examples	Best for
Document	MongoDB, DynamoDB	Flexible-shaped data (profiles, content)
Key-value	Redis, Memcached	, session, counters
Wide-column	Cassandra, ScyllaDB	Massive write throughput, time series
Graph	Neo4j, JanusGraph	Friend-of-friend queries, fraud rings

Most NoSQL DBs are schema-on-read — the database stores whatever you give it; your code is responsible for handling shape variation.

Which to pick?

For a new backend, start with PostgreSQL. Move to NoSQL only when you have a real reason:

Reason	Suitable NoSQL
Highly variable document shapes	MongoDB
Cache / session / pub-sub	Redis
1M writes/sec across regions	Cassandra
Connected data (social, recommendation)	Neo4j

If you can't name your specific reason, use Postgres.

OLTP vs OLAP

Two completely different workloads, often confused.

OLTP (Online Processing) — many small reads/writes, low latency. Your app. ("Insert this order. Update this stock. Read this user.")
OLAP (Online Analytical Processing) — few queries, huge scans. Dashboards, BI. ("Sum revenue by region by month over 3 years.")

The same database can do both badly. Use a dedicated OLAP store (BigQuery, Snowflake, ClickHouse, DuckDB) for analytics. Don't run a 1B-row aggregation on your production OLTP Postgres at lunchtime.

ACID — the four guarantees

Relational DBs offer four guarantees that are easy to take for granted:

Atomicity — a transaction either fully completes or fully reverts. No half-writes.
Consistency — the DB never enters a state that violates its rules (constraints, FK).
Isolation — concurrent transactions don't see each other's half-done work.
Durability — once committed, the data survives crashes.

Example: transferring money. Two updates (debit account A, credit account B) wrapped in a transaction. If the second fails, the first is rolled back. The account never shows $-100.

Many NoSQL systems give up some of these for scale. Know what you're trading.

Indexes — the single biggest perf lever

A database without indexes scans every row on every query. With indexes, it can find a row in O(log n).

The query

sql

SELECT * FROM users WHERE email = 'aditi@example.com';

Without an → reads every row in users. 1M rows = slow.

With an index on email → reads ~20 rows along the B-tree. 1M rows = a few microseconds.

Create one

sql

CREATE INDEX idx_users_email ON users(email);

The cost. Indexes take space. Writes get slower (each insert updates every index). So you don't index everything — you index columns that show up in WHERE, JOIN, and ORDER BY.

EXPLAIN — read what the database is doing

Every SQL database has an EXPLAIN command that shows the plan:

Postgres

sql

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'aditi@example.com';

Output:

plaintext

Index Scan using idx_users_email on users  (cost=0.29..8.30 rows=1 width=64)
                                            (actual time=0.024..0.026 rows=1 loops=1)
  Index Cond: (email = 'aditi@example.com'::text)
Planning Time: 0.080 ms
Execution Time: 0.044 ms

You don't need to read every field. Look for:

Seq Scan = full table scan. Usually bad on big tables.
Index Scan = using an index. Usually good.
rows= the planner's estimate. If it's wildly off from actual rows, your stats are stale (ANALYZE).

We'll go deep on this in L2.

Where Redis fits

Redis is not your primary database (don't use it as one). It's the swiss-army knife for:

Cache — store the result of an expensive query for 60s.
Sessions — fast key-value, TTL built in.
Counters & rate limits — atomic INCR.
Pub/sub — fan-out messages between services.
Queues — backed by lists or streams (used by BullMQ).

You'll wire Redis into nearly every L4 project.

Common mistakes

Mistake	Why it's wrong	What to do
MongoDB for relational data	Joins become painful, transactions limited	Use Postgres when you have relationships
Postgres for unbounded JSON blobs with no shape	Hard to query, no benefits of relational	Use MongoDB or a document store
No indexes	Every query is a full scan	Index columns in WHERE/JOIN/ORDER BY
One index per column	Bloated writes, often unused	Composite indexes for common combinations
Storing dates as VARCHAR	Bad sorting, no time math	Use `TIMESTAMPTZ`

Production · Performance · Security notes

Production. Take backups. Restore them periodically. An untested backup is a wish.

Performance. Watch pg_stat_statements for your top-10 slowest queries. The 80/20 rule rules — fixing the top 3 usually fixes everything.

Security. The database should be on a private network. Public DBs get scanned and breached within hours.

Exercises

Beginner. Install Postgres locally. Create a users table. Insert 1M rows (a script). Run SELECT * WHERE email = '...' with and without an index. Time both.
Intermediate. Run EXPLAIN ANALYZE on a query you wrote at work. Identify whether it uses an index.
Intermediate. Pick a feature in your favorite app and decide which DB family suits it.
Advanced. Set up Postgres + Redis in Compose. Cache one expensive query.

Interview questions

Beginner. Why do we use indexes? So queries don't scan every row.

Senior. When would you choose NoSQL? When you have a specific workload that relational DBs handle poorly: flexible-shape documents, massive write throughput, deeply connected data, or pure caching.

Summary

Databases solve concurrency, durability, and search. Two big families: SQL (relational, ACID, schema-on-write — Postgres is the default) and NoSQL (document, KV, wide-column, graph — pick when you have a specific reason). Indexes are the biggest perf lever. ACID is what keeps your data sane. EXPLAIN is your debugger.

Quick recall

Postgres is the default. Use it unless you have a specific reason not to.
NoSQL wins for document flexibility, KV caching, massive scale, or graph data.
ACID = atomicity, consistency, isolation, durability.
Indexes turn O(n) into O(log n) for reads — at a small cost to writes.
EXPLAIN shows the plan. Read it.

Quiz

What does the "A" in ACID stand for?
Why is an index a trade-off, not a free win?
Pick the best DB family for: a real-time leaderboard.
What is the difference between OLTP and OLAP?
What command shows how Postgres plans to execute a query?

Chapter 6 — Databases Overview