Warming up the neural circuits...
This module goes far beyond basic JOINs. You'll write queries that would make most developers' heads spin — and understand exactly how they work.
In the free course, you learned INNER JOIN and LEFT JOIN. Real production queries use:
A self-join joins a table to itself. Use this for hierarchical data.
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INTEGER REFERENCES employees(id)
);
INSERT INTO employees (name, manager_id) VALUES
('CEO', NULL),
('VP Engineering', 1),
('VP Marketing', 1),
('Engineering Manager', 2),
('Senior Developer', 4),
('Junior Developer', 4),
('Marketing Lead', 3);SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;Output:
employee | manager
-------------------+-----------------
CEO | NULL
VP Engineering | CEO
VP Marketing | CEO
Engineering Mgr | VP Engineering
Senior Developer | Engineering Mgr
Junior Developer | Engineering Mgr
Marketing Lead | VP MarketingSELECT e.name AS direct_report
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE m.name = 'VP Engineering';-- PostgreSQL recursive CTE
WITH RECURSIVE org_chain AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees who report to someone in the chain
SELECT e.id, e.name,
Recursive CTEs are PostgreSQL's secret weapon for tree structures — org charts, category trees, comment threads, and more.
A subquery is a query inside another query. They can appear in SELECT, FROM, or WHERE.
-- Find employees who earn more than the average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);-- Show each employee's salary and how it compares to average
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS difference
FROM employees;-- Find the top earner in each department
SELECT department, max_salary
FROM (
SELECT
department,
MAX(salary) AS max_salary
FROM employees
GROUP BY department
) AS dept_max
ORDER BY max_salary DESC;Performance trap: Correlated subqueries run once per outer row. On a table with 10,000 rows, that's 10,000 subquery executions. Often a JOIN or window function is faster.
EXISTS checks whether a subquery returns any rows. It's often faster than JOIN for existence checks.
SELECT u.username, u.email
FROM users u
WHERE EXISTS (
SELECT 1 FROM posts p
WHERE p.user_id = u.id
AND p.published = true
);SELECT u.username, u.email
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM posts p
WHERE p.user_id = u.id
AND p.published = true
);| Use case | Winner |
|---|---|
| "Find X that have Y" | EXISTS (stops at first match) |
| "Find X with Y data" | JOIN (needs the data) |
| "Find X without Y" | NOT EXISTS (avoids NULL handling) |
| "Count X per Y" | JOIN with GROUP BY |
-- EXISTS — more efficient for large datasets
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- Same result with JOIN — but requires DISTINCT and scans more rows
SELECT DISTINCT u.* FROM users u
JOIN orders o ON o.user_id = u.Real-world use case: Finding users who haven't logged in for 30 days:
SELECT email
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM activity_log a
WHERE a.user_id = u.id
AND a.event = 'login'
AND a
LATERAL lets a subquery reference columns from preceding tables in the same FROM clause.
-- Top 2 most recent posts per author
SELECT
u.username,
recent_posts.title,
recent_posts.created_at
FROM users u
CROSS JOIN LATERAL (
SELECT title, created_at
FROM posts
WHERE posts.user_id = u.id
ORDER BY created_at DESC
LIMIT 2
) ASWithout LATERAL, you'd need a complex window function or multiple queries.
Nearest locations:
SELECT u.name, nearby_branch.name AS nearest_branch
FROM users u
CROSS JOIN LATERAL (
SELECT name, ST_Distance(location, u.location) AS dist
FROM branches
ORDER BY dist
LIMIT 1
) AS nearby_branch;Latest order per customer:
SELECT c.name, last_order.total, last_order.ordered_at
FROM customers c
CROSS JOIN LATERAL (
SELECT total, ordered_at
FROM orders
WHERE orders.customer_id = c.id
ORDER BY ordered_at DESC
LIMIT 1
) AS last_order;LATERAL is PostgreSQL's superpower. It's like a correlated subquery that returns multiple columns and rows. Use it for "top N per group" queries.
-- BAD: DISTINCT hides duplicate rows from a bad JOIN
SELECT DISTINCT u.* FROM users u
JOIN orders o ON o.user_email = u.phone;✅ Fix: Use EXISTS instead, or fix the JOIN condition.
-- BAD: Runs the subquery for every row
SELECT name,
(SELECT MAX(amount) FROM payments p WHERE p.user_id = u.id)
FROM users u;✅ Fix: Use a window function or LATERAL.
-- BAD: 8 JOINs — hard to read, debug, and optimize
SELECT * FROM orders o
JOIN users u ... JOIN payments p ... JOIN addresses a ...
JOIN coupons c ... JOIN products pr ... JOIN reviews r ...
JOIN shipments s ...✅ Fix: Break into multiple queries or use CTEs.
You have a database with these tables:
users(id, name, email, created_at)orders(id, user_id, total_cents, status, created_at)order_items(id, order_id, product_name, quantity, unit_price_cents)refunds(id, order_id, amount_cents, reason, created_at)Write queries for:
✅ Self-joins for hierarchical data ✅ Subqueries in SELECT, FROM, and WHERE ✅ EXISTS / NOT EXISTS for efficient filtering ✅ LATERAL joins for top-N-per-group ✅ Recursive CTEs for tree structures
Next: CTEs and window functions — two of PostgreSQL's most powerful features.