SQL Frequently Asked Questions
What is the difference between WHERE and HAVING?
WHERE filters individual rows before grouping. HAVING filters groups after aggregation.
SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_priceFROM productsWHERE is_active = TRUE -- filter rows BEFORE groupingGROUP BY categoryHAVING COUNT(*) > 5 -- filter groups AFTER groupingORDER BY avg_price DESC;You can’t use aggregate functions (COUNT, SUM, AVG) in a WHERE clause — use HAVING for that.
What is the difference between DELETE, TRUNCATE, and DROP?
DELETE: Removes specific rows matching a WHERE clause. Transactional, slow on large tables.TRUNCATE: Removes ALL rows from a table. Faster than DELETE, resets auto-increment in most DBs.DROP: Removes the entire table including its structure, indexes, and constraints.DELETE FROM orders WHERE status = 'cancelled'; -- removes matching rowsTRUNCATE TABLE staging_data; -- removes all rows, keeps structureDROP TABLE staging_data; -- removes table entirelyWhat is NULL and why does NULL != NULL?
NULL means “unknown value.” Since the value is unknown, comparing two unknowns can’t produce a definitive TRUE — it produces NULL (also unknown).
NULL = NULL -- evaluates to NULL, not TRUENULL != NULL -- evaluates to NULL, not TRUENULL + 5 -- evaluates to NULLAlways use IS NULL / IS NOT NULL to test for NULLs:
WHERE phone IS NULL -- correctWHERE phone = NULL -- wrong: always returns no rowsWhat is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
INNER JOIN: Returns rows that have a match in BOTH tablesLEFT JOIN: Returns all rows from the left table + matches from right (NULLs for no match)RIGHT JOIN: Returns all rows from the right table + matches from left (NULLs for no match)FULL OUTER JOIN: Returns all rows from both tables (NULLs where no match on either side)CROSS JOIN: Returns every combination of rows from both tables (Cartesian product)-- LEFT JOIN: all customers, including those with no ordersSELECT c.name, o.order_idFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_id;What is an index and when should I add one?
An index is a data structure (usually a B-tree) that lets the database find rows without scanning the whole table. Think of it as a book’s index — instead of reading every page to find “normalization,” you jump directly to page 142.
Add indexes on columns you:
- Filter on frequently in WHERE clauses
- Join on (especially foreign key columns)
- Sort on with ORDER BY + LIMIT
Don’t index everything — indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE must update the index too).
What is the difference between UNION and UNION ALL?
UNION combines results from two queries and removes duplicate rows. UNION ALL combines results and keeps all rows, including duplicates.
-- UNION: no duplicates (extra cost to deduplicate)SELECT email FROM customersUNIONSELECT email FROM newsletter_subscribers;
-- UNION ALL: keeps duplicates, fasterSELECT email FROM customersUNION ALLSELECT email FROM newsletter_subscribers;Use UNION ALL unless you specifically need deduplication — it’s faster because it skips the dedup step.
What is the difference between PRIMARY KEY and UNIQUE constraint?
Both enforce that no two rows have the same value in the column. The differences:
- A table can have only one PRIMARY KEY but multiple UNIQUE constraints
- PRIMARY KEY columns are implicitly NOT NULL; UNIQUE columns can have multiple NULL values (NULLs are considered distinct from each other)
- PRIMARY KEY is typically used for the row identifier; UNIQUE is used for other columns that must be unique (email, username, SSN)
What is a subquery vs a CTE?
Both give you a way to reference a derived result set. A subquery is nested inside another statement. A CTE (Common Table Expression, defined with WITH) is named and defined before the main query.
-- SubquerySELECT * FROM orders WHERE customer_id IN ( SELECT customer_id FROM customers WHERE country = 'UK');
-- CTE (same logic, more readable)WITH uk_customers AS ( SELECT customer_id FROM customers WHERE country = 'UK')SELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM uk_customers);CTEs are preferred for complex queries because they’re named, can be referenced multiple times, and make the query read top-to-bottom like a series of steps.
Why is my query slow even though I have an index?
Several common reasons:
- Function on the indexed column:
WHERE YEAR(created_at) = 2025prevents index use. UseWHERE created_at >= '2025-01-01'instead. - Leading wildcard in LIKE:
WHERE name LIKE '%smith%'can’t use a B-tree index.WHERE name LIKE 'smith%'can. - Low cardinality column: If a column only has 3 distinct values (e.g., status = ‘active’/‘inactive’/‘pending’), a full table scan might be faster than using the index.
- Stale statistics: The optimizer uses statistics to choose a plan. Run
ANALYZE table_nameto refresh. - Wrong index: The query’s WHERE clause doesn’t match the index’s column order.
Run EXPLAIN ANALYZE to see exactly what the database is doing.
What is the difference between CHAR, VARCHAR, and TEXT?
CHAR(n): Fixed-length string. Always stores n bytes. Padded with spaces if shorter.VARCHAR(n): Variable-length string. Stores up to n characters, uses only what's needed.TEXT: Unlimited-length string (PostgreSQL). VARCHAR without a length limit.Use VARCHAR(n) when you have a meaningful max length (email addresses, phone numbers). Use TEXT in PostgreSQL when length is truly unbounded. CHAR is rarely the right choice for modern applications.
How do I generate a sequence of numbers or dates in SQL?
PostgreSQL:
-- Sequence of integers 1-10SELECT generate_series(1, 10) AS n;
-- Sequence of dates (every day in 2025)SELECT generate_series( '2025-01-01'::date, '2025-12-31'::date, '1 day'::interval)::date AS day;SQL Server:
-- Using a recursive CTEWITH numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n < 10)SELECT n FROM numbers;Date sequences are useful for filling gaps in time-series data.
What is a transaction and when do I need one?
A transaction is a group of SQL statements that succeed or fail as a unit. You need one whenever:
- Multiple statements must all succeed or all fail together
- You’re making large changes and want the ability to rollback
BEGIN; UPDATE accounts SET balance = balance - 500 WHERE account_id = 101; UPDATE accounts SET balance = balance + 500 WHERE account_id = 202;COMMIT; -- or ROLLBACK; to undoACID properties: Atomicity (all or nothing), Consistency (valid state after), Isolation (concurrent transactions don’t interfere), Durability (committed data survives crashes).
How do I find duplicate rows in a table?
-- Find duplicate emails (with count)SELECT email, COUNT(*) AS duplicate_countFROM customersGROUP BY emailHAVING COUNT(*) > 1ORDER BY duplicate_count DESC;
-- Get the full rows for duplicatesSELECT *FROM customersWHERE email IN ( SELECT email FROM customers GROUP BY email HAVING COUNT(*) > 1)ORDER BY email;To keep only the most recent duplicate:
DELETE FROM customersWHERE customer_id NOT IN ( SELECT MAX(customer_id) FROM customers GROUP BY email);