SQL Frequently Asked Questions

What is the difference between WHERE and HAVING?

WHERE filters individual rows before grouping. HAVING filters groups after aggregation.

SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price
FROM products
WHERE is_active = TRUE          -- filter rows BEFORE grouping
GROUP BY category
HAVING COUNT(*) > 5             -- filter groups AFTER grouping
ORDER BY avg_price DESC;

You can’t use aggregate functions (COUNT, SUM, AVG) in a WHERE clause — use HAVING for that.

What is the difference between DELETE, TRUNCATE, and DROP?

DELETE:   Removes specific rows matching a WHERE clause. Transactional, slow on large tables.
TRUNCATE: Removes ALL rows from a table. Faster than DELETE, resets auto-increment in most DBs.
DROP:     Removes the entire table including its structure, indexes, and constraints.

DELETE FROM orders WHERE status = 'cancelled';  -- removes matching rows
TRUNCATE TABLE staging_data;                    -- removes all rows, keeps structure
DROP TABLE staging_data;                        -- removes table entirely

What is NULL and why does NULL != NULL?

NULL means “unknown value.” Since the value is unknown, comparing two unknowns can’t produce a definitive TRUE — it produces NULL (also unknown).

NULL = NULL    -- evaluates to NULL, not TRUE
NULL != NULL   -- evaluates to NULL, not TRUE
NULL + 5       -- evaluates to NULL

Always use IS NULL / IS NOT NULL to test for NULLs:

WHERE phone IS NULL          -- correct
WHERE phone = NULL           -- wrong: always returns no rows

What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?

INNER JOIN:      Returns rows that have a match in BOTH tables
LEFT JOIN:       Returns all rows from the left table + matches from right (NULLs for no match)
RIGHT JOIN:      Returns all rows from the right table + matches from left (NULLs for no match)
FULL OUTER JOIN: Returns all rows from both tables (NULLs where no match on either side)
CROSS JOIN:      Returns every combination of rows from both tables (Cartesian product)

-- LEFT JOIN: all customers, including those with no orders
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

What is an index and when should I add one?

An index is a data structure (usually a B-tree) that lets the database find rows without scanning the whole table. Think of it as a book’s index — instead of reading every page to find “normalization,” you jump directly to page 142.

Add indexes on columns you:

Filter on frequently in WHERE clauses
Join on (especially foreign key columns)
Sort on with ORDER BY + LIMIT

Don’t index everything — indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE must update the index too).

What is the difference between UNION and UNION ALL?

UNION combines results from two queries and removes duplicate rows. UNION ALL combines results and keeps all rows, including duplicates.

-- UNION: no duplicates (extra cost to deduplicate)
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;

-- UNION ALL: keeps duplicates, faster
SELECT email FROM customers
UNION ALL
SELECT email FROM newsletter_subscribers;

Use UNION ALL unless you specifically need deduplication — it’s faster because it skips the dedup step.

What is the difference between PRIMARY KEY and UNIQUE constraint?

Both enforce that no two rows have the same value in the column. The differences:

A table can have only one PRIMARY KEY but multiple UNIQUE constraints
PRIMARY KEY columns are implicitly NOT NULL; UNIQUE columns can have multiple NULL values (NULLs are considered distinct from each other)
PRIMARY KEY is typically used for the row identifier; UNIQUE is used for other columns that must be unique (email, username, SSN)

What is a subquery vs a CTE?

Both give you a way to reference a derived result set. A subquery is nested inside another statement. A CTE (Common Table Expression, defined with WITH) is named and defined before the main query.

-- Subquery
SELECT * FROM orders WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE country = 'UK'
);

-- CTE (same logic, more readable)
WITH uk_customers AS (
    SELECT customer_id FROM customers WHERE country = 'UK'
)
SELECT * FROM orders WHERE customer_id IN (SELECT customer_id FROM uk_customers);

CTEs are preferred for complex queries because they’re named, can be referenced multiple times, and make the query read top-to-bottom like a series of steps.

Why is my query slow even though I have an index?

Several common reasons:

Function on the indexed column: WHERE YEAR(created_at) = 2025 prevents index use. Use WHERE created_at >= '2025-01-01' instead.
Leading wildcard in LIKE: WHERE name LIKE '%smith%' can’t use a B-tree index. WHERE name LIKE 'smith%' can.
Low cardinality column: If a column only has 3 distinct values (e.g., status = ‘active’/‘inactive’/‘pending’), a full table scan might be faster than using the index.
Stale statistics: The optimizer uses statistics to choose a plan. Run ANALYZE table_name to refresh.
Wrong index: The query’s WHERE clause doesn’t match the index’s column order.

Run EXPLAIN ANALYZE to see exactly what the database is doing.

What is the difference between CHAR, VARCHAR, and TEXT?

CHAR(n):    Fixed-length string. Always stores n bytes. Padded with spaces if shorter.
VARCHAR(n): Variable-length string. Stores up to n characters, uses only what's needed.
TEXT:       Unlimited-length string (PostgreSQL). VARCHAR without a length limit.

Use VARCHAR(n) when you have a meaningful max length (email addresses, phone numbers). Use TEXT in PostgreSQL when length is truly unbounded. CHAR is rarely the right choice for modern applications.

How do I generate a sequence of numbers or dates in SQL?

PostgreSQL:

-- Sequence of integers 1-10
SELECT generate_series(1, 10) AS n;

-- Sequence of dates (every day in 2025)
SELECT generate_series(
    '2025-01-01'::date,
    '2025-12-31'::date,
    '1 day'::interval
)::date AS day;

SQL Server:

-- Using a recursive CTE
WITH numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 10
)
SELECT n FROM numbers;

Date sequences are useful for filling gaps in time-series data.

What is a transaction and when do I need one?

A transaction is a group of SQL statements that succeed or fail as a unit. You need one whenever:

Multiple statements must all succeed or all fail together
You’re making large changes and want the ability to rollback

BEGIN;
    UPDATE accounts SET balance = balance - 500 WHERE account_id = 101;
    UPDATE accounts SET balance = balance + 500 WHERE account_id = 202;
COMMIT;  -- or ROLLBACK; to undo

ACID properties: Atomicity (all or nothing), Consistency (valid state after), Isolation (concurrent transactions don’t interfere), Durability (committed data survives crashes).

How do I find duplicate rows in a table?

-- Find duplicate emails (with count)
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

-- Get the full rows for duplicates
SELECT *
FROM customers
WHERE email IN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
)
ORDER BY email;

To keep only the most recent duplicate:

DELETE FROM customers
WHERE customer_id NOT IN (
    SELECT MAX(customer_id)
    FROM customers
    GROUP BY email
);