SQL Tables and Schemas
Relational databases organize data into tables. Tables are grouped into schemas (namespaces). Understanding how these structures work is foundational to designing databases that are maintainable, fast, and correct.
What is a Table?
A table is a two-dimensional structure of rows and columns:
- Columns (attributes) define what data is stored — each has a name and a data type
- Rows (records) are individual data entries
customers table: customer_id | name | email | city | created_at ───────────────────────────────────────────────────────────────────────────── 1 | Alice Smith | alice@example.com | London | 2024-01-15 2 | Bob Chen | bob@example.com | New York | 2024-02-20 3 | Maria López | maria@example.com | Madrid | 2025-03-01Each row represents one customer. Each column holds one attribute of that customer.
Table Structure: CREATE TABLE
CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(320) UNIQUE NOT NULL, city VARCHAR(100), is_active BOOLEAN DEFAULT TRUE, created_at TIMESTAMPTZ DEFAULT NOW());Every table should have:
- A primary key — a unique identifier for each row
- NOT NULL constraints on columns that are always required
- Appropriate data types for each column
Primary Keys
A primary key uniquely identifies each row. It must be unique and NOT NULL:
-- Integer auto-increment (PostgreSQL: SERIAL or BIGSERIAL)customer_id SERIAL PRIMARY KEY
-- Integer auto-increment (MySQL)customer_id INT AUTO_INCREMENT PRIMARY KEY
-- UUID (universally unique, good for distributed systems)id UUID DEFAULT gen_random_uuid() PRIMARY KEY
-- Composite primary key (multiple columns together are unique)PRIMARY KEY (order_id, product_id)Foreign Keys and Table Relationships
Foreign keys link tables together by referencing the primary key of another table:
CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL REFERENCES customers(customer_id), amount NUMERIC(10, 2), status VARCHAR(50), created_at TIMESTAMPTZ DEFAULT NOW());The customer_id in orders references customer_id in customers. This prevents orphaned orders (orders for non-existent customers) and enables JOIN queries.
Cascade behavior options:
REFERENCES customers(customer_id) ON DELETE CASCADE -- delete orders when customer is deleted ON DELETE SET NULL -- set customer_id to NULL when customer is deleted ON DELETE RESTRICT -- prevent deletion if orders exist (default)What is a Schema?
A schema is a namespace — a named container for tables, views, functions, and other database objects. Most databases start with a default schema (public in PostgreSQL, dbo in SQL Server).
-- Create schemas for organizationCREATE SCHEMA staging;CREATE SCHEMA reporting;CREATE SCHEMA audit;
-- Create tables in a schemaCREATE TABLE staging.raw_orders ( order_id INTEGER, raw_data JSONB, loaded_at TIMESTAMPTZ DEFAULT NOW());
CREATE TABLE reporting.daily_revenue ( report_date DATE, revenue NUMERIC(15, 2));
-- Reference a table with its schema prefixSELECT * FROM reporting.daily_revenue;INSERT INTO staging.raw_orders VALUES (...);If you don’t specify a schema, the database uses the default search path (typically public).
When to Use Schemas
Schemas separate objects by function or team:
public/ — default application tablesstaging/ — raw ingested data, pre-transformationreporting/ — aggregate tables for BI dashboardsaudit/ — change logs, access logsbackup_2025/ — point-in-time table copiesThis separation makes it easier to grant permissions (give analysts access to reporting.* only), organize dbt models, and keep raw data separate from curated data.
Normalization: Avoiding Redundancy
Normalization is the process of organizing tables to reduce redundant data. The goal: store each piece of information in one place.
Unnormalized (redundant):
orders table: order_id | customer_name | customer_email | product_name | price 1 | Alice Smith | alice@example.com | Keyboard | 149.99 2 | Alice Smith | alice@example.com | Mouse | 79.99Alice’s name and email appear twice. If her email changes, you must update multiple rows.
Normalized (separate tables):
customers: customer_id | name | emailorders: order_id | customer_id | product_id | priceproducts: product_id | name | base_priceEach piece of information appears once. Updates are simple and consistent.
Normal forms:
- 1NF: Each column holds atomic values; no repeating groups
- 2NF: All non-key columns depend on the full primary key (matters for composite keys)
- 3NF: No transitive dependencies — non-key columns don’t depend on other non-key columns
Most production OLTP databases aim for 3NF. Analytical databases (data warehouses) often intentionally denormalize for query performance.
Useful System Views for Schema Exploration
PostgreSQL:
-- List all tables in the current databaseSELECT table_schema, table_nameFROM information_schema.tablesWHERE table_type = 'BASE TABLE'ORDER BY table_schema, table_name;
-- Describe a table's columnsSELECT column_name, data_type, is_nullable, column_defaultFROM information_schema.columnsWHERE table_schema = 'public' AND table_name = 'customers'ORDER BY ordinal_position;
-- List all foreign keysSELECT tc.table_name, kcu.column_name, ccu.table_name AS references_table, ccu.column_name AS references_columnFROM information_schema.table_constraints tcJOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_nameJOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_nameWHERE tc.constraint_type = 'FOREIGN KEY';Physical vs Logical Design
The logical design is how you think about the data — entities, attributes, relationships (Entity-Relationship diagrams). The physical design is how you implement it in SQL — tables, columns, indexes, partitioning.
Good physical design considers:
- Index coverage: Which columns are queried in WHERE, JOIN, ORDER BY?
- Partitioning: Should large tables be partitioned by date or region?
- Compression: Should rarely-accessed columns use column-oriented storage?
- Denormalization: Should frequently joined tables be pre-joined into summary tables?
These decisions only matter at scale — for most applications, clean normalization is the right starting point.