SQL Tables and Schemas

Relational databases organize data into tables. Tables are grouped into schemas (namespaces). Understanding how these structures work is foundational to designing databases that are maintainable, fast, and correct.

What is a Table?

A table is a two-dimensional structure of rows and columns:

Columns (attributes) define what data is stored — each has a name and a data type
Rows (records) are individual data entries

customers table:
  customer_id | name          | email               | city       | created_at
  ─────────────────────────────────────────────────────────────────────────────
  1           | Alice Smith   | alice@example.com   | London     | 2024-01-15
  2           | Bob Chen      | bob@example.com     | New York   | 2024-02-20
  3           | Maria López   | maria@example.com   | Madrid     | 2025-03-01

Each row represents one customer. Each column holds one attribute of that customer.

Table Structure: CREATE TABLE

CREATE TABLE customers (
    customer_id  SERIAL PRIMARY KEY,
    name         VARCHAR(255)  NOT NULL,
    email        VARCHAR(320)  UNIQUE NOT NULL,
    city         VARCHAR(100),
    is_active    BOOLEAN       DEFAULT TRUE,
    created_at   TIMESTAMPTZ   DEFAULT NOW()
);

Every table should have:

A primary key — a unique identifier for each row
NOT NULL constraints on columns that are always required
Appropriate data types for each column

Primary Keys

A primary key uniquely identifies each row. It must be unique and NOT NULL:

-- Integer auto-increment (PostgreSQL: SERIAL or BIGSERIAL)
customer_id SERIAL PRIMARY KEY

-- Integer auto-increment (MySQL)
customer_id INT AUTO_INCREMENT PRIMARY KEY

-- UUID (universally unique, good for distributed systems)
id UUID DEFAULT gen_random_uuid() PRIMARY KEY

-- Composite primary key (multiple columns together are unique)
PRIMARY KEY (order_id, product_id)

Foreign Keys and Table Relationships

Foreign keys link tables together by referencing the primary key of another table:

CREATE TABLE orders (
    order_id    SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    amount      NUMERIC(10, 2),
    status      VARCHAR(50),
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

The customer_id in orders references customer_id in customers. This prevents orphaned orders (orders for non-existent customers) and enables JOIN queries.

Cascade behavior options:

REFERENCES customers(customer_id)
    ON DELETE CASCADE      -- delete orders when customer is deleted
    ON DELETE SET NULL     -- set customer_id to NULL when customer is deleted
    ON DELETE RESTRICT     -- prevent deletion if orders exist (default)

What is a Schema?

A schema is a namespace — a named container for tables, views, functions, and other database objects. Most databases start with a default schema (public in PostgreSQL, dbo in SQL Server).

-- Create schemas for organization
CREATE SCHEMA staging;
CREATE SCHEMA reporting;
CREATE SCHEMA audit;

-- Create tables in a schema
CREATE TABLE staging.raw_orders (
    order_id    INTEGER,
    raw_data    JSONB,
    loaded_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE reporting.daily_revenue (
    report_date DATE,
    revenue     NUMERIC(15, 2)
);

-- Reference a table with its schema prefix
SELECT * FROM reporting.daily_revenue;
INSERT INTO staging.raw_orders VALUES (...);

If you don’t specify a schema, the database uses the default search path (typically public).

When to Use Schemas

Schemas separate objects by function or team:

public/      — default application tables
staging/     — raw ingested data, pre-transformation
reporting/   — aggregate tables for BI dashboards
audit/       — change logs, access logs
backup_2025/ — point-in-time table copies

This separation makes it easier to grant permissions (give analysts access to reporting.* only), organize dbt models, and keep raw data separate from curated data.

Normalization: Avoiding Redundancy

Normalization is the process of organizing tables to reduce redundant data. The goal: store each piece of information in one place.

Unnormalized (redundant):

orders table:
  order_id | customer_name | customer_email     | product_name | price
  1        | Alice Smith   | alice@example.com  | Keyboard     | 149.99
  2        | Alice Smith   | alice@example.com  | Mouse        | 79.99

Alice’s name and email appear twice. If her email changes, you must update multiple rows.

Normalized (separate tables):

customers: customer_id | name        | email
orders:    order_id    | customer_id | product_id | price
products:  product_id  | name        | base_price

Each piece of information appears once. Updates are simple and consistent.

Normal forms:

1NF: Each column holds atomic values; no repeating groups
2NF: All non-key columns depend on the full primary key (matters for composite keys)
3NF: No transitive dependencies — non-key columns don’t depend on other non-key columns

Most production OLTP databases aim for 3NF. Analytical databases (data warehouses) often intentionally denormalize for query performance.

Useful System Views for Schema Exploration

PostgreSQL:

-- List all tables in the current database
SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
ORDER BY table_schema, table_name;

-- Describe a table's columns
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = 'customers'
ORDER BY ordinal_position;

-- List all foreign keys
SELECT
    tc.table_name,
    kcu.column_name,
    ccu.table_name AS references_table,
    ccu.column_name AS references_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';

Physical vs Logical Design

The logical design is how you think about the data — entities, attributes, relationships (Entity-Relationship diagrams). The physical design is how you implement it in SQL — tables, columns, indexes, partitioning.

Good physical design considers:

Index coverage: Which columns are queried in WHERE, JOIN, ORDER BY?
Partitioning: Should large tables be partitioned by date or region?
Compression: Should rarely-accessed columns use column-oriented storage?
Denormalization: Should frequently joined tables be pre-joined into summary tables?

These decisions only matter at scale — for most applications, clean normalization is the right starting point.