Technology  /  SQL

🗄️ SQL 40 guides · updated 2026

The language of data — from SELECT and JOINs to window functions, query plans, and the performance tuning that separates juniors from seniors.

Introduction to NoSQL Databases

NoSQL (“Not Only SQL”) describes databases that store and retrieve data using models other than the relational table-and-row structure. They traded some relational guarantees — joins, ACID transactions, strict schemas — for flexibility, horizontal scalability, and performance at specific access patterns.

The important framing: NoSQL isn’t a replacement for SQL. It’s a set of different tools for different problems. Many production systems use both.


Why NoSQL Emerged

Relational databases were built in the 1970s for structured business data. The constraints they impose — fixed schemas, rows and columns, vertical scaling — became pain points at internet scale:

NoSQL databases solve these specific problems — often by relaxing one or more of these constraints.


The Four Main NoSQL Types

1. Document Stores

Store data as self-contained documents — typically JSON or BSON. Each document can have a different structure, and documents are grouped into collections (analogous to tables).

Examples: MongoDB, CouchDB, Firestore, Amazon DocumentDB

{
"_id": "prod_8821",
"name": "Mechanical Keyboard",
"category": "electronics",
"price": 149.99,
"specs": {
"switch_type": "Cherry MX Blue",
"layout": "TKL",
"backlit": true
},
"tags": ["gaming", "mechanical", "rgb"],
"inventory": { "us": 142, "uk": 38, "de": 0 }
}

Good for: Product catalogs, user profiles, CMS content, event data — anything with variable structure per entity.


2. Key-Value Stores

The simplest NoSQL model: a key maps to an opaque value. Extremely fast reads and writes. The database doesn’t understand the value’s structure.

Examples: Redis, DynamoDB (in its simplest usage), Memcached

SET session:user_12345 '{"user_id": 12345, "role": "admin", "expires": "2025-07-01"}'
GET session:user_12345
TTL session:user_12345
HSET user:12345 name "Alice" email "alice@example.com"
LPUSH queue:jobs '{"task": "send_email", "to": "user@example.com"}'

Good for: Session caching, rate limiting, leaderboards, message queues, real-time counters. Redis is used for caching in the majority of high-traffic web applications.


3. Column-Family Stores (Wide Column)

Store data in rows, but each row can have different columns. Designed for massive write throughput and time-series workloads.

Examples: Apache Cassandra, HBase, Google Bigtable

-- Cassandra CQL
CREATE TABLE sensor_readings (
sensor_id UUID,
recorded_at TIMESTAMP,
temperature FLOAT,
humidity FLOAT,
PRIMARY KEY (sensor_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);
SELECT * FROM sensor_readings
WHERE sensor_id = ? LIMIT 1000;

Good for: IoT sensor data, time-series metrics, activity feeds, audit logs — high write volume, query by a primary partition key.


4. Graph Databases

Store data as nodes (entities) and edges (relationships). Optimized for queries that traverse relationships at depth.

Examples: Neo4j, Amazon Neptune, ArangoDB

-- Find mutual friends between Alice and Bob (Neo4j Cypher)
MATCH (alice:User {name: "Alice"})-[:FRIENDS_WITH]->(friend:User)
<-[:FRIENDS_WITH]-(bob:User {name: "Bob"})
RETURN friend.name AS mutual_friend;

Good for: Social networks, recommendation engines, fraud detection, knowledge graphs.


NoSQL vs SQL: When to Use Which

Use SQL (relational) when:
- Data is structured with well-defined relationships
- You need ACID transactions across multiple tables
- Queries are complex and vary (reporting, ad-hoc analysis)
Use NoSQL when:
- Schema is flexible or evolves rapidly
- You need to scale writes horizontally
- Access patterns are simple and known upfront
- Working with document, graph, or time-series data
- Extreme read/write performance is required (caching, sessions)

Consistency Trade-offs: CAP Theorem

Distributed systems must trade between consistency, availability, and partition tolerance:

C — Consistency: all nodes see the same data at the same time
A — Availability: every request gets a response (not necessarily current)
P — Partition Tolerance: system works despite network splits
P is required in distributed systems, so the real trade-off is C vs A:
CP databases: HBase, Zookeeper, MongoDB (default)
AP databases: Cassandra, CouchDB, DynamoDB (default)

SQL and NoSQL Together

Most production systems use both — each serving its strongest use case:

ComponentTechnologyReason
User sessionsRedis (key-value)Fast reads, TTL expiry
Product catalogMongoDB (document)Variable product attributes
Orders, inventoryPostgreSQL (relational)ACID transactions required
AnalyticsSnowflake / BigQueryComplex SQL queries
Activity feedCassandraHigh write throughput

NoSQL in Data Engineering

Data engineers encounter NoSQL in two main contexts:

As sources: NoSQL databases (MongoDB, DynamoDB) are common operational sources that need ingesting into a warehouse. Change data capture (CDC) via Debezium or AWS DMS is a standard pipeline pattern.

As infrastructure: Redis for pipeline state caching, Kafka (a log-structured store) for event streaming. Core SQL skills still transfer — Snowflake, BigQuery, and Redshift are relational and use standard SQL for analytics.