Cloud  /  AWS

AWS Amazon Web Services 61 guides · updated 2026

Hands-on guides to compute, storage, databases, networking, and serverless on the world's most widely adopted cloud platform.

Amazon DocumentDB: MongoDB-Compatible Document Database Fully Managed on AWS

Relational databases store data in tables with fixed schemas. If you are storing user profiles where each user might have a different set of preferences, addresses, or activity history, a relational schema requires either many nullable columns or complex junction tables. Document databases store each record as a self-contained JSON document where every document can have different fields. This flexibility makes them a natural fit for dynamic, rapidly evolving data models.

Amazon DocumentDB is AWS’s fully managed document database. It uses the MongoDB query API, so applications built for MongoDB can connect to DocumentDB with minimal or no code changes. AWS manages the underlying infrastructure, handles replication, takes automated backups, and scales storage automatically.

What DocumentDB Is (and Is Not)

DocumentDB is MongoDB-compatible, not a fork or clone of MongoDB. The service uses a purpose-built distributed storage layer similar to Aurora’s architecture rather than running the open-source MongoDB software. It exposes the MongoDB wire protocol and supports the MongoDB 3.6, 4.0, and 5.0 APIs, meaning existing MongoDB drivers work. But it is not MongoDB — some MongoDB features and operations are not supported.

The important implication: most applications running MongoDB can connect to DocumentDB by changing the connection string. The edge cases where behavior differs are in advanced features, specific aggregation pipeline stages, and certain administrative operations.

Architecture

DocumentDB shares Aurora’s approach to storage: a distributed, fault-tolerant storage volume that spans three Availability Zones with six copies of the data. This is fundamentally different from running MongoDB on EC2, where replication is handled at the application level through replica sets.

DocumentDB Cluster Architecture
=================================
Application Layer
├── Write operations → Primary instance
└── Read operations → Reader instances (up to 15)
Primary (compute) Reader-1 (compute) Reader-2 (compute)
│ │ │
└──────────────────┼────────────────────┘
┌─────────▼─────────────────────┐
│ DocumentDB Storage Volume │
│ (spans 3 AZs, 6 copies) │
│ Grows in 10 GB increments │
│ Up to 64 TB per cluster │
└───────────────────────────────┘
Failover: if primary fails, a reader is promoted automatically
Typically completes in under 30 seconds

The Document Model

A document is a JSON object. A collection is a group of related documents. Collections do not enforce a fixed schema — two documents in the same collection can have completely different fields.

// User document - minimal profile
{
"_id": "u1001",
"email": "alice@example.com",
"name": "Alice Chen",
"tier": "premium"
}
// User document - extensive profile
{
"_id": "u1002",
"email": "bob@example.com",
"name": "Bob Smith",
"addresses": [
{"type": "home", "city": "Seattle", "zip": "98101"},
{"type": "work", "city": "Bellevue", "zip": "98004"}
],
"preferences": {
"notifications": {"email": true, "sms": false},
"theme": "dark"
},
"purchase_history": ["prod-101", "prod-205", "prod-389"]
}

Both documents live in the same users collection. A relational database would require either nullable columns for the optional fields or separate tables for addresses and preferences, with join operations to reconstruct the full record.

Querying Documents

DocumentDB uses MongoDB Query Language (MQL). Queries can filter by any field at any nesting level, including fields inside arrays and nested objects.

// Find all premium users in Seattle
db.users.find({
"tier": "premium",
"addresses.city": "Seattle"
})
// Find users who purchased product prod-101
db.users.find({
"purchase_history": "prod-101"
})
// Count users by tier
db.users.aggregate([
{ $group: { _id: "$tier", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
])

Indexes are essential for query performance. DocumentDB supports single-field, compound, multi-key (on array fields), text, and geospatial indexes. Without indexes, every query scans the entire collection.

When to Choose DocumentDB vs Alternatives

Choose DocumentDB when:

Consider self-managed MongoDB (on EC2) when:

Consider a relational database when:

Read Scaling

DocumentDB supports up to 15 read replicas per cluster. All replicas share the same storage layer as the primary, so data is immediately consistent from the storage perspective — replicas reflect writes as soon as the storage layer acknowledges them (typically within a few milliseconds).

Applications can connect to the reader endpoint, which load-balances across all available readers. This enables horizontally scaling read capacity independently of the primary.

Real-World Use Case: Content Management Platform

A media company runs a content management system that stores articles, videos, and podcasts. Each content type has different metadata — articles have word count and author credits, videos have duration and resolution, podcasts have episode number and transcript links. The structure varies per content type and evolves frequently as the product team adds new attributes.

With DocumentDB:

Key Interview Points