Amazon DocumentDB: MongoDB-Compatible Document Database Fully Managed on AWS

Relational databases store data in tables with fixed schemas. If you are storing user profiles where each user might have a different set of preferences, addresses, or activity history, a relational schema requires either many nullable columns or complex junction tables. Document databases store each record as a self-contained JSON document where every document can have different fields. This flexibility makes them a natural fit for dynamic, rapidly evolving data models.

Amazon DocumentDB is AWS’s fully managed document database. It uses the MongoDB query API, so applications built for MongoDB can connect to DocumentDB with minimal or no code changes. AWS manages the underlying infrastructure, handles replication, takes automated backups, and scales storage automatically.

What DocumentDB Is (and Is Not)

DocumentDB is MongoDB-compatible, not a fork or clone of MongoDB. The service uses a purpose-built distributed storage layer similar to Aurora’s architecture rather than running the open-source MongoDB software. It exposes the MongoDB wire protocol and supports the MongoDB 3.6, 4.0, and 5.0 APIs, meaning existing MongoDB drivers work. But it is not MongoDB — some MongoDB features and operations are not supported.

The important implication: most applications running MongoDB can connect to DocumentDB by changing the connection string. The edge cases where behavior differs are in advanced features, specific aggregation pipeline stages, and certain administrative operations.

Architecture

DocumentDB shares Aurora’s approach to storage: a distributed, fault-tolerant storage volume that spans three Availability Zones with six copies of the data. This is fundamentally different from running MongoDB on EC2, where replication is handled at the application level through replica sets.

  DocumentDB Cluster Architecture
  =================================

  Application Layer
        │
        ├── Write operations → Primary instance
        └── Read operations → Reader instances (up to 15)

  Primary (compute)   Reader-1 (compute)   Reader-2 (compute)
        │                  │                    │
        └──────────────────┼────────────────────┘
                           │
                 ┌─────────▼─────────────────────┐
                 │  DocumentDB Storage Volume     │
                 │  (spans 3 AZs, 6 copies)       │
                 │  Grows in 10 GB increments     │
                 │  Up to 64 TB per cluster       │
                 └───────────────────────────────┘

  Failover: if primary fails, a reader is promoted automatically
  Typically completes in under 30 seconds

The Document Model

A document is a JSON object. A collection is a group of related documents. Collections do not enforce a fixed schema — two documents in the same collection can have completely different fields.

// User document - minimal profile
{
  "_id": "u1001",
  "email": "alice@example.com",
  "name": "Alice Chen",
  "tier": "premium"
}

// User document - extensive profile
{
  "_id": "u1002",
  "email": "bob@example.com",
  "name": "Bob Smith",
  "addresses": [
    {"type": "home", "city": "Seattle", "zip": "98101"},
    {"type": "work", "city": "Bellevue", "zip": "98004"}
  ],
  "preferences": {
    "notifications": {"email": true, "sms": false},
    "theme": "dark"
  },
  "purchase_history": ["prod-101", "prod-205", "prod-389"]
}

Both documents live in the same users collection. A relational database would require either nullable columns for the optional fields or separate tables for addresses and preferences, with join operations to reconstruct the full record.

Querying Documents

DocumentDB uses MongoDB Query Language (MQL). Queries can filter by any field at any nesting level, including fields inside arrays and nested objects.

// Find all premium users in Seattle
db.users.find({
  "tier": "premium",
  "addresses.city": "Seattle"
})

// Find users who purchased product prod-101
db.users.find({
  "purchase_history": "prod-101"
})

// Count users by tier
db.users.aggregate([
  { $group: { _id: "$tier", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

Indexes are essential for query performance. DocumentDB supports single-field, compound, multi-key (on array fields), text, and geospatial indexes. Without indexes, every query scans the entire collection.

When to Choose DocumentDB vs Alternatives

Choose DocumentDB when:

You are running MongoDB in production and want to remove the operational burden (provisioning, patching, replication management, backup)
Your data is inherently document-shaped with variable schemas across records
You need MongoDB driver compatibility for an existing application
You want AWS-native integration (IAM, CloudWatch, VPC, KMS encryption)

Consider self-managed MongoDB (on EC2) when:

You need MongoDB features not yet supported in DocumentDB (check the AWS compatibility documentation)
You need the absolute latest MongoDB version immediately upon release
You require MongoDB-specific enterprise features

Consider a relational database when:

Your data has consistent, well-defined relationships between entities
You need strong ACID transactions across multiple collections
Your team is more experienced with SQL

Read Scaling

DocumentDB supports up to 15 read replicas per cluster. All replicas share the same storage layer as the primary, so data is immediately consistent from the storage perspective — replicas reflect writes as soon as the storage layer acknowledges them (typically within a few milliseconds).

Applications can connect to the reader endpoint, which load-balances across all available readers. This enables horizontally scaling read capacity independently of the primary.

Real-World Use Case: Content Management Platform

A media company runs a content management system that stores articles, videos, and podcasts. Each content type has different metadata — articles have word count and author credits, videos have duration and resolution, podcasts have episode number and transcript links. The structure varies per content type and evolves frequently as the product team adds new attributes.

With DocumentDB:

Each piece of content is a document with only the relevant fields for its type
New content types or attributes can be added without schema migrations or downtime
The existing editorial tools use the MongoDB Node.js driver, connecting to DocumentDB with only a connection string change
Read replicas serve the high-traffic public-facing content API while the primary handles editorial writes
Content older than one year is archived via a custom process — DocumentDB does not have built-in data archiving, so application logic writes old content to S3 Glacier and deletes from the active collection

Key Interview Points

DocumentDB is compatible with MongoDB 3.6, 4.0, and 5.0 APIs — check the compatibility matrix before migrating; not all MongoDB aggregation operators and commands are supported
DocumentDB is not running MongoDB software — it uses AWS’s custom storage layer with the MongoDB wire protocol on top
No native change streams in earlier versions — change streams (the MongoDB equivalent of DynamoDB Streams) are supported in DocumentDB 4.0 compatibility and later
Storage auto-scales in 10 GB increments up to 64 TB — no manual storage provisioning
Encryption uses AWS KMS; enabling encryption on an existing cluster requires creating an encrypted snapshot and restoring to a new cluster
DocumentDB does not have a serverless option like Aurora Serverless — you provision instance types for primary and readers; this is a cost consideration for intermittent workloads
TTL indexes: DocumentDB supports TTL indexes to automatically expire and delete documents after a specified time — useful for session data and temporary records