Amazon S3: Object Storage With 11 Nines of Durability and Infinite Scale

Every cloud project eventually asks the same question — where do the files live? Log streams from a hundred servers, customer-uploaded photos, nightly database dumps, machine learning datasets: they all need somewhere reliable. Amazon S3 is that place for the majority of AWS workloads, and understanding how it operates shapes how you design everything that touches it.

S3 launched in 2006 alongside EC2 as one of the two foundational AWS services. Today it holds trillions of objects across multiple AWS regions and processes millions of API requests every second. The design choices made at launch — a flat key namespace, HTTP-based access, no file system abstraction — turned out to be exactly right for internet-scale storage.

What S3 Actually Is

S3 is an object store. That is a specific term with a specific meaning. It is not a file system, not a block device, and not a database. You cannot mount S3 like a drive, you cannot seek to byte position 500 inside an existing object, and you cannot update a portion of an object without replacing the whole thing. What you can do is store any blob of bytes up to 5 TB, assign it a unique key, and retrieve it over HTTPS from anywhere on the internet.

Three concepts underpin everything:

Bucket — a globally unique container that belongs to one AWS region
Object — the actual bytes, plus system metadata and optional user-defined metadata
Key — the full name of an object within its bucket, e.g. logs/2025/06/app.log

S3 Structure
============

Bucket: company-assets (us-east-1)
├── logs/
│    ├── 2025/06/15/web.log    ← key = logs/2025/06/15/web.log
│    └── 2025/06/14/web.log
├── uploads/
│    ├── user-101/avatar.png
│    └── user-202/invoice.pdf
└── backups/
     └── db-20250615.sql.gz

Each object carries:
- Key (unique within the bucket)
- Data (up to 5 TB)
- Metadata (content-type, ETag, custom tags)
- Version ID (when versioning is enabled)
- Storage class assignment

Durability vs. Availability

S3 Standard advertises 99.999999999% durability — eleven nines. Store ten million objects and expect to lose one every ten thousand years on average. AWS achieves this by replicating every object across at least three physically separate Availability Zones within the same region. A single data center failure does not threaten your data.

Availability is a different number: 99.99% for S3 Standard. That corresponds to roughly 52 minutes of potential unavailability per year. Durability and availability measure different things — durability is whether the data still exists, availability is whether you can retrieve it right now. Both matter, and they should not be confused when discussing S3 in interviews.

Storage Classes

S3 charges you differently based on how frequently data gets accessed. Seven storage classes let you align cost with actual usage patterns:

S3 Storage Class Spectrum
=========================

More frequent access ──────────────────────────────► Less frequent

Standard → Standard-IA → One Zone-IA → Glacier Instant → Glacier Flexible → Deep Archive

Storage cost:  High ◄──────────────────────────────────────────────► Low
Retrieval fee: $0   ◄──────────────────────────────────────────────► $$$
Retrieval time: ms  ◄──────────────────────────────────────────────► 12 hrs

S3 Standard — default class; millisecond retrieval, no minimum duration
S3 Intelligent-Tiering — monitors per-object access; moves to cheaper tiers automatically
S3 Standard-IA — 45% cheaper storage, per-GB retrieval fee, 30-day minimum
S3 One Zone-IA — single AZ only; data could be lost if that AZ fails
S3 Glacier Instant Retrieval — millisecond access, archival pricing, 90-day minimum
S3 Glacier Flexible Retrieval — minutes to hours, three retrieval speed tiers
S3 Glacier Deep Archive — under $1/TB/month, 12-hour standard retrieval

Lifecycle Rules

Rather than manually reclassifying objects as they age, lifecycle rules automate the transitions. A rule consists of a prefix filter, an age condition, and a target action — transition to a cheaper class or delete entirely.

Lifecycle Rule: Web Server Access Logs
=======================================

Prefix: logs/

Day 0    → S3 Standard (uploaded by log agent)
Day 30   → transition to Standard-IA
Day 90   → transition to Glacier Flexible Retrieval
Day 730  → delete

Cost effect: pay Standard rates for 30 days,
then roughly 60% less for days 31-90,
then near-minimal cost until deletion.
No application code change required.

You define rules through the console, CLI, Terraform, or CloudFormation. S3 runs transitions in the background. The application writing the logs does not need to know any of this.

Versioning

When versioning is enabled, S3 retains every version of every object. A DELETE operation writes a delete marker rather than removing data. An overwrite creates a new version and preserves the old one. You can restore any previous version or undelete a mistakenly deleted object.

Versioning costs money — all those previous versions consume storage — but for datasets where accidental deletion or overwrite would be catastrophic, it is essential. Combine versioning with S3 Object Lock (WORM mode) and not even an administrator with full S3 permissions can delete or modify objects during the retention period. Financial institutions and healthcare providers use this for regulatory compliance mandates.

Access Control

Three layers determine who can read and write S3 objects:

IAM policies — attached to users, roles, or groups; govern what actions are permitted
Bucket policies — resource-based policies attached to the bucket itself; can allow access from specific AWS accounts, services, or IP ranges
S3 Block Public Access — a guard at the account or bucket level that overrides ACLs and policies to prevent any public exposure

A common production setup: the application runs under an IAM role with s3:GetObject and s3:PutObject scoped to one specific bucket. Block Public Access is enabled on the bucket. A CloudFront distribution uses Origin Access Control (OAC) to serve objects to end users without making the bucket itself public.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"Service": "cloudfront.amazonaws.com"},
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::company-assets/*",
    "Condition": {
      "StringEquals": {
        "AWS:SourceArn": "arn:aws:cloudfront::123456789012:distribution/EDFDVBD6EXAMPLE"
      }
    }
  }]
}

Encryption

Since January 2023, S3 encrypts all new objects by default using SSE-S3 (AES-256, AWS manages the keys). For tighter control:

SSE-KMS — keys managed in AWS KMS; full audit trail via CloudTrail; supports customer-managed keys with rotation policies
SSE-C — you provide the key on every request; AWS never stores it
Client-side encryption — encrypt locally before upload; S3 stores only ciphertext

Real-World Scenario: Media Upload Pipeline

A video streaming startup receives uploads from mobile clients, processes them, and delivers globally. Here is how S3 fits across the entire workflow:

Video Upload Pipeline
=====================

Mobile Client
     │
     │ PUT raw video (presigned URL)
     ▼
S3: uploads/ prefix (Standard class)
     │
     │ S3 Event Notification → SQS queue
     ▼
Lambda function reads SQS message
→ starts AWS MediaConvert job
→ transcoded outputs written to S3: processed/ prefix
     │
     ▼
CloudFront distribution
→ signs and serves from processed/ prefix
→ users stream over edge network

Lifecycle rule on uploads/ prefix:
→ Day 7: move raw originals to Glacier Flexible Retrieval
→ Day 90: delete raw originals

No storage server provisioning. No capacity planning. S3 absorbs any upload volume the application sends.

Key Points Worth Knowing

S3 provides strong read-after-write consistency for all operations since December 2020 — the old eventual-consistency caveat no longer applies
Multipart upload is mandatory for objects over 5 GB and recommended above 100 MB — it improves throughput and allows resuming failed transfers
Transfer Acceleration routes uploads through CloudFront edge locations to speed transfers from distant geographic regions
S3 Replication (Same-Region or Cross-Region) copies objects to another bucket automatically after upload — useful for compliance, disaster recovery, or data locality requirements
Bucket names must be globally unique across every AWS account worldwide
An object’s storage class can be changed in-place with aws s3 cp --storage-class or a lifecycle rule — no download and re-upload required
Enabling Block Public Access at the account level prevents any bucket in the account from being accidentally made public, even if someone adds a permissive bucket policy