Amazon S3: Object Storage With 11 Nines of Durability and Infinite Scale
Every cloud project eventually asks the same question — where do the files live? Log streams from a hundred servers, customer-uploaded photos, nightly database dumps, machine learning datasets: they all need somewhere reliable. Amazon S3 is that place for the majority of AWS workloads, and understanding how it operates shapes how you design everything that touches it.
S3 launched in 2006 alongside EC2 as one of the two foundational AWS services. Today it holds trillions of objects across multiple AWS regions and processes millions of API requests every second. The design choices made at launch — a flat key namespace, HTTP-based access, no file system abstraction — turned out to be exactly right for internet-scale storage.
What S3 Actually Is
S3 is an object store. That is a specific term with a specific meaning. It is not a file system, not a block device, and not a database. You cannot mount S3 like a drive, you cannot seek to byte position 500 inside an existing object, and you cannot update a portion of an object without replacing the whole thing. What you can do is store any blob of bytes up to 5 TB, assign it a unique key, and retrieve it over HTTPS from anywhere on the internet.
Three concepts underpin everything:
- Bucket — a globally unique container that belongs to one AWS region
- Object — the actual bytes, plus system metadata and optional user-defined metadata
- Key — the full name of an object within its bucket, e.g.
logs/2025/06/app.log
S3 Structure============
Bucket: company-assets (us-east-1)├── logs/│ ├── 2025/06/15/web.log ← key = logs/2025/06/15/web.log│ └── 2025/06/14/web.log├── uploads/│ ├── user-101/avatar.png│ └── user-202/invoice.pdf└── backups/ └── db-20250615.sql.gz
Each object carries:- Key (unique within the bucket)- Data (up to 5 TB)- Metadata (content-type, ETag, custom tags)- Version ID (when versioning is enabled)- Storage class assignmentDurability vs. Availability
S3 Standard advertises 99.999999999% durability — eleven nines. Store ten million objects and expect to lose one every ten thousand years on average. AWS achieves this by replicating every object across at least three physically separate Availability Zones within the same region. A single data center failure does not threaten your data.
Availability is a different number: 99.99% for S3 Standard. That corresponds to roughly 52 minutes of potential unavailability per year. Durability and availability measure different things — durability is whether the data still exists, availability is whether you can retrieve it right now. Both matter, and they should not be confused when discussing S3 in interviews.
Storage Classes
S3 charges you differently based on how frequently data gets accessed. Seven storage classes let you align cost with actual usage patterns:
S3 Storage Class Spectrum=========================
More frequent access ──────────────────────────────► Less frequent
Standard → Standard-IA → One Zone-IA → Glacier Instant → Glacier Flexible → Deep Archive
Storage cost: High ◄──────────────────────────────────────────────► LowRetrieval fee: $0 ◄──────────────────────────────────────────────► $$$Retrieval time: ms ◄──────────────────────────────────────────────► 12 hrs- S3 Standard — default class; millisecond retrieval, no minimum duration
- S3 Intelligent-Tiering — monitors per-object access; moves to cheaper tiers automatically
- S3 Standard-IA — 45% cheaper storage, per-GB retrieval fee, 30-day minimum
- S3 One Zone-IA — single AZ only; data could be lost if that AZ fails
- S3 Glacier Instant Retrieval — millisecond access, archival pricing, 90-day minimum
- S3 Glacier Flexible Retrieval — minutes to hours, three retrieval speed tiers
- S3 Glacier Deep Archive — under $1/TB/month, 12-hour standard retrieval
Lifecycle Rules
Rather than manually reclassifying objects as they age, lifecycle rules automate the transitions. A rule consists of a prefix filter, an age condition, and a target action — transition to a cheaper class or delete entirely.
Lifecycle Rule: Web Server Access Logs=======================================
Prefix: logs/
Day 0 → S3 Standard (uploaded by log agent)Day 30 → transition to Standard-IADay 90 → transition to Glacier Flexible RetrievalDay 730 → delete
Cost effect: pay Standard rates for 30 days,then roughly 60% less for days 31-90,then near-minimal cost until deletion.No application code change required.You define rules through the console, CLI, Terraform, or CloudFormation. S3 runs transitions in the background. The application writing the logs does not need to know any of this.
Versioning
When versioning is enabled, S3 retains every version of every object. A DELETE operation writes a delete marker rather than removing data. An overwrite creates a new version and preserves the old one. You can restore any previous version or undelete a mistakenly deleted object.
Versioning costs money — all those previous versions consume storage — but for datasets where accidental deletion or overwrite would be catastrophic, it is essential. Combine versioning with S3 Object Lock (WORM mode) and not even an administrator with full S3 permissions can delete or modify objects during the retention period. Financial institutions and healthcare providers use this for regulatory compliance mandates.
Access Control
Three layers determine who can read and write S3 objects:
- IAM policies — attached to users, roles, or groups; govern what actions are permitted
- Bucket policies — resource-based policies attached to the bucket itself; can allow access from specific AWS accounts, services, or IP ranges
- S3 Block Public Access — a guard at the account or bucket level that overrides ACLs and policies to prevent any public exposure
A common production setup: the application runs under an IAM role with s3:GetObject and s3:PutObject scoped to one specific bucket. Block Public Access is enabled on the bucket. A CloudFront distribution uses Origin Access Control (OAC) to serve objects to end users without making the bucket itself public.
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "cloudfront.amazonaws.com"}, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::company-assets/*", "Condition": { "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::123456789012:distribution/EDFDVBD6EXAMPLE" } } }]}Encryption
Since January 2023, S3 encrypts all new objects by default using SSE-S3 (AES-256, AWS manages the keys). For tighter control:
- SSE-KMS — keys managed in AWS KMS; full audit trail via CloudTrail; supports customer-managed keys with rotation policies
- SSE-C — you provide the key on every request; AWS never stores it
- Client-side encryption — encrypt locally before upload; S3 stores only ciphertext
Real-World Scenario: Media Upload Pipeline
A video streaming startup receives uploads from mobile clients, processes them, and delivers globally. Here is how S3 fits across the entire workflow:
Video Upload Pipeline=====================
Mobile Client │ │ PUT raw video (presigned URL) ▼S3: uploads/ prefix (Standard class) │ │ S3 Event Notification → SQS queue ▼Lambda function reads SQS message→ starts AWS MediaConvert job→ transcoded outputs written to S3: processed/ prefix │ ▼CloudFront distribution→ signs and serves from processed/ prefix→ users stream over edge network
Lifecycle rule on uploads/ prefix:→ Day 7: move raw originals to Glacier Flexible Retrieval→ Day 90: delete raw originalsNo storage server provisioning. No capacity planning. S3 absorbs any upload volume the application sends.
Key Points Worth Knowing
- S3 provides strong read-after-write consistency for all operations since December 2020 — the old eventual-consistency caveat no longer applies
- Multipart upload is mandatory for objects over 5 GB and recommended above 100 MB — it improves throughput and allows resuming failed transfers
- Transfer Acceleration routes uploads through CloudFront edge locations to speed transfers from distant geographic regions
- S3 Replication (Same-Region or Cross-Region) copies objects to another bucket automatically after upload — useful for compliance, disaster recovery, or data locality requirements
- Bucket names must be globally unique across every AWS account worldwide
- An object’s storage class can be changed in-place with
aws s3 cp --storage-classor a lifecycle rule — no download and re-upload required - Enabling Block Public Access at the account level prevents any bucket in the account from being accidentally made public, even if someone adds a permissive bucket policy