AWS Storage Gateway: Connecting On-Premises Storage to the AWS Cloud
Most cloud migrations do not happen all at once. Companies move workloads incrementally while existing on-premises infrastructure keeps running. The problem is the gap: applications in the data center need to read and write data, and they cannot suddenly start making S3 API calls instead of writing to a local NFS share or tape library. AWS Storage Gateway fills that gap.
Storage Gateway runs as a virtual machine (or a hardware appliance you rack in your data center) and presents local storage interfaces — NFS, SMB, iSCSI — to your applications. Behind the scenes, it durably stores the data in AWS. From the application’s perspective, nothing changes. From the data protection and cost perspective, everything does.
The Three Gateway Types
Storage Gateway has three distinct gateway types, each solving a different problem. Choosing the wrong one is a common mistake, so it is worth understanding exactly what each one does.
Storage Gateway: Three Types =============================
On-Premises Gateway AWS Cloud ─────────────────────────────────────────────────────────
NFS/SMB apps ────────────► File Gateway ──────► Amazon S3 (files as objects)
iSCSI apps ────────────► Volume Gateway ─────► Amazon S3 (snapshots) (Cached Mode) (EBS snapshots) + local cache
iSCSI apps ────────────► Volume Gateway ─────► Amazon S3 (async backup) (Stored Mode) (local primary)
Tape backup ────────────► Tape Gateway ───────► S3 (active tapes) software (virtual tape library) Glacier (archived tapes)File Gateway
File Gateway presents NFS or SMB file shares to on-premises clients. When an application writes a file to the share, it is stored as an object in S3. Reading a file retrieves the object from S3 through a local cache that holds recently accessed data.
The key insight: files that were on-premises as files are now in S3 as objects, with their original metadata preserved. Existing applications see an NFS or SMB share and work normally. AWS services like Athena, Lambda, or EMR see regular S3 objects and can process them directly without any intermediate step.
This makes File Gateway the right choice for:
- Backing up on-premises application data to S3 without changing the application
- Cloud-based analytics on data that originates on-premises
- Transitioning from on-premises file servers to S3-based storage without a flag day
- Giving on-premises workloads access to data created in AWS
Volume Gateway
Volume Gateway presents iSCSI block volumes to on-premises servers. Applications see a standard disk and use it the same way they would use any SAN volume. What happens behind the scenes differs between two modes:
Cached Mode: the data primarily lives in S3. The gateway keeps recently accessed data in a local cache on-premises. This minimizes on-premises storage requirements while still providing low-latency access to hot data. A server writes to the iSCSI volume; the gateway stores the data in S3 and caches it locally.
Stored Mode: the full dataset lives on-premises. The gateway stores everything locally and asynchronously replicates it to S3 as snapshots. This gives you sub-millisecond access to all data (since it is all local) plus offsite backup in S3 without needing a high-bandwidth internet connection at all times. The tradeoff is that you still need on-premises storage capacity.
Volume Gateway — Cached vs Stored Mode ========================================
Cached Mode: Server → Gateway → [local SSD cache] → S3 (primary storage) Hot data served from cache, cold data pulled from S3 Minimizes on-premises footprint
Stored Mode: Server → Gateway → [local disk — all data] → S3 (async backup) All data local for fast access, S3 for disaster recovery Keeps existing on-premises storage but adds cloud backupSnapshots from Volume Gateway are stored as EBS snapshots in S3. You can mount them on EC2 instances — enabling recovery in the cloud when the on-premises site is unavailable.
Tape Gateway
Many enterprises still use tape-based backup software like Veeam, Veritas NetBackup, Commvault, or Dell EMC Avamar. These applications expect a physical tape library connected via iSCSI. Tape Gateway presents a virtual tape library (VTL) with virtual tape drives and virtual tapes. The backup software writes to these virtual tapes exactly as it would to physical ones.
The actual data goes to S3 (for active virtual tapes) and Glacier or Glacier Deep Archive (for ejected/archived tapes). The result: existing backup software keeps working, no retraining required, but the physical tape racks, tape drives, offsite vaulting, and tape rotation costs disappear.
A tape cost comparison matters here. A typical enterprise might pay for tape hardware, replacement media, secure offsite transport, and vaulting fees. Glacier Deep Archive at roughly $1 per TB per month is significantly cheaper for archival storage, and retrieval (when needed for audit or disaster recovery) is faster than locating and shipping physical tapes from a vault.
Running Storage Gateway
Storage Gateway runs as:
- A VMware ESXi or Microsoft Hyper-V virtual machine on your existing hypervisor infrastructure
- A Linux KVM virtual machine
- An AWS-supplied hardware appliance for environments without a hypervisor
- An EC2 instance (for specific use cases where you want a gateway running in AWS)
The software gateway is free to download; you pay for AWS storage used (S3, EBS snapshots, Glacier).
Network Considerations
The gateway sits between your applications and AWS. Bandwidth determines how quickly data syncs to the cloud:
- For cached Volume Gateway, reads of cold data traverse the internet, so latency matters
- For File Gateway, writes go to the gateway’s local buffer first (low latency) then sync to S3 asynchronously
- For Tape Gateway, backup windows determine how much data must transfer; schedule large backups overnight
AWS Direct Connect or VPN improves reliability and may reduce latency for latency-sensitive cached volumes.
Real-World Use Case: Branch Office Backup
A retail chain has 200 branch locations, each with a file server holding point-of-sale data, security camera recordings, and local documents. Running a VPN and backup agent at each site is complex and expensive. With File Gateway:
- Deploy a Storage Gateway virtual appliance at each branch
- Configure NFS or SMB shares pointing to S3 buckets in a central AWS account
- Existing applications write to the share as always
- Data is durably stored in S3 with lifecycle rules that move older data to Glacier
- Central IT team manages backup policies from one place without touching 200 sites
Key Interview Points
- File Gateway stores files as S3 objects — the mapping is one file equals one S3 object, and you can access the same data using S3 APIs or AWS analytics tools
- Cached vs Stored is a frequent exam distinction: Cached = primary in S3, Stored = primary on-premises with async backup
- Tape Gateway replaces physical tape without changing backup software — existing jobs work unchanged
- Volume Gateway snapshots are EBS snapshots, allowing cloud-based recovery on EC2
- Storage Gateway requires internet connectivity or Direct Connect to sync data; it does not work fully offline (unlike Snowball)
- S3 File Gateway vs FSx File Gateway: FSx File Gateway caches data for FSx for Windows File Server, enabling on-premises Windows clients to access FSx shares with low-latency local caching