AWS Interview Questions and Answers
From associate-level cloud concepts to architect-level design decisions — these questions reflect what hiring teams test across cloud engineer, DevOps, data engineer, and solutions architect roles.
Core Concepts
Q1. What is the difference between horizontal and vertical scaling in AWS?
Vertical scaling (scaling up) — upgrade to a larger instance type (e.g., t3.medium → m5.xlarge). Simple but has ceiling limits and requires downtime for EC2.
Horizontal scaling (scaling out) — add more instances behind a load balancer. AWS handles this via Auto Scaling Groups (ASG) with EC2, or automatically with services like Lambda, ECS Fargate, and DynamoDB.
Horizontal scaling is preferred for cloud-native architectures because it:
- Has no practical ceiling
- Maintains availability during scaling events
- Distributes failure blast radius
Q2. Explain the differences between S3 storage classes.
| Storage Class | Use Case | Retrieval | Min Duration |
|---|---|---|---|
| S3 Standard | Frequently accessed data | Instant | None |
| S3 Intelligent-Tiering | Unknown/changing access patterns | Instant | None |
| S3 Standard-IA | Infrequent access, must be fast | Instant | 30 days |
| S3 One Zone-IA | Infrequent, non-critical | Instant | 30 days |
| S3 Glacier Instant | Archive with ms retrieval | Milliseconds | 90 days |
| S3 Glacier Flexible | Archive, minutes–hours retrieval | Minutes–hours | 90 days |
| S3 Glacier Deep Archive | Lowest cost, long-term archive | Up to 12h | 180 days |
S3 Lifecycle Policies automate transitions between classes based on object age.
Q3. What is the difference between Security Groups and Network ACLs?
| Feature | Security Group | Network ACL |
|---|---|---|
| Level | Instance level | Subnet level |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow inbound AND outbound explicitly) |
| Rules | Allow rules only | Allow and Deny rules |
| Evaluation | All rules evaluated | Rules evaluated in number order; first match wins |
| Default | Deny all inbound, allow all outbound | Allow all in/out |
Typical pattern: Security Groups for fine-grained per-instance control; NACLs as an extra layer to block known bad IP ranges at the subnet boundary.
Q4. Describe the components of a VPC.
A VPC is a logically isolated virtual network within an AWS region:
- Subnets — segments of the VPC’s CIDR block in a single AZ. Public subnets have a route to an Internet Gateway; private subnets don’t.
- Internet Gateway (IGW) — enables bidirectional traffic between VPC resources and the internet
- NAT Gateway — allows private subnet resources to initiate outbound internet connections without being directly reachable from the internet
- Route Tables — define where traffic is directed (e.g., 0.0.0.0/0 → IGW for public subnets)
- VPC Peering — private connection between two VPCs (same or different accounts/regions)
- Transit Gateway — hub-and-spoke for connecting many VPCs and on-premise networks
- VPC Endpoints — private connectivity to AWS services without leaving the AWS network (Gateway endpoints for S3/DynamoDB; Interface endpoints via PrivateLink for others)
Q5. What is IAM and what are its key components?
IAM (Identity and Access Management) controls who can do what in your AWS account:
- Users — individual human identities with permanent credentials
- Groups — collection of users sharing the same permissions
- Roles — assumed by services (EC2, Lambda), users, or cross-account principals; temporary credentials via STS
- Policies — JSON documents defining Allow/Deny permissions on specific actions and resources
- Permission boundaries — cap the maximum permissions a user or role can have, regardless of attached policies
Best practices: follow least-privilege, use roles for all service-to-service authentication, require MFA for human users, never use root account for daily work.
Compute
Q6. When would you choose Lambda vs EC2 vs ECS Fargate?
| Service | Choose when |
|---|---|
| Lambda | Event-driven, short-lived (≤15 min), variable/spiky traffic, no infrastructure management |
| EC2 | Long-running processes, specific OS/kernel requirements, need for persistent local storage, GPU workloads |
| ECS Fargate | Containerized workloads, team knows Docker, want managed infrastructure without EC2 management |
| EKS | Kubernetes required, complex microservices, multi-cloud portability needed |
Lambda’s cold start (100ms–3s) matters for latency-sensitive APIs — use Provisioned Concurrency to eliminate it.
Q7. What is an Auto Scaling Group and how does it scale?
An ASG maintains a fleet of EC2 instances between minimum and maximum limits, automatically adding or removing instances based on:
- Target tracking — maintain a target metric value (e.g., keep CPU at 50%). Simplest to configure.
- Step scaling — scale by specific amounts when alarms breach thresholds (e.g., add 2 instances when CPU >70%, add 5 when CPU >85%)
- Scheduled scaling — scale at predictable times (e.g., add capacity at 8 AM on weekdays)
- Predictive scaling — ML-based forecasting of demand to scale proactively
Lifecycle hooks allow running custom scripts during instance launch or termination (e.g., drain connections before termination).
Storage & Databases
Q8. What is the difference between RDS and DynamoDB?
| Aspect | RDS | DynamoDB |
|---|---|---|
| Type | Relational (SQL) | NoSQL (key-value + document) |
| Schema | Fixed, predefined | Flexible per item |
| Scaling | Vertical + read replicas | Horizontal, automatic |
| Consistency | Strong by default | Eventual (configurable to strong) |
| Query flexibility | Full SQL | Limited to primary key + GSIs |
| Best for | Complex queries, transactions, relational data | High-throughput, simple access patterns, gaming, sessions |
RDS Multi-AZ provides synchronous replication for HA. RDS Read Replicas (async) offload read traffic.
Q9. Explain S3 versioning and how it protects against accidental deletion.
When versioning is enabled on an S3 bucket:
- Every PUT creates a new version with a unique version ID
- DELETE on an object adds a delete marker (soft delete) — the object is hidden but not gone
- To permanently delete a versioned object, you must delete a specific version ID
Restoring a deleted file: delete the delete marker to make the previous version current again.
Lifecycle rules can automatically expire old versions after N days to control storage costs.
For extra protection: enable S3 Object Lock (WORM — Write Once Read Many) for immutable compliance storage.
High Availability & Architecture
Q10. What is the difference between an Application Load Balancer, Network Load Balancer, and Gateway Load Balancer?
| ALB | NLB | GWLB |
|---|---|---|
| Layer 7 (HTTP/HTTPS/WebSocket) | Layer 4 (TCP/UDP/TLS) | Layer 3 (IP) |
| Content-based routing (path, host, header) | Ultra-low latency, millions of req/sec | Route traffic through third-party appliances |
| Best for REST APIs, microservices | Best for gaming, IoT, financial services | Best for firewalls, intrusion detection |
ALB target groups can be EC2, Lambda, containers, or IPs. NLB can preserve the client IP without X-Forwarded-For headers.
Q11. What is CloudFront and how does it work?
CloudFront is AWS’s CDN — it caches content at 400+ edge locations worldwide to reduce latency for end users.
Flow: User request → Nearest Edge Location → If cached: serve directly; If not: fetch from origin (S3, ALB, EC2) → cache → serve
Key features:
- Origin Shield — intermediate caching layer to reduce origin load
- Lambda@Edge / CloudFront Functions — run code at edge for auth, redirects, A/B testing
- Signed URLs/Cookies — restrict content access to authorized users
- WAF integration — filter malicious traffic at the edge
Cache behavior control: Cache-Control headers from origin, or TTL settings in CloudFront distribution.
Monitoring & Costs
Q12. What are the key AWS cost optimization strategies?
Right-sizing: analyze CloudWatch metrics to find over-provisioned instances. Use AWS Compute Optimizer for recommendations.
Savings Plans & Reserved Instances: commit to consistent usage for 1–3 years for up to 72% discount.
Spot Instances: up to 90% discount for fault-tolerant, interruptible workloads (batch processing, CI/CD, rendering).
S3 cost optimization: lifecycle policies to move data to cheaper tiers; S3 Intelligent-Tiering for unknown access patterns.
Data transfer: keep traffic within the same AZ where possible; use VPC Endpoints to avoid NAT Gateway charges for S3/DynamoDB; use CloudFront to cache at edge.
Cost monitoring: AWS Cost Explorer for trends; AWS Budgets for alerts; Cost Allocation Tags to attribute costs to teams/projects.
Q13. How does CloudWatch differ from CloudTrail?
| CloudWatch | CloudTrail |
|---|---|
| Performance and operational monitoring | Audit and governance log of API calls |
| Metrics, logs, alarms, dashboards | ”Who did what, when, from where” |
| Monitor CPU, memory, latency, errors | Record EC2 start/stop, S3 bucket policy change, IAM role assumption |
| Action: trigger autoscaling, SNS alerts | Action: compliance audits, security investigation |
Both should be enabled: CloudWatch for operational alerting, CloudTrail for security and compliance auditing. CloudTrail logs should be sent to a separate, protected S3 bucket with Object Lock for tamper-proof audit logs.
Q14. Describe the Shared Responsibility Model.
AWS is responsible for the cloud:
- Physical data centers, hardware, networking infrastructure
- Hypervisor and managed service infrastructure
- AZ and region fault isolation
Customer is responsible in the cloud:
- Guest OS patches (EC2)
- Application security, data encryption, network access controls
- IAM configuration, MFA enforcement
- Data classification and backup
For managed services (RDS, Lambda, S3), AWS takes more responsibility (OS, patching, replication), but the customer remains responsible for access control, encryption settings, and application-level security.
Q15. What is AWS Well-Architected Framework and its pillars?
A set of design principles and best practices for building reliable, secure, efficient, and cost-effective systems on AWS:
- Operational Excellence — run and monitor systems, continually improve
- Security — protect data, systems, and assets via IAM, encryption, monitoring
- Reliability — recover from failures, scale to meet demand, manage change
- Performance Efficiency — use resources efficiently, select right instance types
- Cost Optimization — avoid unnecessary costs, understand spending over time
- Sustainability (added 2021) — minimize environmental impact
The Well-Architected Tool in the AWS console performs reviews against these pillars and surfaces actionable recommendations.