EC2 Auto Scaling: Dynamic Capacity That Responds to Real Demand
EC2 Auto Scaling solves a problem every application team faces: how much capacity do you provision? Too little and you get degraded performance under load. Too much and you waste money on idle servers.
Auto Scaling groups (ASGs) handle this by adding instances when metrics like CPU or request count exceed a threshold, and removing them when load drops. The group always keeps running between your defined minimum and maximum capacity.
Auto Scaling Group Components
┌───────────────────────────────────────────────────────────────────┐│ Auto Scaling Group ││ ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Launch Template: AMI + instance type + SG + IAM role │ ││ └─────────────────────────────────────────────────────────────┘ ││ ││ Min: 2 Desired: 4 Max: 10 ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ EC2 1 │ │ EC2 2 │ │ EC2 3 │ │ EC2 4 │ ││ │ AZ-1a │ │ AZ-1b │ │ AZ-1a │ │ AZ-1b │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ││ Scaling Policy → CloudWatch Alarm → Scale Out/In │└───────────────────────────────────────────────────────────────────┘Launch Template defines what each new instance looks like — AMI, instance type, key pair, security groups, user data script, and IAM instance profile. Launch Templates replaced the older Launch Configurations and support versioning and mixed instance types.
Desired Capacity is the number of instances the group targets right now. Scaling policies change the desired capacity; the ASG reconciles to that number.
Min and Max are hard limits. The group never drops below min (even if a scale-in policy fires) and never exceeds max.
Creating an Auto Scaling Group
# Step 1: Create a launch templateaws ec2 create-launch-template \ --launch-template-name web-server-template \ --launch-template-data '{ "ImageId": "ami-0c02fb55956c7d316", "InstanceType": "t3.medium", "SecurityGroupIds": ["sg-0abc123def"], "IamInstanceProfile": {"Name": "WebServerRole"}, "UserData": "IyEvYmluL2Jhc2gKeXVtIGluc3RhbGwgLXkgaHR0cGQ=" }'
# Step 2: Create the Auto Scaling Groupaws autoscaling create-auto-scaling-group \ --auto-scaling-group-name web-asg \ --launch-template "LaunchTemplateName=web-server-template,Version=\$Latest" \ --min-size 2 \ --max-size 10 \ --desired-capacity 2 \ --vpc-zone-identifier "subnet-0a1b2c,subnet-0d4e5f" \ --target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/abc123"The --target-group-arns flag registers new instances with an ALB target group automatically, so they start receiving traffic once health checks pass.
Scaling Policies
Target Tracking Scaling
The simplest and most recommended policy. You pick a metric and a target value; AWS handles the math.
aws autoscaling put-scaling-policy \ --auto-scaling-group-name web-asg \ --policy-name cpu-target-tracking \ --policy-type TargetTrackingScaling \ --target-tracking-configuration '{ "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" }, "TargetValue": 60.0, "DisableScaleIn": false }'This example keeps average CPU at 60%. When CPU rises above 60%, the policy adds instances. When it drops, it removes them. AWS automatically creates the CloudWatch alarms.
You can also use ALB request count per target as the metric, which is often more responsive than CPU for web workloads:
aws autoscaling put-scaling-policy \ --auto-scaling-group-name web-asg \ --policy-name alb-request-tracking \ --policy-type TargetTrackingScaling \ --target-tracking-configuration '{ "PredefinedMetricSpecification": { "PredefinedMetricType": "ALBRequestCountPerTarget", "ResourceLabel": "app/web-alb/abc123/targetgroup/web-tg/def456" }, "TargetValue": 1000.0 }'Step Scaling
Step scaling gives you finer control by defining different scaling amounts depending on how far a metric is from your threshold.
CloudWatch alarm: CPUUtilization > 70%
Steps: 70% – 80% → add 1 instance 80% – 90% → add 2 instances 90%+ → add 4 instancesaws autoscaling put-scaling-policy \ --auto-scaling-group-name web-asg \ --policy-name step-scale-out \ --policy-type StepScaling \ --adjustment-type ChangeInCapacity \ --step-adjustments '[ {"MetricIntervalLowerBound": 0, "MetricIntervalUpperBound": 10, "ScalingAdjustment": 1}, {"MetricIntervalLowerBound": 10, "MetricIntervalUpperBound": 20, "ScalingAdjustment": 2}, {"MetricIntervalLowerBound": 20, "ScalingAdjustment": 4} ]'Scheduled Scaling
For predictable load patterns, scheduled scaling pre-emptively adjusts capacity before the load arrives. This is faster than reactive scaling because you do not wait for metrics to breach a threshold.
# Scale up before business hours (UTC)aws autoscaling put-scheduled-update-group-action \ --auto-scaling-group-name web-asg \ --scheduled-action-name scale-up-morning \ --recurrence "0 8 * * MON-FRI" \ --min-size 4 \ --desired-capacity 6
# Scale down after close of businessaws autoscaling put-scheduled-update-group-action \ --auto-scaling-group-name web-asg \ --scheduled-action-name scale-down-evening \ --recurrence "0 20 * * MON-FRI" \ --min-size 2 \ --desired-capacity 2Cooldown Periods
After a scaling activity completes, the ASG waits for the cooldown period before evaluating another. The default is 300 seconds (5 minutes). This prevents the ASG from launching multiple rounds of instances before the first round has time to handle load.
With target tracking policies, AWS manages cooldown internally — scale-in cooldown is typically 300 seconds while scale-out cooldown defaults to 60 seconds to respond faster to sudden spikes.
If your instances take a long time to start and warm up (for example, 3 minutes to pull a container image and run a database migration on startup), increase the cooldown to avoid premature scale-out decisions.
Lifecycle Hooks
Lifecycle hooks pause an instance at a specific transition point and let you run custom logic before the transition completes.
SCALE OUT: Pending → [Pending:Wait] → [Pending:Proceed] → InService
SCALE IN: InService → [Terminating:Wait] → [Terminating:Proceed] → TerminatedCommon uses for lifecycle hooks:
- Scale out: wait for the instance to register with a service discovery system, complete a database migration, or warm up an application cache before the ALB sends traffic
- Scale in: drain connections gracefully, flush logs to S3, or deregister from external monitoring before termination
aws autoscaling put-lifecycle-hook \ --auto-scaling-group-name web-asg \ --lifecycle-hook-name warm-up-hook \ --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \ --heartbeat-timeout 300 \ --default-result CONTINUEA Lambda function or SSM Run Command then sends a complete-lifecycle-action signal when the instance is ready.
Health Checks
ASGs can perform health checks using either EC2 status checks or ELB health checks.
EC2 health checks only catch hardware-level failures — the instance is unreachable or the system check fails. They miss cases where your application is running but returning 500 errors.
ELB health checks are more useful for web applications. If an instance fails the ALB health check (e.g., returns non-2xx for /health for 3 consecutive checks), the ASG marks it unhealthy and replaces it.
aws autoscaling update-auto-scaling-group \ --auto-scaling-group-name web-asg \ --health-check-type ELB \ --health-check-grace-period 120The grace period (120 seconds here) is the time the ASG waits after an instance enters InService before starting health check evaluation. This prevents healthy-but-slow-starting instances from being immediately replaced.
Real-World Scenario: Media Processing Platform
A video encoding platform needs to process uploaded files quickly but faces unpredictable upload volumes:
- Files land in S3 and an event triggers an SQS message
- A scheduled action pre-scales to 4 instances at 08:00 UTC (morning upload peak)
- Target tracking on SQS queue depth adds more instances as the queue grows
- A lifecycle hook on scale-in flushes any in-progress encode to EFS before the instance terminates
- Spot Instances with mixed instance policy cover 70% of the fleet to reduce cost
Common Interview Questions
Q: What is the difference between desired capacity, minimum, and maximum? Desired is the target the ASG tries to maintain right now. Minimum is the floor — it never goes below this, even during scale-in. Maximum is the ceiling — policies cannot add beyond this.
Q: How does an ASG know when a new instance is healthy enough to receive traffic? If associated with an ELB target group, the ALB health check must pass. The health check grace period gives the instance time to start up before health checks begin.
Q: What is the difference between target tracking and step scaling? Target tracking is simpler — you define a goal metric value and AWS manages the scale-out/in automatically. Step scaling gives you explicit control over how many instances to add at different metric levels, useful when you need to pre-empt overload by scaling aggressively.
Q: When would you use a lifecycle hook? When your instance needs to perform work before it’s ready to serve traffic (scale-out) or before it is terminated (scale-in). Without a hook, the ASG transitions immediately and you may route traffic to an unready instance or lose in-flight work.