Cloud  /  AWS

AWS Amazon Web Services 61 guides · updated 2026

Hands-on guides to compute, storage, databases, networking, and serverless on the world's most widely adopted cloud platform.

Amazon ECS: Container Orchestration Without the Kubernetes Learning Curve

Running Docker containers in production requires more than docker run. You need to schedule containers across multiple hosts, handle failures by restarting crashed containers, distribute traffic with a load balancer, and manage secrets and configuration. This is container orchestration.

ECS provides all of this with less operational complexity than Kubernetes. There is no control plane to manage, no etcd to back up, no CNI plugin to troubleshoot. You define tasks, ECS places and runs them.

ECS Core Concepts

┌──────────────────────────────────────────────────────────────────┐
│ ECS Architecture │
│ │
│ Cluster │
│ └── Service (desired count, deployment config, LB attachment) │
│ └── Task Definition (container image, CPU, memory, ports) │
│ └── Task (running instance of task definition) │
│ └── Container(s) │
│ │
│ Launch Type: │
│ EC2 → Task runs on EC2 instances you manage │
│ Fargate → Task runs on AWS-managed compute (serverless) │
└──────────────────────────────────────────────────────────────────┘

Cluster: A logical grouping for services and tasks. If you use the EC2 launch type, the cluster also contains the EC2 instances (called container instances).

Task Definition: A JSON document that specifies how one or more containers should run — which image, how much CPU and memory, which ports to expose, what environment variables to set, and which IAM role the containers can use.

Task: A running instance of a task definition. A task may contain multiple containers (a main container and sidecars).

Service: Ensures a specified number of tasks run at all times. If a task crashes, the service scheduler starts a replacement. Services integrate with load balancers for traffic distribution.

Task Definitions

A task definition is the blueprint ECS uses to launch tasks. Here is a practical example:

{
"family": "web-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/web-api-task-role",
"containerDefinitions": [
{
"name": "api",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/web-api:v1.2.0",
"portMappings": [{"containerPort": 8080, "protocol": "tcp"}],
"environment": [
{"name": "NODE_ENV", "value": "production"},
{"name": "PORT", "value": "8080"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "api"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3
}
}
]
}

The executionRoleArn gives ECS permission to pull the container image from ECR and send logs to CloudWatch. The taskRoleArn gives the container itself permission to call AWS APIs (like S3 or DynamoDB).

Terminal window
aws ecs register-task-definition \
--cli-input-json file://task-definition.json

Launch Types: EC2 vs Fargate

EC2 Launch Type

You provision EC2 instances and join them to the cluster by installing the ECS container agent. ECS schedules tasks onto these instances based on available CPU and memory.

Advantages:

Disadvantages:

Fargate Launch Type

No EC2 instances. You define CPU and memory at the task level, and AWS provides compute transparently.

Advantages:

Disadvantages:

Creating a Service

Terminal window
# Create the cluster
aws ecs create-cluster --cluster-name production
# Create the service
aws ecs create-service \
--cluster production \
--service-name web-api \
--task-definition web-api:3 \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-0a1b2c,subnet-0d4e5f],
securityGroups=[sg-api-tasks],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,
containerName=api,containerPort=8080" \
--health-check-grace-period-seconds 30

The health-check-grace-period-seconds gives containers time to start before ECS begins evaluating their health.

Networking: awsvpc Mode

The awsvpc networking mode is mandatory for Fargate and strongly recommended for EC2. Each task gets its own Elastic Network Interface (ENI) with its own private IP address in your VPC.

VPC 10.0.0.0/16
├── Public Subnet 10.0.1.0/24
│ └── Application Load Balancer
└── Private Subnet 10.0.2.0/24
├── ECS Task (10.0.2.45) — ENI with security group sg-api
├── ECS Task (10.0.2.67) — ENI with security group sg-api
└── ECS Task (10.0.2.89) — ENI with security group sg-api

With awsvpc, you can apply a security group directly to each task. The security group on the RDS instance allows inbound from the task security group only.

Deployment Strategies

ECS services support multiple deployment types:

Rolling update (default): ECS replaces old tasks with new ones gradually. Configure minimumHealthyPercent and maximumPercent to control how aggressively it replaces.

Terminal window
aws ecs update-service \
--cluster production \
--service web-api \
--task-definition web-api:4 \
--deployment-configuration "minimumHealthyPercent=50,maximumPercent=200"

This allows ECS to temporarily run up to 200% of desired capacity while replacing old tasks, ensuring no downtime.

Blue/Green with CodeDeploy: ECS creates a new “green” set of tasks, shifts traffic from the ALB, and waits for validation before terminating the “blue” tasks. Supports automatic rollback if alarms fire.

Secrets Management

Never put secrets in environment variables as plain text. Use Secrets Manager or SSM Parameter Store:

"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/database-url-abc123"
},
{
"name": "API_KEY",
"valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/prod/api-key"
}
]

ECS injects these as environment variables at task start time. The task execution role needs secretsmanager:GetSecretValue or ssm:GetParameters permission.

Real-World Scenario: Multi-Tier E-Commerce Backend

A three-tier backend on ECS:

┌───────────────────────────────────────────────────────────┐
│ ALB (internet-facing) │
│ /api/* → api-service (3 Fargate tasks) │
│ /admin/* → admin-service (2 Fargate tasks) │
│ │
│ api-service Task Definition: │
│ - Container: api (Node.js 20, 512 CPU, 1024 MB RAM) │
│ - Pulls image from ECR │
│ - Reads DB_PASSWORD from Secrets Manager │
│ - Logs to CloudWatch Logs /ecs/api │
│ │
│ RDS PostgreSQL (private subnet) │
│ - sg-db allows port 5432 from sg-api-tasks │
└───────────────────────────────────────────────────────────┘

Deployment pipeline: Developer pushes to Git → CodePipeline triggers → CodeBuild builds and pushes image to ECR → CodeDeploy performs blue/green deployment to ECS.

ECS vs EKS: When to Use Each

ConsiderationECSEKS
Kubernetes requiredNoYes
Learning curveLowerHigher
AWS integrationNativeGood (with add-ons)
Custom schedulersNoYes
Service meshApp MeshIstio, Linkerd
PortabilityAWS onlyPortable to other clouds
Team expertiseNew teamK8s-experienced team

Choose ECS when your team is new to containers, your workload is on AWS only, and you want simpler operations. Choose EKS when you need Kubernetes-specific features, have existing K8s expertise, or need portability.

Common Interview Questions

Q: What is the difference between a task and a service in ECS? A task is a single running instance of a task definition. A service is a long-running controller that ensures a desired number of tasks are running and handles failure recovery, rolling deployments, and load balancer integration.

Q: What is the difference between the execution role and the task role? The execution role is used by ECS itself — to pull container images from ECR and write logs to CloudWatch. The task role is used by the containers — to call AWS APIs from within the application code.

Q: Can ECS tasks communicate with each other without going through a load balancer? Yes, with awsvpc networking, tasks have private IPs and can communicate directly if security groups allow it. AWS Cloud Map can provide service discovery — a task registers its IP and port, and other tasks look it up by name.

Q: What happens if an ECS task fails? The ECS service scheduler detects the failure (via container exit code or ELB health check failure), deregisters the task from the load balancer, and starts a replacement task. The desired count is maintained automatically.