AWS Lambda Explained: How Serverless Compute Actually Works Under the Hood

Lambda abstracts away servers, but understanding what is happening underneath that abstraction makes you much more effective at building reliable serverless applications. This guide covers the execution model, billing mechanics, and the architecture patterns that Lambda enables.

The Execution Environment Lifecycle

AWS manages a fleet of worker machines that run Lambda execution environments. Each environment is an isolated microVM (using AWS Firecracker) with your function code, its dependencies, and the chosen runtime.

Lambda Invocation Lifecycle:

  INIT PHASE (cold start only)
  ┌──────────────────────────────────────────────────────┐
  │  1. Download deployment package or container image   │
  │  2. Start the runtime (Python, Node.js, Java etc.)   │
  │  3. Run init code (outside handler function)         │
  └──────────────────────────────────────────────────────┘
                           │
  INVOKE PHASE (every invocation)
  ┌──────────────────────────────────────────────────────┐
  │  4. Receive event payload                            │
  │  5. Call your handler function                       │
  │  6. Return response or error                         │
  └──────────────────────────────────────────────────────┘
                           │
  SHUTDOWN PHASE (after idle timeout)
  ┌──────────────────────────────────────────────────────┐
  │  7. Environment is frozen                            │
  │  8. Frozen environment may be reused for up to ~15m  │
  │  9. Environment is eventually destroyed              │
  └──────────────────────────────────────────────────────┘

The key insight is that init code (step 3) runs once per environment, not once per invocation. An environment that handles 1,000 requests only runs your database connection setup once. This is why you should initialise expensive resources in the module scope, not inside the handler.

import boto3
import json

# INIT CODE — runs once per environment
s3_client = boto3.client('s3')
ssm_client = boto3.client('ssm')

# Cache parameter value during init
FEATURE_FLAG = ssm_client.get_parameter(
    Name='/app/feature-flags/new-ui'
)['Parameter']['Value']

def lambda_handler(event, context):
    # HANDLER CODE — runs every invocation
    # s3_client is already initialised, not created here
    key = event['key']
    obj = s3_client.get_object(Bucket='my-bucket', Key=key)
    return {'content': obj['Body'].read().decode()}

How Billing Works

Lambda billing has two dimensions:

Request charge: $0.20 per 1 million requests (first 1 million free each month).

Duration charge: based on memory allocated and execution time in milliseconds.

The price per GB-second is $0.0000166667. If your function runs at 256 MB:

1 GB-second cost = $0.0000166667
256 MB function running for 500ms:
  = 0.25 GB × 0.5 seconds = 0.125 GB-seconds
  = 0.125 × $0.0000166667 = $0.0000020833 per invocation
  = $2.08 per million invocations

Duration billing is why the memory-performance tradeoff matters. A function that runs in 200ms at 1024 MB might cost the same as one running in 800ms at 256 MB — but the 1024 MB version finishes faster and frees the environment sooner.

Runtimes

Lambda supports managed runtimes maintained by AWS:

Runtime	Language	Version
python3.12	Python	3.12
nodejs20.x	Node.js	20
java21	Java	21 (Corretto)
dotnet8	.NET	8
ruby3.3	Ruby	3.3
provided.al2023	Custom	Any language via bootstrap

The provided.al2023 runtime is for custom runtimes — you supply a bootstrap executable that AWS calls. This is how Go, Rust, and C++ Lambda functions work, as there are no managed runtimes for them. The AWS-provided Go layer wraps the provided runtime.

Lambda Layers

Layers let you package dependencies separately from your function code. Up to 5 layers per function, each up to 250 MB compressed. Layers are mounted at /opt/ in the execution environment.

Function code (max 250 MB unzipped):
  └── lambda_function.py

Layer 1: numpy + pandas (scientific stack)
  └── /opt/python/numpy/...
  └── /opt/python/pandas/...

Layer 2: company shared utilities
  └── /opt/python/mycompany/auth.py
  └── /opt/python/mycompany/logging.py

# Create a layer with pandas
pip install pandas -t python/
zip -r pandas-layer.zip python/

aws lambda publish-layer-version \
  --layer-name pandas-layer \
  --zip-file fileb://pandas-layer.zip \
  --compatible-runtimes python3.12

# Attach layer to function
aws lambda update-function-configuration \
  --function-name my-function \
  --layers arn:aws:lambda:us-east-1:123456789:layer:pandas-layer:1

Layers reduce deployment package size and let multiple functions share the same dependency version.

Lambda in a VPC

By default, Lambda functions run in an AWS-managed VPC without access to resources in your VPC (like RDS databases or ElastiCache). To access private resources, configure the function with VPC settings.

Without VPC config:
  Lambda → Public internet → S3, DynamoDB (via public endpoints)
                           ✗ Cannot reach RDS in private subnet

With VPC config:
  Lambda → VPC → Private subnet → RDS, ElastiCache, EC2
                               → Needs NAT Gateway for public internet

aws lambda update-function-configuration \
  --function-name api-handler \
  --vpc-config SubnetIds=subnet-0a1b2c,subnet-0d4e5f,SecurityGroupIds=sg-lambda-outbound

The security group on the Lambda function is the outbound security group. On the RDS instance, you allow inbound from the Lambda security group.

Tradeoff: VPC-attached functions have longer cold starts (Lambda needs to attach an ENI to your VPC). AWS improved this significantly with the Hyperplane ENI model, so the penalty is now seconds, not minutes.

Serverless Architecture Patterns

API Backend

Client → API Gateway → Lambda → DynamoDB
                             → RDS (via VPC)

API Gateway handles authentication, throttling, and request validation. Lambda handles business logic. This is the simplest serverless web API architecture.

Fan-Out

Lambda A → SNS Topic → Lambda B (email notification)
                     → Lambda C (push notification)
                     → Lambda D (audit log)

SNS delivers a single event to multiple subscribers simultaneously. Each subscriber processes independently.

Queue-Based Worker

Application → SQS Queue → Lambda (polls)

Lambda polls SQS, receives batches of messages, and processes them. Failed messages return to the queue and retry. Dead-letter queues catch messages that fail repeatedly.

Event-Driven ETL

S3 (raw data) → EventBridge → Lambda (transform) → S3 (processed)
                                                  → Redshift (load)

Files land in S3, trigger an event, Lambda transforms and loads data. No servers running while no files arrive.

Lambda Extensions

Extensions run in the same execution environment as your function but as a separate process. They are used for:

Sending telemetry to monitoring tools (Datadog, New Relic Lambda Extension)
Retrieving secrets before the function handler runs
Flushing metrics after the handler completes

Lambda environment with extension:
  ┌────────────────────────────────────────┐
  │  Function process (your code)          │
  │  Extension process (monitoring agent)  │
  └────────────────────────────────────────┘

Extensions can add to cold start time because they initialise during the INIT phase.

Lambda@Edge and CloudFront Functions

Lambda@Edge runs Lambda functions at CloudFront edge locations, closest to the user. Supported triggers:

Viewer Request (when CloudFront receives a request)
Origin Request (when CloudFront forwards to origin)
Origin Response (when origin returns a response)
Viewer Response (before CloudFront returns to viewer)

Use cases: A/B testing with cookie manipulation at the edge, custom authentication headers, image transformation at the CDN level.

CloudFront Functions are a lighter, faster alternative for simple header manipulation and URL rewrites — sub-millisecond latency, cheaper, but limited capabilities.

Common Interview Questions

Q: What is a cold start and how does it affect performance? A cold start is the initialisation of a new execution environment — downloading the package, starting the runtime, running init code. This adds latency to the first invocation on a new environment. Warm invocations reuse the environment and skip this phase. Provisioned Concurrency eliminates cold starts by keeping environments initialised.

Q: Can Lambda access resources in a private VPC? Yes, by configuring VPC settings (subnet IDs and security groups). The function gets a network interface in your VPC and can reach private resources. For public internet access from a VPC-attached Lambda, you need a NAT Gateway in a public subnet.

Q: What is the difference between Lambda Layers and container images? Layers are zip archives mounted into the execution environment, allowing shared dependencies. Container images package everything (runtime, dependencies, code) into a Docker image. Container images support up to 10 GB, much larger than the 250 MB layer limit. Container images are better for ML workloads with large model files.

Q: How does Lambda handle errors differently for synchronous vs asynchronous invocations? Synchronous: errors are returned directly to the caller, no automatic retries. The caller decides whether to retry. Asynchronous: Lambda retries up to 2 times with backoff. After all retries fail, you can send the event to a dead-letter queue (SQS or SNS) for later inspection.