Amazon Kinesis: Real-Time Data Streaming, Analytics, and Video Ingestion
Batch processing works fine when data arrives in daily dumps. It breaks down the moment you need to react to something within seconds — a fraud signal, a server error spike, a user abandoning a checkout flow. Amazon Kinesis is Amazon’s answer to that problem: a family of four managed services for ingesting, processing, and analysing data streams in real time.
The four services address different parts of the problem. Understanding which one to use — and when to combine them — is the practical skill this article builds.
The Four Kinesis Services
+---------------------+ +---------------------+| Kinesis Data | | Kinesis Data || Streams | | Firehose || | | || Custom consumers | | Managed delivery to || Full control | | S3, Redshift, || Millisecond latency | | OpenSearch, Splunk |+---------------------+ +---------------------+
+---------------------+ +---------------------+| Kinesis Data | | Kinesis Video || Analytics | | Streams || | | || SQL or Flink on | | Ingest, store, and || streams | | process video from || Real-time windowing | | cameras and devices |+---------------------+ +---------------------+Kinesis Data Streams
Kinesis Data Streams (KDS) is the core service — a durable, real-time data transport. Producers push records into the stream; consumers read from it.
Shards: A stream is divided into shards. Each shard provides:
- Write capacity: 1,000 records/second or 1 MB/second (whichever is reached first)
- Read capacity: 5 transactions/second or 2 MB/second per shard
If you need more throughput, you add shards. A stream with 10 shards can ingest 10,000 records/second. You can split or merge shards without downtime.
Producer A ─────────────┐Producer B ────────────┐| +----------+ +-----------+Producer C ───────────┐|| | Shard 1 | | Consumer 1| ||| | Shard 2 | | (Lambda) | |||───>| Shard 3 |───>| Consumer 2| ||| | ... | | (KCL app) | ||| +----------+ +-----------+ StreamRetention: Data is retained for 24 hours by default. You can extend this to 7 days (standard) or 365 days (extended). Extended retention costs more per GB-hour but is valuable for replay and audit scenarios.
Consumers: There are two consumption modes:
- Standard (polling): Each consumer polls its assigned shards. Up to 5 readers per shard share the 2 MB/second read throughput. This gets expensive if multiple independent consumers need the same shard.
- Enhanced fan-out: Each consumer gets its own 2 MB/second per shard, delivered via HTTP/2 push. Use this when you have multiple independent consumers and cannot afford to share throughput.
Sequence numbers: Every record written to KDS gets a sequence number that is monotonically increasing within a shard. Consumers track their position by sequence number, which is how they resume after a failure or restart.
Kinesis Data Firehose
Firehose is the managed delivery service. You do not manage consumers or shards — Firehose buffers records, optionally transforms them with a Lambda function, and delivers them to a destination.
Delivery destinations:
- Amazon S3
- Amazon Redshift (via S3 staging)
- Amazon OpenSearch Service
- Splunk
- Datadog, New Relic, MongoDB (via HTTP endpoint)
Producers | v[Kinesis Firehose] | +-- Optional Lambda transform | (filter, enrich, convert format) | +-- Buffer (by size or time) | e.g., 5 MB or 60 seconds | vDestination: S3 / Redshift / OpenSearch / SplunkFirehose buffers records until either a size threshold or a time threshold is met, then flushes the batch to the destination. Minimum buffer time is 60 seconds. This means Firehose is near-real-time, not real-time — if your use case requires sub-minute latency, use KDS with a custom consumer.
Firehose supports automatic format conversion: it can convert JSON to Parquet or ORC before writing to S3 using a schema from the Glue Data Catalogue. This eliminates the need for a separate conversion job.
When to use Firehose instead of KDS: When you need simple delivery to S3, Redshift, or OpenSearch without writing consumer code. Firehose handles scaling, retries, and format conversion. When you need custom processing logic or sub-minute latency, use KDS.
Kinesis Data Analytics
Kinesis Data Analytics (KDA) lets you process data streams using SQL or Apache Flink without managing infrastructure.
KDA for SQL (classic): Write standard SQL queries against an input stream. Use time-based windows (tumbling, sliding, session) to aggregate records over intervals. The output goes to another Kinesis stream or Firehose.
KDA for Apache Flink (Studio): A managed Apache Flink environment where you write Flink applications in Java, Python, or Scala. More powerful than SQL — supports complex stateful processing, custom windowing, joins across streams, and exactly-once semantics.
Example use case: A payment processor wants to flag any merchant whose transaction volume spikes more than 3x their 10-minute average. KDA for SQL:
CREATE OR REPLACE STREAM "ANOMALY_STREAM" ( merchant_id VARCHAR(32), tx_count INTEGER, avg_count DOUBLE);
CREATE OR REPLACE PUMP "ANOMALY_PUMP" ASINSERT INTO "ANOMALY_STREAM"SELECT STREAM merchant_id, COUNT(*) AS tx_count, AVG(COUNT(*)) OVER (ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS avg_countFROM "SOURCE_STREAM"GROUP BY merchant_id, STEP("SOURCE_STREAM".ROWTIME BY INTERVAL '1' MINUTE)HAVING COUNT(*) > AVG(COUNT(*)) OVER (...) * 3;Kinesis Video Streams
Kinesis Video Streams (KVS) is for ingesting, storing, and processing video, audio, and time-serialised data from cameras and IoT devices. It handles the protocol complexity of device connections and provides APIs for real-time streaming and on-demand playback.
Use cases:
- Smart camera feeds for computer vision pipelines
- Video security and surveillance with ML inference
- Telematics video from connected vehicles
- Live video streaming to applications
KVS integrates with Amazon Rekognition Video for real-time object and face detection, and with SageMaker for custom ML inference on video frames.
Choosing the Right Kinesis Service
Need to process data in real time with custom code? └── Kinesis Data Streams
Need to deliver data to S3, Redshift, or OpenSearch without writing consumers? └── Kinesis Data Firehose
Need to run SQL or Flink queries against a stream? └── Kinesis Data Analytics
Ingesting video or audio from cameras or devices? └── Kinesis Video StreamsCommon combination: IoT devices → KDS → (KDA for real-time anomaly detection AND Firehose for S3 archival). Both KDA and Firehose read from the same KDS stream independently.
Real-World Scenario: E-Commerce Real-Time Dashboard
An e-commerce platform needs a live dashboard showing orders per minute, revenue in the last 5 minutes, and alerts when the error rate on checkout exceeds 2%.
Architecture:
- Application servers → KDS (10 shards, ~2,000 events/second at peak)
- KDA for SQL → aggregates events in 1-minute tumbling windows → outputs to a second KDS stream
- Lambda consumer reads the aggregated stream → writes to DynamoDB → dashboard reads DynamoDB
- Firehose reads from the same source KDS → writes Parquet to S3 → Athena queries for historical analysis
This separates the real-time path (KDA + Lambda + DynamoDB) from the batch analytics path (Firehose + S3 + Athena), both reading from the same source stream.
Interview Notes
Q: How do you calculate the number of shards you need? Divide the peak ingestion rate by 1,000 records/second (or 1 MB/second, whichever is the binding constraint). Add headroom — typically 20-25% — for unexpected traffic spikes. If you have 5,000 records/second at peak, you need at least 5 shards, so 6-7 with headroom.
Q: What is the difference between Kinesis and SQS? SQS is a message queue — one consumer reads a message, it is deleted. Kinesis is a stream — multiple consumers can read the same record independently, and records are retained for hours or days. SQS is for task distribution; Kinesis is for stream processing with multiple downstream consumers.
Q: What happens when a shard is hot (one shard receives disproportionate traffic)?
The shard hits its 1 MB/second or 1,000 records/second limit and producers receive ProvisionedThroughputExceededException errors. Fix this by choosing a better partition key that distributes records more evenly across shards, or by adding a random suffix to the partition key.
Q: What is enhanced fan-out and when do you need it? Standard KDS consumers share 2 MB/second read throughput per shard across all consumers. Enhanced fan-out gives each registered consumer its own dedicated 2 MB/second per shard via HTTP/2 push. Use it when you have three or more independent consumers reading the same stream and standard throughput is insufficient.