Amazon Kinesis: Real-Time Data Streaming, Analytics, and Video Ingestion

Batch processing works fine when data arrives in daily dumps. It breaks down the moment you need to react to something within seconds — a fraud signal, a server error spike, a user abandoning a checkout flow. Amazon Kinesis is Amazon’s answer to that problem: a family of four managed services for ingesting, processing, and analysing data streams in real time.

The four services address different parts of the problem. Understanding which one to use — and when to combine them — is the practical skill this article builds.

The Four Kinesis Services

+---------------------+    +---------------------+
| Kinesis Data        |    | Kinesis Data        |
| Streams             |    | Firehose            |
|                     |    |                     |
| Custom consumers    |    | Managed delivery to |
| Full control        |    | S3, Redshift,       |
| Millisecond latency |    | OpenSearch, Splunk  |
+---------------------+    +---------------------+

+---------------------+    +---------------------+
| Kinesis Data        |    | Kinesis Video       |
| Analytics           |    | Streams             |
|                     |    |                     |
| SQL or Flink on     |    | Ingest, store, and  |
| streams             |    | process video from  |
| Real-time windowing |    | cameras and devices |
+---------------------+    +---------------------+

Kinesis Data Streams

Kinesis Data Streams (KDS) is the core service — a durable, real-time data transport. Producers push records into the stream; consumers read from it.

Shards: A stream is divided into shards. Each shard provides:

Write capacity: 1,000 records/second or 1 MB/second (whichever is reached first)
Read capacity: 5 transactions/second or 2 MB/second per shard

If you need more throughput, you add shards. A stream with 10 shards can ingest 10,000 records/second. You can split or merge shards without downtime.

Producer A ─────────────┐
Producer B ────────────┐|    +----------+    +-----------+
Producer C ───────────┐||    | Shard 1  |    | Consumer 1|
                      |||    | Shard 2  |    | (Lambda)  |
                      |||───>| Shard 3  |───>| Consumer 2|
                      |||    | ...      |    | (KCL app) |
                      |||    +----------+    +-----------+
                              Stream

Retention: Data is retained for 24 hours by default. You can extend this to 7 days (standard) or 365 days (extended). Extended retention costs more per GB-hour but is valuable for replay and audit scenarios.

Consumers: There are two consumption modes:

Standard (polling): Each consumer polls its assigned shards. Up to 5 readers per shard share the 2 MB/second read throughput. This gets expensive if multiple independent consumers need the same shard.
Enhanced fan-out: Each consumer gets its own 2 MB/second per shard, delivered via HTTP/2 push. Use this when you have multiple independent consumers and cannot afford to share throughput.

Sequence numbers: Every record written to KDS gets a sequence number that is monotonically increasing within a shard. Consumers track their position by sequence number, which is how they resume after a failure or restart.

Kinesis Data Firehose

Firehose is the managed delivery service. You do not manage consumers or shards — Firehose buffers records, optionally transforms them with a Lambda function, and delivers them to a destination.

Delivery destinations:

Amazon S3
Amazon Redshift (via S3 staging)
Amazon OpenSearch Service
Splunk
Datadog, New Relic, MongoDB (via HTTP endpoint)

Producers
  |
  v
[Kinesis Firehose]
  |
  +-- Optional Lambda transform
  |      (filter, enrich, convert format)
  |
  +-- Buffer (by size or time)
  |      e.g., 5 MB or 60 seconds
  |
  v
Destination: S3 / Redshift / OpenSearch / Splunk

Firehose buffers records until either a size threshold or a time threshold is met, then flushes the batch to the destination. Minimum buffer time is 60 seconds. This means Firehose is near-real-time, not real-time — if your use case requires sub-minute latency, use KDS with a custom consumer.

Firehose supports automatic format conversion: it can convert JSON to Parquet or ORC before writing to S3 using a schema from the Glue Data Catalogue. This eliminates the need for a separate conversion job.

When to use Firehose instead of KDS: When you need simple delivery to S3, Redshift, or OpenSearch without writing consumer code. Firehose handles scaling, retries, and format conversion. When you need custom processing logic or sub-minute latency, use KDS.

Kinesis Data Analytics

Kinesis Data Analytics (KDA) lets you process data streams using SQL or Apache Flink without managing infrastructure.

KDA for SQL (classic): Write standard SQL queries against an input stream. Use time-based windows (tumbling, sliding, session) to aggregate records over intervals. The output goes to another Kinesis stream or Firehose.

KDA for Apache Flink (Studio): A managed Apache Flink environment where you write Flink applications in Java, Python, or Scala. More powerful than SQL — supports complex stateful processing, custom windowing, joins across streams, and exactly-once semantics.

Example use case: A payment processor wants to flag any merchant whose transaction volume spikes more than 3x their 10-minute average. KDA for SQL:

CREATE OR REPLACE STREAM "ANOMALY_STREAM" (
    merchant_id VARCHAR(32),
    tx_count    INTEGER,
    avg_count   DOUBLE
);

CREATE OR REPLACE PUMP "ANOMALY_PUMP" AS
INSERT INTO "ANOMALY_STREAM"
SELECT STREAM
    merchant_id,
    COUNT(*) AS tx_count,
    AVG(COUNT(*)) OVER (ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS avg_count
FROM "SOURCE_STREAM"
GROUP BY merchant_id, STEP("SOURCE_STREAM".ROWTIME BY INTERVAL '1' MINUTE)
HAVING COUNT(*) > AVG(COUNT(*)) OVER (...) * 3;

Kinesis Video Streams

Kinesis Video Streams (KVS) is for ingesting, storing, and processing video, audio, and time-serialised data from cameras and IoT devices. It handles the protocol complexity of device connections and provides APIs for real-time streaming and on-demand playback.

Use cases:

Smart camera feeds for computer vision pipelines
Video security and surveillance with ML inference
Telematics video from connected vehicles
Live video streaming to applications

KVS integrates with Amazon Rekognition Video for real-time object and face detection, and with SageMaker for custom ML inference on video frames.

Choosing the Right Kinesis Service

Need to process data in real time with custom code?
  └── Kinesis Data Streams

Need to deliver data to S3, Redshift, or OpenSearch without writing consumers?
  └── Kinesis Data Firehose

Need to run SQL or Flink queries against a stream?
  └── Kinesis Data Analytics

Ingesting video or audio from cameras or devices?
  └── Kinesis Video Streams

Common combination: IoT devices → KDS → (KDA for real-time anomaly detection AND Firehose for S3 archival). Both KDA and Firehose read from the same KDS stream independently.

Real-World Scenario: E-Commerce Real-Time Dashboard

An e-commerce platform needs a live dashboard showing orders per minute, revenue in the last 5 minutes, and alerts when the error rate on checkout exceeds 2%.

Architecture:

Application servers → KDS (10 shards, ~2,000 events/second at peak)
KDA for SQL → aggregates events in 1-minute tumbling windows → outputs to a second KDS stream
Lambda consumer reads the aggregated stream → writes to DynamoDB → dashboard reads DynamoDB
Firehose reads from the same source KDS → writes Parquet to S3 → Athena queries for historical analysis

This separates the real-time path (KDA + Lambda + DynamoDB) from the batch analytics path (Firehose + S3 + Athena), both reading from the same source stream.

Interview Notes

Q: How do you calculate the number of shards you need? Divide the peak ingestion rate by 1,000 records/second (or 1 MB/second, whichever is the binding constraint). Add headroom — typically 20-25% — for unexpected traffic spikes. If you have 5,000 records/second at peak, you need at least 5 shards, so 6-7 with headroom.

Q: What is the difference between Kinesis and SQS? SQS is a message queue — one consumer reads a message, it is deleted. Kinesis is a stream — multiple consumers can read the same record independently, and records are retained for hours or days. SQS is for task distribution; Kinesis is for stream processing with multiple downstream consumers.

Q: What happens when a shard is hot (one shard receives disproportionate traffic)? The shard hits its 1 MB/second or 1,000 records/second limit and producers receive ProvisionedThroughputExceededException errors. Fix this by choosing a better partition key that distributes records more evenly across shards, or by adding a random suffix to the partition key.

Q: What is enhanced fan-out and when do you need it? Standard KDS consumers share 2 MB/second read throughput per shard across all consumers. Enhanced fan-out gives each registered consumer its own dedicated 2 MB/second per shard via HTTP/2 push. Use it when you have three or more independent consumers reading the same stream and standard throughput is insufficient.