Cloud  /  Google Cloud

GCP Google Cloud Platform 25 guides · updated 2026

Guides to BigQuery, Vertex AI, GKE, Dataflow, and the rest of Google's data- and AI-first cloud — written for engineers shipping real workloads.

Google Cloud Bigtable: Wide-Column NoSQL for Time Series, IoT, and High-Throughput Reads

Bigtable is the database Google has run since 2004 — it stores data for Search indexing, Maps, Gmail, and YouTube. If your application needs to write hundreds of thousands of rows per second, maintain millisecond read latency at petabyte scale, and handle time-series or sparse data efficiently, Bigtable is the right tool. It is not a general-purpose database and should not be used as one.


The Data Model: Rows, Column Families, and Cells

Bigtable’s data model is deceptively simple but has important implications for performance:

Table: sensor_readings
Row key │ column family: data
│ ┌────────────┬─────────────┬────────────┐
│ │ data:temp │ data:humidity│ data:status│
──────────────────────────┼──┼────────────┼─────────────┼────────────┤
sensor_A#2025-03-15T10:00 │ │ 24.5 │ 62.1 │ OK │
sensor_A#2025-03-15T10:01 │ │ 24.7 │ 62.3 │ OK │
sensor_B#2025-03-15T10:00 │ │ 31.2 │ 48.7 │ WARN │

Row key: The only index. Rows are sorted lexicographically by row key. All “reads in a range” are scans of contiguous row keys. The row key is the most important design decision.

Column families: Groups of columns defined at table creation time. Column qualifiers within a family can be anything — you do not need to predefine which columns exist. A row can have data in some qualifiers but not others (sparse).

Cells: The intersection of a row and a column qualifier. Each cell can store multiple versions, each with a different timestamp. By default Bigtable keeps the most recent, but you can configure retention rules per column family.


Row Key Design: The Most Critical Decision

Because rows are sorted by key and there is no secondary index, every access pattern must be served by a scan of the row key space. Bad row key design leads to hotspots — one tablet server receives all writes while others sit idle.

Common patterns:

Time series — reverse timestamp

Naive key: sensor_A#2025-03-15T10:00 ← chronological order
All new writes go to the end of the table
─► tablet server at the end gets all writes (hotspot)
Better key: sensor_A#<reversed_timestamp>
Reversed timestamp = MAX_LONG - unix_timestamp_millis
Newest data sorts to the front of the row space
Reads for recent data are fast (start of table)
Writes are distributed (new keys go at front, then different timestamp)

Distributed write with hash prefix

Without prefix:
user_A_2025-03-01 ← sequential user IDs → sequential keys → hotspot
user_B_2025-03-01
user_C_2025-03-01
With salted prefix:
3_user_A_2025-03-01 ← hash(user_A) % 4 = 3
1_user_B_2025-03-01 ← hash(user_B) % 4 = 1
2_user_C_2025-03-01 ← distributes writes across tablets

The trade-off with hashing is that range scans no longer work. You must scan each bucket separately.


Column Families and Garbage Collection

Column families are created at the DDL level. Each family can have its own garbage collection policy:

Terminal window
# Create a table with two column families
cbt -instance=my-instance createtable telemetry
cbt -instance=my-instance createfamily telemetry readings
cbt -instance=my-instance createfamily telemetry metadata
# Set garbage collection: keep only the 10 most recent versions in readings
cbt -instance=my-instance setgcpolicy telemetry readings maxversions=10
# Keep readings data for no more than 90 days
cbt -instance=my-instance setgcpolicy telemetry readings maxage=90d

Garbage collection runs in the background. Data past the policy threshold is not immediately deleted — it becomes invisible to reads and is removed during compaction.


App Profiles: Routing and Replication

App profiles control how client traffic routes to Bigtable clusters. This matters for multi-cluster (replicated) Bigtable instances.

Bigtable Instance
├── Cluster 1: us-east1 (primary, write cluster)
└── Cluster 2: us-west1 (replica)
App profile: "backend-writes"
Routing: Single cluster (us-east1)
Reason: Strong consistency for writes
App profile: "analytics-reads"
Routing: Multi-cluster (automatic load balancing)
Reason: Reads can go to either cluster, lower latency
Consistency: Eventual (reads may lag slightly behind writes)

This allows a single Bigtable instance to serve latency-sensitive transactional writes with strong consistency while simultaneously serving analytics reads with eventual consistency and load balancing.


Writing and Reading with Python

from google.cloud.bigtable import Client
from google.cloud.bigtable.row_filters import ColumnRangeFilter, CellsRowLimitFilter
client = Client(project="my-project")
instance = client.instance("my-instance")
table = instance.table("telemetry")
# Write a row
row_key = f"sensor_A#{2**63 - 1716800000}".encode() # reversed timestamp
row = table.direct_row(row_key)
row.set_cell(
column_family_id="readings",
column="temperature",
value=b"24.7",
)
row.set_cell(
column_family_id="readings",
column="humidity",
value=b"62.1",
)
row.commit()
# Read a single row
row = table.read_row(row_key)
if row:
temp = row.cells["readings"]["temperature".encode()][0].value
print(f"Temperature: {temp.decode()}")
# Scan a range (get last 100 rows for sensor_A)
prefix_start = b"sensor_A#"
prefix_end = b"sensor_A$" # $ sorts after # in ASCII
rows = table.read_rows(
start_key=prefix_start,
end_key=prefix_end,
limit=100,
filter_=CellsRowLimitFilter(1), # only latest version per cell
)
for row in rows:
print(row.row_key)

Bigtable vs HBase Compatibility

Bigtable implements the HBase API, which means existing HBase applications can run against Cloud Bigtable with minimal code changes. The Java HBase client works against Bigtable by replacing the HBase connection configuration.

// HBase connection to Cloud Bigtable
Configuration config = BigtableConfiguration.configure(
"my-project",
"my-instance"
);
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf("telemetry"));

This compatibility is intentional — Bigtable allows organizations to migrate on-premises HBase workloads to a fully managed cloud service without rewriting application code.


Sizing and Performance

Bigtable performance scales linearly with the number of nodes:

Per node (approximate):
10,000 rows/second (small rows, simple lookups)
10 MB/s read throughput
10 MB/s write throughput
Starting point guidelines:
Development: 1 node, SSD
Moderate production: 3 nodes minimum (minimum for HA)
High throughput: size based on throughput target
e.g., 100,000 writes/second → ~10 nodes minimum

Nodes can be added or removed without downtime. Bigtable automatically rebalances tablet distribution across the new node count. There is a delay (typically 20 minutes to several hours for large datasets) before full performance improvement is realized as data rebalances.

Storage is separate from nodes — adding nodes does not add storage. Bigtable charges per node-hour and per GB stored.


Real-World Use Case: Fleet Telemetry

A logistics company tracks 50,000 vehicles. Each vehicle sends GPS location, speed, fuel level, and engine diagnostics every 10 seconds.

Write rate: 50,000 vehicles × 6 datapoints × every 10 seconds
= 30,000 data points / second
= needs roughly 3-4 Bigtable nodes (SSD)
Row key design: {vehicle_id}#{reversed_timestamp_ms}
Reverse timestamp ensures latest data sorts first
Prefix scan on {vehicle_id}# retrieves recent data for that vehicle
Column family: "telemetry"
Columns: lat, lon, speed, fuel, engine_code
Garbage collection: keep 30 days of data (raw)
Aggregated summaries exported nightly to BigQuery for historical analysis

Querying the last 5 minutes for vehicle VH-4821:

start_key = f"VH-4821#{2**63 - int(time.time()*1000)}".encode()
end_key = f"VH-4821#{2**63 - int((time.time()-300)*1000)}".encode()
rows = table.read_rows(start_key=start_key, end_key=end_key)

Summary

Bigtable excels at what it was designed for: high-throughput, low-latency, large-scale storage for time-series, IoT telemetry, financial tick data, and user activity logs. The row key is not just a lookup identifier — it is the entire access pattern. Design it wrong and you get hotspots. Design it right and Bigtable scales linearly to millions of operations per second. Column families provide schema flexibility without schema rigidity. HBase compatibility eases migration of existing workloads. The key discipline is matching Bigtable’s strengths — fast single-key lookups and range scans — to your access patterns, and using BigQuery or a separate analytics layer for complex multi-dimensional queries.