Inside TimescaleDB's Columnar Compression Pipeline
How the hypercore engine transforms row-based PostgreSQL chunks into highly compressed columnar batches for time-series efficiency.
Relational databases are historically built for transactional workloads where rows are written, updated, and read individually. But when a system is logging millions of sensor metrics, financial ticks, or application events per second, standard row-oriented storage quickly becomes a bottleneck. Storing every timestamp and float as an independent row is a fast path to exhausting your storage budget.
To solve this, TimescaleDB introduces a hybrid engine called "hypercore" that runs inside PostgreSQL. By transitioning older data from row-oriented chunks to a compressed columnar format, it can achieve compression ratios of up to 98% for typical time-series workloads.
Understanding how this pipeline works requires looking under the hood at how TimescaleDB bypasses traditional PostgreSQL storage limits and applies specialized compression algorithms.
Why TOAST Falls Short for Time-Series
PostgreSQL has a native compression and storage mechanism called TOAST (The Oversized-Attribute Storage Technique). While TOAST is highly effective for its intended use case, it is fundamentally unsuited for time-series optimization.
TOAST is designed to handle individual large values—such as long text fields, jsonb blobs, or bytea arrays—that exceed the standard PostgreSQL page size (typically 8 kB). When a row exceeds the TOAST_TUPLE_THRESHOLD (around 2 KB), PostgreSQL compresses or splits those specific large values using general-purpose algorithms like pglz or lz4 (available since PostgreSQL 14).
However, TOAST operates on a per-value basis. It treats values as opaque byte streams and has no awareness of cross-row patterns. Because of this, TOAST cannot compress individual small, fixed-length fields like timestamps or floats. For a typical IoT workload, TOAST yields a 1.0× compression ratio (no compression at all) on these columns.
In contrast, TimescaleDB's hypercore engine targets cross-row patterns. Instead of compressing a single large field within one row, it groups data across up to 1000 rows and compresses them together. This allows the engine to exploit the mathematical structure, monotonicity, and repetitive nature of time-series data.
| Feature | TOAST (Vanilla PostgreSQL) | TimescaleDB Hypercore |
|---|---|---|
| Design Goal | Individual values > 2 KB | Cross-row patterns in time-series |
| Trigger | Row exceeds threshold (~2 KB) | Per-chunk policy (e.g., older than 7 days) |
| Supported Types | Variable-length only (text, jsonb, etc.) |
All data types |
| Algorithms | pglz, lz4 |
Delta, Delta-of-Delta, Simple-8b, RLE, Gorilla XOR, Dictionary |
| Granularity | Per value (1 value = 1 byte stream) | Per batch (~1000 rows together) |
| Typical Float Ratio | ~1.0× | 10–20× |
| Typical Timestamp Ratio | ~1.0× | 50–100× (for regular intervals) |
| Typical Text Ratio | 2–3× | 5–10× (via dictionary + RLE) |
The Hypercore Engine: Row to Columnar Transition
TimescaleDB uses a hybrid row-columnar architecture to balance write performance with storage efficiency.
When new data is written to TimescaleDB, it lands in standard, row-based PostgreSQL chunks. This ensures that INSERT and UPDATE operations remain fast and lightweight, as row-oriented storage is highly optimized for write-heavy transactional workloads.
Once a chunk reaches a certain age—defined by a developer-configured compression policy (for example, chunks older than 7 days)—the hypercore engine automatically converts it into a compressed, columnar format.
During this conversion, the engine groups up to 1000 rows into a single "batch." Each batch is stored as a single row in the compressed table, where the individual columns are represented as compressed arrays. This column-major format inside the batch colocates values of the same column. When an analytical query runs, it only needs to fetch and decompress the specific columns requested by the query, rather than scanning entire rows, significantly reducing I/O overhead.
The Compression Toolkit
Once data is grouped into columnar arrays within a batch, the hypercore engine applies specialized algorithms based on the data type of each column.
Delta and Delta-of-Delta Encoding
For numeric values and timestamps that change incrementally, storing absolute values is highly redundant. Delta encoding solves this by storing only the difference between a value and the one preceding it.
If the data points occur at highly regular intervals (such as a sensor reporting exactly every 5 seconds), TimescaleDB applies delta-of-delta encoding. If the time difference between consecutive points is constant, the delta-of-delta is 0. Storing a sequence of zeros requires only a fraction of a bit per value, which explains why timestamp columns can reach compression ratios of 50–100×.
Run-Length Encoding (RLE)
Time-series datasets frequently contain columns with highly repetitive metadata, such as device IDs, sensor types, or status codes.
Instead of writing "MACHINE_001" across hundreds of consecutive rows, Run-Length Encoding (RLE) stores the value once alongside the number of times it repeats consecutively. For repetitive text or low-cardinality status columns, combining RLE with dictionary compression improves text compression ratios to 5–10×, compared to the 2–3× typical of general-purpose LZ compression.
Gorilla XOR and Simple-8b
For floating-point metrics (like temperature or speed) that fluctuate slightly, TimescaleDB uses Gorilla XOR compression. This algorithm exploits the fact that consecutive floating-point values often share similar bit patterns, storing only the XOR-ed difference of the floating-point representations. For integers, algorithms like Simple-8b pack multiple small integer values into a single 64-bit word, maximizing bit-packing efficiency.
Developer Tradeoffs
While a 98% reduction in storage footprint is highly compelling, developers must design their schemas and access patterns around the hybrid nature of this architecture.
Because compressed chunks are stored as highly optimized, read-only columnar arrays inside a single PostgreSQL row, they are not designed for frequent modifications. While uncompressed chunks support rapid INSERT and UPDATE operations, modifying data inside a compressed chunk requires decompressing the batch, applying the change, and recompressing it.
For this reason, compression policies should be configured to target chunks that have transitioned out of their active write cycle, ensuring that analytical queries reap the benefits of high-speed columnar scans without impacting transactional write performance.
Sources & further reading
- How TimescaleDB compresses time-series data — roszigit.com
Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.
Discussion 3
i love how timescale's hypercore engine can squash storage needs by up to 98%, that's a total game changer for our sensor logging use case, we're definitely looking into integrating this
@shipfast_marco yeah that 98% compression ratio is insane, i'm curious to see how it handles our own high-volume logging, do you think the columnar format will also speed up our query times?
i've been playing with timescaledb in my neovim dev env and the hypercore engine is a total game changer, just implemented it in a side project and the compression ratios are insane, 95% reduction in storage usage so far