Z-Order Sort Transform

Overview

The Z-Order Sort transform reorders records by interleaving the bits of multiple column values, producing a space-filling curve that preserves multi-dimensional locality. The result: query engines can skip far more data files when filtering on any combination of the chosen columns.

When to Use

Use Z-Order Sort when your downstream queries frequently filter on two or more columns simultaneously:

SELECT * FROM events
WHERE region = 'US' AND event_date > '2026-01-01';

SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';

Without z-ordering, data is sorted by a single column — filters on other columns force full scans. Z-ordering distributes locality across all chosen dimensions.

Do not z-order by columns that are already partition keys. Partitioned columns are already isolated into separate directories and z-ordering them wastes CPU without improving pruning.

How It Works

Normalize — Each column value is converted to an 8-byte comparable representation (sign-bit flip for integers, IEEE 754 encoding for floats, epoch microseconds for timestamps, first 8 bytes for strings)
Interleave — Bits from all columns are woven together: A[0] B[0] C[0] A[1] B[1] C[1] ...
Sort — Records are sorted by the composite z-value
Write — Parquet files preserve the z-order, creating natural multi-dimensional clustering

Configuration

Z-Order Sort can be configured in two ways:

Option A: Standalone Transform Node

Drag the Z-Order Sort node from the Row Transforms section of the sidebar onto the canvas. Place it before any destination node.

Connect the upstream transform output to the Z-Order Sort node
In the config panel, enter the columns to z-order (e.g. region, event_date, user_id)
Connect the Z-Order Sort output to the destination node

This option works with any destination, not just Managed Lakehouse.

Option B: Managed Lakehouse Destination Setting

The Managed Lakehouse destination has a built-in sort strategy:

Open the Managed Lakehouse destination node settings
Scroll to Sort Strategy and select Z-Order
Enter the columns to z-order: region, event_date, user_id

Pipeline JSON

// Standalone node
{
  "type": "z_order_sort",
  "columns": ["region", "event_date", "user_id"]
}

// Or as a destination setting
{
  "managedLakehouseSettings": {
    "sortStrategy": "z_order",
    "sortColumns": ["region", "event_date", "user_id"]
  }
}

When using both a standalone Z-Order Sort node and the Managed Lakehouse sort strategy, the standalone node’s sort is applied first. The destination setting is redundant in this case — use one or the other.

Supported Column Types

Type	Normalization	Notes
`int`, `int32`, `int64`	Sign-bit flip	Preserves total order for signed integers
`float32`, `float64`	IEEE 754 total-order encoding	Handles NaN and negative zero correctly
`timestamp`, `date`	Microseconds since epoch	Timezone-normalized before encoding
`string`	First 8 bytes	Provides locality for short prefixes
`boolean`	0 or 1	Binary columns contribute 1 bit per level

Buffering Behavior

The Z-Order Sort is a buffering transform — it must see all records in the batch before producing sorted output. During a pipeline run:

Records accumulate in the transform’s internal buffer via Apply()
At flush time, GetRecords() returns all records sorted by z-value
The sorted batch is written to Parquet in one pass

For streaming mode, each micro-batch is z-ordered independently.

Column Selection Tips

2–4 columns is the sweet spot. More columns dilute the locality benefit of each individual column.
Timestamp columns are excellent candidates — they provide natural range-based locality.
High-cardinality columns (IDs, hashes) benefit from bucketing as a partition strategy instead.
Low-cardinality columns (enums, booleans) contribute fewer bits and should be supplemented with higher-cardinality columns.

Performance Impact

Z-ordering adds CPU cost during writes (normalization + sort) but significantly reduces I/O during reads:

Scenario	Without Z-Order	With Z-Order
2-column filter scan	~100% of files read	~15–30% of files read
3-column filter scan	~100% of files read	~10–25% of files read
Write overhead	Baseline	+5–15% CPU

Combine z-order with compaction for best results. Compaction merges small files and re-sorts across batches, achieving global z-order across the full table.

Z-Order Overview

Concept overview and when to use z-ordering

Column Statistics

Z-ordered data produces tighter min/max bounds per file

Table Maintenance

Compaction achieves global z-order across batches

Transform-Before-Land

Other transforms to apply before lakehouse writes

Z-Order Sort Column Statistics

​Overview

​When to Use

​How It Works

​Configuration

​Option A: Standalone Transform Node

​Option B: Managed Lakehouse Destination Setting

​Pipeline JSON

​Supported Column Types

​Buffering Behavior

​Column Selection Tips

​Performance Impact

​Related

Z-Order Overview

Column Statistics

Table Maintenance

Transform-Before-Land

Overview

When to Use

How It Works

Configuration

Option A: Standalone Transform Node

Option B: Managed Lakehouse Destination Setting

Pipeline JSON

Supported Column Types

Buffering Behavior

Column Selection Tips

Performance Impact

Related