Skip to main content

Overview

Streaming mode enables continuous data ingestion by committing data at regular intervals instead of waiting for the entire pipeline run to complete.

Configuration

Enable streaming mode in the Managed Lakehouse destination node:
SettingDefaultDescription
Streaming ModeOffEnable micro-batch commits
Commit Interval60 secondsMaximum time between commits
Row Limit100,000Maximum rows before forced commit
Whichever threshold is reached first triggers the commit.

Backpressure

When storage upload latency increases, the system automatically reduces batch sizes to prevent memory buildup:
Avg Commit LatencyBatch SizeAction
< 2 seconds10,000 (default)Normal operation
2–5 seconds5,000Moderate backpressure
> 5 seconds2,000Heavy backpressure, warning logged
The engine queries the destination’s GetBackpressureProfile() to adjust ArrowTuning accordingly.

Metrics

Monitor streaming throughput in the pipeline run summary:
{
  "streamingMode": true,
  "totalCommits": 42,
  "totalRows": 4200000,
  "pendingRows": 15000,
  "avgCommitLatency": "850.3ms"
}

Tier Limits

ProfessionalPremiumEnterprise
Streaming tables210Unlimited
Min commit interval5 min1 min10 sec

Best Practices

Start with 60s intervals

Begin with the default 60-second interval and adjust based on observed latency.

Monitor backpressure

If you see batch size reductions in logs, your storage may need scaling.

Use with compaction

Streaming creates many small files. Enable compaction via table maintenance.

Arrow-native sources

For maximum throughput, use Arrow-native sources (Kafka, Parquet) with streaming mode.