Managed Lakehouse is available on the Professional plan and above. Upgrade →
Architecture
Key design decisions:- Single write, dual commit — Parquet files are uploaded once. Iceberg and Delta metadata are committed separately, eliminating data duplication.
- Iceberg-primary — Iceberg is the transactional source of truth. If the Iceberg commit succeeds but Delta fails, the pipeline retries Delta once and logs a warning without failing the run.
- Catalog-backed — Iceberg tables are registered in a catalog (AWS Glue Data Catalog or REST Catalog) for schema governance, time travel, and partition pruning.
Supported cloud providers
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
| Field | Description |
|---|---|
| Credential | AWS credential with s3:PutObject, s3:GetObject, s3:DeleteObject, s3:ListBucket on the target bucket |
| Bucket | S3 bucket name (e.g., my-data-lake) |
| Prefix | Path prefix for data files (e.g., lakehouse/) |
| Region | AWS region (e.g., us-west-2) |
glue:GetDatabase,glue:GetDatabasesglue:GetTable,glue:GetTables,glue:CreateTable,glue:UpdateTable
Iceberg catalog configuration
- AWS Glue Catalog
- REST Catalog
| Field | Description |
|---|---|
| Catalog Type | glue |
| Region | AWS region of the Glue Data Catalog |
| Namespace | Glue database name (e.g., analytics) |
| Warehouse | S3 path for Iceberg data files (e.g., s3://my-bucket/lakehouse/) |
Write modes
- Append
- Overwrite
- Merge (Upsert)
Adds new Parquet files and commits a new snapshot to both Iceberg and Delta. Existing data is preserved.Best for: event streams, logs, incremental loads, and any workload where historical data should not be modified.
Table formats
You can enable one or both formats:| Format | Description | When to use |
|---|---|---|
| Apache Iceberg | Catalog-backed, ACID-transactional table format with time travel, schema evolution, and hidden partitioning | Primary format for analytics engines (Spark, Trino, Athena, Flink) |
| Delta Lake | File-based transaction log with column statistics | When you also need Databricks/DuckDB compatibility or dual-engine access |
Advanced settings
Partition strategy
Partitioning organizes data files by column values for faster queries. Supported partition transforms:| Transform | Syntax | Example |
|---|---|---|
| Identity | column_name | region |
| Year | year(column) | year(created_at) |
| Month | month(column) | month(created_at) |
| Day | day(column) | day(created_at) |
| Hour | hour(column) | hour(event_time) |
| Bucket | bucket(n, column) | bucket(16, user_id) |
| Truncate | truncate(w, column) | truncate(4, zip_code) |
Schema evolution
When enabled (default), the destination automatically adapts to upstream schema changes:First batch — schema inference
Column types are inferred from the data and registered in both the Iceberg catalog and Delta log.
New columns
If a new column appears in a later batch, it is added to the schema. Existing columns retain their original types.
Maintenance settings
| Setting | Default | Description |
|---|---|---|
| Snapshot Retention | 7 days | How long to keep expired Iceberg snapshots and Delta versions before cleanup |
| Compaction Target | 128 MB | Target file size for compaction operations — smaller values produce more files but faster writes |
Reading your tables
API reference
The Managed Lakehouse API provides endpoints for table management, commit history, and maintenance operations.List registered tables
Register a new table
View commit history
Trigger maintenance
snapshot_expiry, orphan_cleanup, compaction, metadata_cleanup, delta_checkpoint, full_maintenance.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| ”Iceberg catalog not found” | Glue database doesn’t exist or wrong region | Create the Glue database first; verify the region matches your bucket |
| ”io scheme not registered” | Missing S3/GCS I/O driver | This is an internal error — contact support |
| ”PermanentRedirect” on S3 | Region mismatch between bucket and Glue | Ensure the bucket region and Glue catalog region are the same |
| Delta commit failed, Iceberg succeeded | Transient storage error | The destination retries Delta once automatically. Check S3 permissions. |
| ”Access denied” on Glue | IAM credential lacks Glue permissions | Add glue:CreateTable, glue:UpdateTable, glue:GetTable to the IAM policy |
| Table visible in Iceberg but not Delta | Only Iceberg format enabled | Check the format selection — enable Delta in the node configuration |
| Slow writes | Many small batches | Increase pipeline batch size to 10,000+ rows |
Comparison with other destinations
| Feature | Managed Lakehouse | Delta Lake | Iceberg | Fabric / OneLake |
|---|---|---|---|---|
| Formats | Iceberg + Delta | Delta only | Iceberg only | Delta only |
| Cloud support | S3, GCS, Azure | S3, GCS, Azure | S3, GCS, Azure | OneLake only |
| Catalog | Glue, REST | None (file-based) | Glue, REST | Fabric |
| Write modes | Append, Overwrite, Merge | Append, Overwrite | Append, Overwrite, Merge | Append, Upsert |
| Time travel | Yes (Iceberg snapshots) | Yes (Delta versions) | Yes | Yes |
| Schema evolution | Yes | Yes | Yes | Yes |
| Dual-engine access | Yes | Delta engines only | Iceberg engines only | Fabric only |
| Tier | Professional+ | All tiers | Professional+ | All tiers |
Related topics
Delta Lake destination
Standalone Delta Lake destination for simpler single-format workflows.
Destination nodes
All destination node types including Write, Cloud Destination, and Iceberg.
Cloud storage
Configure S3, GCS, and Azure Blob connections used by the lakehouse.
Data contracts
Enforce schema and quality rules before data lands in your lakehouse.