Overview
The Managed Lakehouse destination works best when data is cleaned, validated, and shaped before it lands. This guide covers the recommended transform pipeline and how contract enforcement gates data quality at the write boundary.Recommended Transforms
1. Filter
Remove rows that should never reach the lakehouse — test records, internal traffic, incomplete events.2. Cleanse
Standardize formats, trim whitespace, normalize casing, and coerce types before writing to Parquet.| Transform | Example |
|---|---|
| Trim whitespace | " John " → "John" |
| Normalize email | "USER@Example.COM" → "user@example.com" |
| Coerce types | "123" → 123 (string to integer) |
| Default nulls | NULL → "unknown" for required string fields |
3. Deduplicate
Remove duplicate records within a batch using a deduplication key:4. PII Detection
Identify and mask personally identifiable information before landing in the lakehouse:- Hash email addresses and phone numbers
- Redact free-text fields containing detected PII
- Tag columns with sensitivity classifications
5. Contract Validation
Assign a data contract to the Managed Lakehouse node to validate every record before writing. See the contract enforcement guide for full details.Contract Enforcement at the Write Boundary
When a contract is assigned to a Managed Lakehouse destination:| Mode | Valid Records | Invalid Records |
|---|---|---|
| Warn | Written to lakehouse | Written to lakehouse + violations logged |
| Block | Written to lakehouse | Routed to Dead Letter Queue |
What’s Validated
- Schema: Column names, types, nullability
- Value rules: Range checks, enum membership, regex patterns
- Freshness: Timestamp recency constraints
Violation Tracking
Violations appear in two places:- Data Contracts page — full violation history with sample data
- Pipeline run summary — violation count badge on the destination node
block mode, rejected records appear in the DLQ page with error code CONTRACT_VIOLATION, including the original payload and the specific rule that failed.
DLQ Routing for Failed Records
When contract enforcement is inblock mode:
- The destination splits each batch into valid and invalid records
- Valid records are committed to the lakehouse
- Invalid records are written to the DLQ with:
- Original record payload
- Contract ID and version
- Failed rule name and expected vs actual values
- Pipeline run ID for traceability
Pipeline Templates
A prebuilt pipeline template is available in the Template Library: Source → Contract → Lakehouse This template provides:- Pre-configured source extraction
- Data contract validation node
- Managed Lakehouse destination with recommended settings
- Automatic partition strategy via the Partition Advisor
Ordering Transforms
Place transforms in this order for optimal results:| Order | Transform | Why |
|---|---|---|
| 1 | Filter | Reduce volume early |
| 2 | Cleanse / type coercion | Normalize before validation |
| 3 | Deduplicate | Remove duplicates before contract check |
| 4 | PII detection / masking | Protect sensitive data before storage |
| 5 | Contract validation | Final quality gate |
| 6 | Z-order sort | Applied automatically at the destination |
Related
Contract Enforcement
Full guide to contract configuration and partition advisor
Dead Letter Queue
Triage and reprocess contract-failed records
Z-Order Sort
Multi-dimensional sort applied at the destination
Data Quality Nodes
General data quality transforms