Skip to main content

Overview

The Anomaly Detection transform identifies statistical outliers in numeric data columns. It supports multiple detection methods and configurable actions — from flagging anomalies for review to automatically removing or capping them. Available on Professional plans and above.

Detection Methods

MethodAlgorithmBest For
Z-ScoreMeasures standard deviations from the meanNormally distributed data
IQRUses Q1/Q3 interquartile range boundariesSkewed distributions, robust to extreme outliers
PercentileFlags values outside a percentile rangeGeneral-purpose without distribution assumptions
Modified Z-ScoreUses median absolute deviation (MAD)Data with extreme outliers that distort the mean

Method Selection Guide

Configuration

FieldDescriptionDefault
ColumnsNumeric columns to analyze for anomalies(required)
MethodStatistical detection algorithmZ-Score
ThresholdSensitivity level (method-dependent, see below)3.0
Window SizeNumber of recent rows for rolling statisticsAll rows

Threshold Interpretation

MethodThreshold MeaningTypical Values
Z-ScoreNumber of standard deviations2.0 (95%), 3.0 (99.7%)
IQRMultiplier of the interquartile range1.5 (standard), 3.0 (extreme only)
PercentileUpper and lower percentile bounds1.0 and 99.0
Modified Z-ScoreMAD-based deviation threshold3.5 (recommended)

Actions

ActionBehavior
FlagAdds a boolean column (_anomaly = true) to anomalous rows
RemoveDrops anomalous rows from the output
CapClamps anomalous values to the boundary thresholds
ReplaceSets anomalous values to NULL
Start with the Flag action to review detected anomalies before committing to removal or capping. Switch to a destructive action only after confirming the detection method and threshold match your data.

Examples

Financial Data Quality

Flag transactions with extreme amounts before loading to the warehouse:
Transactions Source → Anomaly Detection [amount, Z-Score, threshold=3.0, Flag] → Warehouse
Query flagged rows:
SELECT * FROM transactions WHERE amount_anomaly = true;

Sensor Data Cleaning

Cap extreme sensor readings to prevent dashboard spikes:
IoT Source → Anomaly Detection [temperature, IQR, threshold=1.5, Cap] → Time-Series DB

Rolling Window Detection

Detect anomalies based on recent trends rather than all-time statistics:
Metrics Source → Anomaly Detection [latency_ms, Z-Score, window=1000] → Alert

Pipeline Patterns

Review-then-Clean

Source → Anomaly Detection [Flag] → Filter (keep anomalies) → Review Table
Source → Anomaly Detection [Flag] → Filter (remove anomalies) → Production Table

Multi-Column Analysis

Apply detection to multiple numeric columns in one node:
Source → Anomaly Detection [price, quantity, discount — IQR, threshold=1.5] → Destination

Tips

  • Z-Score assumes normally distributed data — verify with Data Profiling first
  • IQR is more robust to extreme outliers than Z-Score because it uses quartiles rather than mean
  • Lower thresholds flag more anomalies (higher sensitivity, more false positives)
  • Window Size enables rolling statistics — useful for time-series data where the distribution shifts over time
  • Combine with Validation nodes for comprehensive quality: Anomaly Detection catches statistical outliers while Validation catches rule-based violations

Data Quality Nodes

All data quality transforms including validation and profiling

Data Contracts

Rule-based validation at the contract level

Observability

Monitor anomaly rates across pipeline runs

Column Transforms

Additional column-level operations