Skip to main content

Overview

The Type Inference transform automatically detects and casts data types from string values. When data arrives as strings — common with CSV files, flat file sources, or text-based APIs — this transform analyzes each value and converts it to the appropriate native type (integer, float, boolean, date, or JSON).

When to Use

  • After CSV or flat file sources where all columns arrive as strings
  • After REST API sources that return numeric values as quoted strings
  • Before Aggregation or Window Function nodes that require properly typed numeric columns
  • When migrating from legacy systems that store everything as varchar

Type Detection Order

Values are tested in this priority order (first match wins):
PriorityTypeExamples
1Booleantrue, false, TRUE, FALSE
2Nullnull, nil, "" (empty string)
3Integer42, -100, 0
4Float3.14, -0.5, 1.23e4
5Date/Time2024-01-15, 2024-01-15T10:30:00Z
6JSON{"key": "value"}, [1, 2, 3]
7StringEverything else (fallback)

Configuration

FieldDescriptionDefault
ColumnsSpecific columns to infer types for (empty = all columns)All columns
Date FormatsCustom date format patterns for non-standard formatsStandard formats

Built-in Date Formats

The following date formats are recognized automatically:
  • 2006-01-02 (ISO date)
  • 2006-01-02T15:04:05Z (ISO datetime / RFC 3339)
  • 2006-01-02 15:04:05 (datetime with space)
  • 01/02/2006 (US format)
  • 02-01-2006 (EU format)
Add custom patterns in the Date Formats field for non-standard sources.

Example

Input (all strings from CSV)

idpriceactivecreated
”1""29.99""true""2024-01-15"
"2""45""false""2024-01-16”

Output (typed values)

id (int64)price (float64)active (bool)created (timestamp)
129.99true2024-01-15T00:00:00Z
245.0false2024-01-16T00:00:00Z

Pipeline Patterns

CSV Processing

CSV Source → Type Inference → Validation → Warehouse Write

Pre-Aggregation Typing

Source → Type Inference → Aggregate (numeric columns now work) → Dashboard

JSON Parsing Chain

Source → JSON Parser → Type Inference → Flatten → Destination

Legacy System Migration

Legacy DB (varchar) → Type Inference → Modern DB (proper types)

Troubleshooting

Wrong Type Detected

Problem: "123.0" detected as a string instead of float. Solution: Check for trailing whitespace. Add a Cleanse or trim step upstream.

Date Not Parsed

Problem: "15/01/2024" not recognized as a date. Solution: Add the custom format 02/01/2006 to the Date Formats configuration.

Integer Overflow

Problem: Very large numbers fail to parse. Solution: Values are parsed as int64 (max ~9.2 quintillion). Values exceeding this range fall through to float or string.

Tips

  • Apply early in the pipeline — before transforms that require specific types (aggregation, calculations, window functions)
  • Specify columns for large datasets to avoid unnecessary type detection overhead on columns you know are strings
  • Empty strings become null — plan for this in downstream logic
  • Combine with Validation — after type inference, validate that inferred types match your expected schema

Column Transforms

Date formatting, windowing, and schema mapping

Parsers

Parse raw CSV/JSON before type inference

Data Quality

Validate types and values after inference

Schema Mapping

Explicitly map and rename columns after typing