Skip to main content
Data cleaning rules normalize and enrich raw values during ingestion — trimming whitespace, standardizing case, extracting values with patterns, and enriching attributes (for example color taxonomy) before data lands in the silver layer.

Rule types

Cleaning operations come in a few families:
  • Standard — direct text transforms.
  • Regex boolean — derive a true/false from a pattern match.
  • Cross-table — enrich using another table.
  • Color taxonomy enrichment — map raw colors to the taxonomy.
Typical text tools include regex replace / extract / match / conditional replace, capitalize words, trim, normalize whitespace, uppercase, lowercase, and take-while-pattern.

Working with data cleaning

1

Open data cleaning

Go to Data platform → Configuration → Data cleaning.
2

Create a rule

Choose the operation type and tool, configure the pattern/parameters, and target a column.
3

Validate it safely

Test the effect on a file in the sandbox before relying on it in production ingestion.