Promotion shape
| Field | Meaning |
|---|---|
dataset_id | Logical dataset name. |
domain | Logical domain (e.g. catalog, sales, inventory). |
conformed_table | The target silver/gold table. |
source_table / source_tables | The source table(s) this promotion reads. |
merge_keys | Columns that identify a row for upsert. |
update_columns | On upsert, which columns to update (null/omitted = all). |
steps | The ordered pipeline (below). |
Write behavior
Promotions write with one of three modes — merge (upsert onmerge_keys), append
(insert only), or overwrite partition (replace a partition). In practice, a promotion
with merge_keys upserts on those keys; update_columns narrows which columns are
written on a match (e.g. only enrich postal_code without touching the rest of the row).
Ordering
Steps run top to bottom. Each step sees the output of the previous one, so order matters — e.g. add/rename columns beforegenerate_id, and run an id_mapping_output
only after the id exists.
Step catalog
Every step is an object with atype. Below, grouped by purpose.
Shaping
coerce_types — cast columns to SQL types
coerce_types — cast columns to SQL types
filter — keep rows matching a condition
filter — keep rows matching a condition
condition is a SQL WHERE expression (without the WHERE keyword).deduplicate — remove duplicate rows
deduplicate — remove duplicate rows
strategy: drop (default), keep_first, or keep_last. order_by / order_desc
decide which row survives for keep_first/keep_last.add_column — add a computed column
add_column — add a computed column
expression is any SQL scalar expression (literals, functions, CASE, references to
other columns).rename_columns — rename columns
rename_columns — rename columns
source → target names.select_columns — keep only these columns
select_columns — keep only these columns
drop_columns — remove these columns
drop_columns — remove these columns
inject_value — inject a runtime value
inject_value — inject a runtime value
organization_id, created_by) as a
literal column.Joining
join — join another dataset
join — join another dataset
join_type: left (default), inner, outer, right. select_columns picks (and
optionally aliases) columns from the joined dataset.sequential_join — flexible join with fallback & expressions
sequential_join — flexible join with fallback & expressions
join: a fallback_join_column (COALESCE-style secondary
key) and select_expressions (custom SQL expressions selected from the target).aggregation_join — group + aggregate, then join
aggregation_join — group + aggregate, then join
function ∈ avg, max, min, sum, count, first,
then joins the result on join_column.Reshaping
unpivot — wide to long
unpivot — wide to long
{ id, col1, col2, … } into { id, variable, value } rows.
variable_column/value_column default to variable/value.Identity & taxonomy
generate_id — deterministic UUID from key columns
generate_id — deterministic UUID from key columns
key_columns — the same key values produce the
same id across runs (idempotent), which is what makes re-ingestion safe. An optional
namespace scopes the derivation (UUID v5). Defaults: output_column = id,
key_columns = all non-null columns.id_mapping_output — persist a code → id mapping
id_mapping_output — persist a code → id mapping
code_column → id_column under mapping_type, so later
promotions (in this or another spec) can resolve foreign keys against it.id_mapping_join — resolve an FK via a persisted mapping
id_mapping_join — resolve an FK via a persisted mapping
source_column up in the mapping created by an earlier id_mapping_output and
writes the resolved id into output_column.taxonomy_mapping — resolve a label to a taxonomy id
taxonomy_mapping — resolve a label to a taxonomy id
source_id_column
optionally provides the POS reference for disambiguation.Sub-option reference
| Enum | Values |
|---|---|
| Deduplicate strategy | drop, keep_first, keep_last |
| Join type | left, inner, outer, right |
| Aggregate function | avg, max, min, sum, count, first |
| Write mode | merge, append, overwrite_partition |
id_mapping_output → id_mapping_join pair: one
promotion publishes the mapping, later promotions consume it. See worked
examples.
