Skip to main content
A promotion transforms parsed data and writes it to a target table. A spec can have several promotions (one per output dataset), and each runs an ordered list of steps.

Promotion shape

{
  "dataset_id": "brands",
  "domain": "catalog",
  "conformed_table": "brands",
  "source_table": "marque",
  "source_tables": ["marque"],
  "merge_keys": ["organization_id", "code"],
  "update_columns": ["name", "updated_at"],
  "steps": [ /* ordered transformation steps */ ]
}
FieldMeaning
dataset_idLogical dataset name.
domainLogical domain (e.g. catalog, sales, inventory).
conformed_tableThe target silver/gold table.
source_table / source_tablesThe source table(s) this promotion reads.
merge_keysColumns that identify a row for upsert.
update_columnsOn upsert, which columns to update (null/omitted = all).
stepsThe ordered pipeline (below).

Write behavior

Promotions write with one of three modes — merge (upsert on merge_keys), append (insert only), or overwrite partition (replace a partition). In practice, a promotion with merge_keys upserts on those keys; update_columns narrows which columns are written on a match (e.g. only enrich postal_code without touching the rest of the row).

Ordering

Steps run top to bottom. Each step sees the output of the previous one, so order matters — e.g. add/rename columns before generate_id, and run an id_mapping_output only after the id exists.

Step catalog

Every step is an object with a type. Below, grouped by purpose.

Shaping

{ "type": "coerce_types", "columns": { "quantity": "INT", "price": "DOUBLE", "date": "TIMESTAMP" } }
Casts each listed column to the given Spark SQL type.
{ "type": "filter", "condition": "quantity > 0 AND date IS NOT NULL" }
condition is a SQL WHERE expression (without the WHERE keyword).
{ "type": "deduplicate", "columns": ["no_ligmvt"], "strategy": "keep_last", "order_by": ["updated_at"], "order_desc": true }
strategy: drop (default), keep_first, or keep_last. order_by / order_desc decide which row survives for keep_first/keep_last.
{ "type": "add_column", "column": "created_at", "expression": "current_timestamp()" }
expression is any SQL scalar expression (literals, functions, CASE, references to other columns).
{ "type": "rename_columns", "columns": { "no_marque": "code", "nom": "name" } }
Maps source → target names.
{ "type": "select_columns", "columns": ["id", "code", "name"] }
Drops everything not listed.
{ "type": "drop_columns", "columns": ["_tmp", "_extra"] }
{ "type": "inject_value", "column": "organization_id", "value_key": "organization_id" }
Writes a value from the runtime context (e.g. organization_id, created_by) as a literal column.

Joining

{
  "type": "join",
  "target_dataset": "shops",
  "source_column": "shop_code",
  "target_column": "code",
  "join_type": "left",
  "select_columns": [ { "source": "id", "alias": "shop_id" }, { "source": "name" } ],
  "columns_to_drop": ["_tmp"],
  "deduplicate_on": ["code"]
}
join_type: left (default), inner, outer, right. select_columns picks (and optionally aliases) columns from the joined dataset.
{
  "type": "sequential_join",
  "target_dataset": "libtaille",
  "join_column": "no_libtaille",
  "target_join_column": null,
  "fallback_join_column": null,
  "select_columns": [],
  "select_expressions": { "_size_label": "libelle" },
  "deduplicate_on": ["no_libtaille"],
  "join_type": "left"
}
Adds two capabilities over join: a fallback_join_column (COALESCE-style secondary key) and select_expressions (custom SQL expressions selected from the target).
{
  "type": "aggregation_join",
  "target_dataset": "detcde",
  "group_by_columns": ["no_ligcde"],
  "aggregate_columns": [
    { "source_column": "pa", "function": "avg", "alias": "_detcde_pa", "cast_type": "double" }
  ],
  "join_column": "no_ligcde",
  "join_type": "left",
  "order_by": null,
  "order_desc": false
}
Aggregates the target with functionavg, max, min, sum, count, first, then joins the result on join_column.

Reshaping

{
  "type": "unpivot",
  "id_columns": ["product_id"],
  "value_columns": ["jan", "feb", "mar"],
  "variable_column": "month",
  "value_column": "value"
}
Turns { id, col1, col2, … } into { id, variable, value } rows. variable_column/value_column default to variable/value.

Identity & taxonomy

{ "type": "generate_id", "output_column": "id", "key_columns": ["code"], "namespace": null }
Derives a deterministic id from key_columns — the same key values produce the same id across runs (idempotent), which is what makes re-ingestion safe. An optional namespace scopes the derivation (UUID v5). Defaults: output_column = id, key_columns = all non-null columns.
{ "type": "id_mapping_output", "mapping_type": "brands", "code_column": "code", "id_column": "id", "extra_columns": [] }
Persists a mapping of code_column → id_column under mapping_type, so later promotions (in this or another spec) can resolve foreign keys against it.
{ "type": "id_mapping_join", "mapping_type": "brands", "source_column": "brand_code", "output_column": "brand_id", "extra_output_columns": ["shop_id"], "deduplicate": true }
Looks source_column up in the mapping created by an earlier id_mapping_output and writes the resolved id into output_column.
{ "type": "taxonomy_mapping", "taxonomy_type": "size", "source_column": "_size_label", "source_id_column": "no_libtaille", "output_column": "taxonomy_id" }
Resolves a text label (e.g. a size or color) to a Solya taxonomy id. source_id_column optionally provides the POS reference for disambiguation.

Sub-option reference

EnumValues
Deduplicate strategydrop, keep_first, keep_last
Join typeleft, inner, outer, right
Aggregate functionavg, max, min, sum, count, first
Write modemerge, append, overwrite_partition
Cross-spec foreign keys work via the id_mapping_outputid_mapping_join pair: one promotion publishes the mapping, later promotions consume it. See worked examples.