Skip to main content
Parsing turns the raw file into a typed table. It has four parts:
"parsing": {
  "parser": { /* how to read the file */ },
  "mapping": { /* source column -> target field + type */ },
  "extra_fields": { /* what to do with unmapped columns */ },
  "validation": { /* required columns, min rows */ }
}

Parser

The parser object selects a reader (type) and configures it.
typeFor
csvCSV / delimited text
excelExcel workbooks
polaris_savPolaris .sav SQL-dump archives

Common options

FieldMeaning
header_row0-indexed row holding the column names (default 0).
skip_rowsRows to skip before the header.
max_rowsCap on rows parsed (null = no limit).
strip_columnsColumns to drop (by index array, or true for all).
supports_reparseWhether the file can be re-parsed after initial ingest.

CSV options

encoding (e.g. utf-8, iso-8859-1), delimiter (,, ;, \t, |…), has_header.

Excel options

sheet_names (array, or null for all sheets), plus section_context for files where data is grouped under section headers:
"section_context": [
  { "label": "Magasin", "output_column": "_section_magasin" },
  { "label": "Poste",   "output_column": "_section_poste" }
]
Each recognized section header value is carried onto the rows beneath it, into the named output column.

SAV options (Polaris)

sql_filename (path to the SQL dump inside the archive, e.g. 0-full.sql), tables (table names to extract), extract_media (pull binary assets).

Column mapping

mapping.columns maps each source column to a target field and a Spark SQL type:
"mapping": {
  "columns": {
    "Numéro":        { "target": "ticket_number", "type": "STRING" },
    "Montant":       { "target": "amount",         "type": "DOUBLE" },
    "Date":          { "target": "payment_date",   "type": "STRING" }
  }
}
type is any Spark SQL type (it’s a free-form string, not a fixed enum). Common values: STRING, INT, LONG, DOUBLE, TIMESTAMP.

Extra fields

Unmapped columns are handled by extra_fields.mode:
ModeBehavior
ignoreDrop unmapped columns.
passthroughKeep them as-is.
collectGather them into a list in target_column.
store_jsonStore them as a JSON object in target_column.
"extra_fields": { "mode": "store_json", "target_column": "_extra_fields" }

Validation

Guard rails applied after parsing:
"validation": {
  "required_columns": ["ticket_number", "payment_method"],
  "min_rows": 1
}
  • required_columns — must be present and non-null.
  • min_rows — minimum row count.
Parsed columns then flow into the promotion pipeline, where they’re transformed and written to silver/gold.