← Back to Pipeline
4
Raw Vault

Satellites

Satellites store descriptive data with full history. Every time an attribute changes — a customer's phone number, an order's status, a product's price — a new row is inserted with a new timestamp. Nothing is ever updated or deleted.

What a Satellite Contains

  • Hub/link hash key (FK) — which entity this data describes
  • Hashdiff — MD5 of all descriptive columns, used to detect changes
  • Descriptive columns — the actual attributes (name, status, amount, etc.)
  • Load datetime — when this version was loaded
  • Record source — which source provided this version

Satellite Models in This Pipeline

sat_customer_details    — name, address, region, segment
sat_customer_contact    — phone, email, contact person
sat_product             — name, category, price, weight
sat_order_status        — status (new → confirmed → shipped → delivered)
sat_order_financials    — total amount, discount, tax
sat_order_line          — quantity, unit price, line total
sat_invoice             — amount, due date, status
sat_payment             — amount, payment date, method
sat_warehouse           — name, location, capacity
sat_manager             — name, department, hire date
sat_contract            — terms, start/end date, value

Change Detection with Hashdiff

On each load, the satellite compares the incoming hashdiff against the latest row for that hub key. If the hash differs, a new row is inserted. If it's the same, the row is skipped. This is equivalent to SCD Type 2 but handled automatically by the automate_dv sat macro.

Why Split Satellites?

Customers have two satellites: sat_customer_details and sat_customer_contact. This is intentional — contact info changes more frequently than demographic details. By splitting, we avoid creating unnecessary history rows when only one group of attributes changes.