← Back to Pipeline
2
Raw Vault

Hubs

Hubs are the anchor points of the Data Vault. Each hub stores the unique business keys for one core entity. They are insert-only — once a business key is loaded, it stays forever.

What a Hub Contains

  • Hash key (PK) — MD5 of the business key, used as the primary key and join key
  • Business key — the natural key from the source system (e.g. customer_id)
  • Load datetime — when this key was first seen
  • Record source — which source system provided it

Hub Models in This Pipeline

hub_customer     — retail chains, distributors, stores
hub_product      — FMCG product SKUs
hub_order        — purchase orders
hub_order_line   — individual line items
hub_invoice      — financial invoices
hub_payment      — payment transactions
hub_warehouse    — distribution centers
hub_manager      — sales managers
hub_contract     — customer agreements

Why Hubs Matter

Hubs give the vault its stability. Business keys rarely change — a customer_id is a customer_id regardless of how many times the customer's name or address changes. By separating keys from descriptive data, the vault structure survives source system changes.

Loading Pattern

Hubs use incremental materialization. On each run, dbt checks which business keys are new and inserts only those. Duplicates are automatically filtered by the automate_dv hub macro using a NOT EXISTS pattern.