Hubs are the anchor points of the Data Vault. Each hub stores the unique business keys for one core entity. They are insert-only — once a business key is loaded, it stays forever.
What a Hub Contains
- Hash key (PK) — MD5 of the business key, used as the primary key and join key
- Business key — the natural key from the source system (e.g.
customer_id) - Load datetime — when this key was first seen
- Record source — which source system provided it
Hub Models in This Pipeline
hub_customer — retail chains, distributors, stores hub_product — FMCG product SKUs hub_order — purchase orders hub_order_line — individual line items hub_invoice — financial invoices hub_payment — payment transactions hub_warehouse — distribution centers hub_manager — sales managers hub_contract — customer agreements
Why Hubs Matter
Hubs give the vault its stability. Business keys rarely change — a customer_id is a customer_id regardless of how many times the customer's name or address changes. By separating keys from descriptive data, the vault structure survives source system changes.
Loading Pattern
Hubs use incremental materialization. On each run, dbt checks which business keys are new and inserts only those. Duplicates are automatically filtered by the automate_dv hub macro using a NOT EXISTS pattern.