If you're working with enterprise data warehouses, you've probably heard the term Data Vault 2.0 thrown around. But what exactly is it, and why are so many organizations adopting it? This guide covers everything you need to know.
What is Data Vault 2.0?
Data Vault 2.0 is a data warehouse modeling methodology created by Dan Linstedt. It provides a set of standards and patterns for building scalable, auditable, and flexible data warehouses.
Unlike traditional approaches (Kimball's dimensional modeling or Inmon's 3NF), Data Vault separates structure from context from relationships. This separation makes it uniquely suited for environments where:
- Source systems change frequently
- Full audit trails are required (finance, healthcare, government)
- Multiple teams need to load data in parallel
- The business model evolves faster than the warehouse can be rebuilt
The "2.0" version adds modern practices like hash keys, ghost records, and compatibility with NoSQL and big data platforms.
The Three Core Components
Every Data Vault model is built from just three table types:
1. Hubs — Business Entities
A Hub represents a core business concept — a customer, product, order, account, or any entity that your business tracks. Hubs contain:
- Hash Key — a hashed surrogate key derived from the business key
- Business Key — the natural key from the source system (e.g., customer_id)
- Load Date — when the record was first loaded
- Record Source — which system the record came from
Hubs are insert-only. Once a business key is recorded, it never changes. This is a key reason Data Vault provides full auditability.
2. Links — Relationships
A Link captures the relationship between two or more Hubs. For example, the relationship between a Customer and an Order, or between a Product and a Supplier.
Links contain the hash keys of the connected hubs, plus load date and record source. Like hubs, links are insert-only.
This separation means that when relationships change, you add new link records rather than updating existing ones — preserving the full history of how entities related over time.
3. Satellites — Descriptive Context
Satellites store the descriptive attributes that change over time. A Customer Hub might have satellites for:
- Customer Details — name, email, phone (changes when customer updates profile)
- Customer Address — street, city, country (changes when customer moves)
- Customer Classification — segment, tier, risk score (changes as business rules evolve)
Each satellite tracks changes with a hashdiff — a hash of all descriptive columns. When the hashdiff changes, a new satellite record is inserted. This gives you complete history of every change without updating or deleting any rows.
Why Separate Everything Into Three Table Types?
This separation is the core insight of Data Vault. Here's why it matters:
- Parallel loading — Hubs, Links, and Satellites can be loaded independently by different teams or processes, with no locking conflicts
- Agility — Adding a new source system means adding new Satellites, not redesigning existing tables
- Auditability — Every record has a load date and record source, so you can trace any value back to its origin
- No destructive changes — Insert-only patterns mean you never lose historical data
Data Vault vs. Kimball vs. Inmon
| Kimball (Dimensional) | Inmon (3NF) | Data Vault 2.0 | |
|---|---|---|---|
| Design focus | Business process / reporting | Enterprise data model | Business keys & relationships |
| Schema type | Star / Snowflake | 3rd Normal Form | Hub / Link / Satellite |
| History tracking | SCD Type 1/2 (limited) | Varies | Full history by default |
| Parallel loading | Difficult | Moderate | Built-in |
| Agility | Low (schema changes break reports) | Moderate | High (additive changes only) |
| Reporting | Direct querying | Needs data marts | Needs data marts (Business Vault) |
| Best for | Small-medium, stable sources | Enterprise, governed | Complex, multi-source, auditable |
Important: Data Vault doesn't replace dimensional modeling — it complements it. You typically build a Data Vault as your integration layer, then create Kimball-style star schemas on top as data marts for reporting.
When Should You Use Data Vault 2.0?
Data Vault is the right choice when:
- You have multiple source systems feeding your warehouse
- You need full audit trails (regulatory compliance, finance, healthcare)
- Source systems change frequently and you can't keep redesigning your warehouse
- Multiple teams need to load data in parallel without conflicts
- You're building for the long term and expect the data landscape to evolve
Data Vault may be overkill if:
- You have a single source system with a stable schema
- Your only goal is a simple reporting dashboard
- Your team is small and doesn't need parallel loading
Data Vault 2.0 with dbt
The rise of dbt (data build tool) has made Data Vault significantly more accessible. Instead of writing hundreds of repetitive SQL scripts for hub, link, and satellite loading, you can use packages like dbtvault (now automate-dv) to generate the loading logic from YAML metadata.
A typical dbt + Data Vault workflow looks like this:
- Define sources in YAML — map business keys, relationships, and descriptive attributes
- Stage the data — dbtvault creates staging models that hash keys and prepare records
- Load the Raw Vault — dbtvault macros handle hub, link, and satellite loading with full idempotency
- Build the Business Vault — create Point-in-Time tables, Bridge tables, and derived business rules
- Create data marts — build star schemas for reporting on top of the vault
This approach means you can go from raw sources to a production-ready Data Vault in days, not months.
Getting Started
Ready to build your first Data Vault? Here are your next steps:
- Learn the theory — read Dan Linstedt's Building a Scalable Data Warehouse with Data Vault 2.0
- Try dbtvault — set up a dbt project and experiment with the package's example models
- Start small — pick 2-3 source tables and model them as hubs, links, and satellites
- Iterate — add more sources and build out your Business Vault layer
Or if you want to skip the learning curve, get in touch with our team — we've built hundreds of Data Vaults and can have yours running in a week.
Ready to build your Data Vault?
We design, implement, and automate Data Vault 2.0 warehouses using dbt. From 5 tables to enterprise-scale.
Get Started