What is Data Vault 2.0? A Complete Guide

If you're working with enterprise data warehouses, you've probably heard the term Data Vault 2.0 thrown around. But what exactly is it, and why are so many organizations adopting it? This guide covers everything you need to know.

What is Data Vault 2.0?

Data Vault 2.0 is a data warehouse modeling methodology created by Dan Linstedt. It provides a set of standards and patterns for building scalable, auditable, and flexible data warehouses.

Unlike traditional approaches (Kimball's dimensional modeling or Inmon's 3NF), Data Vault separates structure from context from relationships. This separation makes it uniquely suited for environments where:

Source systems change frequently
Full audit trails are required (finance, healthcare, government)
Multiple teams need to load data in parallel
The business model evolves faster than the warehouse can be rebuilt

The "2.0" version adds modern practices like hash keys, ghost records, and compatibility with NoSQL and big data platforms.

The Three Core Components

Every Data Vault model is built from just three table types:

1. Hubs — Business Entities

A Hub represents a core business concept — a customer, product, order, account, or any entity that your business tracks. Hubs contain:

Hash Key — a hashed surrogate key derived from the business key
Business Key — the natural key from the source system (e.g., customer_id)
Load Date — when the record was first loaded
Record Source — which system the record came from

Hubs are insert-only. Once a business key is recorded, it never changes. This is a key reason Data Vault provides full auditability.

2. Links — Relationships

A Link captures the relationship between two or more Hubs. For example, the relationship between a Customer and an Order, or between a Product and a Supplier.

Links contain the hash keys of the connected hubs, plus load date and record source. Like hubs, links are insert-only.

This separation means that when relationships change, you add new link records rather than updating existing ones — preserving the full history of how entities related over time.

3. Satellites — Descriptive Context

Satellites store the descriptive attributes that change over time. A Customer Hub might have satellites for:

Customer Details — name, email, phone (changes when customer updates profile)
Customer Address — street, city, country (changes when customer moves)
Customer Classification — segment, tier, risk score (changes as business rules evolve)

Each satellite tracks changes with a hashdiff — a hash of all descriptive columns. When the hashdiff changes, a new satellite record is inserted. This gives you complete history of every change without updating or deleting any rows.

Want to see hubs, links, and satellites in action? Walk through a live FMCG sales pipeline — real data flowing from staging through raw vault to business marts, with row counts and DAG runs you can click into.

Why Separate Everything Into Three Table Types?

This separation is the core insight of Data Vault. Here's why it matters:

Parallel loading — Hubs, Links, and Satellites can be loaded independently by different teams or processes, with no locking conflicts
Agility — Adding a new source system means adding new Satellites, not redesigning existing tables
Auditability — Every record has a load date and record source, so you can trace any value back to its origin
No destructive changes — Insert-only patterns mean you never lose historical data

Data Vault vs. Kimball vs. Inmon

	Kimball (Dimensional)	Inmon (3NF)	Data Vault 2.0
Design focus	Business process / reporting	Enterprise data model	Business keys & relationships
Schema type	Star / Snowflake	3rd Normal Form	Hub / Link / Satellite
History tracking	SCD Type 1/2 (limited)	Varies	Full history by default
Parallel loading	Difficult	Moderate	Built-in
Agility	Low (schema changes break reports)	Moderate	High (additive changes only)
Reporting	Direct querying	Needs data marts	Needs data marts (Business Vault)
Best for	Small-medium, stable sources	Enterprise, governed	Complex, multi-source, auditable

Important: Data Vault doesn't replace dimensional modeling — it complements it. You typically build a Data Vault as your integration layer, then create Kimball-style star schemas on top as data marts for reporting.

When Should You Use Data Vault 2.0?

Data Vault is the right choice when:

You have multiple source systems feeding your warehouse
You need full audit trails (regulatory compliance, finance, healthcare)
Source systems change frequently and you can't keep redesigning your warehouse
Multiple teams need to load data in parallel without conflicts
You're building for the long term and expect the data landscape to evolve

Data Vault may be overkill if:

You have a single source system with a stable schema
Your only goal is a simple reporting dashboard
Your team is small and doesn't need parallel loading

Data Vault 2.0 with dbt

The rise of dbt (data build tool) has made Data Vault significantly more accessible. Instead of writing hundreds of repetitive SQL scripts for hub, link, and satellite loading, you can use packages like dbtvault (now automate-dv) to generate the loading logic from YAML metadata.

A typical dbt + Data Vault workflow looks like this:

Define sources in YAML — map business keys, relationships, and descriptive attributes
Stage the data — dbtvault creates staging models that hash keys and prepare records
Load the Raw Vault — dbtvault macros handle hub, link, and satellite loading with full idempotency
Build the Business Vault — create Point-in-Time tables, Bridge tables, and derived business rules
Create data marts — build star schemas for reporting on top of the vault

This approach means you can go from raw sources to a production-ready Data Vault in days, not months.

You can browse the rendered dbt schema from a working FMCG vault — every hub, link, satellite, PIT, and bridge model with the actual SQL behind it.

Getting Started

Ready to build your first Data Vault? Here are your next steps:

Learn the theory — read Dan Linstedt's Building a Scalable Data Warehouse with Data Vault 2.0
Try dbtvault — set up a dbt project and experiment with the package's example models
Start small — pick 2-3 source tables and model them as hubs, links, and satellites
Iterate — add more sources and build out your Business Vault layer

Or if you want to skip the learning curve, get in touch with our team — we've built hundreds of Data Vaults and can have yours running in a week.

Ready to build your Data Vault?

We design, implement, and automate Data Vault 2.0 warehouses using dbt. From 5 tables to enterprise-scale.

Browse the schema See live pipeline Get in touch

What is Data Vault 2.0? A Complete Guide