Your CEO doesn't care about Hubs and Satellites. Your CFO doesn't care about hash keys. They care about business questions they need answered. This article presents 7 real business requests that come up in enterprises every day — and shows why Data Vault 2.0 can handle them while traditional architectures cannot.
Request 1: "Show me exactly what risk score this customer had in our system on January 15th at 3 PM"
Who asks this: Compliance officer at a bank during a regulatory audit.
Why it matters: The regulator doesn't want to know the customer's current risk score. They want to know what your system said at a specific moment — because a lending decision was made based on that data.
In Kimball (star schema) — impossible to answer reliably
-- dim_customer with SCD Type 2
| customer_id | risk_score | valid_from | valid_to |
|-------------|------------|------------|------------|
| 1001 | Medium | 2025-11-01 | 2026-02-01 |
| 1001 | High | 2026-02-01 | 9999-12-31 |
-- Query: what did we have on Jan 15?
SELECT risk_score FROM dim_customer
WHERE customer_id = 1001
AND valid_from <= '2026-01-15' AND valid_to > '2026-01-15';
-- Returns: "Medium"
-- BUT: valid_from is the BUSINESS effective date.
-- The ETL might have loaded this record on Jan 20
-- (5 days late due to a pipeline failure).
-- On Jan 15 at 3 PM, the warehouse actually had NOTHING
-- for this customer. The answer "Medium" is WRONG.
-- You just lied to a regulator.
In Data Vault — exact answer
-- sat_customer_risk tracks LOAD_DATETIME (when warehouse received it)
| CUSTOMER_HK | RISK_SCORE | LOAD_DATETIME | RECORD_SOURCE |
|-------------|------------|---------------------|---------------|
| a1b2c3... | Low | 2025-06-15 06:00:00 | RISK_ENGINE |
| a1b2c3... | Medium | 2025-11-01 06:00:00 | RISK_ENGINE |
| a1b2c3... | High | 2026-02-01 06:00:00 | RISK_ENGINE |
-- Query: what did our WAREHOUSE have on Jan 15 at 3 PM?
SELECT RISK_SCORE, LOAD_DATETIME, RECORD_SOURCE
FROM sat_customer_risk
WHERE CUSTOMER_HK = 'a1b2c3...'
AND LOAD_DATETIME <= '2026-01-15 15:00:00'
ORDER BY LOAD_DATETIME DESC
LIMIT 1;
-- Returns: "Medium", loaded 2025-11-01 06:00:00, from RISK_ENGINE
-- This is EXACTLY what the warehouse contained. Provably correct.
Why Kimball fails: SCD Type 2 tracks when the business change happened, not when the warehouse loaded it. Data Vault tracks both — and never updates or deletes records, so you can prove the answer is authentic.
Request 2: "Our CRM vendor just changed their data model. Don't break anything."
Who asks this: Head of Data at any company after a SaaS vendor pushes a breaking change.
The situation: Your CRM renames customer_tier to customer_segment. Values change from "Gold/Silver/Bronze" to "Premium/Standard/Basic". This happens on March 1st with zero warning.
In Kimball — everything breaks
-- dim_customer had this column:
-- customer_tier VARCHAR(20)
--
-- Now the source sends "customer_segment" with different values.
--
-- Your options:
-- 1. Rename the column -> breaks all reports using "customer_tier"
-- 2. Add a new column -> dim_customer grows, old reports show NULLs
-- 3. Map new values to old -> you lose the original new values
-- 4. Do nothing -> ETL fails every night until you fix it
--
-- ALL options require changing existing tables and reports.
In Data Vault — zero changes to existing models
You create a new satellite. The old one stays. Reports on old data keep working. New reports use the new satellite. A Business Vault model handles the mapping when you need a unified view.
Request 3: "We acquired a company. Merge their customer data with ours without losing anything."
Who asks this: CTO after a company acquisition.
The situation: Your company uses Shopify (50,000 customers). The acquired company uses WooCommerce (30,000 customers). Some customers exist in both. Both systems have completely different schemas, IDs, and fields.
In Kimball — painful redesign
-- Your dim_customer was built for Shopify:
| customer_sk | shopify_id | name | shopify_tier | email |
|-------------|------------|------------|--------------|-----------------|
| 1 | SH-1001 | John Smith | Plus | john@gmail.com |
-- WooCommerce has different fields:
-- woo_customer_id, loyalty_level, membership_date, referral_code
--
-- To merge, you must either:
-- 1. Add 4 new nullable columns to dim_customer (messy, sparse)
-- 2. Create a separate dim_customer_woo (two customer dimensions??)
-- 3. Redesign dim_customer from scratch (breaks everything)
--
-- And for overlapping customers (same person in both systems),
-- you need to decide NOW which system's data "wins".
-- If you choose wrong, the other data is gone.
In Data Vault — purely additive
-- Unified view from both systems — nothing lost
SELECT
hub.CUSTOMER_ID,
hub.EMAIL,
sh.SHOPIFY_TIER,
sh.SHOPIFY_SINCE,
wc.LOYALTY_LEVEL,
wc.MEMBERSHIP_DATE,
wc.REFERRAL_CODE
FROM hub_customer hub
LEFT JOIN sat_customer_shopify sh ON hub.CUSTOMER_HK = sh.CUSTOMER_HK
LEFT JOIN sat_customer_woocommerce wc ON hub.CUSTOMER_HK = wc.CUSTOMER_HK;
-- John Smith appears ONCE in the hub (same email = same hash key).
-- His Shopify data is in one satellite.
-- His WooCommerce data is in another.
-- NOTHING was lost. NOTHING was merged destructively.
Request 4: "Three departments report different revenue numbers. Which one is right?"
Who asks this: CFO at a board meeting.
The situation: Sales says Q1 revenue is $4.2M (from Salesforce). Finance says $3.8M (from SAP). Operations says $4.0M (from the billing system). The board wants one number, but they also want to understand why the systems disagree.
In Kimball — you already picked a winner
-- fct_revenue was built from ONE source (probably SAP, because Finance owns it).
-- Sales and Operations data was either:
-- a) Never loaded (discarded at ETL time)
-- b) "Reconciled" into SAP numbers (original values overwritten)
--
-- When the CFO asks "why does Sales say $4.2M?",
-- you can't answer. That data isn't in the warehouse.
In Data Vault — all three are preserved
-- Each source has its own satellite — compare directly:
SELECT 'SALESFORCE' AS SOURCE, SUM(REVENUE) AS Q1_TOTAL
FROM sat_order_revenue_salesforce
WHERE LOAD_DATETIME BETWEEN '2026-01-01' AND '2026-04-01'
UNION ALL
SELECT 'SAP', SUM(REVENUE)
FROM sat_order_revenue_sap
WHERE LOAD_DATETIME BETWEEN '2026-01-01' AND '2026-04-01'
UNION ALL
SELECT 'BILLING', SUM(REVENUE)
FROM sat_order_revenue_billing
WHERE LOAD_DATETIME BETWEEN '2026-01-01' AND '2026-04-01';
-- Result:
-- SALESFORCE: $4,200,000
-- SAP: $3,800,000
-- BILLING: $4,000,000
-- Now drill into WHY they differ — per order:
SELECT
hub.ORDER_ID,
sf.REVENUE AS SALESFORCE_AMOUNT,
sap.REVENUE AS SAP_AMOUNT,
bill.REVENUE AS BILLING_AMOUNT,
sf.REVENUE - sap.REVENUE AS SF_VS_SAP_DIFF
FROM hub_order hub
LEFT JOIN sat_order_revenue_salesforce sf ON hub.ORDER_HK = sf.ORDER_HK
LEFT JOIN sat_order_revenue_sap sap ON hub.ORDER_HK = sap.ORDER_HK
LEFT JOIN sat_order_revenue_billing bill ON hub.ORDER_HK = bill.ORDER_HK
WHERE sf.REVENUE != sap.REVENUE
ORDER BY ABS(sf.REVENUE - sap.REVENUE) DESC;
-- Now you see: SAP applies discounts and tax adjustments.
-- Salesforce shows gross. Billing is net.
-- You just gave the CFO a reconciliation report in 5 minutes.
Request 5: "A patient's allergy was recorded wrong for 3 months. Show me every decision made using that wrong data."
Who asks this: Chief Medical Officer at a hospital after a near-miss incident.
The situation: A patient's allergy was entered incorrectly on January 10th and corrected on April 8th. During those 3 months, prescriptions were written based on the wrong data. The hospital needs to identify every affected prescription.
In Kimball — the wrong data is gone
-- dim_patient (SCD Type 1 — most common for allergies)
-- The wrong allergy was overwritten when corrected.
-- You can see the CURRENT allergy, but not what it was during Jan-April.
--
-- Even with SCD Type 2, you can't reliably cross-reference
-- with fct_prescription because valid_from is the business date,
-- not the warehouse load date. You don't know which prescriptions
-- were written WHILE the wrong allergy was in the warehouse.
In Data Vault — full traceability
-- sat_patient_allergy (insert-only, every change tracked)
| PATIENT_HK | ALLERGY | LOAD_DATETIME | RECORD_SOURCE |
|------------|-------------|---------------------|---------------|
| hash(5001) | None | 2025-06-01 06:00:00 | EHR_SYSTEM |
| hash(5001) | Penicillin | 2026-01-10 06:00:00 | EHR_SYSTEM | -- wrong
| hash(5001) | Amoxicillin | 2026-04-08 06:00:00 | EHR_SYSTEM | -- corrected
-- Find every prescription written while wrong allergy was in the system:
SELECT
hub_rx.PRESCRIPTION_ID,
sat_rx.DRUG_NAME,
sat_rx.PRESCRIBED_DATE,
sat_rx.PRESCRIBING_DOCTOR,
sat_allergy.ALLERGY AS ALLERGY_AT_TIME_OF_PRESCRIPTION
FROM hub_prescription hub_rx
JOIN link_patient_prescription lnk
ON hub_rx.PRESCRIPTION_HK = lnk.PRESCRIPTION_HK
JOIN sat_prescription_details sat_rx
ON hub_rx.PRESCRIPTION_HK = sat_rx.PRESCRIPTION_HK
JOIN sat_patient_allergy sat_allergy
ON lnk.PATIENT_HK = sat_allergy.PATIENT_HK
WHERE lnk.PATIENT_HK = 'hash(5001)'
AND sat_rx.LOAD_DATETIME BETWEEN '2026-01-10' AND '2026-04-08'
AND sat_allergy.LOAD_DATETIME <= sat_rx.LOAD_DATETIME
QUALIFY ROW_NUMBER() OVER (
PARTITION BY hub_rx.PRESCRIPTION_HK
ORDER BY sat_allergy.LOAD_DATETIME DESC
) = 1;
-- Returns every prescription during the 3-month window,
-- with proof that "Penicillin" (wrong) was what the system showed.
-- This is legally defensible evidence for the safety review.
Request 6: "Our nightly ETL takes 8 hours. The business needs data by 7 AM."
Who asks this: VP of Analytics at a retail company.
The situation: ETL starts at midnight. Dimensions must load before facts. Everything is sequential because of foreign key dependencies. It finishes at 8 AM — one hour late.
In Kimball — stuck with sequential loading
-- Kimball ETL must run in order:
-- 00:00 dim_customer (45 min) -- must finish first
-- 00:45 dim_product (30 min) -- must finish first
-- 01:15 dim_store (20 min) -- must finish first
-- 01:35 fct_daily_sales (3 hours) -- needs all 3 dimension SKs
-- 04:35 fct_inventory (2 hours) -- needs dim_product + dim_store
-- 06:35 agg_weekly (1 hour) -- needs fct_daily_sales
-- 07:35 DONE — 35 minutes late
--
-- fct_daily_sales CANNOT start until all dimensions are loaded
-- because it needs surrogate keys that don't exist yet.
In Data Vault — full parallel loading
-- Data Vault: everything runs at 00:00, in parallel
--
-- 00:00 ALL hubs, links, satellites at the same time (20 min)
-- 00:20 business_vault (PIT tables, bridge tables) (15 min)
-- 00:35 marts (star schemas for reporting) (10 min)
-- 00:45 DONE — 6 hours and 15 minutes EARLY
--
-- Why? Hash keys are computed from business keys.
-- The hash for customer 1001 is always the same,
-- regardless of whether hub_customer has loaded yet.
-- No table waits for another table.
Request 7: "We're switching from Salesforce to HubSpot. Don't lose 5 years of history."
Who asks this: VP of Sales during a CRM migration.
The situation: The company used Salesforce for 5 years. They're moving to HubSpot. They need both historical and new data — without gaps or data loss.
In Kimball — risky and destructive
-- dim_customer and fct_opportunity were built for Salesforce.
--
-- Option 1: Redesign for HubSpot
-- -> 5 years of Salesforce data needs re-mapping
-- -> Fields that don't exist in HubSpot become NULL
-- -> Historical reports change their output
--
-- Option 2: Keep Salesforce schema, map HubSpot into it
-- -> HubSpot-specific fields are lost forever
-- -> New features can never be used in analytics
--
-- Option 3: Two separate warehouses
-- -> No unified view
-- -> "Compare 2025 vs 2026" requires querying two systems
--
-- No good option exists.
In Data Vault — seamless transition
-- Stop loading SAT_CUSTOMER_SALESFORCE (history preserved forever)
-- Start loading SAT_CUSTOMER_HUBSPOT (new data flows in)
-- Both attached to the same HUB_CUSTOMER
-- Unified view across both CRMs:
SELECT
hub.CUSTOMER_ID,
COALESCE(hs.CUSTOMER_NAME, sf.CUSTOMER_NAME) AS CUSTOMER_NAME,
COALESCE(hs.LIFECYCLE_STAGE, sf.LEAD_STATUS) AS STATUS,
CASE
WHEN hs.CUSTOMER_HK IS NOT NULL THEN 'HUBSPOT'
ELSE 'SALESFORCE'
END AS CURRENT_SOURCE,
COALESCE(sf.FIRST_CONTACT_DATE, hs.CREATED_DATE) AS FIRST_SEEN
FROM hub_customer hub
LEFT JOIN sat_customer_salesforce sf ON hub.CUSTOMER_HK = sf.CUSTOMER_HK
LEFT JOIN sat_customer_hubspot hs ON hub.CUSTOMER_HK = hs.CUSTOMER_HK;
-- Result:
-- | CUSTOMER_ID | NAME | STATUS | SOURCE | FIRST_SEEN |
-- |-------------|------------|-----------|------------|------------|
-- | C-1001 | John Smith | Customer | HUBSPOT | 2022-03-15 | migrated
-- | C-1002 | Jane Doe | Lead | SALESFORCE | 2023-07-01 | not yet
-- | C-1003 | New Lead | Subscriber| HUBSPOT | 2026-03-10 | new
--
-- 5 years of Salesforce history: preserved forever.
-- New HubSpot data: flowing.
-- Cross-CRM analytics: one query.
Summary: Business Problems Only Data Vault Can Solve
| Business Request | Kimball / 3NF | Data Vault 2.0 |
|---|---|---|
| "What did our system have at this exact moment?" | Can't prove it | LOAD_DATETIME on every record |
| "Source system changed, don't break anything" | Redesign required | Add new satellite, zero changes |
| "Merge acquired company's data" | Painful redesign | Add satellite, link duplicates |
| "Why do departments report different numbers?" | Other sources were discarded | All sources preserved, compare anytime |
| "Trace every decision made with wrong data" | Old data overwritten | Full insert-only history with timestamps |
| "ETL takes 8 hours, need it in 1" | Sequential FK dependencies | Full parallel loading, 45 min |
| "Switching CRM, keep 5 years of history" | Redesign or lose data | Old + new satellites coexist seamlessly |
Data Vault 2.0 isn't about technology — it's about giving the business flexibility they didn't know they needed. Every example above is a real situation that companies face. The ones using Data Vault handle them in hours. The ones using traditional architectures spend weeks or months — and often lose data in the process.
Facing any of these problems?
We've built Data Vaults for companies dealing with exactly these challenges — from regulatory audits to CRM migrations to multi-source reconciliation. Let's discuss your situation.
Get Started