Pipeline Walkthrough
How data flows from ERP source to Power BI reports — a real Apache Airflow DAG running dbt + Data Vault 2.0. Click any task to see what happens inside.
Grid
| Task |
|---|
| staging |
| test_staging |
| hubs |
| links |
| sats |
| eff_sats |
| as_of_dates |
| pits |
| bridge |
| marts |
| test_marts |
Data Volume by Stage — idempotent: rerun produces identical counts
Try it: click Run Pipeline or any ↻ button above, then watch the Current Run column refill stage by stage — same numbers every time. That's Data Vault idempotency.
| Stage | Tables | Previous Run | Current Run |
|---|---|---|---|
| ERP Source | 6 | — | — |
| Staging | 4 | — | — |
| Hubs | 9 | — | — |
| Links | 13 | — | — |
| Sat (Order Status) | 1 | — | — |
| Sat (Order Financials) | 1 | — | — |
| Sat (Order Line) | 1 | — | — |
| Sat (Other) | 8 | — | — |
| Eff. Satellites | 3 | — | — |
| PITs | 3 | — | — |
| Bridge | 1 | — | — |
| Marts | 5 | — | — |
| Total | 55 | — | — |
What Happens at Each Step
ERP Source
Python generator simulates a real ERP system — creates orders, customers, products, invoices, and payments in PostgreSQL. New data generated daily via Airflow.
Staging
dbt views that add hash keys, hashdiffs, and load metadata. No data is stored — staging is a virtual layer that transforms source columns into Data Vault-ready inputs.
Hubs
Unique business keys for each core entity: customer, order, product, warehouse, manager, contract, invoice, payment. Insert-only — once a key exists it stays forever.
Links
Relationships between hubs: order-customer, order_line-order+product, invoice-order, payment-invoice, and more. Tracks how entities connect without duplicating data.
Satellites
Descriptive attributes with full history. Every change to a customer name, product price, or order status creates a new row with a timestamp — nothing is overwritten.
Effectivity Satellites
Track when relationships are active or ended. When an order is cancelled or a contract expires, the effectivity satellite records the exact date range.
As-of Dates
A calendar spine that defines which dates PITs and Bridges should be calculated for. Controls the time granularity of the business vault snapshot.
Point-in-Time Tables
Snapshot of all satellite data for each hub at each as-of date. Lets you ask "what did this customer look like on March 15?" without scanning full history.
Bridge Table
Pre-joined link + hub keys for the order-line grain. Eliminates expensive multi-table joins at query time — Power BI reads one table instead of five.
Marts
Star-schema fact and dimension tables built from PITs + Bridge. dim_customer, dim_product, fact_order_lines — ready for Power BI DirectQuery with zero additional joins.
◆ Live Airflow UI
This is the real Apache Airflow interface running this pipeline. You can see DAG runs, task statuses, and execution history — the same view our engineers use in production. Read-only access.
Want a pipeline like this?
I design, build, and maintain production Data Vault pipelines on dbt + Airflow.
Get in touch