← Back to Demos AI Analytics →

Pipeline Walkthrough

How data flows from ERP source to Power BI reports — a real Apache Airflow DAG running dbt + Data Vault 2.0. Click any task to see what happens inside.

Connecting...
Edit an Order Airflow UI
success
running
queued
failed
no status

Grid

Task
staging
test_staging
hubs
links
sats
eff_sats
as_of_dates
pits
bridge
marts
test_marts

Data Volume by Stage — idempotent: rerun produces identical counts

Try it: click Run Pipeline or any button above, then watch the Current Run column refill stage by stage — same numbers every time. That's Data Vault idempotency.

StageTablesPrevious RunCurrent Run
ERP Source6
Staging4
Hubs9
Links13
Sat (Order Status)1
Sat (Order Financials)1
Sat (Order Line)1
Sat (Other)8
Eff. Satellites3
PITs3
Bridge1
Marts5
Total55

What Happens at Each Step

0

ERP Source

Python generator simulates a real ERP system — creates orders, customers, products, invoices, and payments in PostgreSQL. New data generated daily via Airflow.

1

Staging

dbt views that add hash keys, hashdiffs, and load metadata. No data is stored — staging is a virtual layer that transforms source columns into Data Vault-ready inputs.

2

Hubs

Unique business keys for each core entity: customer, order, product, warehouse, manager, contract, invoice, payment. Insert-only — once a key exists it stays forever.

3

Links

Relationships between hubs: order-customer, order_line-order+product, invoice-order, payment-invoice, and more. Tracks how entities connect without duplicating data.

4

Satellites

Descriptive attributes with full history. Every change to a customer name, product price, or order status creates a new row with a timestamp — nothing is overwritten.

5

Effectivity Satellites

Track when relationships are active or ended. When an order is cancelled or a contract expires, the effectivity satellite records the exact date range.

6

As-of Dates

A calendar spine that defines which dates PITs and Bridges should be calculated for. Controls the time granularity of the business vault snapshot.

7

Point-in-Time Tables

Snapshot of all satellite data for each hub at each as-of date. Lets you ask "what did this customer look like on March 15?" without scanning full history.

8

Bridge Table

Pre-joined link + hub keys for the order-line grain. Eliminates expensive multi-table joins at query time — Power BI reads one table instead of five.

9

Marts

Star-schema fact and dimension tables built from PITs + Bridge. dim_customer, dim_product, fact_order_lines — ready for Power BI DirectQuery with zero additional joins.

Live Airflow UI

This is the real Apache Airflow interface running this pipeline. You can see DAG runs, task statuses, and execution history — the same view our engineers use in production. Read-only access.

Open full Airflow UI in new tab ↗

Want a pipeline like this?

I design, build, and maintain production Data Vault pipelines on dbt + Airflow.

Get in touch

Connect on LinkedIn