Every pipeline starts with a source. In production, this would be a real ERP system — SAP, Oracle, 1C, or a custom solution. For this demo, a Python generator simulates a live ERP database.
What the Generator Creates
Each run produces a realistic batch of FMCG transactional data:
- Customers — retail chains, distributors, independent stores
- Products — SKUs with categories, prices, and weight
- Orders — purchase orders with status lifecycle (new → confirmed → shipped → delivered)
- Order lines — individual items within each order (quantity, unit price, discounts)
- Invoices — financial documents tied to orders and contracts
- Payments — payment records against invoices
- Warehouses — distribution centers that fulfill orders
- Managers — sales managers assigned to customer accounts
- Contracts — agreements between customers and the company
How It Runs
Airflow's fmcg_generator DAG triggers the generator every 30 minutes. Each run:
- Connects to PostgreSQL using Airflow Variables for credentials
- Generates new orders and related data for the current date
- Inserts rows into the
erpschema source tables - Triggers the
fmcg_pipelineDAG automatically viaTriggerDagRunOperator
Source Tables
erp.customers erp.products erp.orders erp.order_lines erp.invoices erp.payments erp.warehouses erp.managers erp.contracts
Why This Matters
The generator proves the pipeline works with continuous, unpredictable data — not a static CSV loaded once. New customers appear, products change prices, orders progress through statuses. This is what a real pipeline must handle.