← Back to Pipeline
0
Source

ERP Source

Every pipeline starts with a source. In production, this would be a real ERP system — SAP, Oracle, 1C, or a custom solution. For this demo, a Python generator simulates a live ERP database.

What the Generator Creates

Each run produces a realistic batch of FMCG transactional data:

  • Customers — retail chains, distributors, independent stores
  • Products — SKUs with categories, prices, and weight
  • Orders — purchase orders with status lifecycle (new → confirmed → shipped → delivered)
  • Order lines — individual items within each order (quantity, unit price, discounts)
  • Invoices — financial documents tied to orders and contracts
  • Payments — payment records against invoices
  • Warehouses — distribution centers that fulfill orders
  • Managers — sales managers assigned to customer accounts
  • Contracts — agreements between customers and the company

How It Runs

Airflow's fmcg_generator DAG triggers the generator every 30 minutes. Each run:

  1. Connects to PostgreSQL using Airflow Variables for credentials
  2. Generates new orders and related data for the current date
  3. Inserts rows into the erp schema source tables
  4. Triggers the fmcg_pipeline DAG automatically via TriggerDagRunOperator

Source Tables

erp.customers
erp.products
erp.orders
erp.order_lines
erp.invoices
erp.payments
erp.warehouses
erp.managers
erp.contracts

Why This Matters

The generator proves the pipeline works with continuous, unpredictable data — not a static CSV loaded once. New customers appear, products change prices, orders progress through statuses. This is what a real pipeline must handle.