StreamCart Airflow Project
Step-by-step walkthrough: Build an e-commerce data pipeline with Apache Airflow
What You'll Build
Two production-ready Airflow DAGs for StreamCart, an e-commerce platform:
1. Web Events Ingestion
Ingest clickstream data every 15 minutes from web analytics API
2. Daily Product Summary
Aggregate product metrics daily for business reporting
Prerequisites
- ✓ Docker Desktop installed and running
- ✓ Basic Python knowledge (variables, functions, loops)
- ✓ Terminal/command line familiarity
- ✓ 8GB+ RAM available
Set Up Airflow Environment
~20 minWe'll use Docker Compose to run Airflow locally with all necessary components.
ℹ️ This downloads the official Airflow docker-compose configuration
Expected Output:
airflow-init_1 exited with code 0
⏱️ This takes 2-3 minutes to start all services
Open browser to: http://localhost:8080
Login: airflow / airflow
What You Should See:
Airflow UI with "DAGs" tab showing example DAGs
⚠️ Common Issues
- • Port 8080 already in use → Change port in docker-compose.yaml
- • "Permission denied" → Run:
chmod -R 777 logs dags plugins - • Container fails to start → Ensure Docker has 4GB+ RAM allocated
Build Web Events Ingestion DAG
~30 minCreate a DAG that ingests clickstream data every 15 minutes.
💡 What This DAG Does
- Fetches web events from API (simulated)
- Validates events have required fields
- Loads events to data warehouse
- Cleans up temporary files
Build Daily Product Summary DAG
~25 minCreate a DAG that aggregates product metrics daily at 2 AM.
✨ Key Features
- • Uses XCom to pass data between tasks
- • Scheduled for daily execution at 2 AM
- • Email alerts on failure
- • Automatic retry with backoff
Test Your DAGs
~15 minGo to http://localhost:8080
What You Should See:
Two new DAGs: "ingest_web_events" and "daily_product_summary"
Click the toggle switch next to each DAG name to enable them
- Click on "ingest_web_events" DAG name
- Click the "Play" button (▶) in top right
- Click "Trigger DAG"
- Watch the Graph view as tasks execute
Success!
All tasks should turn green (success) within 1-2 minutes
- Click on any green task box
- Click "Log" button
- Review the output to see what happened
💡 Logs show print statements from your Python functions
Repeat steps 4.3-4.4 for "daily_product_summary" DAG
Congratulations! 🎉
You've successfully built and tested two production-ready Airflow DAGs!
What's Next?
Part 2: Advanced Scheduling
Learn cron expressions, dynamic DAGs, and branching logic
Part 3: Monitoring & Alerting
Set up SLAs, email alerts, and Slack notifications
Part 4: Production Deployment
Deploy to Kubernetes with proper secrets management