Marketing API Ingestion Service
Build a fault-tolerant Python service that paginates through third-party ad network APIs, handles rate limits, and loads data into the warehouse.
Ingestion Pipeline
What You'll Build
Foundation — REST API Ingestion with Retry Logic
3–4 hoursBuild a production-grade REST API client with exponential backoff, cursor/offset/keyset pagination, token-bucket rate limiting, idempotent request handling, and structured error responses. The backbone of every data ingestion layer.
Events — Webhook Receiver & S3 Batch Drops
3–4 hoursHandle push-based data: build a webhook receiver with HMAC signature verification and event deduplication, plus an S3 batch ingestion pipeline with manifest tracking, file format detection, and dead letter queues for failed events.
Connectors — SaaS Export & Schema Validation
3–4 hoursBuild connectors for third-party SaaS platforms (Salesforce, Stripe, HubSpot patterns), implement JSON Schema validation on every record, handle schema evolution gracefully, and enforce data contracts between source and consumer.
Production — Unified Pipeline & Monitoring
3–4 hoursOrchestrate all four source types in a unified Airflow DAG, implement source-specific scheduling strategies, build freshness SLA monitoring with alerting on schema drift and volume anomalies, and deploy with blue/green rollback safety.
Skills This Project Reinforces
API Integration
M1: REST Fundamentals, M2: Auth & Secrets
Error Handling
Retry Logic, Dead Letter Queues, Circuit Breakers
Schema Validation
JSON Schema, Evolution, Contract Testing
Data Quality
Deduplication, Idempotency, Validation
Orchestration
Airflow DAGs, Scheduling Strategies, Dependencies
Observability
SLA Monitoring, Alerting, Drift Detection
Tech Stack
Sample Datasets
Paginated REST API simulator with rate limiting, cursor pagination, and intermittent 429/500 errors
Simulated webhook payloads with HMAC signatures, duplicate events, and out-of-order delivery
S3 batch files in mixed formats (CSV, JSON, Parquet) with manifest files and checksums
Salesforce-style bulk API export with schema evolution scenarios (added fields, type changes)
Resume-Ready Bullets
Built multi-source data ingestion layer handling REST APIs, webhooks, S3 batch drops, and SaaS exports with exponential backoff retry logic reducing failed extractions by 95%
Implemented idempotent ingestion pipeline with request fingerprinting and bloom filter deduplication, achieving exactly-once delivery across 4 heterogeneous data sources processing 500K+ daily records
Designed schema validation framework using JSON Schema with automated drift detection, preventing 100% of breaking schema changes from reaching the data warehouse
Orchestrated unified Airflow DAG with source-specific scheduling (cron, event-driven, file-arrival), freshness SLA monitoring, and blue/green deployment with automated rollback
Related Learning
Ready to Build Your Ingestion Layer?
Every data engineering role starts with getting data in. This project gives you the production patterns that separate "it works on my laptop" from "it runs in prod."