Skip to content
8 articles

Engineering Insights

Hard-won lessons, system design teardowns, and architecture guides from the frontlines of data engineering.

ArchitectureFeatured8 min read

Why We Migrated from Airflow to Kubernetes-Native Orchestration

After three years running Airflow at scale, we hit the ceiling: resource contention, slow DAG parsing, and a scheduler that became a single point of failure. Here's the full story of how we rebuilt orchestration on Argo Workflows — what we gained, what we lost, and the lessons you can steal.

Jordan Reeves·Mar 18, 2026
Read Article

7 articles

StreamingMar 14, 2026

The Reality of Streaming: When to Actually Use Apache Flink

Flink is extraordinarily powerful and extraordinarily complex. Most teams reach for it before they need it — and pay the operational price. Here's a framework for deciding when stream processing is justified, and when a micro-batch approach will serve you just as well.

6 min read
Read
Core DEMar 10, 2026

Implementing Data Contracts in a dbt Monorepo

Data contracts promise to fix the silent breakage problem — upstream schema changes that quietly corrupt downstream reports. But the tooling is still maturing. Here's what actually worked for us: a lightweight contract layer built on dbt meta, JSON Schema, and a pre-merge CI check.

7 min read
Read
AI/MLOpsMar 6, 2026

Building a Cost-Efficient RAG Pipeline with Pinecone

RAG pipelines can get expensive fast: embedding costs, vector storage costs, LLM inference costs. After running our internal knowledge base RAG in production for six months, here's what we optimized to cut costs by 70% without sacrificing retrieval quality.

9 min read
Read
AI/MLOpsMar 21, 2026

Stop Building Toy Pipelines: The 2026 Data Engineering Portfolio Guide (with Code)

Hiring managers see hundreds of GitHub repos with a Jupyter notebook and a README promising an "end-to-end pipeline." They pass on all of them. Here's how to build a PySpark + dbt + Airflow portfolio project that demonstrates production-grade thinking — with full code.

14 min read
Read
PlatformFeb 28, 2026

Snowflake vs BigQuery in 2026: A Cost Analysis

We ran the same workload — 8 TB scanned daily, mixed ad-hoc and scheduled queries, three BI tools — on both Snowflake and BigQuery for 30 days. The winner depends heavily on your query patterns. Here are the numbers.

10 min read
Read
ArchitectureApr 15, 2026

How to Design a Modern Data + AI System: Control, Data, and Decision Planes

Most data teams build AI features by bolting an LLM onto their existing pipeline and calling it done. The systems that actually work in production separate concerns into three explicit planes: a Control Plane that orchestrates, a Data Plane that models, and a Decision Plane that decides. Here's the architecture.

12 min read
Read
AI/MLOpsApr 15, 2026

Build an AI Tactical Analyst with NFL Data, dbt, and RAG: A Full Data Engineering Pipeline

While everyone else argues about the halftime show, we're building the scouting report. This tutorial walks through a full production-style data + AI pipeline on real NFL play-by-play data: ingestion via nfl_data_py, dbt staging → marts with EPA and CPOE, quality gates, rolling features, and a RAG-powered tactical analyst that answers 'go for it or punt?' — all code included.

15 min read
Read
Press Cmd+K to open