Skip to content
6 articles

Engineering Insights

Hard-won lessons, system design teardowns, and architecture guides from the frontlines of data engineering.

ArchitectureFeatured8 min read

Why We Migrated from Airflow to Kubernetes-Native Orchestration

After three years running Airflow at scale, we hit the ceiling: resource contention, slow DAG parsing, and a scheduler that became a single point of failure. Here's the full story of how we rebuilt orchestration on Argo Workflows — what we gained, what we lost, and the lessons you can steal.

Jordan Reeves·Mar 18, 2026
Read Article

5 articles

StreamingMar 14, 2026

The Reality of Streaming: When to Actually Use Apache Flink

Flink is extraordinarily powerful and extraordinarily complex. Most teams reach for it before they need it — and pay the operational price. Here's a framework for deciding when stream processing is justified, and when a micro-batch approach will serve you just as well.

6 min read
Read
Core DEMar 10, 2026

Implementing Data Contracts in a dbt Monorepo

Data contracts promise to fix the silent breakage problem — upstream schema changes that quietly corrupt downstream reports. But the tooling is still maturing. Here's what actually worked for us: a lightweight contract layer built on dbt meta, JSON Schema, and a pre-merge CI check.

7 min read
Read
AI/MLOpsMar 6, 2026

Building a Cost-Efficient RAG Pipeline with Pinecone

RAG pipelines can get expensive fast: embedding costs, vector storage costs, LLM inference costs. After running our internal knowledge base RAG in production for six months, here's what we optimized to cut costs by 70% without sacrificing retrieval quality.

9 min read
Read
AI/MLOpsMar 21, 2026

Stop Building Toy Pipelines: The 2026 Data Engineering Portfolio Guide (with Code)

Hiring managers see hundreds of GitHub repos with a Jupyter notebook and a README promising an "end-to-end pipeline." They pass on all of them. Here's how to build a PySpark + dbt + Airflow portfolio project that demonstrates production-grade thinking — with full code.

14 min read
Read
PlatformFeb 28, 2026

Snowflake vs BigQuery in 2026: A Cost Analysis

We ran the same workload — 8 TB scanned daily, mixed ad-hoc and scheduled queries, three BI tools — on both Snowflake and BigQuery for 30 days. The winner depends heavily on your query patterns. Here are the numbers.

10 min read
Read
Press Cmd+K to open