Engineering essays · not corporate blog

Engineering Insights

Hard-won lessons, system design teardowns, and architecture guides from the frontlines of data engineering.

Articles: 10
Authors: 6

Sort

1 featured + 9 more posts

Why We Migrated from Airflow to Kubernetes-Native Orchestration

After three years running Airflow at scale, we hit the ceiling: resource contention, slow DAG parsing, and a scheduler that became a single point of failure. Here's the full story of how we rebuilt orchestration on Argo Workflows — what we gained, what we lost, and the lessons you can steal.

Jordan Reeves

Mar 18, 2026 · 8 min read

Read article →

PlatformJun 18, 2026

Data Observability in 2026: Monte Carlo vs Great Expectations vs Soda — A Data Engineer's Honest Comparison

A practitioner comparison of data observability tools in 2026 — Monte Carlo, Great Expectations, Soda, and Elementary — covering real integration code, production trade-offs, cost analysis, and when to build vs buy.

18 min

Read →

PlatformJun 02, 2026

CI/CD for Data Pipelines: The Production Guide

Data pipelines are stateful — software CI/CD patterns don't transfer cleanly. Schema validation, data quality gates, environment parity, and the 7-stage lifecycle that prevents 3 AM failures, with working dbt + Airflow + Great Expectations code.

15 min

Read →

ArchitectureApr 15, 2026

How to Design a Modern Data + AI System: Control, Data, and Decision Planes

Most data teams build AI features by bolting an LLM onto their existing pipeline and calling it done. The systems that actually work in production separate three planes. Here's why, with a reference architecture.

12 min

Read →

AI/MLOpsApr 15, 2026

Build an AI Tactical Analyst with NFL Data, dbt, and RAG: A Full Data Engineering Pipeline

While everyone else argues about the halftime show, we're building the scouting report. This tutorial walks through a full end-to-end data + AI pipeline using NFL play-by-play data, dbt, and a RAG-powered analyst.

15 min

Read →

StreamingMar 14, 2026

The Reality of Streaming: When to Actually Use Apache Flink

Flink is extraordinarily powerful and extraordinarily complex. Most teams reach for it before they need it — and pay the operational price for years. A decision framework drawn from 14 production deployments.

6 min

Read →

Core DEMar 10, 2026

Implementing Data Contracts in a dbt Monorepo

Data contracts promise to fix the silent breakage problem — upstream schema changes that quietly corrupt downstream models. We implemented contracts across 87 dbt models and 12 source systems. Here's what worked, what didn't, and the YAML you can copy.

7 min

Read →

AI/MLOpsMar 04, 2026

Building a Cost-Efficient RAG Pipeline with Pinecone

RAG pipelines can get expensive fast: embedding costs, vector storage costs, LLM inference costs. After running our production RAG system for 9 months, we cut costs by 73% with three architectural changes. None of them involve switching models.

9 min

Read →

PlatformFeb 28, 2026

Snowflake vs BigQuery in 2026: A Cost Analysis

We ran the same workload — 8 TB scanned daily, mixed ad-hoc and scheduled queries, three BI tools — on both Snowflake and BigQuery for 30 days. The winner depends on something most comparisons get wrong.

10 min

Read →

AI/MLOpsFeb 21, 2026

Stop Building Toy Pipelines: The 2026 Data Engineering Portfolio Guide

Hiring managers see hundreds of GitHub repos with a Jupyter notebook and a README promising an "end-to-end pipeline." None of them get interviews. Here's what does, with three project specs you can actually ship this quarter.

14 min

Read →

Engineering Insights

Why We Migrated from Airflow to Kubernetes-Native Orchestration

Data Observability in 2026: Monte Carlo vs Great Expectations vs Soda — A Data Engineer's Honest Comparison

CI/CD for Data Pipelines: The Production Guide

How to Design a Modern Data + AI System: Control, Data, and Decision Planes

Build an AI Tactical Analyst with NFL Data, dbt, and RAG: A Full Data Engineering Pipeline

The Reality of Streaming: When to Actually Use Apache Flink

Implementing Data Contracts in a dbt Monorepo

Building a Cost-Efficient RAG Pipeline with Pinecone

Snowflake vs BigQuery in 2026: A Cost Analysis

Stop Building Toy Pipelines: The 2026 Data Engineering Portfolio Guide

The Insights digest — once a month.