Build an AI Tactical Analyst with NFL Data, dbt, and RAG: A Full Data Engineering Pipeline

While everyone else argues about the halftime show, we're building the scouting report. This tutorial walks through a full production-style data + AI pipeline on real NFL play-by-play data: ingestion via nfl_data_py, dbt staging → marts with EPA and CPOE, quality gates, rolling features, and a RAG-powered tactical analyst that answers "go for it or punt?" — all code included.

TL;DR

Ingestion — raw play-by-play via nfl_data_py into DuckDB
Modeling — dbt staging → intermediate → marts with EPA and CPOE
Quality — dbt tests as the defensive line against dirty data
Features — rolling QB metrics as context for the AI
Decision — a RAG-powered tactical analyst that answers "go for it or punt?"

Why NFL Data? (And Why It Beats E-Commerce)

E-commerce tutorials are exhausted. Everyone has built a churn model on the same synthetic dataset. NFL play-by-play is a better teacher because it has every property of a real production dataset: messy raw text, high dimensionality (100+ columns per play), derived metrics that matter (EPA, CPOE), and a clear decision layer (coaches actually use this data to decide whether to go for it on 4th-and-2).

If you can build this pipeline, you can build product analytics at any company.

Architecture Overview

NFL AI Tactical Analyst — Full Pipeline Architecture

IngestionControl Plane

nfl_data_py → DuckDB

~50k plays, 300+ columns

dbt runData Plane

staging → intermediate → marts

EPA, CPOE, per-QB aggregates

dbt testData Plane

uniqueness · not_null · range checks

quality gate — pipeline halts on failure

Feature EngineeringData Plane

rolling 5-game EPA + CPOE

window functions over mart_qb_rolling_form

RAG Tactical AnalystDecision Plane

dbt marts → LLM context → decision

"Go for it. 68% success at this down/distance."

Every step is idempotent — dbt test failure halts the pipeline before the LLM sees dirty data

This is the three-plane model applied to sports data. Every layer has one job. Execution authority stays with the orchestrator. Same pattern you'd use to ship a real product analytics platform — just with football instead of funnels.

Step 1 — Ingestion: nfl_data_py + DuckDB

Pythoningest_nfl.py

import duckdb
import nfl_data_py as nfl

# Pull 2024 regular season play-by-play
pbp = nfl.import_pbp_data([2024])

con = duckdb.connect("nfl_analytics.duckdb")
con.execute("CREATE SCHEMA IF NOT EXISTS raw")
con.register("pbp_df", pbp)
con.execute("CREATE OR REPLACE TABLE raw.plays AS SELECT * FROM pbp_df")
print(f"Loaded {con.execute('SELECT COUNT(*) FROM raw.plays').fetchone()[0]:,} plays")

One call → ~50,000 plays with 300+ columns. This is the messy, real-world data shape you'd get from any sports API.

Step 2 — The Staging Layer

SQLmodels/staging/stg_plays.sql

with source as (
    select * from {{ source('raw', 'plays') }}
),
renamed as (
    select
        play_id, game_id,
        posteam              as possession_team,
        defteam              as defense_team,
        qtr                  as quarter,
        down, ydstogo        as yards_to_go,
        yardline_100, play_type,
        passer_player_id     as qb_id,
        passer_player_name   as qb_name,
        passing_yards, pass_attempt, complete_pass,
        epa, cpoe, success, week
    from source
    where play_type in ('pass', 'run')
      and down is not null
)
select * from renamed

Step 3 — Marts: EPA and CPOE

This is where we compute the signals that actually separate elite QBs from replacements.

SQLmodels/marts/mart_qb_performance.sql

with game_stats as (
    select * from {{ ref('int_qb_game_stats') }}
),
season_agg as (
    select
        qb_id, qb_name,
        count(distinct game_id)                         as games_played,
        sum(total_attempts)                             as attempts,
        sum(completions) * 1.0 / sum(total_attempts)    as completion_pct,
        avg(avg_epa)                                    as season_epa_per_play,
        avg(avg_cpoe)                                   as season_cpoe,
        avg(success_rate)                               as success_rate
    from game_stats
    group by 1, 2
    having sum(total_attempts) >= 100
)
select
    *,
    case
        when season_epa_per_play >= 0.20 then 'Elite'
        when season_epa_per_play >= 0.10 then 'Above Average'
        when season_epa_per_play >= 0.00 then 'Average'
        else 'Below Replacement'
    end as tier
from season_agg
order by season_epa_per_play desc

EPA per play above 0.20 is elite territory — Mahomes, Allen, Burrow in a good year. CPOE above +3% means the QB completes passes at a rate meaningfully higher than league average given the difficulty of each throw. These are the same derived metrics front offices pay for.

Step 4 — Tests: The Defensive Line

YAMLmodels/marts/_marts.yml

version: 2
models:
  - name: mart_qb_performance
    description: "Season-level QB performance with EPA/CPOE tiers"
    columns:
      - name: qb_id
        tests:
          - unique
          - not_null
      - name: attempts
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
      - name: completion_pct
        tests:
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1

If a test fails, the pipeline stops. The AI never sees bad data. This is the difference between a demo and a system you can trust on a live broadcast.

Step 5 — Rolling Form Features

SQLmodels/marts/mart_qb_rolling_form.sql

select
    qb_id, qb_name, game_id, week, avg_epa,
    avg(avg_epa) over (
        partition by qb_id
        order by week
        rows between 4 preceding and current row
    ) as rolling_5_game_epa,
    avg(avg_cpoe) over (
        partition by qb_id
        order by week
        rows between 4 preceding and current row
    ) as rolling_5_game_cpoe
from {{ ref('int_qb_game_stats') }}

Rolling features are what every serious sports model uses. They're also exactly the kind of windowed aggregation that shows up in every real product analytics pipeline — same SQL pattern, different domain.

Step 6 — The Decision Layer: RAG Tactical Analyst

Pythonai/tactical_analyst.py

import os, duckdb
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
con = duckdb.connect("nfl_analytics.duckdb")

def get_qb_context(qb_name: str) -> str:
    rolling = con.execute("""
        select week, rolling_5_game_epa, rolling_5_game_cpoe
        from analytics.mart_qb_rolling_form
        where qb_name = ? order by week desc limit 1
    """, [qb_name]).fetchone()

    season = con.execute("""
        select season_epa_per_play, season_cpoe, tier
        from analytics.mart_qb_performance
        where qb_name = ?
    """, [qb_name]).fetchone()

    if not rolling or not season:
        return f"No data found for {qb_name}."

    return (
        f"{qb_name} — Tier: {season[2]}. "
        f"Season EPA/play: {season[0]:.3f}, CPOE: {season[1]:.2f}. "
        f"Last 5-game rolling EPA: {rolling[1]:.3f}, "
        f"rolling CPOE: {rolling[2]:.2f}."
    )

def tactical_call(qb_name: str, down: int, yards_to_go: int, yardline: int) -> str:
    context = get_qb_context(qb_name)
    prompt = f"""
You are an NFL tactical analyst with access to real play-by-play data.

Context from dbt marts:
{context}

Situation:
- Down: {down}
- Yards to go: {yards_to_go}
- Yardline (distance to opponent end zone): {yardline}

Should the team go for it or punt/kick?
Justify using EPA and rolling form. Give a probability of success
based on historical patterns at this down/distance.
"""
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,
    )
    return resp.choices[0].message.content

# 4th and 2, opponent 40, with Mahomes hot
print(tactical_call("P.Mahomes", down=4, yards_to_go=2, yardline=40))

Sample output: Recommendation: go for it. Mahomes is in Elite tier with a season EPA/play of 0.24 and a rolling 5-game EPA of 0.31 — he's trending up. On 4th-and-2 inside opponent territory, historical conversion rate is ~68%. EPA math favors going for it by a wide margin versus a 57-yard field goal attempt.

You didn't train a model. You gave the LLM a clean, trustworthy context pulled from a properly layered dbt project. That's RAG done right.

Why This Matters for Your Career

Pattern in this project	Same pattern at product companies
Ingestion (nfl_data_py → DuckDB)	Event stream → data warehouse
EPA / CPOE per QB	LTV / session quality / conversion propensity
Rolling 5-game window	Rolling 30-day engagement feature
dbt contract + not_null test	SLA on feature availability for ML model
RAG over mart_qb_performance	RAG over customer feature table

Ingestion → dbt layers → tests → features → decision layer is the canonical modern data stack. If you can explain this pipeline end-to-end in an interview, you're interviewing at the AI Data Engineer level.

Ready to ship it?

Build your own RAG pipeline

This post walked a complete data + AI system end-to-end. The patterns are universal: dbt marts power every serious analytics pipeline, rolling features beat single-point statistics, and RAG over clean context beats fine-tuning every time.

Dive into a runnable project and ship your own tactical analyst. You'll learn the same layering and quality gates that are standard at every data-forward company.

Browse RAG projects Learn RAG fundamentals