Skip to content
Engineering Insights
AI/MLOps

Build an AI Tactical Analyst with NFL Data, dbt, and RAG: A Full Data Engineering Pipeline

AI-DEApr 15, 202615 min read

Want to build this yourself?

This architecture is covered in our hands-on projects. Build it in the AI-DE sandbox.

Explore Projects

TL;DR

While everyone else is arguing about the halftime show, we're building the scouting report. This tutorial walks through a full production-style data + AI pipeline on real NFL play-by-play data:

  • Ingestion — raw play-by-play via nfl_data_py into DuckDB
  • Modeling — dbt staging → intermediate → marts with EPA and CPOE
  • Quality — dbt tests as the defensive line against dirty data
  • Features — rolling QB metrics as context for the AI
  • Decision — a RAG-powered tactical analyst that answers "go for it or punt?"
  • All code is embedded below and packaged as a runnable project on AI-DE.

    Key takeaways:

  • Real analytics is built on layered dbt models, not one giant SQL file
  • EPA and CPOE are the hard-mode metrics that separate good QBs from great ones — and they're the same kind of derived features every serious analytics org ships to production
  • Data quality tests are non-negotiable: an AI that reads dirty data is worse than no AI at all
  • RAG turns dbt marts into a 10-year-experience sideline coach — without fine-tuning a model
  • Why NFL Data? (And Why It's a Better Teacher Than E-Commerce)

    E-commerce tutorials are exhausted. Everyone has built a churn model on the same synthetic dataset.

    NFL play-by-play is a better teacher because it has every property of a real production dataset:

  • Messy raw text — plays arrive as unstructured commentary strings
  • High dimensionality — 100+ columns per play, hundreds of games per season
  • Derived metrics that matter — EPA and CPOE are real, hard-to-compute signals
  • A clear decision layer — coaches actually use this data to decide whether to go for it on 4th-and-2
  • If you can build this pipeline, you can build product analytics at any company.

    Architecture Overview

    NFL AI Tactical Analyst — Full Pipeline Architecture

    1
    IngestionControl Plane

    nfl_data_py → DuckDB

    ~50k plays, 300+ columns

    2
    dbt runData Plane

    staging → intermediate → marts

    EPA, CPOE, per-QB aggregates

    3
    dbt testData Plane

    uniqueness · not_null · range checks

    quality gate — pipeline halts on failure

    4
    Feature EngineeringData Plane

    rolling 5-game EPA + CPOE

    window functions over mart_qb_rolling_form

    5
    RAG Tactical AnalystDecision Plane

    dbt marts → LLM context → decision

    "Go for it. 68% success at this down/distance."

    Every step is idempotent — dbt test failure halts the pipeline before the LLM sees dirty data

    This is the three-plane model applied to sports data. Every layer has one job. Execution authority stays with the orchestrator. This is the same pattern you'd use to ship a real product analytics platform — we're just using football instead of funnels.

    Step 1 — Project Setup and Ingestion

    We use nfl_data_py to pull real play-by-play data and DuckDB as a zero-setup local warehouse.

    python
    import os
    import subprocess
    from pathlib import Path
    
    PROJECT_NAME = "nfl_ai_bootcamp"
    
    def create_project(base_path):
        """Scaffold the dbt + DuckDB project."""
        base = Path(base_path) / PROJECT_NAME
        base.mkdir(parents=True, exist_ok=True)
    
        (base / "requirements.txt").write_text(
            "dbt-duckdb>=1.7\n"
            "nfl_data_py>=0.3\n"
            "pandas\nnumpy\nopenai>=1.0\n"
        )
    
        subprocess.run(
            ["dbt", "init", PROJECT_NAME, "--skip-profile-setup"],
            cwd=base, check=True,
        )
        return base

    Load NFL play-by-play into DuckDB

    python
    import duckdb
    import nfl_data_py as nfl
    
    # Pull 2024 regular season play-by-play
    pbp = nfl.import_pbp_data([2024])
    
    con = duckdb.connect("nfl_analytics.duckdb")
    con.execute("CREATE SCHEMA IF NOT EXISTS raw")
    con.register("pbp_df", pbp)
    con.execute("CREATE OR REPLACE TABLE raw.plays AS SELECT * FROM pbp_df")
    print(f"Loaded {con.execute('SELECT COUNT(*) FROM raw.plays').fetchone()[0]:,} plays")

    That single call gives you ~50,000 plays with 300+ columns — this is the messy, real-world data you'd get from any sports API.

    Configure dbt profile

    Save this as `~/.dbt/profiles.yml`. Use an absolute path for DuckDB — otherwise dbt and your Python scripts will end up pointing at different database files depending on which directory you run them from.

    yaml
    # ~/.dbt/profiles.yml
    nfl_ai_bootcamp:
      target: dev
      outputs:
        dev:
          type: duckdb
          path: /absolute/path/to/nfl_ai_bootcamp/nfl_analytics.duckdb
          schema: analytics
          threads: 4

    Install dbt packages

    We use dbt_utils for range checks in the test layer:

    yaml
    # packages.yml
    packages:
      - package: dbt-labs/dbt_utils
        version: [">=1.1.0", "<2.0.0"]
    bash
    dbt deps

    Skip this step and the quality-gate tests in Step 5 will fail with `'dbt_utils' is undefined`.

    Step 2 — The dbt Staging Layer

    Raw play-by-play data is the NFL equivalent of commentary text. The staging layer's job is to clean it and expose typed, named columns. Nothing fancy — just discipline.

    sql
    -- models/staging/stg_plays.sql
    
    with source as (
    
        select * from {{ source('raw', 'plays') }}
    
    ),
    
    renamed as (
    
        select
            play_id,
            game_id,
            posteam                 as possession_team,
            defteam                 as defense_team,
            qtr                     as quarter,
            down,
            ydstogo                 as yards_to_go,
            yardline_100,
            play_type,
            passer_player_id        as qb_id,
            passer_player_name      as qb_name,
            passing_yards,
            pass_attempt,
            complete_pass,
            epa,
            cpoe,
            success,
            week
        from source
        where play_type in ('pass', 'run')
          and down is not null
    
    )
    
    select * from renamed
    yaml
    # models/staging/_sources.yml
    version: 2
    
    sources:
      - name: raw
        schema: raw
        tables:
          - name: plays
            description: "Raw NFL play-by-play from nfl_data_py"

    Step 3 — Intermediate Layer: Per-QB Aggregation

    Intermediate models are where business logic lives. Here we aggregate plays to QB-game level.

    sql
    -- models/intermediate/int_qb_game_stats.sql
    
    with plays as (
    
        select * from {{ ref('stg_plays') }}
        where pass_attempt = 1
          and qb_id is not null
    
    ),
    
    qb_game as (
    
        select
            qb_id,
            qb_name,
            game_id,
            week,
            possession_team,
            count(*)                    as total_attempts,
            sum(complete_pass)          as completions,
            sum(passing_yards)          as passing_yards,
            avg(epa)                    as avg_epa,
            avg(cpoe)                   as avg_cpoe,
            avg(success::int)           as success_rate
        from plays
        group by 1, 2, 3, 4, 5
    
    )
    
    select * from qb_game

    Step 4 — Marts Layer: EPA and CPOE (The Metrics That Matter)

    This is where we compute the signals that actually separate elite QBs from replacements.

    sql
    -- models/marts/mart_qb_performance.sql
    
    with game_stats as (
    
        select * from {{ ref('int_qb_game_stats') }}
    
    ),
    
    season_agg as (
    
        select
            qb_id,
            qb_name,
            count(distinct game_id)                         as games_played,
            sum(total_attempts)                             as attempts,
            sum(completions) * 1.0 / sum(total_attempts)    as completion_pct,
            avg(avg_epa)                                    as season_epa_per_play,
            avg(avg_cpoe)                                   as season_cpoe,
            avg(success_rate)                               as success_rate
        from game_stats
        group by 1, 2
        having sum(total_attempts) >= 100
    
    )
    
    select
        *,
        case
            when season_epa_per_play >= 0.20 then 'Elite'
            when season_epa_per_play >= 0.10 then 'Above Average'
            when season_epa_per_play >= 0.00 then 'Average'
            else 'Below Replacement'
        end as tier
    from season_agg
    order by season_epa_per_play desc

    EPA per play above 0.20 is elite territory — Mahomes, Allen, Burrow in a good year. CPOE above +3% means the QB completes passes at a rate meaningfully higher than league average given the difficulty of each throw. These are the same derived metrics front offices pay for.

    Step 5 — The Defensive Line: dbt Tests

    Before any of this reaches the AI, it has to pass quality gates. This is the CI/CD layer.

    yaml
    # models/marts/_marts.yml
    version: 2
    
    models:
      - name: mart_qb_performance
        description: "Season-level QB performance with EPA/CPOE tiers"
        columns:
          - name: qb_id
            tests:
              - unique
              - not_null
          - name: attempts
            tests:
              - not_null
              - dbt_utils.accepted_range:
                  min_value: 0
          - name: completion_pct
            tests:
              - dbt_utils.accepted_range:
                  min_value: 0
                  max_value: 1
    bash
    dbt deps   # one-time, installs dbt_utils
    dbt run
    dbt test

    If a test fails, the pipeline stops. The AI never sees bad data. This is the difference between a demo and a system you can trust on a live broadcast.

    Step 6 — Feature Engineering: Rolling Form

    One-season averages aren't enough — the AI needs to know if a QB is hot or cold right now. A 5-game rolling EPA is the standard.

    sql
    -- models/marts/mart_qb_rolling_form.sql
    
    with game_stats as (
    
        select * from {{ ref('int_qb_game_stats') }}
    
    )
    
    select
        qb_id,
        qb_name,
        game_id,
        week,
        avg_epa,
        avg(avg_epa) over (
            partition by qb_id
            order by week
            rows between 4 preceding and current row
        ) as rolling_5_game_epa,
        avg(avg_cpoe) over (
            partition by qb_id
            order by week
            rows between 4 preceding and current row
        ) as rolling_5_game_cpoe
    from game_stats

    Rolling features are what every serious sports model uses. They're also exactly the kind of windowed aggregation that shows up in every real product analytics pipeline — same SQL pattern, different domain.

    Step 7 — The Decision Layer: RAG Tactical Analyst

    Now the payoff. The AI reads the dbt marts as context and answers tactical questions in natural language.

    python
    # ai/tactical_analyst.py
    import os
    import duckdb
    from openai import OpenAI
    
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    con = duckdb.connect("nfl_analytics.duckdb")
    
    def get_qb_context(qb_name: str) -> str:
        """Pull rolling form + season tier from dbt marts."""
        rolling = con.execute("""
            select week, rolling_5_game_epa, rolling_5_game_cpoe
            from analytics.mart_qb_rolling_form
            where qb_name = ?
            order by week desc
            limit 1
        """, [qb_name]).fetchone()
    
        season = con.execute("""
            select season_epa_per_play, season_cpoe, tier
            from analytics.mart_qb_performance
            where qb_name = ?
        """, [qb_name]).fetchone()
    
        if not rolling or not season:
            return f"No data found for {qb_name}."
    
        return (
            f"{qb_name} — Tier: {season[2]}. "
            f"Season EPA/play: {season[0]:.3f}, CPOE: {season[1]:.2f}. "
            f"Last 5-game rolling EPA: {rolling[1]:.3f}, "
            f"rolling CPOE: {rolling[2]:.2f}."
        )
    
    def tactical_call(qb_name: str, down: int, yards_to_go: int, yardline: int) -> str:
        context = get_qb_context(qb_name)
    
        prompt = f"""
    You are an NFL tactical analyst with access to real play-by-play data.
    
    Context from dbt marts:
    {context}
    
    Situation:
    - Down: {down}
    - Yards to go: {yards_to_go}
    - Yardline (distance to opponent end zone): {yardline}
    
    Should the team go for it or punt/kick?
    Justify using EPA and rolling form. Give a probability of success
    based on historical patterns at this down/distance.
    """
    
        resp = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2,
        )
        return resp.choices[0].message.content
    
    # 4th and 2, opponent 40, with Mahomes hot
    print(tactical_call("P.Mahomes", down=4, yards_to_go=2, yardline=40))

    Sample output:

    > Recommendation: Go for it. Mahomes is in Elite tier with a season EPA/play of 0.24 and a rolling 5-game EPA of 0.31 — he's trending up, not down. On 4th-and-2 inside opponent territory, historical conversion rate is ~68%. EPA math favors going for it by a wide margin versus a 57-yard field goal attempt. Punt only if you're protecting a 2-score lead with under 4 minutes left.

    You didn't train a model. You gave the LLM a clean, trustworthy context pulled from a properly layered dbt project. That's RAG done right.

    Step 8 — Orchestration: The Full Pipeline Runner

    Tie it together with a single entry point. In production this would be an Airflow DAG; for local dev, a Python runner is fine.

    python
    # run_pipeline.py
    import subprocess
    import sys
    
    def run_step(name, cmd):
        print(f"\n▶ {name}")
        r = subprocess.run(cmd, shell=True)
        if r.returncode != 0:
            print(f"✖ {name} failed")
            sys.exit(1)
        print(f"✓ {name} ok")
    
    if __name__ == "__main__":
        run_step("0. Install dbt packages", "dbt deps")
        run_step("1. Ingest NFL data",      "python ingest_nfl.py")
        run_step("2. dbt run",              "dbt run")
        run_step("3. dbt test",             "dbt test")
        run_step("4. Tactical analyst",     "python ai/tactical_analyst.py")

    Every step is idempotent. Every failure halts the pipeline. This is the bare-minimum production discipline — and it's what separates an engineer from someone who "knows the tools."

    Common Mistakes (What NOT to Do)

  • Computing EPA in a Jupyter notebook. It works once, then you can't reproduce it. Put every derived metric in a dbt model with tests.
  • Skipping the staging layer. "It's just renames" — until a column changes upstream and your entire marts layer breaks silently.
  • Feeding raw stats to the LLM. Without tier labels and rolling context, the model has no grounding and will hallucinate confidently. RAG only works when your marts are good.
  • No dbt test in the pipeline. If a negative passing yardage value slips through, the AI will recommend benching your best QB.
  • Letting the LLM trigger anything. The tactical analyst is a recommendation layer. If you let it write back into your warehouse, you've broken the control/decision separation.
  • Why This Matters for Your Career

    This is the same architecture used in real product analytics orgs — just dressed up in football pads.

    Pattern in this projectSame pattern at product companies
    Ingestion (nfl_data_py → DuckDB)Event stream → data warehouse
    EPA / CPOE per QBLTV / session quality / conversion propensity
    Rolling 5-game windowRolling 30-day engagement feature
    dbt contract + not_null testSLA on feature availability for ML model
    RAG over mart_qb_performanceRAG over customer feature table

    Ingestion → dbt layers → tests → features → decision layer is the canonical modern data stack. If you can explain this pipeline end-to-end in an interview, you're interviewing at the AI Data Engineer level.

    What to Build Next

    Natural extensions on top of this foundation:

  • Airflow orchestration — replace the Python runner with a real control plane
  • Observability — lineage and data freshness monitoring across the pipeline
  • Model serving — expose the tactical analyst as an API for a Streamlit dashboard
  • Multi-sport — same pattern, different raw source (NBA, soccer, esports)
  • These are exactly the progressions in the AI Data Engineer path on AI-DE.

    Get the NFL AI Tactical Analyst project on AI-DE

    Ready to go deeper?

    Explore our full curriculum — hands-on skill toolkits built for production data engineering.

    Press Cmd+K to open