ai-de.net/Projects/P26 · Cloud Cost Optimization on Snowflake

Last updated 2026-05-22By AI-DE Engineering Team

PRO · part 01 free previewPlatform trackP26

Cut a
$300K
Snowflake bill to $120K

FinScale Analytics got a $300K invoice and leadership wants answers. You mine ACCOUNT_USAGE, right-size warehouses, compact Iceberg files, and ship a dbt-powered FinOps dashboard with Z-score anomaly alerts — all runnable offline against pre-built sample data.

Timeline

8-10 hours

Difficulty

Intermediate-Senior

Stack

Snowflake · dbt · Iceberg · Streamlit

See PRO benefits

The cost-engineering question every senior+ DE round asks at Instacart, DoorDash, Shopify and any company whose Snowflake bill has its own line on the P&L.

By the end you will have shipped

A SQL forensics suite over ACCOUNT_USAGE (top-20 expensive queries, idle warehouses, storage hot tables)
Right-sized warehouse DDL with auto-suspend, multi-cluster scaling, and resource-monitor guardrails
An Iceberg compaction pipeline (binpack + 256MB target) and a 4-tier time-travel retention policy
A dbt project (~9 models) computing daily cost attribution + monthly team chargeback
Z-score anomaly detection running as a daily Snowflake Task with email alerts
A Streamlit (or Grafana) cost dashboard + a FinOps weekly-review SQL + responsibility matrix

PREREQComfortable with SQL (CTEs, window functions) and dbt fundamentals. No prior Snowflake admin experience required — the starter kit ships seeded ACCOUNT_USAGE CSVs you can run locally before plugging in a free trial.

snowflake.account_usage.* · FinOps spine

anomaly task armed

Account_usage

Staging

Marts

Alerts + dash

QUERY_HISTORY

WAREHOUSE_METERING

TABLE_STORAGE_METRICS

STORAGE_USAGE

TEAM_MAPPING

5 views · 45-min latency

stg_query_historyincremental · per-query $

stg_warehouse_metering

stg_team_mapping

compaction.py · 256MB

dbt incremental

fct_daily_costs7d rolling avg · scan eff

fct_team_chargeback

fct_cost_anomalies

dim_cost_dashboard

~9 dbt marts

snowflake_task · 8AMz-score > 3 → CRITICAL

SYSTEM$SEND_EMAIL

streamlit · KPI cards

weekly-review.sql

governance spine

# Partition-prune rewrite

gb_scanned: 100 → 2 · cluster on event_date

partitions_scanned: 4096 → 31

credit-burn 50× cheaper · same query

→ top-20 list ranks the rewrites in $ order

● Z-score anomaly auto-page

daily_cost vs 30d baseline · σ window

z > 3 = CRITICAL_SPIKE · z > 2 = WARN

Snowflake Task @ 8AM EST → email

→ SYSTEM$SEND_EMAIL · responsibility matrix

$300K → $120K

annual savings

ACCOUNT_USAGE views

dbt models

Why cost engineering, why now

Cost is the line item every CFO is watching in 2026.

FinOps roles grew 35% YoY. Snowflake customers routinely waste 40-60% of compute on unoptimized queries. The engineer who can prove $180K in documented savings has outsized career leverage — and a portfolio bullet that recruiters actually read.

Immediate, measurable ROI

Unlike feature work that takes months to attribute, cost optimization shows up in the next invoice. $180K saved is $180K saved.

The promotion bullet

“I cut our Snowflake bill from $300K to $120K” is the kind of resume line that gets you on staff-level interview loops. Cost wins are unambiguous.

Senior+ system-design hook

Interviewers at Instacart, DoorDash, and Shopify ask cost questions because they pay the bill. This project gives you firsthand data to answer them.

Patterns transfer

Tutorial ships Snowflake code, but the playbook (forensics → right-size → tier storage → govern) maps cleanly to BigQuery slot reservations and Redshift WLM.

Curriculum · 4 parts · 8-10 hours

Part 01 is free. The rest unlocks with PRO.

Try the first 2 hours — connect to ACCOUNT_USAGE (or run against the seeded CSVs), find the 20 most expensive queries, and walk away with a dollar-figure hit list. If it clicks, upgrade to unlock compute right-sizing, storage compaction, and the FinOps dashboard.

P26 · 8-10 hours · 4 parts

Free preview PRO required

Part 01 is free — no card required. See the forensics SQL before paying.

M01

✓Audit — query forensics + cost baseline

Connect to ACCOUNT_USAGE (or seeded CSVs). Profile the top-20 most expensive queries with bytes_scanned + partition_scan_pct + spill metrics. Audit warehouse utilization, break down storage by table, and build a prioritized hit list with dollar impact.

2h8 lessonsFREE PREVIEW

Start →

M02

⊘Optimize — warehouse tuning + compute right-sizing

Right-size warehouses with a sizing rubric on bytes_scanned + spill_pct. Configure auto-suspend per workload, multi-cluster scaling for ETL, role-based isolation, and resource monitors as guardrails. Eliminate Spark shuffle spills with repartition tuning.

2-3h10 lessonsPRO TIER

Unlock with PRO →

M03

⊘Prevent — storage compaction + lifecycle

Diagnose small files in TABLE_STORAGE_METRICS. Run an Iceberg bin-pack compaction pipeline targeting 256MB files. Tier time-travel retention into 4 buckets. Cluster the 5-10 hottest tables. Reclaim TBs from zombie clones.

2-3h10 lessonsPRO TIER

Unlock with PRO →

M04

⊘Systemize — dbt + dashboard + Z-score alerts

Ship a dbt project: stg_query_history, stg_warehouse_metering, fct_daily_costs, fct_team_chargeback, fct_cost_anomalies (Z-score). Wire a daily Snowflake Task with SYSTEM$SEND_EMAIL alerts. Build a Streamlit (or Grafana) dashboard. Export the weekly-review SQL + responsibility matrix.

2h9 lessonsPRO TIER

Unlock with PRO →

3 parts locked · Unlock all PRO content for $29/mo

Upgrade to PRO →

Backed by curriculum

Cost Optimization for Data Engineers

10 modules·14 hours·FinOps·ACCOUNT_USAGE·warehouse tuning·chargeback·anomaly detection

Open curriculum→

This curriculum is the foundation for the project — not a sales add-on. PRO subscribers get full access to every module.

The build, in 3 phases

Three sprints. Three checkpoints. One $180K saved.

Each phase ends with a tagged commit, a runnable artifact, and a validated dollar drop.

01~2h

Audit — find the money

ACCOUNT_USAGE forensics complete. Top-20 expensive queries ranked by credits, warehouse utilization audited, storage cost broken down by table. Baseline locked at $300K.

✓Top-20 expensive-query report (cloud_credits, gb_scanned, partition_scan_pct, spill)
✓Warehouse utilization audit + idle-warehouse list
✓Storage cost breakdown + cost_multiplier (active + time-travel + fail-safe) per table

02~5h

Cut — compute, then storage

Compute right-sized with auto-suspend + multi-cluster + resource monitors ($300K → $180K). Storage compacted with binpack + clustering + 4-tier retention ($180K → $120K).

✓ALTER WAREHOUSE DDL + auto-suspend policies + resource monitors
✓Iceberg rewrite_data_files compaction (256MB target) + clustering keys
✓4-tier time-travel retention + zombie-clone cleanup

03~2h

Govern — keep it down

dbt project shipped (staging + marts). Z-score anomaly detection running as a daily Snowflake Task with email alerts. Cost dashboard live. Weekly-review SQL exported.

✓dbt project: ~9 models with is_incremental() and team chargeback
✓Z-score anomaly Task + SYSTEM$SEND_EMAIL alerts
✓Streamlit (or Grafana) dashboard + weekly-review SQL + responsibility matrix

Project setup · 10 minutes

Runs offline. Real Snowflake credentials optional.

The starter kit ships seeded ACCOUNT_USAGE CSVs and a self-running acceptance gate so you can build the dbt project before you ever plug in a trial account.

What lives in the repo

Everything you need to run all 4 parts on your laptop, plus the seed scripts that simulate Snowflake's ACCOUNT_USAGE views with realistic shapes and rowcounts.

seeds/ — 5 ACCOUNT_USAGE CSVs (query_history, warehouse_metering, table_storage, storage_usage, team_mapping)
models/ — dbt staging + marts (~9 models): chargeback, anomalies, daily costs
compaction/ — PySpark Iceberg compaction + clustering scripts
tasks/ — Snowflake Task + SYSTEM$SEND_EMAIL alert wiring
dashboards/ — Streamlit app + Grafana panel queries
scripts/validate_seed.py — offline acceptance gate (no Snowflake required)

Download · Starter Kit

Cloud Cost Optimization Starter Kit

Pre-extracted dbt models, Spark/Iceberg compaction, Streamlit dashboard, plus 5 seeded ACCOUNT_USAGE CSVs and the offline acceptance gate. Skip the boilerplate, start on Part 01.

~790 KB · 76 files · 5 ACCOUNT_USAGE seed CSVs · PRO required

~/projects/cloud-cost-optimization — zsh

1. Install Python deps

$ python3 -m venv .venv && source .venv/bin/activate

$ pip install -r requirements.txt

2. Validate offline (no Snowflake required)

$ python scripts/validate_seed.py

3. Configure dbt profile (only if running on real Snowflake)

$ cp profiles.yml.template profiles.yml

$ export SNOWFLAKE_ACCOUNT=... SNOWFLAKE_USER=... SNOWFLAKE_PASSWORD=...

4. Run dbt against your trial account

$ dbt deps --profiles-dir .

$ dbt seed --profiles-dir . --target dev

$ dbt run --profiles-dir . --target dev --select +fct_daily_costs

10K

QUERY_HISTORY rows

WAREHOUSE_METERING

TABLE_STORAGE_METRICS

45-min

ACCOUNT_USAGE latency

Production hardening

The same playbook — but built for the real account.

The tutorial ships against seeded ACCOUNT_USAGE CSVs and a single Snowflake account. Production requires orchestration, RBAC, hard budget caps, and team distribution. Here’s the diff.

Tutorial versionWhat you ship in 8-10 hrs

Cost data source

Seeded ACCOUNT_USAGE CSVs, runnable offline

Pipeline orchestration

Daily dbt run by hand or cron

Anomaly response

Z-score email alert via Snowflake Task

Dashboard access

Streamlit/Grafana with shared creds

Team chargeback

fct_team_chargeback fact table

Account scope

Single Snowflake account

Production versionModules 02-04 + ops

✓

Cost data source

Real ACCOUNT_USAGE with IMPORTED PRIVILEGES grant + 45-min staleness tolerance

✓

Pipeline orchestration

Airflow / Dagster / Snowflake Task DAG with retries, lineage, and freshness SLAs

✓

Anomaly response

Hard budget caps via RESOURCE_MONITOR ... SUSPEND + auto-pause routing

✓

Dashboard access

SSO + RBAC + per-team row-level filters on chargeback marts

✓

Team chargeback

Scheduled per-team email/Slack rollups + invoice templates per cost center

✓

Account scope

Multi-account / multi-region rollup via ORGANIZATION_USAGE views (Enterprise+)

PRO benefit · code review

Real review from senior engineers who’ve cut bills.

Submit your repo, get line-by-line feedback within 48 hours from someone who has actually owned a 7-figure Snowflake bill. The kind of review that's quietly worth thousands of dollars in time-to-staff.

4 reviews / month

Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — Snowflake/FinOps for this project. Async, comments inline, average turnaround 31 hours.

31h

avg turnaround

9.2/10

helpfulness

94%

return next month

2 office hours / month

Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard your warehouse-sizing rubric, mock a system-design interview on cost. Group sessions also available.

30 min

per session

2 / mo

included

+ group

unlimited

What PRO unlocks

One subscription. 15+ projects, all curriculum, code review.

PRO is built for senior+ engineers who want production-grade builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT

Projects

Production-grade builds

15+

Curriculum modules

All 7 tracks

Phase 1 only

All

All + bonus

Code review credits

Senior engineer review

4 / month

Unlimited

Career path access

5 paths × full plans

1 path

All 5

All 5 + 1:1

Certificate

Verifiable on LinkedIn

—

Yes

Yes + portfolio review

Community

Discord + office hours

Read-only

Full + 2/mo

Full + 4/mo

$29/mo

billed monthly · cancel anytime

or annual

$249/yr save 28%

Upgrade to PRO →

Who this is for

Pick this if you own the cloud bill, not just write the queries.

Senior data engineers

You've shipped dbt + warehouse pipelines and now your CFO is asking why the bill is up 40% YoY. This gives you the forensics + governance to answer.

Platform / FinOps engineers

You're building cost visibility for 10+ teams. You need a chargeback fact table, anomaly detection, and a budget-cap pattern that actually pauses spend.

Analytics engineers running dbt

You're already in dbt. Adding cost models alongside your business marts turns you into the person finance calls before they call the head of data.

Engineering managers / tech leads

You sign off on warehouse spend. This is the project that lets you reason about right-sizing trade-offs and approve a multi-cluster reservation without guessing.

Related curriculum

Going deeper? Three tracks back this project.

Cost optimization is the spine. These three curriculums let you go deeper on the fluency this project assumes — query performance, cloud cost models, and warehouse internals.

FAQ

Quick answers.

How is part 01 different from a free Snowflake cost guide?+

Part 01 (free) gives you the exact ACCOUNT_USAGE forensics SQL — top-20 expensive queries with bytes_scanned + partition_scan_pct + spill — runnable offline against seeded data. Most free guides hand you the credit-pricing math; this one hands you the queries.

What's NOT in scope?+

No Airflow / Dagster orchestration code (Snowflake Tasks handle scheduling). No multi-cloud — Snowflake-only, though the patterns transfer to BigQuery slot reservations and Redshift WLM. No real-time streaming costs (Snowpipe + Snowpark Container Services not covered). No auto-remediation — the alerts notify; humans decide whether to pause.

How does this differ from DataGuard Observability?+

DataGuard watches data quality + freshness. This project watches the bill. Same primitives (dbt, ACCOUNT_USAGE, anomaly detection), different fact tables. They pair well — DataGuard tells you when something broke, this tells you when something got expensive.

Do I need a Snowflake account?+

No — the starter kit ships 5 seeded ACCOUNT_USAGE CSVs (10K query rows, 5K metering rows, 1K storage rows) you can run dbt models against offline. A Snowflake free trial unlocks Part 01's forensics SQL on real data.

Is the $300K → $120K (60%) savings claim realistic?+

It's the validated number for the seeded FinScale dataset — each part's optimization compounds against the dbt models that compute the running total. Real-world savings vary; most teams who do this work end up at 30-50% in 6 months. The 60% figure is the project's narrative ceiling, not a guaranteed outcome.

What does PRO actually unlock for $29/mo?+

Parts 02-04 (compute right-sizing, storage compaction, FinOps dashboard + governance), the starter kit ZIP, the seeded sample data, plus 4 code-review credits + 2 office-hours sessions per month and access to all 15+ PRO projects + curriculum across all 7 tracks.

Will this help with senior+ data engineering interviews?+

Yes. System-design rounds for senior+/staff DE roles increasingly include cost questions. After this you can talk through warehouse right-sizing rubrics, explain why partition pruning matters at the credit level, and reason about chargeback vs. showback trade-offs with concrete numbers.

Related projects

Paired with this project

P09·PAID·ai

AI cost optimization (CostGuard)

Cost-aware LLM platform: token tracking, dual-tier cache, 4-strategy router, three-tier budget governance. 5 ADRs + cost-model CSV bundled.

Explore project →

Ready to cut the bill?

Start with Part 01 — free, no card. About 2 hours. By the end you'll have the top-20 expensive queries ranked, the warehouse utilization audited, and a hit list with a dollar figure next to every line.

See PRO benefits

P26 · Cloud Cost Optimization · PRO · part 01 freeUpgrade to PRO →

Cut a$300KSnowflake bill to $120K

Cost is the line item every CFO is watching in 2026.

Immediate, measurable ROI

The promotion bullet

Senior+ system-design hook

Patterns transfer

Part 01 is free. The rest unlocks with PRO.

Cost Optimization for Data Engineers

Three sprints. Three checkpoints. One $180K saved.

Runs offline. Real Snowflake credentials optional.

What lives in the repo

Cloud Cost Optimization Starter Kit

The same playbook — but built for the real account.

Real review from senior engineers who’ve cut bills.

4 reviews / month

2 office hours / month

One subscription. 15+ projects, all curriculum, code review.

Pick this if you own the cloud bill, not just write the queries.

Senior data engineers

Platform / FinOps engineers

Analytics engineers running dbt

Engineering managers / tech leads

Going deeper? Three tracks back this project.

SQL Mastery for Data Engineers

Cloud Data Infrastructure & FinOps

Warehouse Internals & Query Performance

Quick answers.

Paired with this project

Ready to cut the bill?

Cut a
$300K
Snowflake bill to $120K