Skip to content
ai-de.net/Projects/P26 · Cloud Cost Optimization on Snowflake
PRO · part 01 free previewPlatform trackP26

Cut a
$300K
Snowflake bill to $120K

FinScale Analytics got a $300K invoice and leadership wants answers. You mine ACCOUNT_USAGE, right-size warehouses, compact Iceberg files, and ship a dbt-powered FinOps dashboard with Z-score anomaly alerts — all runnable offline against pre-built sample data.

Timeline
8-10 hours
Difficulty
Intermediate-Senior
Stack
Snowflake · dbt · Iceberg · Streamlit

The cost-engineering question every senior+ DE round asks at Instacart, DoorDash, Shopify and any company whose Snowflake bill has its own line on the P&L.

By the end you will have shipped
  • A SQL forensics suite over ACCOUNT_USAGE (top-20 expensive queries, idle warehouses, storage hot tables)
  • Right-sized warehouse DDL with auto-suspend, multi-cluster scaling, and resource-monitor guardrails
  • An Iceberg compaction pipeline (binpack + 256MB target) and a 4-tier time-travel retention policy
  • A dbt project (~9 models) computing daily cost attribution + monthly team chargeback
  • Z-score anomaly detection running as a daily Snowflake Task with email alerts
  • A Streamlit (or Grafana) cost dashboard + a FinOps weekly-review SQL + responsibility matrix
PREREQComfortable with SQL (CTEs, window functions) and dbt fundamentals. No prior Snowflake admin experience required — the starter kit ships seeded ACCOUNT_USAGE CSVs you can run locally before plugging in a free trial.
snowflake.account_usage.* · FinOps spine
anomaly task armed
Account_usage
Staging
Marts
Alerts + dash
QUERY_HISTORY
WAREHOUSE_METERING
TABLE_STORAGE_METRICS
STORAGE_USAGE
TEAM_MAPPING
5 views · 45-min latency
stg_query_historyincremental · per-query $
stg_warehouse_metering
stg_team_mapping
compaction.py · 256MB
dbt incremental
fct_daily_costs7d rolling avg · scan eff
fct_team_chargeback
fct_cost_anomalies
dim_cost_dashboard
~9 dbt marts
snowflake_task · 8AMz-score > 3 → CRITICAL
SYSTEM$SEND_EMAIL
streamlit · KPI cards
weekly-review.sql
governance spine
# Partition-prune rewrite
gb_scanned: 100 → 2 · cluster on event_date
partitions_scanned: 4096 → 31
credit-burn 50× cheaper · same query
→ top-20 list ranks the rewrites in $ order
● Z-score anomaly auto-page
daily_cost vs 30d baseline · σ window
z > 3 = CRITICAL_SPIKE · z > 2 = WARN
Snowflake Task @ 8AM EST → email
→ SYSTEM$SEND_EMAIL · responsibility matrix
$300K → $120K
annual savings
5
ACCOUNT_USAGE views
~9
dbt models
Why cost engineering, why now

Cost is the line item every CFO is watching in 2026.

FinOps roles grew 35% YoY. Snowflake customers routinely waste 40-60% of compute on unoptimized queries. The engineer who can prove $180K in documented savings has outsized career leverage — and a portfolio bullet that recruiters actually read.

Immediate, measurable ROI

Unlike feature work that takes months to attribute, cost optimization shows up in the next invoice. $180K saved is $180K saved.

The promotion bullet

“I cut our Snowflake bill from $300K to $120K” is the kind of resume line that gets you on staff-level interview loops. Cost wins are unambiguous.

Senior+ system-design hook

Interviewers at Instacart, DoorDash, and Shopify ask cost questions because they pay the bill. This project gives you firsthand data to answer them.

Patterns transfer

Tutorial ships Snowflake code, but the playbook (forensics → right-size → tier storage → govern) maps cleanly to BigQuery slot reservations and Redshift WLM.

Curriculum · 4 parts · 8-10 hours

Part 01 is free. The rest unlocks with PRO.

Try the first 2 hours — connect to ACCOUNT_USAGE (or run against the seeded CSVs), find the 20 most expensive queries, and walk away with a dollar-figure hit list. If it clicks, upgrade to unlock compute right-sizing, storage compaction, and the FinOps dashboard.

P26 · 8-10 hours · 4 parts
Free preview PRO required
Part 01 is free — no card required. See the forensics SQL before paying.
M01
Audit — query forensics + cost baseline
Connect to ACCOUNT_USAGE (or seeded CSVs). Profile the top-20 most expensive queries with bytes_scanned + partition_scan_pct + spill metrics. Audit warehouse utilization, break down storage by table, and build a prioritized hit list with dollar impact.
2h8 lessonsFREE PREVIEW
Start →
M02
Optimize — warehouse tuning + compute right-sizing
Right-size warehouses with a sizing rubric on bytes_scanned + spill_pct. Configure auto-suspend per workload, multi-cluster scaling for ETL, role-based isolation, and resource monitors as guardrails. Eliminate Spark shuffle spills with repartition tuning.
2-3h10 lessonsPRO TIER
Unlock with PRO →
M03
Prevent — storage compaction + lifecycle
Diagnose small files in TABLE_STORAGE_METRICS. Run an Iceberg bin-pack compaction pipeline targeting 256MB files. Tier time-travel retention into 4 buckets. Cluster the 5-10 hottest tables. Reclaim TBs from zombie clones.
2-3h10 lessonsPRO TIER
Unlock with PRO →
M04
Systemize — dbt + dashboard + Z-score alerts
Ship a dbt project: stg_query_history, stg_warehouse_metering, fct_daily_costs, fct_team_chargeback, fct_cost_anomalies (Z-score). Wire a daily Snowflake Task with SYSTEM$SEND_EMAIL alerts. Build a Streamlit (or Grafana) dashboard. Export the weekly-review SQL + responsibility matrix.
2h9 lessonsPRO TIER
Unlock with PRO →
3 parts locked · Unlock all PRO content for $29/mo
Upgrade to PRO →
Backed by curriculum

Cost Optimization for Data Engineers

10 modules·14 hours·FinOps·ACCOUNT_USAGE·warehouse tuning·chargeback·anomaly detection
Open curriculum

This curriculum is the foundation for the project — not a sales add-on. PRO subscribers get full access to every module.

The build, in 3 phases

Three sprints. Three checkpoints. One $180K saved.

Each phase ends with a tagged commit, a runnable artifact, and a validated dollar drop.

01~2h
Audit — find the money

ACCOUNT_USAGE forensics complete. Top-20 expensive queries ranked by credits, warehouse utilization audited, storage cost broken down by table. Baseline locked at $300K.

  • Top-20 expensive-query report (cloud_credits, gb_scanned, partition_scan_pct, spill)
  • Warehouse utilization audit + idle-warehouse list
  • Storage cost breakdown + cost_multiplier (active + time-travel + fail-safe) per table
02~5h
Cut — compute, then storage

Compute right-sized with auto-suspend + multi-cluster + resource monitors ($300K → $180K). Storage compacted with binpack + clustering + 4-tier retention ($180K → $120K).

  • ALTER WAREHOUSE DDL + auto-suspend policies + resource monitors
  • Iceberg rewrite_data_files compaction (256MB target) + clustering keys
  • 4-tier time-travel retention + zombie-clone cleanup
03~2h
Govern — keep it down

dbt project shipped (staging + marts). Z-score anomaly detection running as a daily Snowflake Task with email alerts. Cost dashboard live. Weekly-review SQL exported.

  • dbt project: ~9 models with is_incremental() and team chargeback
  • Z-score anomaly Task + SYSTEM$SEND_EMAIL alerts
  • Streamlit (or Grafana) dashboard + weekly-review SQL + responsibility matrix
Project setup · 10 minutes

Runs offline. Real Snowflake credentials optional.

The starter kit ships seeded ACCOUNT_USAGE CSVs and a self-running acceptance gate so you can build the dbt project before you ever plug in a trial account.

What lives in the repo

Everything you need to run all 4 parts on your laptop, plus the seed scripts that simulate Snowflake's ACCOUNT_USAGE views with realistic shapes and rowcounts.

  • seeds/ — 5 ACCOUNT_USAGE CSVs (query_history, warehouse_metering, table_storage, storage_usage, team_mapping)
  • models/ — dbt staging + marts (~9 models): chargeback, anomalies, daily costs
  • compaction/ — PySpark Iceberg compaction + clustering scripts
  • tasks/ — Snowflake Task + SYSTEM$SEND_EMAIL alert wiring
  • dashboards/ — Streamlit app + Grafana panel queries
  • scripts/validate_seed.py — offline acceptance gate (no Snowflake required)
Download · Starter Kit

Cloud Cost Optimization Starter Kit

Pre-extracted dbt models, Spark/Iceberg compaction, Streamlit dashboard, plus 5 seeded ACCOUNT_USAGE CSVs and the offline acceptance gate. Skip the boilerplate, start on Part 01.

~790 KB · 76 files · 5 ACCOUNT_USAGE seed CSVs · PRO required
~/projects/cloud-cost-optimization — zsh
1. Install Python deps
$ python3 -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt
2. Validate offline (no Snowflake required)
$ python scripts/validate_seed.py
3. Configure dbt profile (only if running on real Snowflake)
$ cp profiles.yml.template profiles.yml
$ export SNOWFLAKE_ACCOUNT=... SNOWFLAKE_USER=... SNOWFLAKE_PASSWORD=...
4. Run dbt against your trial account
$ dbt deps --profiles-dir .
$ dbt seed --profiles-dir . --target dev
$ dbt run --profiles-dir . --target dev --select +fct_daily_costs
10K
QUERY_HISTORY rows
5K
WAREHOUSE_METERING
1K
TABLE_STORAGE_METRICS
45-min
ACCOUNT_USAGE latency
Production hardening

The same playbook — but built for the real account.

The tutorial ships against seeded ACCOUNT_USAGE CSVs and a single Snowflake account. Production requires orchestration, RBAC, hard budget caps, and team distribution. Here’s the diff.

Tutorial versionWhat you ship in 8-10 hrs
×
Cost data source
Seeded ACCOUNT_USAGE CSVs, runnable offline
×
Pipeline orchestration
Daily dbt run by hand or cron
×
Anomaly response
Z-score email alert via Snowflake Task
×
Dashboard access
Streamlit/Grafana with shared creds
×
Team chargeback
fct_team_chargeback fact table
×
Account scope
Single Snowflake account
Production versionModules 02-04 + ops
Cost data source
Real ACCOUNT_USAGE with IMPORTED PRIVILEGES grant + 45-min staleness tolerance
Pipeline orchestration
Airflow / Dagster / Snowflake Task DAG with retries, lineage, and freshness SLAs
Anomaly response
Hard budget caps via RESOURCE_MONITOR ... SUSPEND + auto-pause routing
Dashboard access
SSO + RBAC + per-team row-level filters on chargeback marts
Team chargeback
Scheduled per-team email/Slack rollups + invoice templates per cost center
Account scope
Multi-account / multi-region rollup via ORGANIZATION_USAGE views (Enterprise+)
PRO benefit · code review

Real review from senior engineers who’ve cut bills.

Submit your repo, get line-by-line feedback within 48 hours from someone who has actually owned a 7-figure Snowflake bill. The kind of review that's quietly worth thousands of dollars in time-to-staff.

CR

4 reviews / month

Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — Snowflake/FinOps for this project. Async, comments inline, average turnaround 31 hours.

31h
avg turnaround
9.2/10
helpfulness
94%
return next month
OH

2 office hours / month

Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard your warehouse-sizing rubric, mock a system-design interview on cost. Group sessions also available.

30 min
per session
2 / mo
included
+ group
unlimited
What PRO unlocks

One subscription. 15+ projects, all curriculum, code review.

PRO is built for senior+ engineers who want production-grade builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT
Projects
Production-grade builds
2
15+
8
Curriculum modules
All 7 tracks
Phase 1 only
All
All + bonus
Code review credits
Senior engineer review
0
4 / month
Unlimited
Career path access
5 paths × full plans
1 path
All 5
All 5 + 1:1
Certificate
Verifiable on LinkedIn
Yes
Yes + portfolio review
Community
Discord + office hours
Read-only
Full + 2/mo
Full + 4/mo
$29/mo
billed monthly · cancel anytime
or annual
$249/yr save 28%
Upgrade to PRO
Who this is for

Pick this if you own the cloud bill, not just write the queries.

DE

Senior data engineers

You've shipped dbt + warehouse pipelines and now your CFO is asking why the bill is up 40% YoY. This gives you the forensics + governance to answer.

PE

Platform / FinOps engineers

You're building cost visibility for 10+ teams. You need a chargeback fact table, anomaly detection, and a budget-cap pattern that actually pauses spend.

AE

Analytics engineers running dbt

You're already in dbt. Adding cost models alongside your business marts turns you into the person finance calls before they call the head of data.

EM

Engineering managers / tech leads

You sign off on warehouse spend. This is the project that lets you reason about right-sizing trade-offs and approve a multi-cluster reservation without guessing.

FAQ

Quick answers.

Part 01 (free) gives you the exact ACCOUNT_USAGE forensics SQL — top-20 expensive queries with bytes_scanned + partition_scan_pct + spill — runnable offline against seeded data. Most free guides hand you the credit-pricing math; this one hands you the queries.
No Airflow / Dagster orchestration code (Snowflake Tasks handle scheduling). No multi-cloud — Snowflake-only, though the patterns transfer to BigQuery slot reservations and Redshift WLM. No real-time streaming costs (Snowpipe + Snowpark Container Services not covered). No auto-remediation — the alerts notify; humans decide whether to pause.
DataGuard watches data quality + freshness. This project watches the bill. Same primitives (dbt, ACCOUNT_USAGE, anomaly detection), different fact tables. They pair well — DataGuard tells you when something broke, this tells you when something got expensive.
No — the starter kit ships 5 seeded ACCOUNT_USAGE CSVs (10K query rows, 5K metering rows, 1K storage rows) you can run dbt models against offline. A Snowflake free trial unlocks Part 01's forensics SQL on real data.
It's the validated number for the seeded FinScale dataset — each part's optimization compounds against the dbt models that compute the running total. Real-world savings vary; most teams who do this work end up at 30-50% in 6 months. The 60% figure is the project's narrative ceiling, not a guaranteed outcome.
Parts 02-04 (compute right-sizing, storage compaction, FinOps dashboard + governance), the starter kit ZIP, the seeded sample data, plus 4 code-review credits + 2 office-hours sessions per month and access to all 15+ PRO projects + curriculum across all 7 tracks.
Yes. System-design rounds for senior+/staff DE roles increasingly include cost questions. After this you can talk through warehouse right-sizing rubrics, explain why partition pruning matters at the credit level, and reason about chargeback vs. showback trade-offs with concrete numbers.

Ready to cut the bill?

Start with Part 01 — free, no card. About 2 hours. By the end you'll have the top-20 expensive queries ranked, the warehouse utilization audited, and a hit list with a dollar figure next to every line.

P26 · Cloud Cost Optimization · PRO · part 01 freeUpgrade to PRO →
Press Cmd+K to open