Skip to content

How to Reduce Data Warehouse Costs: Step-by-Step Guide

Reduce data warehouse costs in 6 steps: audit to find waste → enable auto-suspend → add partition pruning → eliminate SELECT * → right-size compute → set budget alerts. Teams with no prior optimization typically see 40–70% cost reduction from this first pass, with the majority coming from auto-suspend and partition pruning alone.

1

Audit: find where money is going

Query account usage views to identify the top 20 most expensive queries, users, and warehouses. You cannot optimize what you cannot measure. Run this audit first before making any changes.

-- Snowflake: top 20 queries by credit spend (last 14 days)
SELECT
  query_text,
  user_name,
  warehouse_name,
  warehouse_size,
  ROUND(total_elapsed_time / 1000, 1)   AS seconds,
  ROUND(credits_used_cloud_services, 5) AS credits,
  bytes_scanned / 1e9                   AS gb_scanned
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -14, CURRENT_TIMESTAMP)
  AND execution_status = 'SUCCESS'
ORDER BY credits DESC
LIMIT 20;
2

Enable auto-suspend on all warehouses

Set auto-suspend to 60 seconds for interactive warehouses, 5 minutes for scheduled ETL. This single change is the highest-ROI optimization in almost every Snowflake environment.

-- Snowflake: update all warehouses to auto-suspend
ALTER WAREHOUSE analytics_wh
  SET AUTO_SUSPEND = 60        -- seconds of inactivity before suspend
      AUTO_RESUME  = TRUE;     -- auto-resume on next query

ALTER WAREHOUSE etl_wh
  SET AUTO_SUSPEND = 300;      -- 5 min for scheduled ETL jobs

-- Verify current settings
SHOW WAREHOUSES;
-- Check: AUTO_SUSPEND column should be 60 or 300, not 0 (never suspend)
3

Add partitioning and eliminate full scans

Partition large tables by date. Add WHERE filters on the partition column to every query. In BigQuery, unfiltered queries scan the entire table — partitioning is mandatory for cost control at scale.

-- BigQuery: create partitioned + clustered table
CREATE TABLE analytics.orders
PARTITION BY DATE(created_date)
CLUSTER BY customer_id, status
AS SELECT * FROM analytics.orders_raw;

-- Every query must filter on the partition column
SELECT order_id, customer_id, amount_usd
FROM analytics.orders
WHERE created_date BETWEEN '2025-01-01' AND '2025-01-31'  -- partition filter
  AND status = 'completed';                                -- cluster filter

-- Check bytes that would be billed before running:
-- Run with INFORMATION_SCHEMA.JOBS or dry-run in console
4

Eliminate SELECT * in all production queries

In columnar warehouses, SELECT * scans every column even when only a few are needed. Replace with explicit column lists in all dbt models and production queries.

-- BAD: scans all 200 columns even though you need 4
SELECT * FROM orders WHERE created_date = CURRENT_DATE;

-- GOOD: only the 4 needed columns are scanned
SELECT order_id, customer_id, amount_usd, status
FROM orders
WHERE created_date = CURRENT_DATE;

-- dbt: add to your project's sqlfluff or SQLFluff CI to block SELECT *
# .sqlfluff
[sqlfluff:rules:RF02]
# Disallow wildcard column references

5

Right-size warehouses to match SLAs

Test each scheduled pipeline at progressively smaller warehouse sizes. Use the smallest size where the pipeline meets its SLA. Document the minimum size for each pipeline.

-- Snowflake: test pipeline at multiple warehouse sizes
-- Step 1: tag your ETL queries
ALTER SESSION SET QUERY_TAG = 'etl_nightly_v1';

-- Step 2: run at S, M, L and compare
ALTER WAREHOUSE etl_wh SET WAREHOUSE_SIZE = 'SMALL';
-- run your pipeline...
ALTER WAREHOUSE etl_wh SET WAREHOUSE_SIZE = 'MEDIUM';
-- run again...

-- Step 3: compare results
SELECT warehouse_size,
       AVG(total_elapsed_time) / 60000 AS avg_minutes,
       AVG(credits_used_cloud_services) AS avg_credits
FROM snowflake.account_usage.query_history
WHERE query_tag = 'etl_nightly_v1'
GROUP BY 1
ORDER BY avg_credits;
-- Pick the smallest size where avg_minutes < SLA target
6

Set budget alerts and cost attribution

Create resource monitors that notify or suspend warehouses when spend exceeds a threshold. Tag all queries and warehouses by team so each team sees and owns their costs.

-- Snowflake: resource monitor with auto-suspend at 110% of budget
CREATE RESOURCE MONITOR analytics_budget
  WITH CREDIT_QUOTA = 500        -- monthly credit budget
  TRIGGERS
    ON 75 PERCENT DO NOTIFY      -- alert at 75%
    ON 100 PERCENT DO NOTIFY     -- alert at 100%
    ON 110 PERCENT DO SUSPEND;   -- auto-suspend at 110%

ALTER WAREHOUSE analytics_wh
  SET RESOURCE_MONITOR = analytics_budget;

-- Tag queries by team for attribution
ALTER SESSION SET QUERY_TAG = 'team:marketing|pipeline:daily_report';

Expected Savings by Technique

TechniqueTypical savingsEffort
Auto-suspend (60s)30–50% compute costLow — 1 ALTER per warehouse
Partition pruning50–90% scan cost on large tablesMedium — add filters to queries
Eliminate SELECT *30–80% column scan costMedium — update all dbt models
Warehouse right-sizing25–60% compute costMedium — test and document SLAs
Storage tiering (cold)50–80% storage cost on old dataLow — set lifecycle policy
Budget alerts0% direct — prevents overrunsLow — 1 resource monitor setup

Common Issues

Auto-suspend breaks streaming or always-on use cases

Warehouses that serve real-time dashboards or always-on applications should use a longer auto-suspend (10–15 min) or dedicated always-on warehouses. Never set 0 (no suspend) — always have a safety net.

Partition pruning not working — no savings

Check your WHERE clause includes the exact partition column, not a derived expression. In BigQuery: WHERE DATE(created_at) = '2025-01-01' does NOT prune partitions. Use WHERE created_date = '2025-01-01' on a DATE partition column.

Budget alerts fire but no one acts

Alerts only reduce costs if someone responds. Route alerts to a Slack channel with an on-call rotation and a runbook: "when this fires, suspend warehouse X and file a cost review ticket."

FAQ

What is the fastest way to reduce Snowflake costs?
Enable auto-suspend at 60 seconds on all warehouses. This alone reduces most teams' Snowflake bill by 30–50% immediately by eliminating idle compute.
How do you reduce BigQuery costs?
Partition tables by date, always filter on the partition column, and never use SELECT *. These three changes address the root cause of BigQuery overspend: scanning too much data.
How much can you realistically save?
40–70% from the first optimization pass is typical. Auto-suspend + partition pruning + column projection compound: each reduces a different cost component independently.

Related

Press Cmd+K to open