Skip to content

Airflow DAGs Explained: What They Are and How They Work

An Airflow DAG (Directed Acyclic Graph) is a Python file that defines a workflow as a graph of tasks and dependencies. Each task is a node; each dependency is a directed edge. The graph is acyclic — no circular dependencies allowed — so Airflow can always determine execution order.

A Minimal DAG

# dags/my_pipeline.py
from airflow.decorators import dag, task
from datetime import datetime

@dag(schedule='@daily', start_date=datetime(2024, 1, 1), catchup=False)
def my_pipeline():

    @task()
    def extract(): return fetch_data()

    @task()
    def load(data): write_to_db(data)

    load(extract())  # dependency: extract → load

my_pipeline()

Core Concepts

DAG

The Workflow Container

Defines schedule, start date, default args, and the task graph. Instantiated by calling the function at module level.

Task

The Unit of Work

A single step — a Python function, a Bash command, a SQL query, or a dbt run. Each task runs in isolation with its own logs.

DAG Run

A Scheduled Execution

Each time Airflow triggers a DAG, it creates a DAG Run with a logical date. Multiple runs can exist simultaneously for different dates.

DAG Schedule Options

schedule=Behavior
'@daily'Run once per day at midnight UTC
'@hourly'Run once per hour
'0 6 * * *'Cron expression: run at 6am UTC daily
'@once'Run exactly once and never again
NoneNever schedule — only trigger manually or via API
timedelta(hours=6)Run every 6 hours using a Python timedelta

Common Mistakes

Forgetting to instantiate the DAG

You must call my_dag() at module level. If you define the function but don't call it, Airflow will never see the DAG.

Circular dependencies

Airflow will raise a AirflowDagCycleException if task A depends on task B and task B depends on task A. Always verify your dependency graph is truly acyclic.

Using dynamic dates in start_date

Never use datetime.now() as start_date — it changes on every parse cycle and breaks backfill. Use a fixed date like datetime(2024, 1, 1).

Top-level code that makes network calls

Any code at the module level (outside task functions) runs on every DAG parse — potentially hundreds of times per minute. Never make database or API calls at DAG-file level.

FAQ

What is an Airflow DAG?
An Airflow DAG (Directed Acyclic Graph) is a Python file that defines a workflow. Each task is a node, each dependency is a directed edge, and the graph is acyclic. The scheduler parses DAG files and runs task instances according to the defined schedule.
What does "acyclic" mean in a DAG?
Acyclic means there are no cycles — task A cannot depend on task B if task B already depends on task A. This guarantees a clear, deterministic execution order.
What is the difference between a DAG and a task in Airflow?
A DAG is the container defining workflow structure, schedule, and defaults. Tasks are the individual work units inside a DAG — each task runs its own code and has its own logs and state.
How does the Airflow scheduler read DAG files?
The scheduler scans the dags/ directory every 30 seconds, imports each Python file, and builds the task graph. If a file fails to parse, it's silently skipped. Check the Import Errors page in the UI to debug.

Related

Press Cmd+K to open