Airflow DAGs Explained: What They Are and How They Work
An Airflow DAG (Directed Acyclic Graph) is a Python file that defines a workflow as a graph of tasks and dependencies. Each task is a node; each dependency is a directed edge. The graph is acyclic — no circular dependencies allowed — so Airflow can always determine execution order.
A Minimal DAG
# dags/my_pipeline.py
from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule='@daily', start_date=datetime(2024, 1, 1), catchup=False)
def my_pipeline():
@task()
def extract(): return fetch_data()
@task()
def load(data): write_to_db(data)
load(extract()) # dependency: extract → load
my_pipeline()Core Concepts
DAG
The Workflow Container
Defines schedule, start date, default args, and the task graph. Instantiated by calling the function at module level.
Task
The Unit of Work
A single step — a Python function, a Bash command, a SQL query, or a dbt run. Each task runs in isolation with its own logs.
DAG Run
A Scheduled Execution
Each time Airflow triggers a DAG, it creates a DAG Run with a logical date. Multiple runs can exist simultaneously for different dates.
DAG Schedule Options
| schedule= | Behavior |
|---|---|
| '@daily' | Run once per day at midnight UTC |
| '@hourly' | Run once per hour |
| '0 6 * * *' | Cron expression: run at 6am UTC daily |
| '@once' | Run exactly once and never again |
| None | Never schedule — only trigger manually or via API |
| timedelta(hours=6) | Run every 6 hours using a Python timedelta |
Common Mistakes
Forgetting to instantiate the DAG
You must call my_dag() at module level. If you define the function but don't call it, Airflow will never see the DAG.
Circular dependencies
Airflow will raise a AirflowDagCycleException if task A depends on task B and task B depends on task A. Always verify your dependency graph is truly acyclic.
Using dynamic dates in start_date
Never use datetime.now() as start_date — it changes on every parse cycle and breaks backfill. Use a fixed date like datetime(2024, 1, 1).
Top-level code that makes network calls
Any code at the module level (outside task functions) runs on every DAG parse — potentially hundreds of times per minute. Never make database or API calls at DAG-file level.
FAQ
- What is an Airflow DAG?
- An Airflow DAG (Directed Acyclic Graph) is a Python file that defines a workflow. Each task is a node, each dependency is a directed edge, and the graph is acyclic. The scheduler parses DAG files and runs task instances according to the defined schedule.
- What does "acyclic" mean in a DAG?
- Acyclic means there are no cycles — task A cannot depend on task B if task B already depends on task A. This guarantees a clear, deterministic execution order.
- What is the difference between a DAG and a task in Airflow?
- A DAG is the container defining workflow structure, schedule, and defaults. Tasks are the individual work units inside a DAG — each task runs its own code and has its own logs and state.
- How does the Airflow scheduler read DAG files?
- The scheduler scans the dags/ directory every 30 seconds, imports each Python file, and builds the task graph. If a file fails to parse, it's silently skipped. Check the Import Errors page in the UI to debug.