Skip to content

Skill Toolkits

Production-grade playbooks, architecture templates, and deep dives for modern data engineering.

31 toolkits

Core DE7 Modules

SQL Mastery for Data Engineers

Move beyond basic SELECT statements. Learn to write highly optimized, analytical SQL to process massive datasets, handle data skew, and architect robust dimensional models for the enterprise.

PostgreSQLWindow FunctionsCTEsQuery Plans
~8h
Free
Core DE7 Modules

Python for Data Engineers

Transition from messy Pandas scripts to production-grade software engineering. Build fault-tolerant, object-oriented ETL pipelines with robust testing and CI/CD integration.

PythonPolarsPySparkpytest
~17h
Pro
Core DE7 Modules

Advanced Data Modeling & Architecture

Master enterprise data architecture. Design Kimball dimensional models, track Slowly Changing Dimensions (SCDs), and build AI-ready warehouse schemas.

KimballStar SchemaSCDsdbt
~12h
Pro
Core DE7 Modules

API & External System Integration

Stop dropping data from flaky endpoints. Master REST, OAuth, and streaming API ingestion patterns to build resilient, rate-limit-aware connectors for enterprise data platforms.

RESTOAuthPaginationWebhooks
~12h
Pro
Core DE7 Modules

Cloud Data Infrastructure & FinOps

Master cloud data infrastructure across AWS, GCP, and Azure. Architect serverless pipelines, implement IaC, and govern enterprise cloud security.

AWSGCPAzureTerraform
~15h
Pro
Core DE7 Modules

Warehouse Internals & Query Performance

Master how data warehouses execute SQL, partition data, and optimize queries. From reading query plans to staff-level cost strategy on BigQuery, Snowflake, and Spark.

BigQuerySnowflakeSparkEXPLAIN Plans
~7h
Pro
Analytics7 Modules

dbt & Analytics Engineering

Stop writing untestable, monolithic SQL scripts. Master Analytics Engineering by building modular data models, writing reusable Jinja macros, and deploying production-ready semantic layers.

dbt CoreJinjaSnowflakeSemantic Layer
~9h
Free
Analytics7 Modules

Data Observability & Quality

Prevent silent data bugs before the CEO sees a broken dashboard. Implement automated anomaly detection, rigorous dbt testing, and strict data SLAs using modern observability platforms.

Great ExpectationsMonte Carlodbt TestsSLAs
~8h
Pro
Analytics7 Modules

Governance & Data Contracts

Stop downstream breakages from upstream schema changes. Design enterprise data governance frameworks with strict schema contracts, lineage tracking, and compliance controls.

Data ContractsLineageSchema RegistryGDPR
~14h
Pro
Analytics7 Modules

Cost Optimization for Data Engineers

Stop burning money on inefficient queries. Master cloud FinOps across Snowflake and BigQuery to identify orphaned tables, tune compute engines, and slash your data infrastructure bill.

SnowflakeBigQueryFinOpsQuery Tuning
~14h
Pro
Analytics7 Modules

Product Thinking for Data Engineers

Bridge the gap between data engineering and business impact — master KPIs, experimentation infrastructure, stakeholder communication, and data strategy for career advancement.

KPIsA/B TestingExperimentationProduct Analytics
~14h
Pro
Data Platform7 Modules

System Design for Data Engineers

Ace the Staff-level system design interview. Master data platform architecture patterns, evaluating tradeoffs between ingestion, distributed storage, and serving layers at massive scale.

ArchitectureDistributed SystemsStorage TradeoffsScale
~16h
Pro
Data Platform10 Modules

Apache Airflow: Production Orchestration

Move beyond simple cron jobs. Master Apache Airflow to build idempotent, resilient DAGs, integrate with external systems securely, and deploy at scale on Kubernetes.

Apache AirflowDAGsKubernetesKubernetesPodOperator
~205h
Pro
Data Platform7 Modules

DataOps: CI/CD & Infrastructure as Code

Eliminate manual deployment errors. Build reliable DataOps pipelines with git workflows, multi-environment deployments, and Infrastructure-as-Code for production-grade reliability.

GitHub ActionsTerraformHelmDocker
~14h
Pro
Data Platform7 Modules

Apache Iceberg & Modern Lakehouse Architecture

Modernize your legacy data lake. Implement Apache Iceberg to enable ACID transactions, time-travel queries, and schema evolution for multi-engine table formats at petabyte scale.

Apache IcebergACIDTime TravelSchema Evolution
~11h
Pro
Data Platform Modules

Staff Data Engineer: Leadership & Architecture

Bridge the gap between Senior and Staff. Master the art of writing technical RFCs, leading architecture reviews, and driving cross-team engineering standards.

RFCsArchitecture ReviewsOKRsStaff+ Skills
~13h
Pro
Streaming Modules

Real-Time Streaming Architecture

Master the mechanics of distributed streaming. Build fault-tolerant real-time pipelines handling stateful processing, watermarks, and delivery guarantees.

KafkaWatermarksDelivery GuaranteesStateful Processing
~13h
Pro
Streaming8 Modules

Apache Spark: Distributed Data Processing

Stop running out of memory on large datasets. Master Apache Spark for distributed data processing, diving deep into engine mechanics, performance tuning, and Kubernetes deployments.

Apache SparkPySparkKubernetesDelta Lake
~110h
Pro
Streaming7 Modules

Apache Flink & Stream Processing

Process massive data streams with sub-second latency. Master Apache Flink for stateful real-time pipelines, complex window operations, and reliable CDC integration.

Apache FlinkState ManagementCDCWindowing
~14h
Pro
Streaming7 Modules

Kafka Streams Learning Path

Build low-latency event-driven applications natively. Master Kafka Streams for exactly-once processing, stateful joins, and interactive queries without deploying separate clusters.

Kafka StreamsKStreamKTableExactly-Once
~12h
Pro
Streaming7 Modules

Event Design & Data Contracts

Design clean, reliable event streams that teams can trust. From first-principles event modeling to production-grade schema contracts, validation, and org-wide governance.

KafkaAvroJSON SchemaConfluent
~7h
Pro
AI / MLOps7 Modules

MLOps for Data Engineers

Bridge the gap between data engineering and machine learning. Build production ML systems with feature stores, scalable training infrastructure, and automated model deployment pipelines.

MLflowFeature StoresSeldonModel Monitoring
~16h
Pro
AI / MLOps7 Modules

Feature Stores for ML

Stop re-computing ML features for every model. Build centralized, production feature stores with offline batch pipelines, online serving, and real-time streaming freshness.

FeastTectonRedisPoint-in-Time Joins
~15h
Pro
AI / MLOps13 Modules

Data Curation & Dataset Engineering

Models are only as good as their training data. Architect scalable pipelines to clean, deduplicate, and version massive datasets, ensuring high-fidelity inputs for enterprise ML.

DeduplicationDataset VersioningDVCQuality Filters
~38h
Pro
AI / MLOps14 Modules

Vector Databases & Retrieval Infrastructure

Power the next generation of AI search. Master vector embeddings and databases to build high-performance semantic search, hybrid retrieval, and enterprise RAG applications.

pgvectorPineconeWeaviateHybrid Search
~28h
Pro
AI / MLOps9 Modules

RAG Learning Path

Give LLMs a secure corporate memory. Architect Retrieval-Augmented Generation systems, mastering advanced document chunking, retrieval orchestration, and incremental syncing.

LangChainLlamaIndexChunkingReranking
~35h
Pro
AI / MLOps7 Modules

LLM Data Pipelines Deep Dive

Engineer the data infrastructure behind Generative AI. Build production LLM pipelines covering massive instruction datasets, fine-tuning alignment, and comprehensive LLMOps.

Training DataTokenizationFine-tuningLLMOps
~12h
Pro
AI / MLOps5 Modules

LLM Evaluation

Stop guessing if your LLM is hallucinating. Build automated evaluation systems to test LLM applications for accuracy, bias, and toxicity using multi-judge evaluation frameworks.

RAGASMulti-JudgeBenchmarksBias Testing
~14h
Pro
AI / MLOps7 Modules

Agentic Workflows

Move beyond simple chatbots to autonomous systems. Build production-grade AI agents using LangGraph state machines, resilient tool engineering, and multi-agent orchestration.

LangGraphReActTool UseMulti-Agent
~24h
Pro
AI / MLOps7 Modules

Enterprise Generative AI & LLM Security

Don't leak corporate data to public LLMs. Build secure, compliant, and production-ready Generative AI systems with enterprise RAG, PII protection, and strict governance.

Enterprise RAGPII DetectionRBACCompliance
~21h
Pro
AI / MLOps8 Modules

AI Inference & Serving Systems

Deploy, optimize, and scale AI models in production. From a basic FastAPI wrapper to an enterprise multi-model routing platform — learn every layer of the inference stack.

vLLMFastAPIRedisRay Serve
~9h
Pro
Press Cmd+K to open