Mô tả

Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.

It is scalable, dynamic, extensible, and modulable.

Without any doubt, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data.

What you will learn in the course:

  • Fundamentals of Airflow are explained such as what Airflow is, how the scheduler and the web server work

  • The Forex Data Pipeline project is an incredible way to discover many operators in Airflow and deal with Slack, Spark, Hadoop, and more

  • Mastering your DAGs is a top priority, and you can play with timezones, unit test your DAGs, structure your DAG folder, and much more.

  • Scaling Airflow through different executors such as the Local Executor, the Celery Executor, and the Kubernetes Executor will be explained in detail. You will discover how to specialize your workers, add new workers, and what happens when a node crashes.

  • A Kubernetes cluster of 3 nodes will be set up with Rancher, Airflow, and the Kubernetes Executor local to run your data pipelines.

  • Advanced concepts will be shown through practical examples such as templating your DAGs, how to make your DAG dependent on another, what are Subdags and deadlocks, and more.

  • You will set up a Kubernetes cluster in the cloud with AWS EKS and Rancher to use Airflow and the Kubernetes Executor.

  • Monitoring Airflow is extremely important! That's why you will know how to do it with Elasticsearch and Grafana.

  • Security will also be addressed to make your Airflow instance compliant with your company. Specifying roles and permissions for your users with RBAC, preventing them from accessing the Airflow UI with authentication and password,  data encryption, and more.

In addition:

  • Many practical exercises are given along the course so that you will have occasions to apply what you learn.

  • Best practices are stated when needed to give you the best ways of using Airflow.

  • Quiz are available to assess your comprehension at the end of each section.

  • Answering your questions fast is my top priority, and I will do my best for you.

I put a lot of effort into giving you the best content, and I hope you will enjoy it as much as I wanted to do it.

At the end of the course, you will be more confident than ever in using Airflow.

I wish you a great success!

Marc Lamberti

Bạn sẽ học được gì

Coding Production Grade Data pipelines by Mastering Airflow through Hands-on Examples

How to Follow Best Practices with Apache Airflow

How to Scale Airflow with the Local, Celery and Kubernetes Wxecutors

How to Set Up Monitoring with Elasticsearch and Grafana

How to Secure Airflow with authentication, crypto and the RBAC UI

Core and Advanced Concepts with Pros and Limitations

Mastering DAGs with timezones, unit testing, backfill and catchup

Organising the DAG folder and keep things clean

Yêu cầu

  • Notions of Docker and Python
  • Virtual Box installed (Only for local Kubernetes cluster part)
  • Vagrant installed
  • The course "The Complete Hands-On Introduction to Apache Airflow" can be a nice plus.

Nội dung khoá học

10 sections

Introduction

4 lectures
Important Prerequisites
01:09
The Roadmap
01:42
Who I am?
01:03
Development Environment
01:26

The basics of Apache Airflow

9 lectures
Introduction
00:57
Why Airflow?
01:05
What is Airflow?
06:48
How Airflow works?
07:02
The little secret of the webserver and the scheduler
01:12
[Practice] Installing Airflow
19:08
[Practice] Quick tour of Airflow UI
11:16
[Practice] Quick tour of Airflow CLI
09:50
Quick side note
00:07

The Forex Data Pipeline

19 lectures
Introduction
01:42
Docker reminder
14:11
Docker performances
01:55
Project: The Forex Data Pipeline
05:25
A bit more about the architecture
00:32
What is a DAG?
02:45
[Practice] Define your DAG
09:34
What is an Operator?
03:50
[Practice] Check if the API is available - HttpSensor
14:43
[Practice] Check if the currency file is available - FileSensor
10:02
[Practice] Download the forex rates from the API - PythonOperator
08:21
[Practice] Save the forex rates into HDFS - BashOperator
07:06
[Practice] Create the Hive table forex_rates - HiveOperator
08:35
[Practice] Process the forex rates with Spark - SparkSubmitOperator
09:09
[Practice] Send email notifications - EmailOperator
08:54
[Practice] Send Slack notifications - SlackWebhookOperator
09:17
[Practice] Add dependencies between tasks
06:18
[Practice] The Forex Data Pipeline in action!
03:36
Congratulations!
00:07

Mastering your DAGs

16 lectures
Introduction
00:49
Start_date and schedule_interval parameters demystified
06:46
[Practice] Manipulating the start_date with schedule_interval
11:03
Backfill and Catchup
04:01
[Practice] Catching up non triggered DAGRuns
14:58
Dealing with timezones in Airflow
06:50
[Practice] Making your DAGs timezone aware
13:54
How to make your tasks dependent
03:57
[Practice] Creating task dependencies between DagRuns
12:26
How to structure your DAG folder
04:38
[Practice] Organizing your DAGs folder
09:34
[Practice] How the Web Server works
07:16
How to deal with failures in your DAGs
04:19
[Practice] Retry and Alerting
18:32
How to test your DAGs
07:17
[Practice] Unit testing your DAGs
14:11

Improving your DAGs with advanced concepts

15 lectures
Introduction
00:55
Minimising Repetitive Patterns With SubDAGs
02:36
[Practice] Grouping your tasks with SubDAGs and Deadlocks
09:49
Making different paths in your DAGs with Branching
03:10
[Practice] Make Your First Conditional Task Using Branching
09:47
Trigger rules for your tasks
04:38
[Practice] Changing how your tasks are triggered
13:13
Avoid hard coding values with Variables, Macros and Templates
04:40
[Practice] Templating your tasks
18:32
How to share data between your tasks with XCOMs
03:59
[Practice] Sharing (big?) data with XCOMs
09:58
TriggerDagRunOperator or when your DAG controls another DAG
02:17
[Practice] Trigger a DAG from another DAG
05:24
Dependencies between your DAGs with the ExternalTaskSensor
04:42
[Practice] Make your DAGs dependent with the ExternalTaskSensor
03:47

Distributing Apache Airflow

16 lectures
Introduction
01:03
Sequential Executor with SQLite
03:38
Local Executor with PostgreSQL
07:17
[Practice] Executing tasks in parallel with the Local Executor
18:35
[Practice] Ad Hoc Queries with the metadata database
15:39
Scale out Apache Airflow with Celery Executors and Redis
05:01
[Practice] Set up the Airflow cluster with Celery Executors and Docker
07:01
[Practice] Distributing your tasks with the Celery Executor
11:15
[Practice] Adding new worker nodes with the Celery Executor
20:59
[Practice] Sending tasks to a specific worker with Queues
12:44
[Practice] Pools and priority_weights: Limiting parallelism - prioritizing tasks
11:18
Kubernetes Reminder
07:00
Scaling Airflow with Kubernetes Executors
05:16
[Practice] Set up a 3 nodes Kubernetes Cluster with Vagrant and Rancher
10:51
[Practice] Installing Airflow with Rancher and the Kubernetes Executor
09:55
[Practice] Running your DAGs with the Kubernetes Executor
10:45

Deploying Airflow on AWS EKS with Kubernetes Executors and Rancher

10 lectures
Introduction
01:28
Quick overview of AWS EKS
03:45
[Practice] Set up an EC2 instance for Rancher
08:17
[Practice] Create an IAM User with permissions
02:34
[Practice] Create an ECR repository
06:49
[Practice] Create an EKS cluster with Rancher
06:21
How to access your applications from the outside
04:19
[Practice] Deploy Nginx Ingress with Catalogs (Helm)
04:56
[Practice] Deploy and run Airflow with the Kubernetes Executor on EKS
05:21
[Practice] Cleaning your AWS services
02:50

Monitoring Apache Airflow

11 lectures
Introduction
01:28
How the logging system works in Airflow
03:43
[Practice] Setting up custom logging
17:16
[Practice] Storing your logs in AWS S3
14:40
Elasticsearch Reminder
04:13
[Practice] Configuring Airflow with Elasticsearch
18:08
[Practice] Monitoring your DAGs with Elasticsearch
10:40
Introduction to metrics
04:33
[Practice] Monitoring Airflow with TIG stack
12:12
[Practice] Triggering alerts for Airflow with Grafana
11:30
Airflow maintenance DAGs
02:59

Security in Apache Airflow

7 lectures
Introduction
00:54
[Practice] Encrypting sensitive data with Fernet
16:54
[Practice] Rotating the Fernet Key
07:19
[Practice] Hiding variables
03:24
[Practice] Password authentication and filter by owner
09:38
[Practice] RBAC UI
14:15
What to expect from Airflow 2.0?
10:41

BONUS - APPENDIX

8 lectures
How to define variables through environment variables
00:10
[BLOG POST] Running Apache Airflow on a multi-nodes Kubernetes cluster locally
00:20
[BLOG POST] Best Practices in Apache Airflow (Part 1)
00:23
COUPON FOR MY OTHER COURSES!
00:13
[BLOG POST] The PostgresOperator: All you need to know
00:20
[VIDEO] Running Airflow with the Official Helm Chart
00:24
[VIDEO] The DockerOperator: The basics and more
19:35
[VIDEO] Airflow with DBT: The best way!
00:07

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.