Mô tả

This course does not require any prior knowledge of Apache Spark or Hadoop. We have taken enough care to explain Spark Architecture and fundamental concepts to help you come up to speed and grasp the content of this course.


About the Course

I am creating Apache Spark 3 - Spark Programming in Python for Beginners course to help you understand the Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.

Who should take this Course?

I designed this course for software engineers willing to develop a Data Engineering pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.

Spark Version used in the Course

This Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.

Bạn sẽ học được gì

Apache Spark Foundation and Spark Architecture

Data Engineering and Data Processing in Spark

Working with Data Sources and Sinks

Working with Data Frames and Spark SQL

Using PyCharm IDE for Spark Development and Debugging

Unit Testing, Managing Application Logs and Cluster Deployment

Yêu cầu

  • Programming Knowledge Using Python Programming Language
  • A Recent 64-bit Windows/Mac/Linux Machine with 8 GB RAM

Nội dung khoá học

14 sections

Understanding Big Data and Data Lake

5 lectures
Section Overview
02:05
What is Big Data and How it Started
22:07
Hadoop Architecture, History, and Evolution
31:04
What is Data Lake and How it works
09:56
Introducing Apache Spark and Databricks Cloud
13:34

Installing and Using Apache Spark

9 lectures
Section Overview
01:16
Spark Development Environments
02:16
Setup your Databricks Community Cloud Environment
05:14
Introduction to Databricks Workspace
06:30
Create your First Spark Application in Databricks Cloud
06:44
Setup your Local Development IDE
21:22
Mac Users - Setup your Local Development IDE
12:08
Create your First Spark Application using IDE
05:08
Source Code and Other Resources
00:11

Getting Started with Apache Spark

10 lectures
Micro Project - Problem Statement
04:45
Introduction to Spark Data Frames
09:23
Creating Spark Dataframe
16:55
Creating Spark Tables
09:42
Common problem with Databricks Community
12:32
Working with Spark SQL
15:53
Dataframe Transformations and Actions
05:41
Applying Transformations
18:15
Querying Spark Dataframe
11:16
More Dataframe Transformations
17:32

Spark Execution Model and Architecture

12 lectures
Execution Methods - How to Run Spark Programs?
05:01
Check your knowledge
4 questions
Spark Distributed Processing Model - How your program runs?
03:11
Spark Execution Modes and Cluster Managers
04:55
Check your knowledge
10 questions
Summarizing Spark Execution Models - When to use What?
02:24
Working with PySpark Shell - Demo
04:31
Installing Multi-Node Spark Cluster - Demo
05:36
Working with Notebooks in Cluster - Demo
06:58
Working with Spark Submit - Demo
02:55
Section Summary
01:42
Check your knowledge
10 questions

Spark Programming Model and Developer Experience

13 lectures
Creating Spark Project Build Configuration
06:10
Configuring Spark Project Application Logs
10:50
Check your knowledge
5 questions
Creating Spark Session
08:26
Check your knowledge
5 questions
Configuring Spark Session
09:12
Data Frame Introduction
07:43
Data Frame Partitions and Executors
05:24
Spark Transformations and Actions
11:02
Spark Jobs Stages and Task
08:34
Understanding your Execution Plan
09:33
Unit Testing Spark Application
05:01
Rounding off Summary
05:27

Spark Structured API Foundation

5 lectures
Introduction to Spark APIs
05:11
Introduction to Spark RDD API
13:13
Working with Spark SQL
02:37
Spark SQL Engine and Catalyst Optimizer
02:53
Section Summary
01:18

Spark Data Sources and Sinks

8 lectures
Spark Data Sources and Sinks
06:44
Spark DataFrameReader API
05:00
Reading CSV, JSON and Parquet files
07:59
Creating Spark DataFrame Schema
06:06
Spark DataFrameWriter API
06:09
Writing Your Data and Managing Layout
12:51
Spark Databases and Tables
05:33
Working with Spark SQL Tables
08:41

Spark Dataframe and Dataset Transformations

7 lectures
Introduction to Data Transformation
02:44
Working with Dataframe Rows
05:02
DataFrame Rows and Unit Testing
04:02
Dataframe Rows and Unstructured data
06:08
Working with Dataframe Columns
10:33
Creating and Using UDF
10:01
Misc Transformations
15:34

Aggregations in Apache Spark

3 lectures
Aggregating Dataframes
08:58
Grouping Aggregations
04:25
Windowing Aggregations
05:27

Spark Dataframe Joins

5 lectures
Dataframe Joins and column name ambiguity
07:40
Outer Joins in Dataframe
07:25
Internals of Spark Join and shuffle
08:46
Optimizing your joins
12:17
Implementing Bucket Joins
08:57

Capstone Project

10 lectures
Project Scope and Background
09:40
Data Transformation Requirement
10:00
Setup your starter project
12:24
Test your starter project
06:30
Setup your source control and process
25:44
Creating your Project CI CD Pipeline
25:51
Develop Code
16:44
Write Test Cases
25:28
Working with Kafka integration
15:11
Estimating resources for your application
27:30

Keep Learning

2 lectures
Final Word
00:50
Bonus Lecture : Get Extra
00:27

Archived - Apache Spark Introduction

4 lectures
Big Data History and Primer
05:51
Understanding the Data Lake Landscape
06:42
What is Apache Spark - An Introduction and Overview
08:48
Check your knowledge
15 questions

Archived - Installing and Using Apache Spark

10 lectures
Spark Development Environments
02:52
Mac Users - Apache Spark in Local Mode Command Line REPL
12:08
Windows Users - Apache Spark in Local Mode Command Line REPL
05:49
Did you notice?
3 questions
Mac Users - Apache Spark in the IDE - PyCharm
07:59
Windows Users - Apache Spark in the IDE - PyCharm
08:38
Did you notice?
3 questions
Apache Spark in Cloud - Databricks Community and Notebooks
04:33
Check your knowledge
3 questions
Apache Spark in Anaconda - Jupyter Notebook
04:32

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.