Mô tả

Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis.

Unlock the Power of Azure Synapse Analytics!

18.5+ HOURS OF IN-DEPTH LEARNING CONTENT!

In this course we will be learning about :

  1. Serverless SQL Pool - Perform flexible querying for structured and initial data exploration

  2. Spark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.

  3. Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.

  4. MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/

  5. 50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.

  6. Dedicated SQL Pool - To report data efficiently to Power BI.

  7. Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.

  8. Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.

  9. Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.


    You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.

    While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"


    Join with me in mastering Azure Synapse Analytics !

Bạn sẽ học được gì

Yêu cầu

Nội dung khoá học

23 sections

Introduction

3 lectures
Introduction
06:31
Project Architecture
05:25
Course Slides
00:02

Origin of Azure Synapse Analytics

9 lectures
Section Introduction
00:42
Need of separate Analytical system
04:54
OLAP vs OLTP
04:02
A typical Datawarehouse
02:04
Datalake Introduction
01:54
Modern datawarehouse and its problem
08:06
The solution - Azure Synapse Analytics and its Components
04:58
Azure Synapse Analytics - A Single stop solution
10:18
Section Summary
00:36

Environment Setup

5 lectures
Section Introduction
00:40
Creating a resource group in Azure
02:45
Create Azure Synapse Analytics Service
06:50
Exploring Azure Synapse Analytics
07:50
Understanding the dataset
03:51

Serverless SQL Pool

17 lectures
Section Introduction
01:26
Serverless SQL Pool - Introduction
03:24
Serverless SQL Pool - Architecture
03:57
Serverless SQL Pool- Benefits and Pricing
05:27
Uploading files into Azure Datalake Storage
06:36
Initial Data Exploration
14:36
How to import SQL scripts or ipynb notebooks to Azure Synapse
02:58
Fixing the Collation warning
09:39
Creating External datasource
09:13
Creating database scoped credential Using SAS
12:23
Creating Database scoped cred using MI
08:11
Deleting existing data sources for cleanup
03:51
Creating an external file format - Demo
05:36
Creating an External File Format - Practical
02:11
Creating External DataSource for Refined container
01:57
Creating an External Table
12:47
End of section
00:39

History and Data processing before Spark

5 lectures
Section Introduction
00:56
Big Data Approach
05:51
Understanding Hadoop Yarn- Cluster Manager
05:26
Understanding Hadoop - HDFS
04:19
Understanding Hadoop - MapReduce Distributed Computing
07:11

Emergence of Spark

3 lectures
Section Introduction
00:49
Drawbacks of MapReduce Framework
03:24
Emergence of Spark
04:51

Spark Core Concepts

20 lectures
Section Introduction
00:51
Spark EcoSystem
06:18
Difference between Hadoop & Spark
03:37
Spark Architecture
02:40
Creating a Spark Pool & its benefits
09:02
RDD Overview
02:48
Functions Lambda, Map and Filter - Overview
04:19
Understanding RDD in practical
10:53
RDD- Lazy loading - Transformations and Actions
06:40
What is RDD Lineage
05:07
RDD - Word count program - Demo
07:45
RDD - Word count - PySpark Program - Practical
11:40
Optimization - ReduceByKey vs GroupByKey Explanation
07:36
RDD - Understanding about Jobs in spark Practical
03:44
RDD - Understanding Narrow and Wide Transformations
04:40
RDD- Understanding Stages - Practical
06:48
RDD- Understanding Tasks Practical
06:13
Understand DAG , RDD Lineage and Differences
08:06
Spark Higher level APIs Intro
03:53
Synapse Notebook - Creating dataframes practical
16:11

PySpark Transformation 1 - Select and Filter functions

8 lectures
Introduction for PySpark Transformations
01:41
Walkthrough on Notebook , Markdown cells
08:38
Using Free Databricks Community Edition to practise and Save Costs
06:33
Display and show Functions
10:49
Stop Spark Session when not in use
01:11
Select and SelectExpr
13:52
Filter Function
13:36
Organizing notebooks into a folder
02:04

PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation

5 lectures
Understanding fillna and na.fill
09:05
Identifying duplicates using Aggregations
10:25
Handling Duplicates using dropna
09:18
Organising notebooks into a folder
00:34
Transformations summary of this section
01:20

PySpark Transformation 3 - Data Transformation and Manipulation

2 lectures
withColumn to Create Update columns
13:49
Transforming and updating column withColumnRenamed
06:56

PySpark 4 - Synapse Spark - MSSparkUtils

13 lectures
What is MSSpark Utilities
02:27
MSSpark Utils - Env utils
04:39
What is mount point
03:16
Creating and accessing mount point in Notebook
10:26
All File System Utils
14:00
Notebook Utils - Exit command
04:32
Creating another spark pool
07:43
Procedure to increase vCores request (optional)
01:32
Calling notebook from another notebook
02:52
Calling notebook from another using runtime parameters
07:33
Magic commands
06:05
Attaching two notebooks to a single spark pool
07:39
Accessing Mount points from another notebook
11:19

PySpark 5 - Synapse - Spark SQL

7 lectures
Accessing data using Temporary Views - Practical
08:29
Lake Database - Overview
02:41
Understanding and creating database in Lake Database
10:51
Using Spark SQL in notebook
04:54
Managed vs External tables in Spark
13:50
Metadata sharing between Spark pool and Serverless SQL Pool
06:38
Deleting unwanted folders
01:15

PySpark Transformation 6 - Join Transformations

11 lectures
Uploading required files for Joins
02:00
Python notebooks till Union
00:01
Inner join
08:02
Left Join
02:46
Right Join
02:24
Full outer join
02:43
Left Semi Join
04:02
Left anti and Cross Join
03:28
Union Operation
03:10
Performing Join Transformation on Project Dataset
05:02
Summary of Transformations performed
01:01

PySpark Transformation 7 - String Manipulation and sorting

5 lectures
Replace function to change spaces
04:44
PySpark Notebook for this section
00:03
Split and concat functions
09:21
Order by and sort
07:30
Section Summary
01:31

PySpark Transformation 8 - Window Functions

4 lectures
Row number function
07:54
PySpark Notebook used in this section
00:02
Rank Function
04:47
Dense Rank function
07:25

PySpark Transformation 9 - Conversions and Pivoting

5 lectures
Conversion using cast function
09:09
PySpark Notebook need for casting and pivoting lectures
00:03
Pivot function
05:10
Unpivot using stack function
06:07
Using to date to convert date column
08:51

PySpark Transformation 10 - Schema definition and Management

3 lectures
PySpark Notebook used in this lecture
00:04
StructType and StructField - Demo
03:05
Implementing explicit schema with StructType and StructField
13:31

PySpark Transformation 11 - UDFs

3 lectures
User Defined Functions - Demo
03:18
Implementing UDFs in Notebook
08:48
Writing transformed data to Processed container
03:17

Dedicated SQL Pool

10 lectures
Dedicated SQL pool - Demo
02:19
Dedicated SQL Pool Architecture
04:24
How distribution takes places based on DWU
05:58
Factors to consider when choosing dedicated SQL pool
02:43
Creating Dedicated SQL pool in Synapse
03:08
Ways to copy data into Dedicated SQL Pool
03:47
Copy command to copy to dedicated SQL pool
04:55
Clustured Column Store index(optional)
02:02
Types of Distributions or Sharing patterns
06:52
Using Pipeline to Copy to dedicated SQL Pool
06:57

Reporting data to Power BI

11 lectures
Section Introduction
01:18
Installing Power BI Desktop
01:20
Creating report from Power BI Desktop
04:22
Creating new user in Azure AD for creating workspace (if using personal account)
04:31
Creating a shared workspace in Power BI
03:46
Publishing report to Shared Workspace
01:32
Accessing Power BI from Azure Synapse Analytics
04:31
Download Power BI .pbix file from here
00:03
Creating Dataset and report from Synapse Analytics
06:31
Concluding the Power BI Section
02:41
Summary and end of project implementation
02:25

Spark - Optimisation Techniques

25 lectures
Optimisation Section Intro
00:56
Uploading required files for Optimisation
01:45
Spark Optimisation levels
02:48
Avoid using Collect function
07:37
Making notebook into particular folder
01:22
Avoid InferSchema
09:34
Use Cache Persist 1 - Understanding Serialization and DeSerialization
06:31
Use Cache Persist 2 - How cache or persist will work - Demo
09:11
Use Cache Persist 3 - Understanding cache practically
09:47
Use Cache Persist 4 - Persist - What is persist and different storage levels
03:59
Use Cache Persist - Notebook for persist with all storage levels
00:03
Use Cache Persist 5 - Persist - MEMORY_ONLY
17:27
Use Cache Persist 6 - Persist - MEMORY AND DISK
08:18
Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only)
04:00
Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only)
02:57
Use Cache Persist 9 - Persist - DISK ONLY
05:41
Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)
02:05
Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only)
02:34
Use Partitioning 1 - Understanding partitioning - Demo
05:24
Use Partitioning 2 - Understand partitioning - Practical
08:35
Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo
05:51
Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical
06:43
Broadcast variables 1 - Understanding broadcast variables - Demo
06:47
Broadcast variables 2 - Implementing broadcast variables in notebook
05:53
Use Kryo Serializer
03:10

Delta Lake

23 lectures
Section Introduction
00:48
Drawbacks of ADLS
06:08
What is Delta lake
02:00
Lakehouse Architecture
06:21
Uploading required file for Delta lake
01:32
Problems with Azure Datalake - Practical
08:23
Creating a Delta lake
03:56
Understanding Delta format
04:50
Contents of Transaction Log or Delta log file - Practical
18:15
Contents of a transaction log demo
03:44
Creating delta table by Path using SQL
21:20
Creating delta table in Metastore using Pyspark and SQL
07:30
Schema Enforcement - Files required for Understanding Schema Enforcement -
00:39
What is schema enforcement - Demo
05:00
Schema Enforcement - Practical
08:00
Schema Evolution - Practical
05:52
16. Versioning and Time Travel
19:13
Vacuum command
13:41
Convert to Delta command
06:29
Checkpoints in delta log
06:48
Optimize command - Demo
08:27
Optimize command - Practical
15:35
Applying UPSERT using MERGE Command
09:37

Conclusion

2 lectures
Course Conclusion
01:14
Bonus Lecture
00:03

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.