Trang chủ

Azure Synapse Analytics

Basics to Advanced: Azure Synapse Analytics Hands-On Project

Basics to Advanced: Azure Synapse Analytics Hands-On Project

Loại khoá học: Data Science

Build complete project only with Azure Synapse Analytics focused on PySpark includes delta lake and spark Optimizations

50.000 VND

1.099.000 VND

Đầy Đủ Bài Giảng

Học Online Tiện Lợi

Kích Hoạt Nhanh 2-5 Phút

Thanh toán tự động

Được phép tải xuống

Mô tả

Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis.

Unlock the Power of Azure Synapse Analytics!

18.5+ HOURS OF IN-DEPTH LEARNING CONTENT!

In this course we will be learning about :

Serverless SQL Pool - Perform flexible querying for structured and initial data exploration
Spark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.
Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.
MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/
50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.
Dedicated SQL Pool - To report data efficiently to Power BI.
Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.
Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.
Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.

You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.
While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"

Join with me in mastering Azure Synapse Analytics !

Bạn sẽ học được gì

Yêu cầu

Nội dung khoá học

23 sections

Introduction

3 lectures

Introduction

06:31

Project Architecture

05:25

Course Slides

00:02

Origin of Azure Synapse Analytics

9 lectures

Section Introduction

00:42

Need of separate Analytical system

04:54

OLAP vs OLTP

04:02

A typical Datawarehouse

02:04

Datalake Introduction

01:54

Modern datawarehouse and its problem

08:06

The solution - Azure Synapse Analytics and its Components

04:58

Azure Synapse Analytics - A Single stop solution

10:18

Section Summary

00:36

Environment Setup

5 lectures

Section Introduction

00:40

Creating a resource group in Azure

02:45

Create Azure Synapse Analytics Service

06:50

Exploring Azure Synapse Analytics

07:50

Understanding the dataset

03:51

Serverless SQL Pool

17 lectures

Section Introduction

01:26

Serverless SQL Pool - Introduction

03:24

Serverless SQL Pool - Architecture

03:57

Serverless SQL Pool- Benefits and Pricing

05:27

Uploading files into Azure Datalake Storage

06:36

Initial Data Exploration

14:36

How to import SQL scripts or ipynb notebooks to Azure Synapse

02:58

Fixing the Collation warning

09:39

Creating External datasource

09:13

Creating database scoped credential Using SAS

12:23

Creating Database scoped cred using MI

08:11

Deleting existing data sources for cleanup

03:51

Creating an external file format - Demo

05:36

Creating an External File Format - Practical

02:11

Creating External DataSource for Refined container

01:57

Creating an External Table

12:47

End of section

00:39

History and Data processing before Spark

5 lectures

Section Introduction

00:56

Big Data Approach

05:51

Understanding Hadoop Yarn- Cluster Manager

05:26

Understanding Hadoop - HDFS

04:19

Understanding Hadoop - MapReduce Distributed Computing

07:11

Emergence of Spark

3 lectures

Section Introduction

00:49

Drawbacks of MapReduce Framework

03:24

Emergence of Spark

04:51

Spark Core Concepts

20 lectures

Section Introduction

00:51

Spark EcoSystem

06:18

Difference between Hadoop & Spark

03:37

Spark Architecture

02:40

Creating a Spark Pool & its benefits

09:02

RDD Overview

02:48

Functions Lambda, Map and Filter - Overview

04:19

Understanding RDD in practical

10:53

RDD- Lazy loading - Transformations and Actions

06:40

What is RDD Lineage

05:07

RDD - Word count program - Demo

07:45

RDD - Word count - PySpark Program - Practical

11:40

Optimization - ReduceByKey vs GroupByKey Explanation

07:36

RDD - Understanding about Jobs in spark Practical

03:44

RDD - Understanding Narrow and Wide Transformations

04:40

RDD- Understanding Stages - Practical

06:48

RDD- Understanding Tasks Practical

06:13

Understand DAG , RDD Lineage and Differences

08:06

Spark Higher level APIs Intro

03:53

Synapse Notebook - Creating dataframes practical

16:11

PySpark Transformation 1 - Select and Filter functions

8 lectures

Introduction for PySpark Transformations

01:41

Walkthrough on Notebook , Markdown cells

08:38

Using Free Databricks Community Edition to practise and Save Costs

06:33

Display and show Functions

10:49

Stop Spark Session when not in use

01:11

Select and SelectExpr

13:52

Filter Function

13:36

Organizing notebooks into a folder

02:04

PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation

5 lectures

Understanding fillna and na.fill

09:05

Identifying duplicates using Aggregations

10:25

Handling Duplicates using dropna

09:18

Organising notebooks into a folder

00:34

Transformations summary of this section

01:20

PySpark Transformation 3 - Data Transformation and Manipulation

2 lectures

withColumn to Create Update columns

13:49

Transforming and updating column withColumnRenamed

06:56

PySpark 4 - Synapse Spark - MSSparkUtils

13 lectures

What is MSSpark Utilities

02:27

MSSpark Utils - Env utils

04:39

What is mount point

03:16

Creating and accessing mount point in Notebook

10:26

All File System Utils

14:00

Notebook Utils - Exit command

04:32

Creating another spark pool

07:43

Procedure to increase vCores request (optional)

01:32

Calling notebook from another notebook

02:52

Calling notebook from another using runtime parameters

07:33

Magic commands

06:05

Attaching two notebooks to a single spark pool

07:39

Accessing Mount points from another notebook

11:19

PySpark 5 - Synapse - Spark SQL

7 lectures

Accessing data using Temporary Views - Practical

08:29

Lake Database - Overview

02:41

Understanding and creating database in Lake Database

10:51

Using Spark SQL in notebook

04:54

Managed vs External tables in Spark

13:50

Metadata sharing between Spark pool and Serverless SQL Pool

06:38

Deleting unwanted folders

01:15

PySpark Transformation 6 - Join Transformations

11 lectures

Uploading required files for Joins

02:00

Python notebooks till Union

00:01

Inner join

08:02

Left Join

02:46

Right Join

02:24

Full outer join

02:43

Left Semi Join

04:02

Left anti and Cross Join

03:28

Union Operation

03:10

Performing Join Transformation on Project Dataset

05:02

Summary of Transformations performed

01:01

PySpark Transformation 7 - String Manipulation and sorting

5 lectures

Replace function to change spaces

04:44

PySpark Notebook for this section

00:03

Split and concat functions

09:21

Order by and sort

07:30

Section Summary

01:31

PySpark Transformation 8 - Window Functions

4 lectures

Row number function

07:54

PySpark Notebook used in this section

00:02

Rank Function

04:47

Dense Rank function

07:25

PySpark Transformation 9 - Conversions and Pivoting

5 lectures

Conversion using cast function

09:09

PySpark Notebook need for casting and pivoting lectures

00:03

Pivot function

05:10

Unpivot using stack function

06:07

Using to date to convert date column

08:51

PySpark Transformation 10 - Schema definition and Management

3 lectures

PySpark Notebook used in this lecture

00:04

StructType and StructField - Demo

03:05

Implementing explicit schema with StructType and StructField

13:31

PySpark Transformation 11 - UDFs

3 lectures

User Defined Functions - Demo

03:18

Implementing UDFs in Notebook

08:48

Writing transformed data to Processed container

03:17

Dedicated SQL Pool

10 lectures

Dedicated SQL pool - Demo

02:19

Dedicated SQL Pool Architecture

04:24

How distribution takes places based on DWU

05:58

Factors to consider when choosing dedicated SQL pool

02:43

Creating Dedicated SQL pool in Synapse

03:08

Ways to copy data into Dedicated SQL Pool

03:47

Copy command to copy to dedicated SQL pool

04:55

Clustured Column Store index(optional)

02:02

Types of Distributions or Sharing patterns

06:52

Using Pipeline to Copy to dedicated SQL Pool

06:57

Reporting data to Power BI

11 lectures

Section Introduction

01:18

Installing Power BI Desktop

01:20

Creating report from Power BI Desktop

04:22

Creating new user in Azure AD for creating workspace (if using personal account)

04:31

Creating a shared workspace in Power BI

03:46

Publishing report to Shared Workspace

01:32

Accessing Power BI from Azure Synapse Analytics

04:31

Download Power BI .pbix file from here

00:03

Creating Dataset and report from Synapse Analytics

06:31

Concluding the Power BI Section

02:41

Summary and end of project implementation

02:25

Spark - Optimisation Techniques

25 lectures

Optimisation Section Intro

00:56

Uploading required files for Optimisation

01:45

Spark Optimisation levels

02:48

Avoid using Collect function

07:37

Making notebook into particular folder

01:22

Avoid InferSchema

09:34

Use Cache Persist 1 - Understanding Serialization and DeSerialization

06:31

Use Cache Persist 2 - How cache or persist will work - Demo

09:11

Use Cache Persist 3 - Understanding cache practically

09:47

Use Cache Persist 4 - Persist - What is persist and different storage levels

03:59

Use Cache Persist - Notebook for persist with all storage levels

00:03

Use Cache Persist 5 - Persist - MEMORY_ONLY

17:27

Use Cache Persist 6 - Persist - MEMORY AND DISK

08:18

Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only)

04:00

Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only)

02:57

Use Cache Persist 9 - Persist - DISK ONLY

05:41

Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)

02:05

Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only)

02:34

Use Partitioning 1 - Understanding partitioning - Demo

05:24

Use Partitioning 2 - Understand partitioning - Practical

08:35

Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo

05:51

Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical

06:43

Broadcast variables 1 - Understanding broadcast variables - Demo

06:47

Broadcast variables 2 - Implementing broadcast variables in notebook

05:53

Use Kryo Serializer

03:10

Delta Lake

23 lectures

Section Introduction

00:48

Drawbacks of ADLS

06:08

What is Delta lake

02:00

Lakehouse Architecture

06:21

Uploading required file for Delta lake

01:32

Problems with Azure Datalake - Practical

08:23

Creating a Delta lake

03:56

Understanding Delta format

04:50

Contents of Transaction Log or Delta log file - Practical

18:15

Contents of a transaction log demo

03:44

Creating delta table by Path using SQL

21:20

Creating delta table in Metastore using Pyspark and SQL

07:30

Schema Enforcement - Files required for Understanding Schema Enforcement -

00:39

What is schema enforcement - Demo

05:00

Schema Enforcement - Practical

08:00

Schema Evolution - Practical

05:52

16. Versioning and Time Travel

19:13

Vacuum command

13:41

Convert to Delta command

06:29

Checkpoints in delta log

06:48

Optimize command - Demo

08:27

Optimize command - Practical

15:35

Applying UPSERT using MERGE Command

09:37

Conclusion

2 lectures

Course Conclusion

01:14

Bonus Lecture

00:03

Đánh giá của học viên

Chưa có đánh giá

Course Rating

5

0%

4

0%

3

0%

2

0%

1

0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

Email

Nội dung

Khoá học liên quan

Practice Exams | AWS Certified Solutions Architect Associate

Practice Exams | AWS Certified Solutions Architect Associate

Ethical Hacking Course by Black Hat Pakistan

Ethical Hacking Course by Black Hat Pakistan

AWS Cloud for beginner (Vietnamese)

AWS Cloud for beginner (Vietnamese)

OpenStack Essentials

OpenStack Essentials

Java Certification : OCA (1Z0-808) Exam Simulation [2023]

Java Certification : OCA (1Z0-808) Exam Simulation [2023]

Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo

Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume/Mongo

Reverse Engineering and Malware Analysis Fundamentals

Reverse Engineering and Malware Analysis Fundamentals

Microsoft Windows Server 2016 Training for Beginners

Microsoft Windows Server 2016 Training for Beginners

Learn Autocad API with C# - MVVM - WPF

Learn Autocad API with C# - MVVM - WPF

Ultimate AWS Certified Solutions Architect Associate SAA-C03

Ultimate AWS Certified Solutions Architect Associate SAA-C03

Website Hacking / Penetration Testing

Website Hacking / Penetration Testing

10 Sample Exams ISTQB Foundation Level (CTFL) v4.0 [NEW!]

10 Sample Exams ISTQB Foundation Level (CTFL) v4.0 [NEW!]

MongoDB : A Complete Database Administration Course

MongoDB : A Complete Database Administration Course

Become Zabbix Administrator For Expert

Become Zabbix Administrator For Expert

Learn 5 PLCs in a Day-AB, Siemens, Schneider, Omron & Delta

Learn 5 PLCs in a Day-AB, Siemens, Schneider, Omron & Delta

C++ Data Structures & Algorithms + LEETCODE Exercises

C++ Data Structures & Algorithms + LEETCODE Exercises

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.

Get khoá học cho tôi