Mô tả

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Data Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, EMR, Kinesis, and many more.

Here are the high-level steps which you will follow as part of the course.

  • Setup Development Environment

  • Getting Started with AWS

  • Storage - All about AWS s3 (Simple Storage Service)

  • User Level Security - Managing Users, Roles, and Policies using IAM

  • Infrastructure - AWS EC2 (Elastic Cloud Compute)

  • Data Ingestion using AWS Lambda Functions

  • Overview of AWS Glue Components

  • Setup Spark History Server for AWS Glue Jobs

  • Deep Dive into AWS Glue Catalog

  • Exploring AWS Glue Job APIs

  • AWS Glue Job Bookmarks

  • Development Life Cycle of Pyspark

  • Getting Started with AWS EMR

  • Deploying Spark Applications using AWS EMR

  • Streaming Pipeline using AWS Kinesis

  • Consuming Data from AWS s3 using boto3 ingested using AWS Kinesis

  • Populating GitHub Data to AWS Dynamodb

  • Overview of Amazon AWS Athena

  • Amazon AWS Athena using AWS CLI

  • Amazon AWS Athena using Python boto3

  • Getting Started with Amazon AWS Redshift

  • Copy Data from AWS s3 into AWS Redshift Tables

  • Develop Applications using AWS Redshift Cluster

  • AWS Redshift Tables with Distkeys and Sortkeys

  • AWS Redshift Federated Queries and Spectrum

Here are the details about what you will be learning as part of this course. We will cover most of the commonly used services with hands-on practice which are available under AWS Data Analytics.

Getting Started with AWS

As part of this section, you will be going through the details related to getting started with AWS.

  • Introduction - AWS Getting Started

  • Create s3 Bucket

  • Create AWS IAM Group and AWS IAM User to have required access on s3 Bucket and other services

  • Overview of AWS IAM Roles

  • Create and Attach Custom AWS IAM Policy to both AWS IAM Groups as well as Users

  • Configure and Validate AWS CLI to access AWS Services using AWS CLI Commands

Storage - All about AWS s3 (Simple Storage Service)

AWS s3 is one of the most prominent fully managed AWS services. All IT professionals who would like to work on AWS should be familiar with it. We will get into quite a few common features related to AWS s3 in this section.

  • Getting Started with AWS S3

  • Setup Data Set locally to upload to AWS s3

  • Adding AWS S3 Buckets and Managing Objects (files and folders) in AWS s3 buckets

  • Version Control for AWS S3 Buckets

  • Cross-Region Replication for AWS S3 Buckets

  • Overview of AWS S3 Storage Classes

  • Overview of AWS S3 Glacier

  • Managing AWS S3 using AWS CLI Commands

  • Managing Objects in AWS S3 using CLI - Lab

User Level Security - Managing Users, Roles, and Policies using IAM

Once you start working on AWS, you need to understand the permissions you have as a non-admin user. As part of this section, you will understand the details related to AWS IAM users, groups, roles as well as policies.

  • Creating AWS IAM Users

  • Logging into AWS Management Console using AWS IAM User

  • Validate Programmatic Access to AWS IAM User

  • AWS IAM Identity-based Policies

  • Managing AWS IAM Groups

  • Managing AWS IAM Roles

  • Overview of Custom AWS IAM Policies

  • Managing AWS IAM users, groups, roles as well as policies using AWS CLI Commands

Infrastructure - AWS EC2 (Elastic Cloud Compute) Basics

AWS EC2 Instances are nothing but virtual machines on AWS. As part of this section, we will go through some of the basics related to AWS EC2 Basics.

  • Getting Started with AWS EC2

  • Create AWS EC2 Key Pair

  • Launch AWS EC2 Instance

  • Connecting to AWS EC2 Instance

  • AWS EC2 Security Groups Basics

  • AWS EC2 Public and Private IP Addresses

  • AWS EC2 Life Cycle

  • Allocating and Assigning AWS Elastic IP Address

  • Managing AWS EC2 Using AWS CLI

  • Upgrade or Downgrade AWS EC2 Instances

Infrastructure - AWS EC2 Advanced

In this section, we will continue with AWS EC2 to understand how we can manage EC2 instances using AWS Commands and also how to install additional OS modules leveraging bootstrap scripts.

  • Getting Started with AWS EC2

  • Understanding AWS EC2 Metadata

  • Querying on AWS EC2 Metadata

  • Fitering on AWS EC2 Metadata

  • Using Bootstrapping Scripts with AWS EC2 Instances to install additional softwares on AWS EC2 instances

  • Create an AWS AMI using AWS EC2 Instances

  • Validate AWS AMI - Lab

Data Ingestion using Lambda Functions

AWS Lambda functions are nothing but serverless functions. In this section, we will understand how we can develop and deploy Lambda functions using Python as a programming language. We will also see how to maintain a bookmark or checkpoint using s3.

  • Hello World using AWS Lambda

  • Setup Project for local development of AWS Lambda Functions

  • Deploy Project to AWS Lambda console

  • Develop download functionality using requests for AWS Lambda Functions

  • Using 3rd party libraries in AWS Lambda Functions

  • Validating AWS s3 access for local development of AWS Lambda Functions

  • Develop upload functionality to s3 using AWS Lambda Functions

  • Validating AWS Lambda Functions using AWS Lambda Console

  • Run AWS Lambda Functions using AWS Lambda Console

  • Validating files incrementally downloaded using AWS Lambda Functions

  • Reading and Writing Bookmark to s3 using AWS Lambda Functions

  • Maintaining Bookmark on s3 using AWS Lambda Functions

  • Review the incremental upload logic developed using AWS Lambda Functions

  • Deploying AWS Lambda Functions

  • Schedule AWS Lambda Functions using AWS Event Bridge

Overview of AWS Glue Components

In this section, we will get a broad overview of all important Glue Components such as Glue Crawler, Glue Databases, Glue Tables, etc. We will also understand how to validate Glue tables using AWS Athena. AWS Glue (especially Glue Catalog) is one of the key components in the realm of AWS Data Analytics Services.

  • Introduction - Overview of AWS Glue Components

  • Create AWS Glue Crawler and AWS Glue Catalog Database as well as Table

  • Analyze Data using AWS Athena

  • Creating AWS S3 Bucket and Role to create AWS Glue Catalog Tables using Crawler on the s3 location

  • Create and Run the AWS Glue Job to process data in AWS Glue Catalog Tables

  • Validate using AWS Glue Catalog Table and by running queries using AWS Athena

  • Create and Run AWS Glue Trigger

  • Create AWS Glue Workflow

  • Run AWS Glue Workflow and Validate

Setup Spark History Server for AWS Glue Jobs

AWS Glue uses Apache Spark under the hood to process the data. It is important we setup Spark History Server for AWS Glue Jobs to troubleshoot any issues.

  • Introduction - Spark History Server for AWS Glue

  • Setup Spark History Server on AWS

  • Clone AWS Glue Samples repository

  • Build AWS Glue Spark UI Container

  • Update AWS IAM Policy Permissions

  • Start AWS Glue Spark UI Container

Deep Dive into AWS Glue Catalog

AWS Glue has several components, but the most important ones are nothing but AWS Glue Crawlers, Databases as well as Catalog Tables. In this section, we will go through some of the most important and commonly used features of the AWS Glue Catalog.

  • Prerequisites for AWS Glue Catalog Tables

  • Steps for Creating AWS Glue Catalog Tables

  • Download Data Set to use to create AWS Glue Catalog Tables

  • Upload data to s3 to crawl using AWS Glue Crawler to create required AWS Glue Catalog Tables

  • Create AWS Glue Catalog Database - itvghlandingdb

  • Create AWS Glue Catalog Table - ghactivity

  • Running Queries using AWS Athena - ghactivity

  • Crawling Multiple Folders using AWS Glue Crawlers

  • Managing AWS Glue Catalog using AWS CLI

  • Managing AWS Glue Catalog using Python Boto3

Exploring AWS Glue Job APIs

Once we deploy AWS Glue jobs, we can manage them using AWS Glue Job APIs. In this section we will get overview of AWS Glue Job APIs to run and manage the jobs.

  • Update AWS IAM Role for AWS Glue Job

  • Generate baseline AWS Glue Job

  • Running baseline AWS Glue Job

  • AWS Glue Script for Partitioning Data

  • Validating using AWS Athena

Understanding AWS Glue Job Bookmarks

AWS Glue Job Bookmarks can be leveraged to maintain the bookmarks or checkpoints for incremental loads. In this section, we will go through the details related to AWS Glue Job Bookmarks.

  • Introduction to AWS Glue Job Bookmarks

  • Cleaning up the data to run AWS Glue Jobs

  • Overview of AWS Glue CLI and Commands

  • Run AWS Glue Job using AWS Glue Bookmark

  • Validate AWS Glue Bookmark using AWS CLI

  • Add new data to the landing zone to run AWS Glue Jobs using Bookmarks

  • Rerun AWS Glue Job using Bookmark

  • Validate AWS Glue Job Bookmark and Files for Incremental run

  • Recrawl the AWS Glue Catalog Table using AWS CLI Commands

  • Run AWS Athena Queries for Data Validation

Development Lifecycle for Pyspark

In this section, we will focus on the development of Spark applications using Pyspark. We will use this application later while exploring EMR in detail.

  • Setup Virtual Environment and Install Pyspark

  • Getting Started with Pycharm

  • Passing Run Time Arguments

  • Accessing OS Environment Variables

  • Getting Started with Spark

  • Create Function for Spark Session

  • Setup Sample Data

  • Read data from files

  • Process data using Spark APIs

  • Write data to files

  • Validating Writing Data to Files

  • Productionizing the Code

Getting Started with AWS EMR (Elastic Map Reduce)

As part of this section, we will understand how to get started with AWS EMR Cluster. We will primarily focus on AWS EMR Web Console. Elastic Map Reduce is one of the key service in AWS Data Analytics Services which provide capability to run applications which process large scale data leveraging distributed computing frameworks such as Spark.

  • Planning for AWS EMR Cluster

  • Create AWS EC2 Key Pair for AWS EMR Cluster

  • Setup AWS EMR Cluster with Apache Spark

  • Understanding Summary of AWS EMR Cluster

  • Review AWS EMR Cluster Application User Interfaces

  • Review AWS EMR Cluster Monitoring

  • Review AWS EMR Cluster Hardware and Cluster Scaling Policy

  • Review AWS EMR Cluster Configurations

  • Review AWS EMR Cluster Events

  • Review AWS EMR Cluster Steps

  • Review AWS EMR Cluster Bootstrap Actions

  • Connecting to AWS EMR Master Node using SSH

  • Disabling Termination Protection for AWS EMR Cluster and Terminating the AWS EMR Cluster

  • Clone and Create a New AWS EMR Cluster

  • Listing AWS S3 Buckets and Objects using AWS CLI on AWS EMR Cluster

  • Listing AWS S3 Buckets and Objects using HDFS CLI on AWS EMR Cluster

  • Managing Files in AWS S3 using HDFS CLI on AWS EMR Cluster

  • Review AWS Glue Catalog Databases and Tables

  • Accessing AWS Glue Catalog Databases and Tables using AWS EMR Cluster

  • Accessing spark-sql CLI of AWS EMR Cluster

  • Accessing pyspark CLI of AWS EMR Cluster

  • Accessing spark-shell CLI of AWS EMR Cluster

  • Create AWS EMR Cluster for Notebooks

Deploying Spark Applications using AWS EMR

As part of this section, we will understand how we typically deploy Spark Applications using AWS EMR. We will be using the Spark Application we deployed earlier.

  • Deploying Applications using AWS EMR - Introduction

  • Setup AWS EMR Cluster to deploy applications

  • Validate SSH Connectivity to Master node of AWS EMR Cluster

  • Setup Jupyter Notebook Environment on AWS EMR Cluster

  • Create required AWS s3 Bucket for AWS EMR Cluster

  • Upload GHActivity Data to s3 so that we can process using Spark Application deployed on AWS EMR Cluster

  • Validate Application using AWS EMR Compatible Versions of Python and Spark

  • Deploy Spark Application to AWS EMR Master Node

  • Create user space for ec2-user on AWS EMR Cluster

  • Run Spark Application using spark-submit on AWS EMR Master Node

  • Validate Data using Jupyter Notebooks on AWS EMR Cluster

  • Clone and Start Auto Terminated AWS EMR Cluster

  • Delete Data Populated by GHAcitivity Application using AWS EMR Cluster

  • Differences between Spark Client and Cluster Deployment Modes on AWS EMR Cluster

  • Running Spark Application using Cluster Mode on AWS EMR Cluster

  • Overview of Adding Pyspark Application as Step to AWS EMR Cluster

  • Deploy Spark Application to AWS S3 to run using AWS EMR Steps

  • Running Spark Applications as AWS EMR Steps in client mode

  • Running Spark Applications as AWS EMR Steps in cluster mode

  • Validate AWS EMR Step Execution of Spark Application

Streaming Data Ingestion Pipeline using AWS Kinesis

As part of this section, we will go through details related to the streaming data ingestion pipeline using AWS Kinesis which is a streaming service of AWS Data Analytics Services. We will use AWS Kinesis Firehose Agent and AWS Kinesis Delivery Stream to read the data from log files and ingest it into AWS s3.

  • Building Streaming Pipeline using AWS Kinesis Firehose Agent and Delivery Stream

  • Rotating Logs so that the files are created frequently which will be eventually ingested using AWS Kinesis Firehose Agent and AWS Kinesis Firehose Delivery Stream

  • Set up AWS Kinesis Firehose Agent to get data from logs into AWS Kinesis Delivery Stream.

  • Create AWS Kinesis Firehose Delivery Stream

  • Planning the Pipeline to ingest data into s3 using AWS Kinesis Delivery Stream

  • Create AWS IAM Group and User for Streaming Pipelines using AWS Kinesis Components

  • Granting Permissions to AWS IAM User using Policy for Streaming Pipelines using AWS Kinesis Components

  • Configure AWS Kinesis Firehose Agent to read the data from log files and ingest it into AWS Kinesis Firehose Delivery Stream.

  • Start and Validate AWS Kinesis Firehose Agent

  • Conclusion - Building Simple Steaming Pipeline using AWS Kinesis Firehose

Consuming Data from AWS s3 using Python boto3 ingested using AWS Kinesis

As data is ingested into AWS S3, we will understand how data can ingested in AWS s3 can be processed using boto3.

  • Customizing AWS s3 folder using AWS Kinesis Delivery Stream

  • Create AWS IAM Policy to read from AWS s3 Bucket

  • Validate AWS s3 access using AWS CLI

  • Setup Python Virtual Environment to explore boto3

  • Validating access to AWS s3 using Python boto3

  • Read Content from AWS s3 object

  • Read multiple AWS s3 Objects

  • Get the number of AWS s3 Objects using Marker

  • Get the size of AWS s3 Objects using Marker

Populating GitHub Data to AWS Dynamodb

As part of this section, we will understand how we can populate data to AWS Dynamodb tables using Python as a programming language.

  • Install required libraries to get GitHub Data to AWS Dynamodb tables.

  • Understanding GitHub APIs

  • Setting up GitHub API Token

  • Understanding GitHub Rate Limit

  • Create New Repository for since

  • Extracting Required Information using Python

  • Processing Data using Python

  • Grant Permissions to create AWS dynamodb tables using boto3

  • Create AWS Dynamodb Tables

  • AWS Dynamodb CRUD Operations

  • Populate AWS Dynamodb Table

  • AWS Dynamodb Batch Operations

Overview of Amazon AWS Athena

As part of this section, we will understand how to get started with AWS Athena using AWS Web console. We will also focus on basic DDL and DML or CRUD Operations using AWS Athena Query Editor.

  • Getting Started with Amazon AWS Athena

  • Quick Recap of AWS Glue Catalog Databases and Tables

  • Access AWS Glue Catalog Databases and Tables using AWS Athena Query Editor

  • Create a Database and Table using AWS Athena

  • Populate Data into Table using AWS Athena

  • Using CTAS to create tables using AWS Athena

  • Overview of Amazon AWS Athena Architecture

  • Amazon AWS Athena Resources and relationship with Hive

  • Create a Partitioned Table using AWS Athena

  • Develop Query for Partitioned Column

  • Insert into Partitioned Tables using AWS Athena

  • Validate Data Partitioning using AWS Athena

  • Drop AWS Athena Tables and Delete Data Files

  • Drop Partitioned Table using AWS Athena

  • Data Partitioning in AWS Athena using CTAS

Amazon AWS Athena using AWS CLI

As part of this section, we will understand how to interact with AWS Athena using AWS CLI Commands.

  • Amazon AWS Athena using AWS CLI - Introduction

  • Get help and list AWS Athena databases using AWS CLI

  • Managing AWS Athena Workgroups using AWS CLI

  • Run AWS Athena Queries using AWS CLI

  • Get AWS Athena Table Metadata using AWS CLI

  • Run AWS Athena Queries with a custom location using AWS CLI

  • Drop AWS Athena table using AWS CLI

  • Run CTAS under AWS Athena using AWS CLI

Amazon AWS Athena using Python boto3

As part of this section, we will understand how to interact with AWS Athena using Python boto3.

  • Amazon AWS Athena using Python boto3 - Introduction

  • Getting Started with Managing AWS Athena using Python boto3

  • List Amazon AWS Athena Databases using Python boto3

  • List Amazon AWS Athena Tables using Python boto3

  • Run Amazon AWS Athena Queries with boto3

  • Review AWS Athena Query Results using boto3

  • Persist Amazon AWS Athena Query Results in Custom Location using boto3

  • Processing AWS Athena Query Results using Pandas

  • Run CTAS against Amazon AWS Athena using Python boto3

Getting Started with Amazon AWS Redshift

As part of this section, we will understand how to get started with AWS Redshift using AWS Web console. We will also focus on basic DDL and DML or CRUD Operations using AWS Redshift Query Editor.

  • Getting Started with Amazon AWS Redshift - Introduction

  • Create AWS Redshift Cluster using Free Trial

  • Connecting to Database using AWS Redshift Query Editor

  • Get a list of tables querying information schema

  • Run Queries against AWS Redshift Tables using Query Editor

  • Create AWS Redshift Table using Primary Key

  • Insert Data into AWS Redshift Tables

  • Update Data in AWS Redshift Tables

  • Delete data from AWS Redshift tables

  • Redshift Saved Queries using Query Editor

  • Deleting AWS Redshift Cluster

  • Restore AWS Redshift Cluster from Snapshot

Copy Data from s3 into AWS Redshift Tables

As part of this section, we will go through the details about copying data from s3 into AWS Redshift tables using the AWS Redshift Copy command.

  • Copy Data from s3 to AWS Redshift - Introduction

  • Setup Data in s3 for AWS Redshift Copy

  • Copy Database and Table for AWS Redshift Copy Command

  • Create IAM User with full access on s3 for AWS Redshift Copy

  • Run Copy Command to copy data from s3 to AWS Redshift Table

  • Troubleshoot Errors related to AWS Redshift Copy Command

  • Run Copy Command to copy from s3 to AWS Redshift table

  • Validate using queries against AWS Redshift Table

  • Overview of AWS Redshift Copy Command

  • Create IAM Role for AWS Redshift to access s3

  • Copy Data from s3 to AWS Redshift table using IAM Role

  • Setup JSON Dataset in s3 for AWS Redshift Copy Command

  • Copy JSON Data from s3 to AWS Redshift table using IAM Role

Develop Applications using AWS Redshift Cluster

As part of this section, we will understand how to develop applications against databases and tables created as part of AWS Redshift Cluster.

  • Develop application using AWS Redshift Cluster - Introduction

  • Allocate Elastic Ip for AWS Redshift Cluster

  • Enable Public Accessibility for AWS Redshift Cluster

  • Update Inbound Rules in Security Group to access AWS Redshift Cluster

  • Create Database and User in AWS Redshift Cluster

  • Connect to the database in AWS Redshift using psql

  • Change Owner on AWS Redshift Tables

  • Download AWS Redshift JDBC Jar file

  • Connect to AWS Redshift Databases using IDEs such as SQL Workbench

  • Setup Python Virtual Environment for AWS Redshift

  • Run Simple Query against AWS Redshift Database Table using Python

  • Truncate AWS Redshift Table using Python

  • Create IAM User to copy from s3 to AWS Redshift Tables

  • Validate Access of IAM User using Boto3

  • Run AWS Redshift Copy Command using Python

AWS Redshift Tables with Distkeys and Sortkeys

As part of this section, we will go through AWS Redshift-specific features such as distribution keys and sort keys to create AWS Redshift tables.

  • AWS Redshift Tables with Distkeys and Sortkeys - Introduction

  • Quick Review of AWS Redshift Architecture

  • Create multi-node AWS Redshift Cluster

  • Connect to AWS Redshift Cluster using Query Editor

  • Create AWS Redshift Database

  • Create AWS Redshift Database User

  • Create AWS Redshift Database Schema

  • Default Distribution Style of AWS Redshift Table

  • Grant Select Permissions on Catalog to AWS Redshift Database User

  • Update Search Path to query AWS Redshift system tables

  • Validate AWS Redshift table with DISTSTYLE AUTO

  • Create AWS Redshift Cluster from Snapshot to the original state

  • Overview of Node Slices in AWS Redshift Cluster

  • Overview of Distribution Styles related to AWS Redshift tables

  • Distribution Strategies for retail tables in AWS Redshift Databases

  • Create AWS Redshift tables with distribution style all

  • Troubleshoot and Fix Load or Copy Errors

  • Create AWS Redshift Table with Distribution Style Auto

  • Create AWS Redshift Tables using Distribution Style Key

  • Delete AWS Redshift Cluster with a manual snapshot

AWS Redshift Federated Queries and Spectrum

As part of this section, we will go through some of the advanced features of Redshift such as AWS Redshift Federated Queries and AWS Redshift Spectrum.

  • AWS Redshift Federated Queries and Spectrum - Introduction

  • Overview of integrating AWS RDS and AWS Redshift for Federated Queries

  • Create IAM Role for AWS Redshift Cluster

  • Setup Postgres Database Server for AWS Redshift Federated Queries

  • Create tables in Postgres Database for AWS Redshift Federated Queries

  • Creating Secret using Secrets Manager for Postgres Database

  • Accessing Secret Details using Python Boto3

  • Reading Json Data to Dataframe using Pandas

  • Write JSON Data to AWS Redshift Database Tables using Pandas

  • Create AWS IAM Policy for Secret and associate with Redshift Role

  • Create AWS Redshift Cluster using AWS IAM Role with permissions on secret

  • Create AWS Redshift External Schema to Postgres Database

  • Update AWS Redshift Cluster Network Settings for Federated Queries

  • Performing ETL using AWS Redshift Federated Queries

  • Clean up resources added for AWS Redshift Federated Queries

  • Grant Access on AWS Glue Data Catalog to AWS Redshift Cluster for Spectrum

  • Setup AWS Redshift Clusters to run queries using Spectrum

  • Quick Recap of AWS Glue Catalog Database and Tables for AWS Redshift Spectrum

  • Create External Schema using AWS Redshift Spectrum

  • Run Queries using AWS Redshift Spectrum

  • Cleanup the AWS Redshift Cluster

Bạn sẽ học được gì

Data Engineering leveraging Services under AWS Data Analytics

AWS Essentials such as s3, IAM, EC2, etc

Understanding AWS s3 for cloud based storage

Understanding details related to virtual machines on AWS known as EC2

Managing AWS IAM users, groups, roles and policies for RBAC (Role Based Access Control)

Managing Tables using AWS Glue Catalog

Engineering Batch Data Pipelines using AWS Glue Jobs

Orchestrating Batch Data Pipelines using AWS Glue Workflows

Running Queries using AWS Athena - Server less query engine service

Using AWS Elastic Map Reduce (EMR) Clusters for building Data Pipelines

Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards

Data Ingestion using AWS Lambda Functions

Scheduling using AWS Events Bridge

Engineering Streaming Pipelines using AWS Kinesis

Streaming Web Server logs using AWS Kinesis Firehose

Overview of data processing using AWS Athena

Running AWS Athena queries or commands using CLI

Running AWS Athena queries using Python boto3

Creating AWS Redshift Cluster, Create tables and perform CRUD Operations

Copy data from s3 to AWS Redshift Tables

Understanding Distribution Styles and creating tables using Distkeys

Running queries on external RDBMS Tables using AWS Redshift Federated Queries

Running queries on Glue or Athena Catalog tables using AWS Redshift Spectrum

Yêu cầu

  • A Computer with at least 8 GB RAM
  • Programming Experience using Python is highly desired as some of the topics are demonstrated using Python
  • SQL Experience is highly desired as some of the topics are demonstrated using SQL
  • Nice to have Data Engineering Experience using Pandas or Pyspark
  • This course is ideal for experienced data engineers to add AWS Analytics Services as key skills to their profile

Nội dung khoá học

29 sections

Introduction to the course

7 lectures
Introduction to Data Engineering using AWS Analytics Services
05:45
Video Lectures and Reference Material
03:01
Taking the Udemy Course for new Udemy Users
04:07
Additional Costs for AWS Infrastructure for Hands-on Practice
01:39
Signup for AWS Account
01:45
Logging in into AWS Account
01:45
Overview of AWS Billing Dashboard - Cost Explorer and Budgets
03:16

Setup Local Development Environment for AWS on Windows 10 or Windows 11

11 lectures
Setup Local Environment on Windows for AWS
03:22
Overview of Powershell on Windows 10 or Windows 11
04:25
Setup Ubuntu VM on Windows 10 or 11 using wsl
06:07
Setup Ubuntu VM on Windows 10 or 11 using wsl - Contd...
05:17
Setup Python venv and pip on Ubuntu
08:49
Setup AWS CLI on Windows and Ubuntu using Pip
03:09
Create AWS IAM User and Download Credentials
03:49
Configure AWS CLI on Windows
07:36
Create Python Virtual Environment for AWS Projects
03:14
Setup Boto3 as part of Python Virtual Environment
02:29
Setup Jupyter Lab and Validate boto3
06:42

Setup Local Development Environment for AWS on Mac

7 lectures
Setup Local Environment for AWS on Mac
02:35
Setup AWS CLI on Mac
02:08
Setup AWS IAM User to configure AWS CLI
02:40
Configure AWS CLI using IAM User Credentials
06:25
Setup Python Virtual Environment on Mac using Python 3
04:43
Setup Boto3 as part of Python Virtual Environment
02:29
Setup Jupyter Lab and Validate boto3
06:42

Setup Environment for Practice using Cloud9

13 lectures
Introduction to Cloud9
00:50
Setup Cloud9
06:20
Overview of Cloud9 IDE
04:40
Docker and AWS CLI on Cloud9
03:03
Cloud9 and EC2
03:25
Accessing Web Applications
03:59
Allocate and Assign Static IP
04:11
Changing Permissions using IAM Policies
04:03
Increasing Size of EBS Volume
02:46
Opening ports for Cloud9 Instance
03:12
Setup Jupyter lab on Cloud9 Instance
07:06
Open SSH Port for Cloud9 EC2 Instance
03:15
Connect to Cloud9 EC2 Instance using SSH
06:33

AWS Getting Started with s3, IAM and CLI

12 lectures
Introduction - AWS Getting Started
01:44
[Instructions] Introduction - AWS Getting Started
00:27
Create AWS s3 Bucket using AWS Web Console
03:45
[Instructions] Create s3 Bucket
00:40
Create AWS IAM Group and User using AWS Web Console
04:26
[Instructions] Create IAM Group and User
00:45
Overview of AWS IAM Roles to grant permissions between AWS Services
02:18
[Instructions] Overview of Roles
00:22
Create and Attach AWS IAM Custom Policy using AWS Web Console
04:36
[Instructions and Code] Create and Attach Custom Policy
00:30
Configure and Validate AWS Command Line Interface to run AWS Commands
04:39
[Instructions and Code] Configure and Validate AWS CLI
00:25

Storage -Deep Dive into AWS Simple Storage Service aka s3

18 lectures
Getting Started with AWS Simple Storage aka S3
02:59
[Instructions] Getting Started with AWS S3
00:07
Setup Data Set locally to upload into AWS s3
02:17
[Instructions] Setup Data Set locally to upload into AWS s3
00:18
Adding AWS S3 Buckets and Objects using AWS Web Console
05:49
[Instruction] Adding AWS s3 Buckets and Objects
00:25
Version Control of AWS S3 Objects or Files
05:55
[Instructions] Version Control in AWS S3
01:01
AWS S3 Cross-Region Replication for fault tolerance
09:15
[Instructions] AWS S3 Cross-Region Replication for fault tolerance
00:49
Overview of AWS S3 Storage Classes or Storage Tiers
05:58
[Instructions] Overview of AWS S3 Storage Classes or Storage Tiers
00:51
Overview of Glacier in AWS s3
03:08
[Instructions] Overview of Glacier in AWS s3
00:19
Managing AWS S3 buckets and objects using AWS CLI
07:07
[Instructions and Commands] Managing AWS S3 buckets and objects using AWS CLI
00:27
Managing Objects in AWS S3 using AWS CLI - Lab
12:17
[Instructions] Managing Objects in AWS S3 using AWS CLI - Lab
00:34

AWS Security using IAM - Managing AWS Users, Roles and Policies using AWS IAM

16 lectures
Creating AWS IAM Users with Programmatic and Web Console Access
06:23
[Instructions] Creating IAM Users
00:07
Logging into AWS Management Console using AWS IAM User
02:24
[Instructions] Logging into AWS Management Console using IAM User
00:21
Validate Programmatic Access to AWS IAM User via AWS CLI
02:15
[Instructions and Commands] Validate Programmatic Access to IAM User
00:29
Getting Started with AWS IAM Identity-based Policies
09:08
[Instructions and Commands] IAM Identity-based Policies
01:04
Managing AWS IAM User Groups
06:20
[Instructions and Commands] Managing IAM Groups
00:50
Managing AWS IAM Roles for Service Level Access
09:38
[Instructions and Commands] Managing IAM Roles
00:46
Overview of AWS Custom Policies to grant permissions to Users, Groups, and Roles
09:00
[Instructions and Commands] Overview of Custom Policies
00:53
Managing AWS IAM Groups, Users, and Roles using AWS CLI
08:56
[Instructions and Commands] Managing IAM using AWS CLI
00:39

Infrastructure - Getting Started with AWS Elastic Cloud Compute aka EC2

20 lectures
Getting Started with AWS Elastic Cloud Compute aka EC2
02:59
[Instructions] Getting Started with EC2
00:36
Create AWS EC2 Key Pair for SSH Access
06:34
[Instructions] Create EC2 Key Pair
00:50
Launch AWS EC2 Instance or Virtual Machine
10:19
[Instructions] Launch EC2 Instance
00:15
Connecting to AWS EC2 Instance or Virtual Machine using SSH
03:07
[Instructions and Commands] Connecting to EC2 Instance
00:11
Overview of AWS Security Groups for firewall security of AWS EC2 Instance
08:28
[Instructions and Commands] Security Groups Basics
01:12
Overview of Public and Private IP Addresses of AWS EC2 Instance
07:21
[Instructions] Public and Private IP Addresses
00:31
Understanding AWS EC2 Instance or Virtual Machine Life Cycle
03:46
[Instructions] EC2 Life Cycle
00:23
Allocating and Assigning AWS Elastic IP or Static IP address to AWS EC2 Instance
05:06
[Instructions] Allocating and Assigning Elastic IP Addresses
00:29
Managing AWS EC2 Instances or Virtual Machines Using AWS CLI
08:52
[Instructions and Commands] Managing EC2 Using AWS CLI
01:01
Upgrade or Downgrade of AWS EC2 Instances or Virtual Machines
06:46
[Instructions and Commands] Upgrade or Downgrade EC2 Instances
01:10

Infrastructure - AWS EC2 Advanced

12 lectures
Understanding AWS EC2 Instance or Virtual Machine Metadata
04:00
[Instructions and Commands] Understanding EC2 Metadata
00:21
Querying on AWS EC2 Instance or Virtual Machine Metadata
05:22
[Instructions and Commands] Querying on EC2 Metadata
00:17
Fitering on AWS EC2 Instance or Virtual Machine Metadata
05:33
[Instructions and Commands] Filtering on EC2 Metadata
00:26
Using Bootstrapping Scripts on AWS EC2 Instance or Virtual Machine
07:33
[Instructions and Commands] Using Bootstrapping Scripts
00:26
Create an Amazon Machine Image aka AMI using AWS EC2 Instance
05:52
[Instructions and Commands] Create an AMI
00:22
Validate Amazon Machine Image aka AMI - Lab
04:08
[Instructions and Commands] Validate AMI - Lab
00:30

Data Ingestion using Lambda Functions

29 lectures
Hello World using AWS Lambda
04:01
[Instructions] Hello World using AWS Lambda
00:23
Setup Project for local development
06:04
[Instructions and Code] Setup Project for local development
00:44
Deploy Project to AWS Lambda console
04:23
[Instructions and Code] Deploy Project to AWS Lambda console
00:15
Develop download functionality using requests
07:06
[Instructions and Code] Develop download functionality using requests
00:26
Using 3rd party libraries in AWS Lambda
06:01
[Instructions and Code] Using 3rd party libraries in AWS Lambda
00:32
Validating s3 access for local development
09:25
[Instructions and Code] Validating s3 access for local development
00:22
Develop upload functionality to s3
08:53
[Instructions and Code] Develop upload functionality to s3
00:45
Validating using AWS Lambda Console
02:46
[Instructions and Code] Validating using AWS Lambda Console
00:22
Run using AWS Lambda Console
04:26
[Instructions] Run using AWS Lambda Console
00:27
Validating files incrementally
09:44
[Instructions and Code] Validating files incrementally
00:32
Reading and Writing Bookmark using s3
07:35
[Instructions and Code] Reading and Writing Bookmark using s3
00:23
Maintaining Bookmark using s3
07:58
[Instructions and Code] Maintaining Bookmark using s3
00:37
Review the incremental upload logic
05:55
Deploying lambda function
11:19
[Instructions and Source Code] - ghactivity-downloader Lambda Function
00:43
Schedule Lambda Function using AWS Event Bridge
04:56
[Instructions] Schedule Lambda Function using AWS Event Bridge
00:20

Development Lifecycle for Pyspark

19 lectures
Setup Virtual Environment and Install Pyspark
04:45
[Commands] - Setup Virtual Environment and Install Pyspark
00:05
Getting Started with Pycharm
04:56
[Code and Instructions] - Getting Started with Pycharm
00:21
Passing Run Time Arguments
05:30
Accessing OS Environment Variables
04:39
Getting Started with Spark
02:53
Create Function for Spark Session
05:48
[Code and Instructions] - Create Function for Spark Session
00:28
Setup Sample Data
02:09
Read data from files
08:46
[Code and Instructions] - Read data from files
00:37
Process data using Spark APIs
06:27
[Code and Instructions] - Process data using Spark APIs
00:35
Write data to files
07:13
[Code and Instructions] - Write data to files
00:44
Validating Writing Data to Files
06:46
Productionizing the Code
04:36
[Code and Instructions] - Productionizing the code
01:10

Overview of Glue Components

18 lectures
Introduction - Overview of Glue Components
02:40
[Instructions] Overview of Glue Components
00:36
Create Crawler and Catalog Table
06:07
[Instructions] Create Crawler and Catalog Table
00:06
Analyze Data using Athena
03:58
[Instructions] Analyze Data using Athena
00:14
Creating S3 Bucket and Role
04:15
[Instructions and Code] Creating S3 Bucket and Role
00:23
Create and Run the Glue Job
09:01
[Instructions] Create and Run the Glue Job
00:27
Validate using Glue CatalogTable and Athena
06:53
[Instructions and Code] Validate using Glue CatalogTable and Athena
00:28
Create and Run Glue Trigger
03:53
[Instructions and Code] Create and Run Glue Trigger
00:37
Create Glue Workflow
06:39
[Instructions] Create Glue Workflow
00:33
Run Glue Workflow and Validate
05:20
[Instructions] Run Glue Workflow and Validate
00:25

Setup Spark History Server for Glue Jobs

9 lectures
Introduction - Spark History Server for Glue
01:44
Setup Spark History Server on AWS
07:15
Clone AWS Glue Samples repository
01:53
[Instructions and Code] Clone AWS Glue Samples repository
00:22
Build Glue Spark UI Container
01:01
[Instructions and Code] Build Glue Spark UI Container
00:06
Update IAM Policy Permissions
04:25
Start Glue Spark UI Container
06:06
[Instructions and Code] Start Glue Spark UI Container
00:19

Deep Dive into Glue Catalog

20 lectures
Prerequisites for Glue Catalog Tables
01:00
[Instructions] Prerequisites for Glue Catalog Tables
00:20
Steps for Creating Catalog Tables
01:26
[Instructions] Steps for Creating Catalog Tables
00:08
Download Data Set
04:40
[Instructions and Code] Download Data Set
00:27
Upload data to s3
06:12
[Instructions and Code] Upload data to s3
00:38
Create Glue Catalog Database - itvghlandingdb
01:06
[Instructions] Create Glue Catalog Database - itvghlandingdb
00:07
Create Glue Catalog Table - ghactivity
05:11
[Instructions] Create Glue Catalog Table - ghactivity
00:43
Running Queries using Athena - ghactivity
04:14
[Instructions and Code] Running Queries using Athena - ghactivity
00:25
Crawling Multiple Folders
06:49
[Instructions] Crawling Multiple Folders
00:18
Managing Glue Catalog using AWS CLI
10:00
[Instructions and Code] Managing Glue Catalog using AWS CLI
00:32
Managing Glue Catalog using Python Boto3
12:12
[Instructions and Code] Managing Glue Catalog using Python Boto3
00:45

Exploring Glue Job APIs

10 lectures
Update IAM Role for Glue Job
02:12
[Instructions and Code] Update IAM Role for Glue Job
00:31
Generate baseline Glue Job
04:45
[Instructions and Code] Generate baseline Glue Job
01:10
Running baseline Glue Job
09:14
[Instructions] Running baseline Glue Job
00:25
Glue Script for Partitioning Data
04:10
[Instructions and Code] Glue Script for Partitioning Data
01:01
Validating using Athena
05:39
[Instructions and Code] Validating using Athena
00:40

Glue Job Bookmarks

19 lectures
Introduction to Glue Job Boomarks
00:59
Cleaning up the data
01:51
[Instructions and Code] Cleaning up the data
00:25
Overview of AWS Glue CLI
03:26
[Instructions and Code] Overview of AWS Glue CLI
00:25
Run Job using Bookmark
02:34
[Instructions and Code] Run Job using Bookmark
00:27
Validate Bookmark using AWS CLI
05:09
[Instructions and Code] Validate Bookmark using AWS CLI
00:39
Add new data to landing
02:22
[Instructions and Code] Add new data to landing
00:20
Rerun Glue Job using Bookmark
02:59
[Instructions and Code] Rerun Glue Job using Bookmark
00:36
Validate Job Bookmark and Files for Incremental run
02:02
[Instructions and Code] Validate Job Bookmark and Files for Incremental run
00:28
Recrawl the Glue Catalog Table using CLI
04:52
[Instructions and Code] Recrawl the Clue Catalog Table using CLI
00:39
Run Athena Queries for Data Validation
04:21
[Instructions and Code] Run Athena Queries for Data Validation
00:44

Getting Started with AWS EMR

17 lectures
Planning of EMR Cluster
01:20
Create EC2 Key Pair
04:30
Setup EMR Cluster with Spark
05:59
Understanding Summary of AWS EMR Cluster
03:28
Review EMR Cluster Application User Interfaces
02:23
Review EMR Cluster Monitoring07 Review EMR Cluster Monitoring
01:46
Review EMR Cluster Hardware and Cluster Scaling Policy
01:16
Review EMR Cluster Configurations
02:11
Review EMR Cluster Events
02:21
Review EMR Cluster Steps
01:48
Review EMR Cluster Bootstrap Actions
02:03
Connecting to EMR Master Node using SSH
02:20
Disabling Termination Protection and Terminating the Cluster
01:41
Clone and Create New Cluster
03:37
Listing AWS S3 Buckets and Objects using AWS CLI on EMR Cluster
03:20
Listing AWS S3 Buckets and Objects using HDFS CLI on EMR Cluster
03:32
Managing Files in AWS s3 using HDFS CLI on EMR Cluster
04:51

Deploying Spark Applications using AWS EMR

20 lectures
Deploying Applications using AWS EMR - Introduction
00:26
Setup EMR Cluster to deploy applications
07:18
Validate SSH Connectivity to Master node of AWS EMR Cluster
02:20
Setup Jupyter Notebook Environment on EMR Cluster
05:49
Create required AWS s3 Bucket
02:40
Upload GHActivity Data to s3
06:19
Validate Application using AWS EMR Compatible Versions
06:30
Deploy Application to AWS EMR Master Node
03:13
Create user space for ec2-user on AWS EMR Cluster
03:56
Run Spark Application using spark-submit on AWS EMR Master Node
07:36
Validate Data using Jupyter Notebooks on AWS EMR Cluster
08:16
Clone and Start Auto Terminated AWS EMR Cluster
08:35
Delete Data Populated by GHAcitivity Application using AWS EMR Cluster
01:02
Differences between Spark Client and Cluster Deployment Modes
09:13
Running Spark Application using Cluster Mode on AWS EMR Cluster
05:14
Overview of Adding Pyspark Application as Step to AWS EMR Cluster
03:33
Deploy Spark Application to AWS S3
02:20
Running Spark Applications as AWS EMR Steps in client mode
03:46
Running Spark Applications as AWS EMR Steps in cluster mode
06:27
Validate AWS EMR Step Execution of Spark Application
02:58

Streaming Pipeline using Kinesis

10 lectures
Building Streaming Pipeline using Kinesis
02:41
Rotating Logs
11:32
Setup Kinesis Firehose Agent
06:00
Create Kinesis Firehose Delivery Stream
07:23
Planning the Pipeline
03:46
Create IAM Group and User
07:13
Granting Permissions to IAM User using Policy
04:47
Configure Kinesis Firehose Agent
05:26
Start and Validate Agent
10:15
Conclusion - Building Simple Steaming Pipeline
01:03

Consuming Data from s3 using boto3

9 lectures
Customizing s3 folder using Kinesis Delivery Stream
05:21
Create Policy to read from s3 Bucket
04:25
Validate s3 access using AWS CLI
04:10
Setup Python Virtual Environment to explore boto3
03:14
Validating access to s3 using Python boto3
04:54
Read Content from s3 object
07:47
Read multiple s3 Objects
06:04
Get number of s3 Objects using Marker
06:15
Get size of s3 Objects using Marker
04:04

Populating GitHub Data to Dynamodb

12 lectures
Install required libraries
02:04
Understanding GitHub APIs
05:09
Setting up GitHub API Token
03:27
Understanding GitHub Rate Limit
01:35
Create New Repository for since
02:04
Extracting Required Information
06:02
Processing Data
08:51
Grant Permissions to create dynamodb tables using boto3
02:57
Create Dynamodb Tables
06:37
Dynamodb CRUD Operations
08:08
Populate Dynamodb Table
04:58
Dynamodb Batch Operations
05:55

Overview of Amazon Athena

15 lectures
Getting Started with Amazon Athena
03:46
Quick Recap of Glue Catalog Databases and Tables
03:13
Access Glue Catalog Databases and Tables using Athena Query Editor
03:32
Create Database and Table using Athena
04:45
Populate Data into Table using Athena
04:19
Using CTAS to create tables using Athena
08:19
Overview of Amazon Athena Architecture
04:15
Amazon Athena Resources and relationship with Hive
03:39
Create Partitioned Table using Athena
04:13
Develop Query for Partitioned Column
07:04
Insert into Partitioned Tables using Athena
02:40
Validate Data Partitioning using Athena
04:13
Drop Athena Tables and Delete Data Files
04:10
Drop Partitioned Table using Athena
01:50
Data Partitioning in Athena using CTAS
05:17

Amazon Athena using AWS CLI

15 lectures
Amazon Athena using AWS CLI - Introduction
00:49
Get help and list Athena databases using AWS CLI
02:18
[Commands] Get help and list Athena databases using AWS CLI
00:03
Managing Athena Workgroups using AWS CLI
06:48
[Commands] Managing Athena Workgroups using AWS CLI
00:03
Run Athena Queries using AWS CLI
04:12
[Commands] Run Athena Queries using AWS CLI
00:07
Get Athena Table Metadata using AWS CLI
03:37
[Commands] Get Athena Table Metadata using AWS CLI
00:07
Run Athena Queries with custom location using AWS CLI
07:19
[Commands] Run Athena Queries with custom location
00:06
Drop Athena table using AWS CLI
03:53
[Commands] Drop Athena table using AWS CLI
00:07
Run CTAS under Athena using AWS CLI
04:35
[Commands] Run CTAS under Athena using AWS CLI
00:08

Amazon Athena using Python boto3

11 lectures
Amazon Athena using Python boto3 - Introduction
02:30
Getting Started with Managing Athena using Python boto3
06:38
[Code] Getting Started with Managing Athena using Python boto3
00:03
List Amazon Athena Databases using Python boto3
05:01
[Code] List Amazon Athena Databases using Python boto3
00:05
List Amazon Athena Tables using Python boto3
09:02
[Code] List Amazon Athena Tables using Python boto3
00:20
Run Amazon Athena Queries using Python boto3
06:15
[Code] Run Amazon Athena Queries using Python boto3
00:11
Review Athena Query Results using boto3
08:12
[Code] Review Athena Query Results using Python boto3
00:10

Getting Started with Amazon Redshift

16 lectures
Getting Started with Amazon Redshift - Introduction
00:56
Create Redshift Cluster using Free Trial
03:34
Connecting to Database using Redshift Query Editor
03:33
Get list of tables querying information schema
03:34
[Queries] - Get list of tables querying information schema
00:01
Run Queries against Redshift Tables using Query Editor
03:37
[Queries] - Validate users data using Query Editor
00:02
Create Redshift Table using Primary Key
03:35
[Queries] - Create Redshift Table
00:02
[Consolidated Queries] - CRUD Operations
00:19
Insert Data into Redshift Tables
07:17
Update Data in Redshift Tables
05:13
Delete data from Redshift tables
04:17
Redshift Saved Queries using Query Editor
03:40
Deleting Redshift Cluster
02:37
Restore Redshift Cluster from Snapshot
04:48

Copy Data from s3 into Redshift Tables

13 lectures
Copy Data from s3 to Redshift - Introduction
01:27
Setup Data in s3 for Redshift Copy
04:55
Copy Database and Table for Redshift Copy Command
03:33
Create IAM User with full access on s3 for Redshift Copy
03:37
Run Copy Command to copy data from s3 to Reshift Table
03:15
Troubleshoot Errors related to Redshift Copy Command
02:17
Run Copy Command to copy from s3 to Redshift table
02:12
Validate using queries against Redshift Table
02:44
Overview of Redshift Copy Command
05:26
Create IAM Role for Redshift to access s3
04:49
Copy Data from s3 to Redshift table using IAM Role
06:05
Setup JSON Dataset in s3 for Redshift Copy Command
03:59
Copy JSON Data from s3 to Redshift table using IAM Role
03:57

Develop Applications using Redshift Cluster

15 lectures
Develop application using Redshift Cluster - Introduction
00:59
Allocate Elastic Ip for Redshift Cluster
03:46
Enable Public Accessibility for Redshift Cluster
04:01
Update Inbound Rules in Security Group to access Redshift Cluster
05:16
Create Database and User in Redshift Cluster
04:57
Connect to database in Redshift using psql
03:47
Change Owner on Redshift Tables
03:06
Download Redshift JDBC Jar file
01:51
Connect to Redshift Databases using IDEs such as SQL Workbench
04:30
Setup Python Virtual Environment for Redshift
04:45
Run Simple Query against Redshift Database Table using Python
06:30
Truncate Redshift Table using Python
03:56
Create IAM User to copy from s3 to Redshift Tables
02:23
Validate Access of IAM User using Boto3
04:51
Run Redshift Copy Command using Python
06:31

Redshift Tables with Distkeys and Sortkeys

20 lectures
Redshift Tables with Distkeys and Sortkeys - Introduction
03:58
Quick Review of Redshift Architecture
03:34
Create multi-node Redshift Cluster
04:34
Connect to Redshift Cluster using Query Editor
02:47
Create Redshift Database
01:34
Create Redshift Database User
03:46
Create Redshift Database Schema
05:37
Default Distribution Style of Redshift Table
04:14
Grant Select Permissions on Catalog to Redshift Database User
03:22
Update Search Path to query Redshift system tables
07:09
Validate table with DISTSTYLE AUTO
06:27
Create Cluster from Snapshot to the original state
06:59
Overview of Node Slices in Redshift Cluster
03:39
Overview of Distribution Styles
03:48
Distribution Strategies for retail tables in Redshift
02:17
Create Redshift tables with distribution style all
05:50
Troubleshoot and Fix Load or Copy Errors
04:03
Create Redshift Table with Distribution Style Auto
03:49
Create Redshift Tables using Distribution Style Key
07:50
Delete Cluster with manual snapshot
01:27

Redshift Federated Queries and Spectrum

21 lectures
Redshift Federated Queries and Spectrum - Introduction
01:28
Overview of integrating RDS and Redshift for Federated Queries
05:30
Create IAM Role for Redshift Cluster
02:26
Setup Postgres Database Server for Redshift Federated Queries
07:27
Create tables in Postgres Database for Redshift Federated Queries
06:02
Creating Secret using Secrets Manager for Postgres Database
04:05
Accessing Secret Details using Python Boto3
06:47
Reading Json Data to Dataframe using Pandas
08:51
Write JSON Data to Database Tables using Pandas
10:43
Create IAM Policy for Secret and associate with Redshift Role
04:45
Create Redshift Cluster using IAM Role with permissions on secret
05:01
Create Redshift External Schema to Postgres Database
06:00
Update Redshift Cluster Network Settings for Federated Queries
09:43
Performing ETL using Redshift Federated Queries
04:46
Clean up resources added for Redshift Federated Queries
03:09
Grant Access on Glue Data Catalog to Redshift Cluster for Spectrum
03:51
Setup Redshift Clusters to run queries using Spectrum
02:33
Quick Recap of Glue Catalog Database and Tables for Redshift Spectrum
02:25
Create External Schema using Redshift Spectrum
03:21
Run Queries using Redshift Spectrum
03:37
Cleanup the Redshift Cluster
01:10

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.