Mô tả

In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.


Then you will be introduced to Sqoop Import

  • Understand lifecycle of sqoop command.

  • Use sqoop import command to migrate data from Mysql to HDFS.

  • Use sqoop import command to migrate data from Mysql to Hive.

  • Use various file formats, compressions, file delimeter,where clause and queries while importing the data.

  • Understand split-by and boundary queries.

  • Use incremental mode to migrate the data from Mysql to HDFS.


Further, you will learn Sqoop Export to migrate data.

  • What is sqoop export

  • Using sqoop export, migrate data from HDFS to Mysql.

  • Using sqoop export, migrate data from Hive to Mysql.



Further, you will learn about Apache Flume

  • Understand Flume Architecture.

  • Using flume, Ingest data from Twitter and save to HDFS.

  • Using flume, Ingest data from netcat and save to HDFS.

  • Using flume, Ingest data from exec and show on console.

  • Describe flume interceptors and see examples of using interceptors.

  • Flume multiple agents

  • Flume Consolidation.


In the next section, we will learn about Apache Hive

  • Hive Intro

  • External & Managed Tables

  • Working with Different Files - Parquet,Avro

  • Compressions

  • Hive Analysis

  • Hive String Functions

  • Hive Date Functions

  • Partitioning

  • Bucketing


You will learn about Apache Spark

  • Spark Intro

  • Cluster Overview

  • RDD

  • DAG/Stages/Tasks

  • Actions & Transformations

  • Transformation & Action Examples

  • Spark Data frames

  • Spark Data frames - working with diff File Formats & Compression

  • Dataframes API's

  • Spark SQL

  • Dataframe Examples

  • Spark with Cassandra Integration

  • Running Spark on Intellij IDE

  • Running Spark on EMR


Bạn sẽ học được gì

Hadoop distributed File system and commands. Lifecycle of sqoop command. Sqoop import command to migrate data from Mysql to HDFS. Sqoop import command to migrate data from Mysql to Hive. Working with various file formats, compressions, file delimeter,where clause and queries while importing the data. Understand split-by and boundary queries. Use incremental mode to migrate the data from Mysql to HDFS. Using sqoop export, migrate data from HDFS to Mysql. Using sqoop export, migrate data from Hive to Mysql. Understand Flume Architecture. Using flume, Ingest data from Twitter and save to HDFS. Using flume, Ingest data from netcat and save to HDFS. Using flume, Ingest data from exec and show on console. Flume Interceptors.

Yêu cầu

  • No

Nội dung khoá học

22 sections

Big Data Introduction

4 lectures
Meet your Instructor
01:22
Course Intro
01:26
Big Data Intro
05:24
Understanding Big Data Ecosystem
10:27

Google Cloud Cluster Setup

5 lectures
Google Cloud Account Setup
02:06
DataProc Cluster Setup
05:11
Upload Files on Google Cloud
03:56
Sqoop Setup
05:31
Environment Update
00:42

Hadoop & Yarn

2 lectures
HDFS and Hadoop Commands
09:16
Yarn Cluster Overview
07:41

Sqoop Import

17 lectures
Sqoop Introduction
15:48
Managing Target Directories
07:26
Working with Parquet File Format
08:24
Working with Avro File Format
11:35
Working with Different Compressions
10:08
Conditional Imports
04:26
Split-by and Boundary Queries
08:27
Field delimeters
03:18
Incremental Appends
11:38
Sqoop-Hive Cluster Fix
00:11
Access Hive on Google Cloud
00:50
Sqoop Hive Import
03:31
Sqoop List Tables/Database
04:13
Sqoop Assignment1
1 question
Sqoop Assignment2
1 question
Sqoop Import Practice1
04:57
Sqoop Import Practice2
03:32

Sqoop Export

4 lectures
Export from Hdfs to Mysql
03:39
Export from Hive to Mysql
02:30
Export Avro Compressed to Mysql
07:30
Bonus Lecture: Sqoop with Airflow
02:57

Apache Flume

9 lectures
Flume Setup
01:44
Flume Introduction & Architecture
10:07
Exec Source and Logger Sink
03:41
Moving data from Twitter to HDFS
09:25
Moving data from NetCat to HDFS
04:39
Flume Interceptors
01:56
Flume Interceptor Example
04:53
Flume Multi-Agent Flow
06:49
Flume Consolidation
06:11

Apache Hive

15 lectures
Access Hive Shell on Google Cloud
00:50
Hive Introduction
03:41
Hive Database
08:29
Hive Managed Tables
06:23
Hive External Tables
02:26
Hive Inserts
05:30
Hive Analytics
04:21
Working with Parquet
03:29
Compressing Parquet
04:27
Working with Fixed File Format
03:04
Alter Command
06:12
Hive String Functions
06:21
Hive Date Functions
05:39
Hive Partitioning
07:16
Hive Bucketing
03:44

Spark with Yarn & HDFS

5 lectures
What is Apache Spark
02:47
Understanding Cluster Manager (Yarn)
04:25
Understanding Distributed Storage (HDFS)
03:38
Running Spark on Yarn/HDFS
08:31
Understanding Deploy Modes
01:23

GCS Cluster

2 lectures
Spark on GCS Cluster
01:48
Upload Data files for Spark
01:49

Spark Internals

6 lectures
Drivers & Executors
02:12
RDDs & Dataframes
04:28
Transformation & Actions
06:11
Wide & Narrow Transformations
05:22
Understanding Execution Plan
04:57
Different Plans by Driver
02:30

Spark RDD : Transformation & Actions

10 lectures
Map/FlatMap Transformation
04:28
Filter/Intersection
04:00
Union/Distinct Transformation
02:23
GroupByKey/ Group people based on Birthday months
05:53
ReduceByKey / Total Number of students in each Subject
06:44
SortByKey / Sort students based on their rollno
06:03
MapPartition / MapPartitionWithIndex
06:20
Change number of Partitions
03:34
Join / join email address based on customer name
03:06
Spark Actions
06:05

Spark RDD Practice

7 lectures
Upload Files
00:24
Scala Tuples
03:05
Filter Error Logs
10:22
Frequency of word in Text File
08:35
Population of each city
03:53
Orders placed by Customers
09:20
average rating of movie
07:04

Spark Dataframes & Spark SQL

16 lectures
Dataframe Intro
02:16
Dafaframe from Json Files
08:42
Dataframe from Parquet Files
07:26
Dataframe from CSV Files
05:14
Dataframe from Avro File
07:13
Working with XML
03:22
Working with Columns
05:23
Working with String
04:05
Working with Dates
03:47
Dataframe Filter API
02:50
DataFrame API Part1
04:51
DataFrame API Part2
06:25
Spark SQL
01:41
Working with Hive Tables in Spark
02:34
Datasets versus Dataframe
03:28
User Defined Functions (UDFS)
03:38

Using Intellij IDE

6 lectures
Intellij Setup
02:24
Project Setup
03:43
Writing first Spark program on IDE
07:55
Understanding spark configuration
07:00
Adding Actions/Transformations
07:55
Understanding Execution Plan
07:43

Running Spark on EMR (AWS Cloud)

5 lectures
EMR Cluster Overview
02:02
Cluster Setup
07:56
Setting Spark Code for EMR
06:31
Using Spark-submit
05:42
Running Spark on EMR Cluster
04:54

Spark with Cassandra

5 lectures
Cassandra Course
00:08
Creating Spark RDD from Cassandra Table
09:13
Processing Cassandra data in Spark
08:18
Cassandra Rows to Case Class
02:33
Saving Spark RDD to Cassandra
02:58

Getting Started with MongoDB

3 lectures
MongoDB Intro
04:18
MongoDB Usecase & Limitations
04:18
MongoDB Installation
08:03

CRUD Operations

7 lectures
Find
03:37
Find With Filter
02:09
Insert
04:20
Update
05:55
Update Continues
05:30
Projections
02:29
Delete
04:14

Working with Operators

4 lectures
In / not in Operators
02:39
gte / lte Operators
02:16
and / or operators
03:03
regex operator
02:47

MongoDB Compass

1 lectures
Working with GUI
04:51

Advanced Mongo

2 lectures
Validation/Schema
03:41
Working with Indexes
05:18

Spark with Mongo

1 lectures
Spark Mongo Integration
00:19

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.