Mô tả

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular data engineering technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.


  • Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI

  • Manage big data on a cluster with HDFS and MapReduce

  • Write programs to analyze data on Hadoop with Pig and Spark

  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto

  • Design real-world systems using the Hadoop ecosystem

  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue

  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Spark and Hadoop developers are hugely valued at companies with large amounts of data; these are very marketable skills to learn.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM,  Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.

You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! 

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle "big data" is an incredibly valuable skill for today's top tech employers. Don't be left behind - enroll now!


  • "The Ultimate Hands-On Hadoop... was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. " - Aldo Serrano

  • "I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.   This course helped me achieve a far greater understanding of the environment and its capabilities.  Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment." - Tyler Buck

Bạn sẽ học được gì

Design distributed systems that manage "big data" using Hadoop and related data engineering technologies.

Use HDFS and MapReduce for storing and analyzing data at scale.

Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.

Analyze relational data using Hive and MySQL

Analyze non-relational data using HBase, Cassandra, and MongoDB

Query data interactively with Drill, Phoenix, and Presto

Choose an appropriate data storage technology for your application

Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.

Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume

Consume streaming data using Spark Streaming, Flink, and Storm

Yêu cầu

  • You will need access to a x86-based PC running 64-bit Windows, MacOS, or Linux with an Internet connection and at least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements or you only have an M1-based Mac available, you can still follow along in the course without doing hands-on activities.
  • Some activities will require some prior programming experience, preferably in Python or Scala.
  • A basic familiarity with the Linux command line will be very helpful.

Nội dung khoá học

12 sections

Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.

9 lectures
Udemy 101: Getting the Most From This Course
02:10
Tips for Using This Course
01:09
If you have trouble downloading Hortonworks Data Platform...
00:29
Warning for Apple M1 users
00:26
Installing Hadoop [Step by Step]
17:44
The Hortonworks and Cloudera Merger, and how it affects this course.
03:01
Hadoop Overview and History
07:44
Overview of the Hadoop Ecosystem
16:46
Important note
00:24

Using Hadoop's Core: HDFS and MapReduce

13 lectures
HDFS: What it is, and how it works
13:53
Alternate MovieLens download location
00:04
Installing the MovieLens Dataset
06:20
[Activity] Install the MovieLens dataset into HDFS using the command line
07:50
MapReduce: What it is, and how it works
10:40
How MapReduce distributes processing
12:57
MapReduce example: Break down movie ratings by rating score
11:35
Notes on MRJob installation
00:55
[Activity] Installing Python, MRJob, and nano
13:19
[Activity] Code up the ratings histogram MapReduce job and run it
07:36
[Exercise] Rank movies by their popularity
07:06
Note: Sorting will only work by partition.
00:24
[Activity] Check your results against mine!
08:23

Programming Hadoop with Pig

7 lectures
Introducing Ambari
09:49
Introducing Pig
06:25
Example: Find the oldest movie with a 5-star rating using Pig
15:07
[Activity] Find old 5-star movies with Pig
09:40
More Pig Latin
07:34
[Exercise] Find the most-rated one-star movie
01:56
Pig Challenge: Compare Your Results to Mine!
05:37

Programming Hadoop with Spark

8 lectures
Why Spark?
10:06
The Resilient Distributed Dataset (RDD)
10:13
[Activity] Find the movie with the lowest average rating - with RDD's
15:33
Datasets and Spark 2.0
06:28
[Activity] Find the movie with the lowest average rating - with DataFrames
10:00
[Activity] Movie recommendations with MLLib
12:44
[Exercise] Filter the lowest-rated movies by number of ratings
02:51
[Activity] Check your results against mine!
06:40

Using relational data stores with Hadoop

10 lectures
What is Hive?
06:31
[Activity] Use Hive to find the most popular movie
10:45
How Hive works
09:10
[Exercise] Use Hive to find the movie with the highest average rating
01:55
Compare your solution to mine.
04:10
Integrating MySQL with Hadoop
08:00
Cheat sheet for the following lecture
00:25
[Activity] Install MySQL and import our movie data
07:46
[Activity] Use Sqoop to import data from MySQL to HFDS/Hive
07:01
[Activity] Use Sqoop to export data from Hadoop to MySQL
07:16

Using non-relational data stores with Hadoop

13 lectures
Why NoSQL?
13:54
What is HBase
12:55
[Activity] Import movie ratings into HBase
13:28
[Activity] Use HBase with Pig to import data at scale.
11:19
Cassandra overview
14:50
If you have trouble installing Cassandra...
00:58
[Activity] Installing Cassandra
10:53
[Activity] Write Spark output into Cassandra
11:00
MongoDB overview
17:19
[Activity] Install MongoDB, and integrate Spark with MongoDB
12:44
[Activity] Using the MongoDB shell
07:48
Choosing a database technology
15:59
[Exercise] Choose a database for a given problem
05:00

Querying your Data Interactively

9 lectures
Overview of Drill
07:55
[Activity] Setting up Drill
10:58
[Activity] Querying across multiple databases with Drill
07:07
Overview of Phoenix
08:55
[Activity] Install Phoenix and query HBase with it
07:02
[Activity] Integrate Phoenix with Pig
11:45
Overview of Presto
06:39
[Activity] Install Presto, and query Hive with it.
12:26
[Activity] Query both Cassandra and Hive using Presto.
09:01

Managing your Cluster

13 lectures
YARN explained
10:01
Tez explained
04:56
[Activity] Use Hive on Tez and measure the performance benefit
08:35
Mesos explained
07:13
ZooKeeper explained
13:10
[Activity] Simulating a failing master with ZooKeeper
06:47
Oozie explained
11:56
[Activity] Set up a simple Oozie workflow
16:54
Zeppelin overview
05:01
[Activity] Use Zeppelin to analyze movie ratings, part 1
12:28
[Activity] Use Zeppelin to analyze movie ratings, part 2
09:46
Hue overview
08:07
Other technologies worth mentioning
04:35

Feeding Data to your Cluster

6 lectures
Kafka explained
09:48
[Activity] Setting up Kafka, and publishing some data.
07:24
[Activity] Publishing web logs with Kafka
10:21
Flume explained
10:16
[Activity] Set up Flume and publish logs with it.
07:46
[Activity] Set up Flume to monitor a directory and store its data in HDFS
09:12

Analyzing Streams of Data

8 lectures
Spark Streaming: Introduction
14:27
[Activity] Analyze web logs published with Flume using Spark Streaming
14:20
[Exercise] Monitor Flume-published logs for errors in real time
02:02
Exercise solution: Aggregating HTTP access codes with Spark Streaming
04:24
Apache Storm: Introduction
09:27
[Activity] Count words with Storm
15:49
Flink: An Overview
06:53
[Activity] Counting words with Flink
10:20

Designing Real-World Systems

7 lectures
The Best of the Rest
09:24
Review: How the pieces fit together
06:29
Understanding your requirements
08:02
Sample application: consume webserver logs and keep track of top-sellers
10:06
Sample application: serving movie recommendations to a website
11:18
[Exercise] Design a system to report web sessions per day
02:52
Exercise solution: Design a system to count daily sessions
04:24

Learning More

2 lectures
Books and online resources
05:32
Bonus Lecture: More courses to explore!
00:51

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.