Mô tả

  • In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.

  • There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability.

  • Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

  • Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.

  • The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.

  • Data Visualization is built using Django Web Framework and Flexmonster.

  • Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

    Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

    Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

    A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

Bạn sẽ học được gì

Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker

Setting up Single Node Hadoop and Spark Cluster on Docker

Features of Spark Structured Streaming using Spark with Scala

Features of Spark Structured Streaming using Spark with Python(PySpark)

How to use PostgreSQL with Spark Structured Streaming

Basic understanding of Apache Kafka

How to build Data Visualisation using Django Web Framework and Flexmonster

Fundamentals of Docker and Containerization

Yêu cầu

  • Basic understanding of Programming Language
  • Basic understanding of Apache Hadoop
  • Basic understanding of Apache Spark

Nội dung khoá học

5 sections

Introduction

2 lectures
Introduction to Apache Spark
32:27
Real Time Spark Project Overview | Building End to End Streaming Data Pipeline
08:40

Environment Setup

6 lectures
Setting up Docker Environment
09:54
Create Single Node Kafka Cluster on Docker
08:15
Create Single Node Apache Hadoop and Spark Cluster on Docker
35:06
Setting up IntelliJ IDEA Community Edition(IDE)
21:00
Setting up PyCharm Community Edition(IDE)
16:40
Setting up Django Web Framework
07:09

Development | Project Code Walk-through

5 lectures
Event Simulator using Python(Server Status Detail)
19:15
Building Streaming Data Pipeline using Scala | Spark Structured Streaming
30:57
Building Streaming Data Pipeline using PySpark | Spark Structured Streaming
28:53
Setting up PostgreSQL Database(Events Database)
04:55
Building Dashboard using Django Web Framework and Flexmonster | Visualization
22:20

Complete Project Demo

2 lectures
Real Time Spark Project Demo
14:31
Running Real Time Streaming Data Pipeline using Spark Cluster On Docker
10:11

Docker Beginners Guide

9 lectures
Introduction to Docker
11:37
Install Docker on Ubuntu 18.04
09:56
Docker Commands | Commonly Used
10:33
Create First Docker Image and Container
09:48
Create MySQL Docker Container
10:58
Cassandra on Docker Container
09:04
MongoDB on Docker Container
08:00
Setting up Docker Compose
18:34
How to create Docker Volume
35:25

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.