Mô tả

Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites  and save the the extracted data to a local file or to a database.

In this course, you will learn how to perform web scraping using Python 3 and the Beautiful Soup, a free open-source library written in Python for parsing HTML.

We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.

Finally, we will use Selenium alongside Beautiful Soup to crawl AJAX & JavaScript driven pages.

The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using Beautiful Soup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.

By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.

This Web Scraping course covers the following topics:

  • Review of data structures (Lists, Dictionaries, Tuples, File Handling)
  • How websites are hosted on servers
  • Calls to the server (GET, POST methods)
  • Review of HTML and CSS
  • Requests Module and BeautifulSoup Module overview
  • Parsing HTML using BeautifulSoup
  • Filtering elements using BeautifulSoup and navigating the Parse Tree
  • JavaScript and AJAX overview
  • Selenium and the need for it
  • Selecting elements using Selenium 
  • CSS selectors 
  • XPath selectors 
  • Navigating pages using Selenium 
  • Practical Projects



Bạn sẽ học được gì

Python Refresher: Review of Data Structures, Conditionals, File Handling

How Websites are Hosted on Servers; Basic Calls to Server (GET, POST Methods)

Web Scraping with Python Beautiful Soup and Requests

Using Selenium to handle JavaScript and AJAX

Diverse Web Scraping Exercises

Source codes (*.py files) for all Exercises can be downloaded

Q&A board to send your questions and get them answered quickly

Yêu cầu

  • Some prior programming experience in Python (e.g. Data Structures and OOP) will help. The course includes a full Python refresher section.
  • Complete beginners may wish to take a beginner Python course first, and then transition to this course afterwards.
  • This course adopts a step-by-step approach and requires you to open a Python editor, download available *.py code files, and start applying the provided examples and exercises.
  • Python 3: Codes of this course are tested on Python 3. It is up to you to adapt them if you want to run them in Python 2.

Nội dung khoá học

19 sections

Web Scraping Course Overview

1 lectures
Web Scraping Course Overview
04:22

Python Refresher: Data Structures (Optional)

11 lectures
Lists
06:39
Dictionaries
09:11
Tuples
07:56
List Comprehensions - Part 1
07:06
List Comprehensions - Part 2
16:37
Inline - if else and List Comprehensions
03:35
Installing xlrd and XlsxWriter​ to Read/Write to Excel Files
00:19
Wrting to Excel Files
10:02
Reading from Excel Files
05:09
Python Editor & Other Software
01:07
Exercise #1: YOU: Web Scraping Expert
00:18

How Servers Work

2 lectures
How Websites are Hosted
01:32
HTML Revision
02:22

BeautifulSoup Warm-up Exercise

2 lectures
BeautifulSoup Solved Exercise
01:38
Simple Scraper
1 question

Installing Required Python Packages

1 lectures
Installing Required Python Packages
03:19

Introduction to Requests Python Library

3 lectures
Requests Get Method
08:50
User Agent
06:55
Installing fake_useragent Package
00:11

Introduction to Beautiful Soup Python Library

4 lectures
Web Scraping with Beautiful Soup - Overview
04:44
Web Scraping with Beautiful Soup - Overview P.2
02:56
Accessing Tags
07:08
Navigable Strings
03:30

Navigating with Beautiful Soup - Going Down

3 lectures
Navigating through Tag Names
03:33
Contents and Children Methods
07:02
Descendants Method
08:19

Navigating with Beautiful Soup - Going Up

2 lectures
Parent Method
07:02
Parents Method
04:40

Navigating with Beautiful Soup - Going Sideways

3 lectures
next_sibling
04:46
previous_sibling
03:52
next_siblings & previous_siblings
04:38

Regular Expressions with Python

7 lectures
Metacharacters Overview
07:50
Compile Function and Character Class
09:32
Special Sequences
02:02
* Repeating Things
04:04
+ Repeating Things
02:24
? and {m,n} Repeating Things
05:41
Metacharacters part2
01:52

Searching the Parse Tree Using Beautiful Soup

5 lectures
Introduction to Searching with BeautifulSoup
09:48
find_all Function
08:01
find_all More Parameters
09:47
find Function
03:07
Craigslist Scraper - Level 1
1 question

Project 1: Scraping CustomerReports Website

2 lectures
Web Scraping CustomerReports - part 1
11:11
Web Scraping CustomerReports - part 2
08:32

Project 2: Web Scraping CodingBat Website with Beautiful Soup

4 lectures
Project 2 Description
04:13
Web Scraping CodingBat - part 1
10:58
Web Scraping CodingBat - part 2
21:16
Web Scraping CodingBat - part 3
14:55

Using Selenium to Handle AJAX & JavaScript Driven Web Pages

8 lectures
JavaScript, AJAX and Selenium intro
09:29
Installing Selenium
00:13
Installing ChromeDriver
00:26
Introduction to Selenium
07:24
Searching Elements and Inputting Data
06:37
Clicking Elements
03:58
XPath Introduction
14:39
XPath Examples
08:48

Project 3: Web Scraping Your Instagram Account

13 lectures
Project 3 Description
03:43
Logging in to Instagram
15:37
Settings Tab!
05:11
Opening Target Profile (NEW)
07:34
Scrolling Down v.1 (NEW)
13:34
Scrolling Down v.2 (NEW)
06:30
Exception Handling (NEW)
10:19
Making Folders (NEW)
05:10
Downloading Images v.1 (NEW)
05:16
Downloading Images v.2 (NEW)
12:45
Downloading Captions (NEW)
17:01
Writing Captions to Excel File (NEW)
16:17
Instagram Final Code - Updated: 2018-06-22
02:15

Web Scraping Best Practices

1 lectures
Web Scraping Best Practices
06:38

Bonus: Data Extraction with APIs

1 lectures
Data Extraction with APIs (Free Tutorial)
00:17

Bonus: Scrapy: Powerful Web Scraping and Crawling Framework in Python

1 lectures
Coupon for "Scrapy: Powerful Web Scraping & Crawling with Python" Course
01:38

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.