Mô tả

Web scraping is simply automatically opening up any website and grabbing the data you find important on that website. It's fundamental to the internet, search engines, Data Science, automation, machine learning, and much more.

Opening websites and extracting data are only part of what makes web scraping great. It's the parsing of the data that's where the value is.

This project will cover:

  • Basic web scraping with Python

  • Web scraping with Selenium

  • Sync vs Async

  • Asynchronous Web scraping with Asyncio

But why asynchronous code? What is it? How does it benefit us?

Asynchrounous code is a way to execute multiple functions basically at once. It's not actually at the exact same time but it's close. (They actually run concurrently). This means that we can do more things in less time and, when it comes to mining or scraping data, this time saving is absolutely significant.


Imagine for a moment you're recreating google's search engine. You'd have to scrape trillions (if not more) web pages on a regular interval to help with the search results. Of course you're not going to be scraping all of the trillions of pages at once but the idea is that scraping event 1,000 pages would take a very long time doing it synchronously (like using Python requests and/or just selenium).


If you've done a lot of web scraping before but never used Python's aysncio, this course will help you better understand the fundamentals and bring your scraping game to another level.


Let's get started!


Bạn sẽ học được gì

Yêu cầu

Nội dung khoá học

6 sections

Welcome

3 lectures
Welcome
03:25
Project Demo
10:27
Requirements
00:59

Fundamentals

5 lectures
Sync vs Async
12:57
Blocking & Timeouts
10:25
Scraping with Selenium
09:01
Async Web Scraping with chrome driver and arsenic
15:00
Hide Arsenic logs
01:12

Extraction & Formatting

4 lectures
Async Data with Python Pandas
13:12
Prepare to Scrape Multiple URLs
11:32
Extract Product Data
13:24
Async Product Data Extraction
09:21

Prepare for Re-usability

3 lectures
Modules & Submodules
05:39
Service Specific Submodule
03:25
Decouple Logging & Scraper
05:23

Storing Data

6 lectures
Synchronous SQL Storage with Pandas
07:16
Store Scrapped Data to SQL Tables
13:26
Inspect Stored Data in Jupyter
05:34
Scraping URLS from Stored Links Table
16:29
Scrape Paginated List View
13:30
Results & Timing
08:44

Thank you and next steps

1 lectures
Thank you & next steps
02:48

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.