Normal view MARC view ISBD view

Practical Web Scraping for Data Science : Best Practices and Examples with Python / Seppe vanden Broucke, Bart Baesens.

By: Broucke, Seppe vanden [author.].
Contributor(s): Baesens, Bart [author.].
Publisher: Berkeley, CA : Springer Science and Business Media : Apress, ©2018Description: xvi, 306 pages : illustration; 25 cm.Content type: text Media type: unmediated Carrier type: volumeISBN: 9781484235812.Subject(s): Automatic data collection systems | Data mining | Python (Computer program language) | Automatic data collection systems | Data mining | Python (Computer program language)Genre/Form: Print books.
Contents:
Intro; Table of Contents; About the Authors; About the Technical Reviewer; Introduction; Part I: Web Scraping Basics; Chapter 1: Introduction; 1.1 What Is Web Scraping?; 1.1.1 Why Web Scraping for Data Science?; 1.1.2 Who Is Using Web Scraping?; 1.2 Getting Ready; 1.2.1 Setting Up; 1.2.2 A Quick Python Primer; Chapter 2: The Web Speaks HTTP; 2.1 The Magic of Networking; 2.2 The HyperText Transfer Protocol: HTTP; 2.3 HTTP in Python: The Requests Library; 2.4 Query Strings: URLs with Parameters; Chapter 3: Stirring the HTML and CSS Soup; 3.1 Hypertext Markup Language: HTML.
3.2 Using Your Browser as a Development Tool3.3 Cascading Style Sheets: CSS; 3.4 The Beautiful Soup Library; 3.5 More on Beautiful Soup; Part II: Advanced Web Scraping; Chapter 4: Delving Deeper in HTTP; 4.1 Working with Forms and POST Requests; 4.2 Other HTTP Request Methods; 4.3 More on Headers; 4.4 Dealing with Cookies; 4.5 Using Sessions with Requests; 4.6 Binary, JSON, and Other Forms of Content; Chapter 5: Dealing with JavaScript; 5.1 What Is JavaScript?; 5.2 Scraping JavaScript; 5.3 Scraping with Selenium; 5.4 More on Selenium; Chapter 6: From Web Scraping to Web Crawling.
6.1 What Is Web Crawling?6.2 Web Crawling in Python; 6.3 Storing Results in a Database; Part III: Managerial Concerns and Best Practices; Chapter 7: Managerial and Legal Concerns; 7.1 The Data Science Process; 7.2 Where Does Web Scraping Fit In?; 7.3 Legal Concerns; Chapter 8: Closing Topics; 8.1 Other Tools; 8.1.1 Alternative Python Libraries; 8.1.2 Scrapy; 8.1.3 Caching; 8.1.4 Proxy Servers; 8.1.5 Scraping in Other Programming Languages; 8.1.6 Command-Line Tools; 8.1.7 Graphical Scraping Tools; 8.2 Best Practices and Tips; Chapter 9: Examples; 9.1 Scraping Hacker News.
9.2 Using the Hacker News API9.3 Quotes to Scrape; 9.4 Books to Scrape; 9.5 Scraping GitHub Stars; 9.6 Scraping Mortgage Rates; 9.7 Scraping and Visualizing IMDB Ratings; 9.8 Scraping IATA Airline Information; 9.9 Scraping and Analyzing Web Forum Interactions; 9.10 Collecting and Clustering a Fashion Data Set; 9.11 Sentiment Analysis of Scraped Amazon Reviews; 9.12 Scraping and Analyzing News Articles; 9.13 Scraping and Analyzing a Wikipedia Graph; 9.14 Scraping and Visualizing a Board Members Graph; 9.15 Breaking CAPTCHA's Using Deep Learning; Index.
In: Springer e-books (online collection). Professional and applied computing. 2018Summary: Including many larger, fully worked out examples, this book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. --
    average rating: 0.0 (0 votes)

Includes bibliographical references and index.

Intro; Table of Contents; About the Authors; About the Technical Reviewer; Introduction; Part I: Web Scraping Basics; Chapter 1: Introduction; 1.1 What Is Web Scraping?; 1.1.1 Why Web Scraping for Data Science?; 1.1.2 Who Is Using Web Scraping?; 1.2 Getting Ready; 1.2.1 Setting Up; 1.2.2 A Quick Python Primer; Chapter 2: The Web Speaks HTTP; 2.1 The Magic of Networking; 2.2 The HyperText Transfer Protocol: HTTP; 2.3 HTTP in Python: The Requests Library; 2.4 Query Strings: URLs with Parameters; Chapter 3: Stirring the HTML and CSS Soup; 3.1 Hypertext Markup Language: HTML.

3.2 Using Your Browser as a Development Tool3.3 Cascading Style Sheets: CSS; 3.4 The Beautiful Soup Library; 3.5 More on Beautiful Soup; Part II: Advanced Web Scraping; Chapter 4: Delving Deeper in HTTP; 4.1 Working with Forms and POST Requests; 4.2 Other HTTP Request Methods; 4.3 More on Headers; 4.4 Dealing with Cookies; 4.5 Using Sessions with Requests; 4.6 Binary, JSON, and Other Forms of Content; Chapter 5: Dealing with JavaScript; 5.1 What Is JavaScript?; 5.2 Scraping JavaScript; 5.3 Scraping with Selenium; 5.4 More on Selenium; Chapter 6: From Web Scraping to Web Crawling.

6.1 What Is Web Crawling?6.2 Web Crawling in Python; 6.3 Storing Results in a Database; Part III: Managerial Concerns and Best Practices; Chapter 7: Managerial and Legal Concerns; 7.1 The Data Science Process; 7.2 Where Does Web Scraping Fit In?; 7.3 Legal Concerns; Chapter 8: Closing Topics; 8.1 Other Tools; 8.1.1 Alternative Python Libraries; 8.1.2 Scrapy; 8.1.3 Caching; 8.1.4 Proxy Servers; 8.1.5 Scraping in Other Programming Languages; 8.1.6 Command-Line Tools; 8.1.7 Graphical Scraping Tools; 8.2 Best Practices and Tips; Chapter 9: Examples; 9.1 Scraping Hacker News.

9.2 Using the Hacker News API9.3 Quotes to Scrape; 9.4 Books to Scrape; 9.5 Scraping GitHub Stars; 9.6 Scraping Mortgage Rates; 9.7 Scraping and Visualizing IMDB Ratings; 9.8 Scraping IATA Airline Information; 9.9 Scraping and Analyzing Web Forum Interactions; 9.10 Collecting and Clustering a Fashion Data Set; 9.11 Sentiment Analysis of Scraped Amazon Reviews; 9.12 Scraping and Analyzing News Articles; 9.13 Scraping and Analyzing a Wikipedia Graph; 9.14 Scraping and Visualizing a Board Members Graph; 9.15 Breaking CAPTCHA's Using Deep Learning; Index.

Access limited to UNC Chapel Hill-authenticated users. Unlimited simultaneous users.

Including many larger, fully worked out examples, this book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. --

Copyright © 2020 Alfaisal University Library. All Rights Reserved.
Tel: +966 11 2158948 Fax: +966 11 2157910 Email:
librarian@alfaisal.edu