Facebook Twitter Instagram
    TecAdmin
    • Home
    • FeedBack
    • Submit Article
    • About Us
    Facebook Twitter Instagram
    TecAdmin
    You are at:Home»Python Tips & Tricks»Web Scraping Made Easy: Python Example Scripts for Beginners

    Web Scraping Made Easy: Python Example Scripts for Beginners

    By RahulApril 21, 20233 Mins Read

    Web scraping is the process of extracting data from websites and online sources. It’s a valuable skill for data analysis, data mining, machine learning, and many other fields. Python, with its rich library ecosystem, has become a go-to language for web scraping. In this article, we will cover the basics of web scraping using Python, introducing you to example scripts for beginners.

    Advertisement

    Table of Contents

    1. What is Web Scraping?
    2. Why Use Python for Web Scraping?
    3. Python Libraries for Web Scraping
    4. Setting Up Your Environment
    5. Example Script: Extracting Quotes from a Website
    6. Handling Pagination
    7. Exporting Scraped Data
    8. Conclusion

    1. What is Web Scraping?

    Web scraping is the automated process of extracting structured data from websites. It involves making HTTP requests to web pages, parsing the HTML content, and extracting the desired information. This technique is commonly used for data analysis, sentiment analysis, price comparison, sentiment analysis, and more.

    2. Why Use Python for Web Scraping?

    Python is a versatile and beginner-friendly programming language, making it perfect for web scraping. It has a wide range of libraries that simplify the process, allowing users to focus on data extraction rather than dealing with the intricacies of HTTP requests and HTML parsing. Moreover, Python’s readability and maintainability make it an excellent choice for web scraping projects.

    3. Python Libraries for Web Scraping

    Several Python libraries can be used for web scraping, but the two most popular are Beautiful Soup and Requests. Beautiful Soup is a powerful library that makes it easy to parse and navigate HTML content, while Requests is used for making HTTP requests to websites.

    4. Setting Up Your Environment

    Before diving into web scraping, ensure you have Python and the necessary libraries installed. You can use pip to install Beautiful Soup and Requests:

    pip install beautifulsoup4 
    pip install requests 
    

    5. Example Script: Extracting Quotes from a Website

    We will use http://quotes.toscrape.com/ as an example website. This website contains quotes from famous authors, and we will extract them using Python. The following script demonstrates how to extract quotes from the first page:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    import requests
    from bs4 import BeautifulSoup
     
    url = 'http://quotes.toscrape.com/'
    response = requests.get(url)
     
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        quotes = soup.find_all('div', class_='quote')
     
        for quote in quotes:
            text = quote.find('span', class_='text').text
            author = quote.find('small', class_='author').text
            print(f'"{text}" - {author}')
    else:
        print('Failed to fetch the web page')

    This script uses Requests to fetch the web page and Beautiful Soup to parse the HTML content. It then locates all div elements with the class ‘quote’ and extracts the quote text and author.

    6. Handling Pagination

    To scrape data from multiple pages, we can modify our script to handle pagination:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    import requests
    from bs4 import BeautifulSoup
     
    base_url = 'http://quotes.toscrape.com/'
    page_number = 1
     
    while True:
        url = f'{base_url}page/{page_number}/'
        response = requests.get(url)
     
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
            quotes = soup.find_all('div', class_='quote')
     
            if not quotes:
                break
     
            for quote in quotes:
                text = quote.find('span', class_='text').text
                author = quote.find('small', class_='author').text
                print(f'"{text}" - {author}')
                page_number += 1
        else:
            break

    This script uses a while loop to navigate through the pages. It constructs the URL for each page by appending the page number to the base URL. The loop continues until there are no more quotes to extract.

    7. Exporting Scraped Data

    Once you have extracted the desired data, you can export it to a file, such as a CSV or JSON, for further processing or analysis. The following code demonstrates how to export the scraped quotes to a CSV file:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    import requests
    from bs4 import BeautifulSoup
    import csv
     
    base_url = 'http://quotes.toscrape.com/'
    page_number = 1
    quote_list = []
     
    while True:
        url = f'{base_url}page/{page_number}/'
        response = requests.get(url)
     
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
            quotes = soup.find_all('div', class_='quote')
     
            if not quotes:
                break
     
            for quote in quotes:
                text = quote.find('span', class_='text').text
                author = quote.find('small', class_='author').text
                quote_list.append({'quote': text, 'author': author})
     
            page_number += 1
        else:
            break
     
    with open('quotes.csv', mode='w', newline='', encoding='utf-8') as file:
        fieldnames = ['quote', 'author']
        writer = csv.DictWriter(file, fieldnames=fieldnames)
     
        writer.writeheader()
        for quote_data in quote_list:
            writer.writerow(quote_data)
     
    print("Quotes have been exported to quotes.csv")

    This modified script appends each quote and author to a list, then exports the list to a CSV file using Python’s built-in CSV module.

    Conclusion

    Web scraping with Python is a powerful and accessible technique for beginners to extract data from websites. In this article, we have demonstrated how to use Python’s Requests and Beautiful Soup libraries to fetch and parse web pages, handle pagination, and export the extracted data to a file. With these foundational skills, you can now apply web scraping to various projects and unlock valuable insights from online sources.

    Python Web Scarping
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp

    Related Posts

    Python Function with Parameters, Return and Data Types

    Understanding Python’s Underscore ( _ ): A Comprehensive Guide

    How to Use Python with MySQL for Web Development

    Add A Comment

    Leave A Reply Cancel Reply

    Advertisement
    Recent Posts
    • Setting Up Angular on Ubuntu: Step-by-Step Guide
    • Converting UTC Date and Time to Local Time in Linux
    • Git Restore: Functionality and Practical Examples
    • Git Switch: Functionality and Practical Examples
    • Git Switch vs. Checkout: A Detailed Comparison with Examples
    Facebook Twitter Instagram Pinterest
    © 2023 Tecadmin.net. All Rights Reserved | Terms  | Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.